Thanks for the report. I have fired up another windows machine (win10 rather than win7) and have again managed to get a failure that looks the same as yours. This should make it easier to track down.
On Fri, Oct 6, 2017 at 11:14 AM, 'Pascal Jasmin' via Programming < [email protected]> wrote: > the file is called jcs.log (attached, not sure if I'm allowed). I had 5 > extra jconsole processes on this run in jqt. (jconsole can also have more > than 1 extra unclosed process sometimes when it fails) > > There's a continuation of the important previous pattern: all jobs were > started and finished, and it is the last kill command that results in hang. > > > One idea (thought to be unnecessary with zmq) might be to track finished > status and PIDs, and clean up and terminate after all done. > console output > > qrun 99 88 2 > start: 0 0 > start: 1 1 > start: 2 2 > start: 3 3 > start: 4 4 > start: 5 5 > start: 6 6 > start: 7 7 > start: 8 8 > start: 9 9 > start: 10 10 > start: 11 11 > start: 12 12 > start: 13 13 > start: 14 14 > start: 15 15 > start: 16 16 > start: 17 17 > start: 18 18 > start: 19 19 > start: 20 20 > start: 21 21 > start: 22 22 > start: 23 23 > start: 24 24 > start: 25 25 > start: 26 26 > start: 27 27 > start: 28 28 > start: 29 29 > start: 30 30 > start: 31 31 > start: 32 32 > start: 33 33 > start: 34 34 > start: 35 35 > start: 36 36 > start: 37 37 > start: 38 38 > start: 39 39 > start: 40 40 > start: 41 41 > start: 42 42 > start: 43 43 > start: 44 44 > start: 45 45 > start: 46 46 > start: 47 47 > start: 48 48 > start: 49 49 > start: 50 50 > start: 51 51 > start: 52 52 > start: 53 53 > start: 54 54 > start: 55 55 > start: 56 56 > start: 57 57 > start: 58 58 > start: 59 59 > start: 60 60 > start: 61 61 > start: 62 62 > start: 63 63 > start: 64 64 > start: 65 65 > start: 66 66 > start: 67 67 > start: 68 68 > start: 69 69 > start: 70 70 > start: 71 71 > start: 72 72 > start: 73 73 > start: 74 74 > start: 75 75 > start: 76 76 > start: 77 77 > start: 78 78 > start: 79 79 > start: 80 80 > start: 81 81 > start: 82 82 > start: 83 83 > start: 84 84 > start: 85 85 > start: 86 86 > start: 87 87 > finish: 0 0 > finish: 1 1 > finish: 2 2 > finish: 3 3 > finish: 4 4 > finish: 5 5 > finish: 6 6 > finish: 7 7 > finish: 8 8 > finish: 9 9 > finish: 10 10 > finish: 11 11 > finish: 12 12 > finish: 13 13 > finish: 14 14 > finish: 15 15 > finish: 16 16 > finish: 17 17 > finish: 18 18 > finish: 19 19 > finish: 20 20 > finish: 21 21 > finish: 22 22 > finish: 23 23 > finish: 24 24 > finish: 25 25 > finish: 26 26 > finish: 27 27 > finish: 28 28 > finish: 29 29 > finish: 30 30 > finish: 31 31 > finish: 32 32 > finish: 33 33 > finish: 34 34 > start: 88 0 > start: 89 1 > start: 90 2 > start: 91 3 > start: 92 4 > start: 93 5 > start: 94 6 > start: 95 7 > start: 96 8 > start: 97 9 > start: 98 10 > kill: 11 > kill: 12 > kill: 13 > kill: 14 > kill: 15 > kill: 16 > kill: 17 > kill: 18 > kill: 19 > kill: 20 > kill: 21 > kill: 22 > kill: 23 > kill: 24 > kill: 25 > kill: 26 > kill: 27 > kill: 28 > kill: 29 > kill: 30 > kill: 31 > kill: 32 > kill: 33 > kill: 34 > finish: 35 35 > finish: 36 36 > finish: 37 37 > finish: 38 38 > finish: 39 39 > finish: 40 40 > finish: 44 44 > finish: 46 46 > finish: 50 50 > finish: 51 51 > finish: 86 86 > kill: 35 > kill: 36 > kill: 37 > kill: 38 > kill: 39 > kill: 40 > kill: 44 > kill: 46 > kill: 50 > kill: 51 > kill: 86 > finish: 88 0 > finish: 89 1 > finish: 90 2 > finish: 91 3 > finish: 92 4 > finish: 93 5 > finish: 94 6 > finish: 95 7 > finish: 96 8 > finish: 97 9 > finish: 98 10 > finish: 42 42 > finish: 43 43 > finish: 45 45 > finish: 48 48 > finish: 54 54 > finish: 55 55 > finish: 56 56 > finish: 57 57 > finish: 58 58 > finish: 59 59 > finish: 60 60 > finish: 61 61 > finish: 62 62 > finish: 63 63 > finish: 64 64 > finish: 65 65 > finish: 66 66 > finish: 67 67 > finish: 68 68 > finish: 69 69 > finish: 70 70 > finish: 71 71 > finish: 72 72 > finish: 76 76 > finish: 78 78 > finish: 79 79 > finish: 80 80 > finish: 82 82 > finish: 83 83 > finish: 84 84 > kill: 0 > kill: 1 > kill: 2 > kill: 3 > kill: 4 > kill: 5 > kill: 6 > kill: 7 > kill: 8 > kill: 9 > kill: 10 > kill: 42 > kill: 43 > kill: 45 > kill: 48 > kill: 54 > kill: 55 > kill: 56 > kill: 57 > kill: 58 > kill: 59 > kill: 60 > kill: 61 > kill: 62 > kill: 63 > kill: 64 > kill: 65 > kill: 66 > kill: 67 > kill: 68 > kill: 69 > kill: 70 > kill: 71 > kill: 72 > kill: 76 > kill: 78 > kill: 79 > kill: 80 > kill: 82 > kill: 83 > kill: 84 > finish: 73 73 > finish: 75 75 > finish: 77 77 > finish: 85 85 > finish: 87 87 > kill: 73 > kill: 75 > kill: 77 > kill: 85 > kill: 87 > finish: 47 47 > finish: 52 52 > kill: 47 > kill: 52 > finish: 81 81 > kill: 81 > poll 0: > > > > > > ________________________________ > From: Eric Iverson <[email protected]> > To: Programming forum <[email protected]> > Sent: Friday, October 6, 2017 9:36 AM > Subject: Re: [Jprogramming] qrun - jcs - zmq > > > > Pascal, > > The logfile_jcs_ includes writes from started tasks that are interspersed > with the screen output. Both output are useful. > > Please get a simple failure and send me the text of the session as well as > the text of the logfile_jcs_. > > At that same time give me the output of windows command: > ...> tasklist /FI "imagename eq jconsole.exe" > > Thanks. > > On Thu, Oct 5, 2017 at 9:52 PM, 'Pascal Jasmin' via Programming < > [email protected]> wrote: > > > each failure leaves behind 1 stranded jconsole task > > > > > > > > > > ________________________________ > > From: bill lam <[email protected]> > > To: Programming forum <[email protected]> > > Sent: Thursday, October 5, 2017 9:09 PM > > Subject: Re: [Jprogramming] qrun - jcs - zmq > > > > > > > > The mission of stress test is to make it fail and a large of task is > > important, try on jconsole > > > > qrun 99 99 1 > > or > > 2 qrun 99 99 2 > > and eventually > > qrun each 500#<99 99 1 > > > > Any failure would mean it is unfit for serious production use. > > > > I don't think the number of cores would affect its stability. > > > > Did you check task manager for any stranded jconsole instances? > > > > > > On Oct 6, 2017 8:43 AM, "'Pascal Jasmin' via Programming" < > > [email protected]> wrote: > > > > > with a separate program running on 6 cores, > > > > > > I can run in jqt without problem, > > > > > > qrun each 10 # < 99 5 3 > > > > > > > > > However, most (many at least) runs with more tasks, fail > > > > > > btw, your suggestions to use jconsole with ctrl-c apply just fine with > > jqt > > > and jbreak.bat (and debug invoked at break) > > > > > > the logfile in ~temp, seems to just repeat the console output. > > > > > > There is a pattern to nearly all of the current failures: > > > > > > 1. It is hanging on terminating the last task "kill 98". All runs > always > > > print "finished lastjob task", and hang on killing the task of the last > > > finish. (not always the last job to finish last) > > > > > > there is no noticeable effect on success from adding an x parameter. > > > > > > ________________________________ > > > From: Eric Iverson <[email protected]> > > > To: Programming forum <[email protected]> > > > Sent: Thursday, October 5, 2017 4:43 PM > > > Subject: [Jprogramming] qrun - jcs - zmq > > > > > > > > > > > > Pascal (and others interested in the qrun problem), > > > > > > > > > I was happy when I was able to repeat the hang on my windows system. > And > > > > > > then it went away. A race condition that depends on the weather? > > > > > > > > > I have updated zmq/jcs addons with an improved qrun that logs more > info. > > > > > > > > > ctrl+c can be very useful in working with zmq. It is best to use > jconsole > > > > > > in tracking down this problem. Jqt and JHS introduce unnecessary > > > > > > complications. > > > > > > > > > Windows also complicates this as its support for ctrl+c has some > problems > > > > > > vs zmq and sockets. > > > > > > > > > In going over all the reports it seems that the problem is that one of > > the > > > > > > early tasks started never finishes its first request. The problem seems > > to > > > > > > be a race between starting the task and the first request to it. > > > > > > > > > The new versions should help track this down. > > > > > > > > > Please try the following and give back the results: > > > > > > > > > 1. start jconsole > > > > > > load'~addons/net/jcs/qrun.ijs' > > > > > > qrun 99 99 1 > > > > > > > > > Poll now has a timeout. If you see poll line repeated every 5 seconds, > > you > > > > > > are likely hung waiting for something that isn't going to happen. The > > good > > > > > > news is that your session should respond to ctrl+c within 5 seconds. > > > > > > > > > qrun now writes a logfile that might have some hints as to the problem. > > > > > > After qrun has hung, and you have done ctrl+c, take a look at: fread > > > > > > logfile_jcs_ > > > > > > > > > Please pass the contents of that file to me as it might hlep track this > > > > > > down. > > > > > > > > > *** > > > > > > if it is a race between starting a task and sending it the 1st request, > > the > > > > > > problem might 'go away' if we add a sleep between starting all the > tasks > > > > > > and starting any jobs. This is not a fix, but provides more info. > > > > > > > > > If you can get the hang repeatedly, please see if you the following > > avoids > > > > > > the hang. > > > > > > > > > 2 qrun 99 99 2 NB. sleep 2 seconds before starting requests > > > > > > > > > *** > > > > > > Has anyone seen this problem on Linux? Can we say it is possibly a > window > > > > > > only problem? > > > > > > ---------------------------------------------------------------------- > > > > > > For information about J forums see http://www.jsoftware.com/forums.htm > > > > > > ---------------------------------------------------------------------- > > > For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
