the file is called jcs.log (attached, not sure if I'm allowed). I had 5 extra jconsole processes on this run in jqt. (jconsole can also have more than 1 extra unclosed process sometimes when it fails)
There's a continuation of the important previous pattern: all jobs were started and finished, and it is the last kill command that results in hang. One idea (thought to be unnecessary with zmq) might be to track finished status and PIDs, and clean up and terminate after all done. console output qrun 99 88 2 start: 0 0 start: 1 1 start: 2 2 start: 3 3 start: 4 4 start: 5 5 start: 6 6 start: 7 7 start: 8 8 start: 9 9 start: 10 10 start: 11 11 start: 12 12 start: 13 13 start: 14 14 start: 15 15 start: 16 16 start: 17 17 start: 18 18 start: 19 19 start: 20 20 start: 21 21 start: 22 22 start: 23 23 start: 24 24 start: 25 25 start: 26 26 start: 27 27 start: 28 28 start: 29 29 start: 30 30 start: 31 31 start: 32 32 start: 33 33 start: 34 34 start: 35 35 start: 36 36 start: 37 37 start: 38 38 start: 39 39 start: 40 40 start: 41 41 start: 42 42 start: 43 43 start: 44 44 start: 45 45 start: 46 46 start: 47 47 start: 48 48 start: 49 49 start: 50 50 start: 51 51 start: 52 52 start: 53 53 start: 54 54 start: 55 55 start: 56 56 start: 57 57 start: 58 58 start: 59 59 start: 60 60 start: 61 61 start: 62 62 start: 63 63 start: 64 64 start: 65 65 start: 66 66 start: 67 67 start: 68 68 start: 69 69 start: 70 70 start: 71 71 start: 72 72 start: 73 73 start: 74 74 start: 75 75 start: 76 76 start: 77 77 start: 78 78 start: 79 79 start: 80 80 start: 81 81 start: 82 82 start: 83 83 start: 84 84 start: 85 85 start: 86 86 start: 87 87 finish: 0 0 finish: 1 1 finish: 2 2 finish: 3 3 finish: 4 4 finish: 5 5 finish: 6 6 finish: 7 7 finish: 8 8 finish: 9 9 finish: 10 10 finish: 11 11 finish: 12 12 finish: 13 13 finish: 14 14 finish: 15 15 finish: 16 16 finish: 17 17 finish: 18 18 finish: 19 19 finish: 20 20 finish: 21 21 finish: 22 22 finish: 23 23 finish: 24 24 finish: 25 25 finish: 26 26 finish: 27 27 finish: 28 28 finish: 29 29 finish: 30 30 finish: 31 31 finish: 32 32 finish: 33 33 finish: 34 34 start: 88 0 start: 89 1 start: 90 2 start: 91 3 start: 92 4 start: 93 5 start: 94 6 start: 95 7 start: 96 8 start: 97 9 start: 98 10 kill: 11 kill: 12 kill: 13 kill: 14 kill: 15 kill: 16 kill: 17 kill: 18 kill: 19 kill: 20 kill: 21 kill: 22 kill: 23 kill: 24 kill: 25 kill: 26 kill: 27 kill: 28 kill: 29 kill: 30 kill: 31 kill: 32 kill: 33 kill: 34 finish: 35 35 finish: 36 36 finish: 37 37 finish: 38 38 finish: 39 39 finish: 40 40 finish: 44 44 finish: 46 46 finish: 50 50 finish: 51 51 finish: 86 86 kill: 35 kill: 36 kill: 37 kill: 38 kill: 39 kill: 40 kill: 44 kill: 46 kill: 50 kill: 51 kill: 86 finish: 88 0 finish: 89 1 finish: 90 2 finish: 91 3 finish: 92 4 finish: 93 5 finish: 94 6 finish: 95 7 finish: 96 8 finish: 97 9 finish: 98 10 finish: 42 42 finish: 43 43 finish: 45 45 finish: 48 48 finish: 54 54 finish: 55 55 finish: 56 56 finish: 57 57 finish: 58 58 finish: 59 59 finish: 60 60 finish: 61 61 finish: 62 62 finish: 63 63 finish: 64 64 finish: 65 65 finish: 66 66 finish: 67 67 finish: 68 68 finish: 69 69 finish: 70 70 finish: 71 71 finish: 72 72 finish: 76 76 finish: 78 78 finish: 79 79 finish: 80 80 finish: 82 82 finish: 83 83 finish: 84 84 kill: 0 kill: 1 kill: 2 kill: 3 kill: 4 kill: 5 kill: 6 kill: 7 kill: 8 kill: 9 kill: 10 kill: 42 kill: 43 kill: 45 kill: 48 kill: 54 kill: 55 kill: 56 kill: 57 kill: 58 kill: 59 kill: 60 kill: 61 kill: 62 kill: 63 kill: 64 kill: 65 kill: 66 kill: 67 kill: 68 kill: 69 kill: 70 kill: 71 kill: 72 kill: 76 kill: 78 kill: 79 kill: 80 kill: 82 kill: 83 kill: 84 finish: 73 73 finish: 75 75 finish: 77 77 finish: 85 85 finish: 87 87 kill: 73 kill: 75 kill: 77 kill: 85 kill: 87 finish: 47 47 finish: 52 52 kill: 47 kill: 52 finish: 81 81 kill: 81 poll 0: ________________________________ From: Eric Iverson <[email protected]> To: Programming forum <[email protected]> Sent: Friday, October 6, 2017 9:36 AM Subject: Re: [Jprogramming] qrun - jcs - zmq Pascal, The logfile_jcs_ includes writes from started tasks that are interspersed with the screen output. Both output are useful. Please get a simple failure and send me the text of the session as well as the text of the logfile_jcs_. At that same time give me the output of windows command: ...> tasklist /FI "imagename eq jconsole.exe" Thanks. On Thu, Oct 5, 2017 at 9:52 PM, 'Pascal Jasmin' via Programming < [email protected]> wrote: > each failure leaves behind 1 stranded jconsole task > > > > > ________________________________ > From: bill lam <[email protected]> > To: Programming forum <[email protected]> > Sent: Thursday, October 5, 2017 9:09 PM > Subject: Re: [Jprogramming] qrun - jcs - zmq > > > > The mission of stress test is to make it fail and a large of task is > important, try on jconsole > > qrun 99 99 1 > or > 2 qrun 99 99 2 > and eventually > qrun each 500#<99 99 1 > > Any failure would mean it is unfit for serious production use. > > I don't think the number of cores would affect its stability. > > Did you check task manager for any stranded jconsole instances? > > > On Oct 6, 2017 8:43 AM, "'Pascal Jasmin' via Programming" < > [email protected]> wrote: > > > with a separate program running on 6 cores, > > > > I can run in jqt without problem, > > > > qrun each 10 # < 99 5 3 > > > > > > However, most (many at least) runs with more tasks, fail > > > > btw, your suggestions to use jconsole with ctrl-c apply just fine with > jqt > > and jbreak.bat (and debug invoked at break) > > > > the logfile in ~temp, seems to just repeat the console output. > > > > There is a pattern to nearly all of the current failures: > > > > 1. It is hanging on terminating the last task "kill 98". All runs always > > print "finished lastjob task", and hang on killing the task of the last > > finish. (not always the last job to finish last) > > > > there is no noticeable effect on success from adding an x parameter. > > > > ________________________________ > > From: Eric Iverson <[email protected]> > > To: Programming forum <[email protected]> > > Sent: Thursday, October 5, 2017 4:43 PM > > Subject: [Jprogramming] qrun - jcs - zmq > > > > > > > > Pascal (and others interested in the qrun problem), > > > > > > I was happy when I was able to repeat the hang on my windows system. And > > > > then it went away. A race condition that depends on the weather? > > > > > > I have updated zmq/jcs addons with an improved qrun that logs more info. > > > > > > ctrl+c can be very useful in working with zmq. It is best to use jconsole > > > > in tracking down this problem. Jqt and JHS introduce unnecessary > > > > complications. > > > > > > Windows also complicates this as its support for ctrl+c has some problems > > > > vs zmq and sockets. > > > > > > In going over all the reports it seems that the problem is that one of > the > > > > early tasks started never finishes its first request. The problem seems > to > > > > be a race between starting the task and the first request to it. > > > > > > The new versions should help track this down. > > > > > > Please try the following and give back the results: > > > > > > 1. start jconsole > > > > load'~addons/net/jcs/qrun.ijs' > > > > qrun 99 99 1 > > > > > > Poll now has a timeout. If you see poll line repeated every 5 seconds, > you > > > > are likely hung waiting for something that isn't going to happen. The > good > > > > news is that your session should respond to ctrl+c within 5 seconds. > > > > > > qrun now writes a logfile that might have some hints as to the problem. > > > > After qrun has hung, and you have done ctrl+c, take a look at: fread > > > > logfile_jcs_ > > > > > > Please pass the contents of that file to me as it might hlep track this > > > > down. > > > > > > *** > > > > if it is a race between starting a task and sending it the 1st request, > the > > > > problem might 'go away' if we add a sleep between starting all the tasks > > > > and starting any jobs. This is not a fix, but provides more info. > > > > > > If you can get the hang repeatedly, please see if you the following > avoids > > > > the hang. > > > > > > 2 qrun 99 99 2 NB. sleep 2 seconds before starting requests > > > > > > *** > > > > Has anyone seen this problem on Linux? Can we say it is possibly a window > > > > only problem? > > > > ---------------------------------------------------------------------- > > > > For information about J forums see http://www.jsoftware.com/forums.htm > > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
