Thanks for the report. I have fired up another windows machine (win10
rather than win7) and have again managed to get a failure that looks the
same as yours. This should make it easier to track down.

On Fri, Oct 6, 2017 at 11:14 AM, 'Pascal Jasmin' via Programming <
[email protected]> wrote:

> the file is called jcs.log (attached, not sure if I'm allowed).  I had 5
> extra jconsole processes on this run in jqt.  (jconsole can also have more
> than 1 extra unclosed process sometimes when it fails)
>
> There's a continuation of the important previous pattern:  all jobs were
> started and finished, and it is the last kill command that results in hang.
>
>
> One idea (thought to be unnecessary with zmq) might be to track finished
> status and PIDs, and clean up and terminate after all done.
> console output
>
> qrun 99 88 2
> start:  0 0
> start:  1 1
> start:  2 2
> start:  3 3
> start:  4 4
> start:  5 5
> start:  6 6
> start:  7 7
> start:  8 8
> start:  9 9
> start:  10 10
> start:  11 11
> start:  12 12
> start:  13 13
> start:  14 14
> start:  15 15
> start:  16 16
> start:  17 17
> start:  18 18
> start:  19 19
> start:  20 20
> start:  21 21
> start:  22 22
> start:  23 23
> start:  24 24
> start:  25 25
> start:  26 26
> start:  27 27
> start:  28 28
> start:  29 29
> start:  30 30
> start:  31 31
> start:  32 32
> start:  33 33
> start:  34 34
> start:  35 35
> start:  36 36
> start:  37 37
> start:  38 38
> start:  39 39
> start:  40 40
> start:  41 41
> start:  42 42
> start:  43 43
> start:  44 44
> start:  45 45
> start:  46 46
> start:  47 47
> start:  48 48
> start:  49 49
> start:  50 50
> start:  51 51
> start:  52 52
> start:  53 53
> start:  54 54
> start:  55 55
> start:  56 56
> start:  57 57
> start:  58 58
> start:  59 59
> start:  60 60
> start:  61 61
> start:  62 62
> start:  63 63
> start:  64 64
> start:  65 65
> start:  66 66
> start:  67 67
> start:  68 68
> start:  69 69
> start:  70 70
> start:  71 71
> start:  72 72
> start:  73 73
> start:  74 74
> start:  75 75
> start:  76 76
> start:  77 77
> start:  78 78
> start:  79 79
> start:  80 80
> start:  81 81
> start:  82 82
> start:  83 83
> start:  84 84
> start:  85 85
> start:  86 86
> start:  87 87
> finish: 0 0
> finish: 1 1
> finish: 2 2
> finish: 3 3
> finish: 4 4
> finish: 5 5
> finish: 6 6
> finish: 7 7
> finish: 8 8
> finish: 9 9
> finish: 10 10
> finish: 11 11
> finish: 12 12
> finish: 13 13
> finish: 14 14
> finish: 15 15
> finish: 16 16
> finish: 17 17
> finish: 18 18
> finish: 19 19
> finish: 20 20
> finish: 21 21
> finish: 22 22
> finish: 23 23
> finish: 24 24
> finish: 25 25
> finish: 26 26
> finish: 27 27
> finish: 28 28
> finish: 29 29
> finish: 30 30
> finish: 31 31
> finish: 32 32
> finish: 33 33
> finish: 34 34
> start:  88 0
> start:  89 1
> start:  90 2
> start:  91 3
> start:  92 4
> start:  93 5
> start:  94 6
> start:  95 7
> start:  96 8
> start:  97 9
> start:  98 10
> kill:  11
> kill:  12
> kill:  13
> kill:  14
> kill:  15
> kill:  16
> kill:  17
> kill:  18
> kill:  19
> kill:  20
> kill:  21
> kill:  22
> kill:  23
> kill:  24
> kill:  25
> kill:  26
> kill:  27
> kill:  28
> kill:  29
> kill:  30
> kill:  31
> kill:  32
> kill:  33
> kill:  34
> finish: 35 35
> finish: 36 36
> finish: 37 37
> finish: 38 38
> finish: 39 39
> finish: 40 40
> finish: 44 44
> finish: 46 46
> finish: 50 50
> finish: 51 51
> finish: 86 86
> kill:  35
> kill:  36
> kill:  37
> kill:  38
> kill:  39
> kill:  40
> kill:  44
> kill:  46
> kill:  50
> kill:  51
> kill:  86
> finish: 88 0
> finish: 89 1
> finish: 90 2
> finish: 91 3
> finish: 92 4
> finish: 93 5
> finish: 94 6
> finish: 95 7
> finish: 96 8
> finish: 97 9
> finish: 98 10
> finish: 42 42
> finish: 43 43
> finish: 45 45
> finish: 48 48
> finish: 54 54
> finish: 55 55
> finish: 56 56
> finish: 57 57
> finish: 58 58
> finish: 59 59
> finish: 60 60
> finish: 61 61
> finish: 62 62
> finish: 63 63
> finish: 64 64
> finish: 65 65
> finish: 66 66
> finish: 67 67
> finish: 68 68
> finish: 69 69
> finish: 70 70
> finish: 71 71
> finish: 72 72
> finish: 76 76
> finish: 78 78
> finish: 79 79
> finish: 80 80
> finish: 82 82
> finish: 83 83
> finish: 84 84
> kill:  0
> kill:  1
> kill:  2
> kill:  3
> kill:  4
> kill:  5
> kill:  6
> kill:  7
> kill:  8
> kill:  9
> kill:  10
> kill:  42
> kill:  43
> kill:  45
> kill:  48
> kill:  54
> kill:  55
> kill:  56
> kill:  57
> kill:  58
> kill:  59
> kill:  60
> kill:  61
> kill:  62
> kill:  63
> kill:  64
> kill:  65
> kill:  66
> kill:  67
> kill:  68
> kill:  69
> kill:  70
> kill:  71
> kill:  72
> kill:  76
> kill:  78
> kill:  79
> kill:  80
> kill:  82
> kill:  83
> kill:  84
> finish: 73 73
> finish: 75 75
> finish: 77 77
> finish: 85 85
> finish: 87 87
> kill:  73
> kill:  75
> kill:  77
> kill:  85
> kill:  87
> finish: 47 47
> finish: 52 52
> kill:  47
> kill:  52
> finish: 81 81
> kill:  81
> poll 0:
>
>
>
>
>
> ________________________________
> From: Eric Iverson <[email protected]>
> To: Programming forum <[email protected]>
> Sent: Friday, October 6, 2017 9:36 AM
> Subject: Re: [Jprogramming] qrun - jcs - zmq
>
>
>
> Pascal,
>
> The logfile_jcs_ includes writes from started tasks that are interspersed
> with the screen output. Both output are useful.
>
> Please get a simple failure and send me the text of the session as well as
> the text of the logfile_jcs_.
>
> At that same time give me the output of windows command:
> ...> tasklist /FI "imagename eq jconsole.exe"
>
> Thanks.
>
> On Thu, Oct 5, 2017 at 9:52 PM, 'Pascal Jasmin' via Programming <
> [email protected]> wrote:
>
> > each failure leaves behind 1 stranded jconsole task
> >
> >
> >
> >
> > ________________________________
> > From: bill lam <[email protected]>
> > To: Programming forum <[email protected]>
> > Sent: Thursday, October 5, 2017 9:09 PM
> > Subject: Re: [Jprogramming] qrun - jcs - zmq
> >
> >
> >
> > The mission of stress test is to make it fail and a large of task is
> > important, try on jconsole
> >
> > qrun 99 99 1
> > or
> > 2 qrun 99 99 2
> > and eventually
> > qrun each 500#<99 99 1
> >
> > Any failure would mean it is unfit for serious production use.
> >
> > I don't think the number of cores would affect its stability.
> >
> > Did you check task manager for any stranded jconsole instances?
> >
> >
> > On Oct 6, 2017 8:43 AM, "'Pascal Jasmin' via Programming" <
> > [email protected]> wrote:
> >
> > > with a separate program running on 6 cores,
> > >
> > > I can run in jqt without problem,
> > >
> > > qrun each 10 # < 99 5 3
> > >
> > >
> > > However, most (many at least) runs with more tasks, fail
> > >
> > > btw, your suggestions to use jconsole with ctrl-c apply just fine with
> > jqt
> > > and jbreak.bat (and debug invoked at break)
> > >
> > > the logfile in ~temp, seems to just repeat the console output.
> > >
> > > There is a pattern to nearly all of the current failures:
> > >
> > > 1. It is hanging on terminating the last task "kill 98".  All runs
> always
> > > print "finished lastjob task", and hang on killing the task of the last
> > > finish. (not always the last job to finish last)
> > >
> > > there is no noticeable effect on success from adding an x parameter.
> > >
> > > ________________________________
> > > From: Eric Iverson <[email protected]>
> > > To: Programming forum <[email protected]>
> > > Sent: Thursday, October 5, 2017 4:43 PM
> > > Subject: [Jprogramming] qrun - jcs - zmq
> > >
> > >
> > >
> > > Pascal (and others interested in the qrun problem),
> > >
> > >
> > > I was happy when I was able to repeat the hang on my windows system.
> And
> > >
> > > then it went away. A race condition that depends on the weather?
> > >
> > >
> > > I have updated zmq/jcs addons with an improved qrun that logs more
> info.
> > >
> > >
> > > ctrl+c can be very useful in working with zmq. It is best to use
> jconsole
> > >
> > > in tracking down this problem. Jqt and JHS introduce unnecessary
> > >
> > > complications.
> > >
> > >
> > > Windows also complicates this as its support for ctrl+c has some
> problems
> > >
> > > vs zmq and sockets.
> > >
> > >
> > > In going over all the reports it seems that the problem is that one of
> > the
> > >
> > > early tasks started never finishes its first request. The problem seems
> > to
> > >
> > > be a race between starting the task and the first request to it.
> > >
> > >
> > > The new versions should help track this down.
> > >
> > >
> > > Please try the following and give back the results:
> > >
> > >
> > > 1. start jconsole
> > >
> > >    load'~addons/net/jcs/qrun.ijs'
> > >
> > >    qrun 99 99 1
> > >
> > >
> > > Poll now has a timeout. If you see poll line repeated every 5 seconds,
> > you
> > >
> > > are likely hung waiting for something that isn't going to happen. The
> > good
> > >
> > > news is that your session should respond to ctrl+c within 5 seconds.
> > >
> > >
> > > qrun now writes a logfile that might have some hints as to the problem.
> > >
> > > After qrun has hung, and you have done ctrl+c, take a look at: fread
> > >
> > > logfile_jcs_
> > >
> > >
> > > Please pass the contents of that file to me as it might hlep track this
> > >
> > > down.
> > >
> > >
> > > ***
> > >
> > > if it is a race between starting a task and sending it the 1st request,
> > the
> > >
> > > problem might 'go away' if we add a sleep between starting all the
> tasks
> > >
> > > and starting any jobs. This is not a fix, but provides more info.
> > >
> > >
> > > If you can get the hang repeatedly, please see if you the following
> > avoids
> > >
> > > the hang.
> > >
> > >
> > >    2 qrun 99 99 2 NB. sleep 2 seconds before starting requests
> > >
> > >
> > > ***
> > >
> > > Has anyone seen this problem on Linux? Can we say it is possibly a
> window
> > >
> > > only problem?
> > >
> > > ----------------------------------------------------------------------
> > >
> > > For information about J forums see http://www.jsoftware.com/forums.htm
>
> >
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to