P:ascal, Thanks for your patience on this.
I just saw something in qrun that seems wrong. Take a time out while I study it. I will get back to you. On Wed, Oct 4, 2017 at 2:25 PM, 'Pascal Jasmin' via Programming < [email protected]> wrote: > yes the break was issued when hung. > > I did what you recommended, but with jqt as the "initial client" > > tasklist /FI lists 13 jconsole instances. > > killing them all (one at a time) did not unfreeze jqt. > > A new jqt session did unfreeze it, but froze itself. No jconsole pid's > while frozen on qrun 99 5 3 at "end task: 4" (though finish 98 4) > > By far the most common freeze in all tests is on the very last job. > > ________________________________ > From: Eric Iverson <[email protected]> > To: Programming forum <[email protected]> > Sent: Wednesday, October 4, 2017 1:59 PM > Subject: Re: [Jprogramming] jcs/zmq addons updated > > > > The break call stack in your last report is interesting. > > Is this a break when it was hung? If it was a break when it was running > normally, then it is just normal. > > If when hung, it indicates the client is waiting (zmq_poll) for a state > change in one of the servers. In this case the tasklist/taskkill from the > previous message would be of interest. > > > > > On Wed, Oct 4, 2017 at 1:54 PM, Eric Iverson <[email protected]> > wrote: > > > Thanks for the additional info. No insight yet, > > > > Please try the following: > > 1. clean system > > 2. start jconsole > > 3. 2!:6'' NB. pid for future reference > > 4. qrun until hung > > 5. windows command window: > > tasklist /FI "imagename eq jconsole.exe" > > 6. above should list the pid from earlier - and perhaps other pids > > 7. windows command window: > > taskkill /PID nnnnn - where nnnn is NOT the pid reported by 2!:6'' > > 8. this should let the main task run again and perhaps give more info > > > > I would expect this to > > > > > > On Wed, Oct 4, 2017 at 1:35 PM, 'Pascal Jasmin' via Programming < > > [email protected]> wrote: > > > >> sometimes disabling smt in bios can increase performance or avoid such > >> problems ( I didn't do this, but ran with 5 threads ie < cores) > >> > >> following sequence, > >> > >> jconsole > >> 99 5 2 fine > >> 99 5 3 fine > >> 99 5 4 hangs at "end task 12" > >> ctrl c no immediate result > >> > >> jqt, > >> 99 5 2 finishes, and jconsole unhangs to produce ctrl c output: > >> > >> |break: cdx > >> | r[check _1~:>{.r=.x cdx y > >> > >> > >> in jqt, rerunning 99 5 3 several times on 4th try with debug (ctrl-k) > on, > >> hangs > >> > >> in jconsole (attempt to unblock jqt) > >> > >> qrun 99 5 2 > >> |port already in use in this task: assert > >> | 'port already in use in this task' assert-.port e.>1{"1 jcs'' > >> this error never occurred before, (when debug in jqt wasn't on). > >> > >> ________________________________ > >> From: Eric Iverson <[email protected]> > >> To: Programming forum <[email protected]> > >> Sent: Wednesday, October 4, 2017 12:58 PM > >> Subject: Re: [Jprogramming] jcs/zmq addons updated > >> > >> > >> > >> Thanks for clarifying things. > >> > >> On your system, in a clean state, jconsole qrun 99 99 2 hangs. > >> > >> When you have the clean state hang in jconsole, please try ctrl+c (if > you > >> have not already done so) as this should break out of some socket hangs. > >> If > >> this breaks, it would provide important info. > >> > >> It would be useful if you could get the hang with smaller args. For > >> example, can you get the hang with: qrun each 10#40 4 2 > >> > >> Unfortunately I can not reproduce this on my windows system. I can loop > >> through 100s of this test without problem. Also on Linux and OSX. > >> > >> On Wed, Oct 4, 2017 at 12:45 PM, 'Pascal Jasmin' via Programming < > >> [email protected]> wrote: > >> > >> > running qrun in a single session hangs. One semi-solution that > >> sometimes > >> > works is to then launch another session (jqt or jconsole) and run > qrun, > >> > which will unhang the original session. If both sessions are hung, > >> > launching a 3rd session may unfreeze them. > >> > > >> > A single run of 99 99 x does not always work. My initial claim that > >> first > >> > runs always worked was based on using a tasks number lower than the > >> > hardware SMT capabilities. > >> > > >> > after clean start in jconsole > >> > > >> > qrun 99 99 2 > >> > > >> > hangs at > >> > > >> > "end task: 98" > >> > > >> > since this fails, I'm not trying the 5# or 10# version. > >> > > >> > with the above hanged, doing the same run in jqt, in this case, > >> > > >> > failed to unhang jconsole > >> > > >> > hangs at "end task: 13" > >> > ________________________________ > >> > > >> > From: Eric Iverson <[email protected]> > >> > To: Programming forum <[email protected]> > >> > Sent: Wednesday, October 4, 2017 12:28 PM > >> > Subject: Re: [Jprogramming] jcs/zmq addons updated > >> > > >> > > >> > > >> > I am confused by your message. > >> > > >> > Are you trying to run qrun at the same time in different J sessions? > >> This > >> > will definitely not work and is not the intended use for qrun. > >> > > >> > We need to narrow down to a simple case that fails. > >> > > >> > You indicate you get failures in jconsole, so let's focus on that. > >> > > >> > I thought you had indicated that a single run always worked and that > the > >> > problem only occurred in repeated runs. If that is correct, then your > >> test > >> > must be something like the example I gave: qrun each 10#<99 99 2. > >> > > >> > Please give me the exact steps that fail and how it fails. > >> > > >> > For example: > >> > 1. clean system start > >> > 2. start jconsole > >> > 3. load'~addons/net/jcs/jcs.ijs' > >> > 4. load'~addons/net/jcs/qrun.ijs' > >> > 5. qrun each 10#<99 99 2 > >> > 6. what happens? > >> > > >> > > >> > On Wed, Oct 4, 2017 at 12:01 PM, 'Pascal Jasmin' via Programming < > >> > [email protected]> wrote: > >> > > >> > > I also had the avast virus chest issue, reran tests with shields > >> > disabled, > >> > > after restart. > >> > > > >> > > > >> > > qrun 99 99 2 is the main test I've used. Though 99 11 has more > >> success > >> > (I > >> > > have 6 core 12 hyperthread AMD Ryzen processor), it still fails. > >> > > > >> > > the tests also fail in jconsole. There is "forward momentum" > >> interaction > >> > > between jqt and jconsole sessions running the same qrun parameters. > >> > > > >> > > I've tried the following modifications to kill__ > >> > > > >> > > > >> > > kill=: 3 : 0 > >> > > access=: su > >> > > runa'exit 0' > >> > > destroy'' > >> > > killp PORT > >> > > if. IFQT do. wd 'msgs' end. > >> > > i.0 0 > >> > > ) > >> > > > >> > > though these modifications have no to potentially slightly worse > >> "getting > >> > > through" performance. > >> > > > >> > > > >> > > Engine: j806/j64avx/windows > >> > > Beta-6: commercial/2017-09-26T14:05:48 > >> > > Library: 8.06.07 > >> > > Qt IDE: 1.6.1/5.6.3 > >> > > Platform: Win 64 > >> > > Installer: J806 install > >> > > InstallPath: d:/j64-806 > >> > > > >> > > ________________________________ > >> > > From: Eric Iverson <[email protected]> > >> > > To: Programming forum <[email protected]> > >> > > Sent: Wednesday, October 4, 2017 10:39 AM > >> > > Subject: Re: [Jprogramming] jcs/zmq addons updated > >> > > > >> > > > >> > > > >> > > Pascal (qrun), > >> > > > >> > > I have run many tests on windows. The tests always run clean with > >> > jconsole > >> > > and JHS. There have been a few hiccups with Jqt. A few hangs as you > >> > > describe and one crash where avast put jqt.exe in its virus chest. > >> > > > >> > > Jqt is probably fine vs qrun but that is the only place I have seen > >> > > problems with the latest code changes. A possible suspicion is > >> wd'msgs'. > >> > I > >> > > can't imagine why running a new Jqt session with qrun would have the > >> > effect > >> > > you describe, > >> > > > >> > > Remember that the linger bug was fixed and so things run more > reliably > >> > than > >> > > in your tests with the first release. > >> > > > >> > > Please do the following: > >> > > 1. let us know exactly what test you run (I use: qrun each 5#<99 99 > 2) > >> > > 2. ensure you have the latest base, net, and qtide > >> > > 3. run your tests in jconsole or JHS until you have a failure or are > >> > > satisfied > >> > > 4. run your tests in Jqt > >> > > 5. let us know your findings > >> > > > >> > > > >> > > On Wed, Oct 4, 2017 at 8:58 AM, 'Pascal Jasmin' via Programming < > >> > > [email protected]> wrote: > >> > > > >> > > > was running with 1e2. > >> > > > > >> > > > The reason the different sessions were unblocking each other is > that > >> > they > >> > > > were using the same ports. (as best as I can guess). > >> > > > > >> > > > qrun hard codes the start addresses. > >> > > > > >> > > > > >> > > > > >> > > > ________________________________ > >> > > > From: bill lam <[email protected]> > >> > > > To: Programming forum <[email protected]> > >> > > > Sent: Tuesday, October 3, 2017 10:55 PM > >> > > > Subject: Re: [Jprogramming] jcs/zmq addons updated > >> > > > > >> > > > > >> > > > > >> > > > Let's take out the memory constraint factor first, say qrun with > >> > sentence > >> > > > 1e3. I am not sure running in different jqt instances is a good > idea > >> > > since > >> > > > the range of 100 ports used by jcs is hardcoded and are the same > for > >> > each > >> > > > jqt. > >> > > > > >> > > > On Oct 4, 2017 10:41 AM, "'Pascal Jasmin' via Programming" < > >> > > > [email protected]> wrote: > >> > > > > >> > > > in a 4th jqt session, yes it hung on first run, though pretty far > >> in. > >> > > > > >> > > > I started getting memory errors (without hanging), at 80 80, and > 22 > >> 22. > >> > > I > >> > > > have 4 hung jqt sessions now, but any new one lets the others > >> progress. > >> > > > Task manager reports very low memory use. > >> > > > > >> > > > 99 11 finishes just fine. It seems that in order to unblock > another > >> > > > session, the tasks attempted have to number the same as in the > >> blocked > >> > > > session, and it has to make it up to (near) the blocked task > number. > >> > > > > >> > > > ________________________________ > >> > > > From: bill lam <[email protected]> > >> > > > To: Programming forum <[email protected]> > >> > > > Sent: Tuesday, October 3, 2017 10:06 PM > >> > > > Subject: Re: [Jprogramming] jcs/zmq addons updated > >> > > > > >> > > > > >> > > > > >> > > > Did qrun 99 99 hang in the first run? > >> > > > > >> > > > > >> > > > On Oct 4, 2017 9:16 AM, "'Pascal Jasmin' via Programming" < > >> > > > [email protected]> wrote: > >> > > > > >> > > > > qrun still hangs for me. Never on the first run though. In 5 > of > >> 6 > >> > > > tries, > >> > > > > it hangs on the 3rd run. On other it hanged on 2nd run. 3rd > >> parameter > >> > > > > always 6. > >> > > > > > >> > > > > I don't think I ever breeched memory/swap issues in these or > >> previous > >> > > > > tests. > >> > > > > > >> > > > > I found a way to unhang it though. > >> > > > > > >> > > > > start 2nd jqt session, and run qrun in it. It may hang, but > other > >> > > > session > >> > > > > will unfreeze. If it did hang, then repeat in other session > until > >> > both > >> > > > > unfrozen. Though, doing this enough can result in both sessions > >> > frozen > >> > > > > (especially if using uneven task balances)... A 3rd jqt session > to > >> > the > >> > > > > rescue of both frozen ones. > >> > > > > > >> > > > > > >> > > > > > >> > > > > the show command and immediate jqt console output is a nice > >> change. > >> > > > > > >> > > > > > >> > > > > > >> > > > > ________________________________ > >> > > > > From: Eric Iverson <[email protected]> > >> > > > > To: Programming forum <[email protected]> > >> > > > > Sent: Tuesday, October 3, 2017 5:41 PM > >> > > > > Subject: [Jprogramming] jcs/zmq addons updated > >> > > > > > >> > > > > > >> > > > > > >> > > > > A few cosmetic changes and perhaps fixes for qrun and related > task > >> > > > > problems. > >> > > > > > >> > > > > > >> > > > > Note: qrun now defined in jcs/qrun.ijs > >> > > > > > >> > > > > > >> > > > > The main problem was that a task ending could have a delayed > >> close of > >> > > the > >> > > > > > >> > > > > associated socket port and this could, depending on timing, > >> prevent > >> > the > >> > > > > > >> > > > > proper start of the next task trying to use the same port. > >> > > > > > >> > > > > > >> > > > > The jcs sockets now set LINGER 0. This should avoid that class > of > >> > > > problem. > >> > > > > > >> > > > > Initial stress tests all run clean on Linux and Windows. > >> > > > > > >> > > > > > >> > > > > The other problem was that a server errror in qrun caused a > hang. > >> > This > >> > > > > > >> > > > > wouldn't happen normally if the jobs were well defined and ran > to > >> > > > > > >> > > > > completion. A way to trigger the qrun server error in Windows > was > >> to > >> > > run > >> > > > a > >> > > > > > >> > > > > large number of tasks with large (memory consumption) jobs. This > >> > could > >> > > > > > >> > > > > exhaust windows swap memory and get an out-of-memory error. > >> > > > > > >> > > > > > >> > > > > qrun now catches the server error, reports the lse error, and > >> > > continues. > >> > > > > > >> > > > > ------------------------------------------------------------ > >> > ---------- > >> > > > > > >> > > > > For information about J forums see http://www.jsoftware.com/ > >> > forums.htm > >> > >> > > >> > > > >> > > > > >> > > > > >> > > > > ------------------------------------------------------------ > >> > ---------- > >> > > > > For information about J forums see http://www.jsoftware.com/ > >> > forums.htm > >> > > > ------------------------------------------------------------ > >> ---------- > >> > > > For information about J forums see http://www.jsoftware.com/forum > >> s.htm > >> > > > ------------------------------------------------------------ > >> ---------- > >> > > > For information about J forums see http://www.jsoftware.com/forum > >> s.htm > >> > > > ------------------------------------------------------------ > >> ---------- > >> > > > For information about J forums see http://www.jsoftware.com/forum > >> s.htm > >> > > > ------------------------------------------------------------ > >> ---------- > >> > > > For information about J forums see http://www.jsoftware.com/forum > >> s.htm > >> > > > > >> > > ------------------------------------------------------------ > >> ---------- > >> > > For information about J forums see http://www.jsoftware.com/forum > >> s.htm > >> > > ------------------------------------------------------------ > >> ---------- > >> > > For information about J forums see http://www.jsoftware.com/forum > >> s.htm > >> > > > >> > ------------------------------------------------------------ > ---------- > >> > For information about J forums see http://www.jsoftware.com/ > forums.htm > >> > ------------------------------------------------------------ > ---------- > >> > For information about J forums see http://www.jsoftware.com/ > forums.htm > >> > > >> ---------------------------------------------------------------------- > >> For information about J forums see http://www.jsoftware.com/forums.htm > >> ---------------------------------------------------------------------- > >> For information about J forums see http://www.jsoftware.com/forums.htm > >> > > > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
