The break call stack in your last report is interesting. Is this a break when it was hung? If it was a break when it was running normally, then it is just normal.
If when hung, it indicates the client is waiting (zmq_poll) for a state change in one of the servers. In this case the tasklist/taskkill from the previous message would be of interest. On Wed, Oct 4, 2017 at 1:54 PM, Eric Iverson <[email protected]> wrote: > Thanks for the additional info. No insight yet, > > Please try the following: > 1. clean system > 2. start jconsole > 3. 2!:6'' NB. pid for future reference > 4. qrun until hung > 5. windows command window: > tasklist /FI "imagename eq jconsole.exe" > 6. above should list the pid from earlier - and perhaps other pids > 7. windows command window: > taskkill /PID nnnnn - where nnnn is NOT the pid reported by 2!:6'' > 8. this should let the main task run again and perhaps give more info > > I would expect this to > > > On Wed, Oct 4, 2017 at 1:35 PM, 'Pascal Jasmin' via Programming < > [email protected]> wrote: > >> sometimes disabling smt in bios can increase performance or avoid such >> problems ( I didn't do this, but ran with 5 threads ie < cores) >> >> following sequence, >> >> jconsole >> 99 5 2 fine >> 99 5 3 fine >> 99 5 4 hangs at "end task 12" >> ctrl c no immediate result >> >> jqt, >> 99 5 2 finishes, and jconsole unhangs to produce ctrl c output: >> >> |break: cdx >> | r[check _1~:>{.r=.x cdx y >> >> >> in jqt, rerunning 99 5 3 several times on 4th try with debug (ctrl-k) on, >> hangs >> >> in jconsole (attempt to unblock jqt) >> >> qrun 99 5 2 >> |port already in use in this task: assert >> | 'port already in use in this task' assert-.port e.>1{"1 jcs'' >> this error never occurred before, (when debug in jqt wasn't on). >> >> ________________________________ >> From: Eric Iverson <[email protected]> >> To: Programming forum <[email protected]> >> Sent: Wednesday, October 4, 2017 12:58 PM >> Subject: Re: [Jprogramming] jcs/zmq addons updated >> >> >> >> Thanks for clarifying things. >> >> On your system, in a clean state, jconsole qrun 99 99 2 hangs. >> >> When you have the clean state hang in jconsole, please try ctrl+c (if you >> have not already done so) as this should break out of some socket hangs. >> If >> this breaks, it would provide important info. >> >> It would be useful if you could get the hang with smaller args. For >> example, can you get the hang with: qrun each 10#40 4 2 >> >> Unfortunately I can not reproduce this on my windows system. I can loop >> through 100s of this test without problem. Also on Linux and OSX. >> >> On Wed, Oct 4, 2017 at 12:45 PM, 'Pascal Jasmin' via Programming < >> [email protected]> wrote: >> >> > running qrun in a single session hangs. One semi-solution that >> sometimes >> > works is to then launch another session (jqt or jconsole) and run qrun, >> > which will unhang the original session. If both sessions are hung, >> > launching a 3rd session may unfreeze them. >> > >> > A single run of 99 99 x does not always work. My initial claim that >> first >> > runs always worked was based on using a tasks number lower than the >> > hardware SMT capabilities. >> > >> > after clean start in jconsole >> > >> > qrun 99 99 2 >> > >> > hangs at >> > >> > "end task: 98" >> > >> > since this fails, I'm not trying the 5# or 10# version. >> > >> > with the above hanged, doing the same run in jqt, in this case, >> > >> > failed to unhang jconsole >> > >> > hangs at "end task: 13" >> > ________________________________ >> > >> > From: Eric Iverson <[email protected]> >> > To: Programming forum <[email protected]> >> > Sent: Wednesday, October 4, 2017 12:28 PM >> > Subject: Re: [Jprogramming] jcs/zmq addons updated >> > >> > >> > >> > I am confused by your message. >> > >> > Are you trying to run qrun at the same time in different J sessions? >> This >> > will definitely not work and is not the intended use for qrun. >> > >> > We need to narrow down to a simple case that fails. >> > >> > You indicate you get failures in jconsole, so let's focus on that. >> > >> > I thought you had indicated that a single run always worked and that the >> > problem only occurred in repeated runs. If that is correct, then your >> test >> > must be something like the example I gave: qrun each 10#<99 99 2. >> > >> > Please give me the exact steps that fail and how it fails. >> > >> > For example: >> > 1. clean system start >> > 2. start jconsole >> > 3. load'~addons/net/jcs/jcs.ijs' >> > 4. load'~addons/net/jcs/qrun.ijs' >> > 5. qrun each 10#<99 99 2 >> > 6. what happens? >> > >> > >> > On Wed, Oct 4, 2017 at 12:01 PM, 'Pascal Jasmin' via Programming < >> > [email protected]> wrote: >> > >> > > I also had the avast virus chest issue, reran tests with shields >> > disabled, >> > > after restart. >> > > >> > > >> > > qrun 99 99 2 is the main test I've used. Though 99 11 has more >> success >> > (I >> > > have 6 core 12 hyperthread AMD Ryzen processor), it still fails. >> > > >> > > the tests also fail in jconsole. There is "forward momentum" >> interaction >> > > between jqt and jconsole sessions running the same qrun parameters. >> > > >> > > I've tried the following modifications to kill__ >> > > >> > > >> > > kill=: 3 : 0 >> > > access=: su >> > > runa'exit 0' >> > > destroy'' >> > > killp PORT >> > > if. IFQT do. wd 'msgs' end. >> > > i.0 0 >> > > ) >> > > >> > > though these modifications have no to potentially slightly worse >> "getting >> > > through" performance. >> > > >> > > >> > > Engine: j806/j64avx/windows >> > > Beta-6: commercial/2017-09-26T14:05:48 >> > > Library: 8.06.07 >> > > Qt IDE: 1.6.1/5.6.3 >> > > Platform: Win 64 >> > > Installer: J806 install >> > > InstallPath: d:/j64-806 >> > > >> > > ________________________________ >> > > From: Eric Iverson <[email protected]> >> > > To: Programming forum <[email protected]> >> > > Sent: Wednesday, October 4, 2017 10:39 AM >> > > Subject: Re: [Jprogramming] jcs/zmq addons updated >> > > >> > > >> > > >> > > Pascal (qrun), >> > > >> > > I have run many tests on windows. The tests always run clean with >> > jconsole >> > > and JHS. There have been a few hiccups with Jqt. A few hangs as you >> > > describe and one crash where avast put jqt.exe in its virus chest. >> > > >> > > Jqt is probably fine vs qrun but that is the only place I have seen >> > > problems with the latest code changes. A possible suspicion is >> wd'msgs'. >> > I >> > > can't imagine why running a new Jqt session with qrun would have the >> > effect >> > > you describe, >> > > >> > > Remember that the linger bug was fixed and so things run more reliably >> > than >> > > in your tests with the first release. >> > > >> > > Please do the following: >> > > 1. let us know exactly what test you run (I use: qrun each 5#<99 99 2) >> > > 2. ensure you have the latest base, net, and qtide >> > > 3. run your tests in jconsole or JHS until you have a failure or are >> > > satisfied >> > > 4. run your tests in Jqt >> > > 5. let us know your findings >> > > >> > > >> > > On Wed, Oct 4, 2017 at 8:58 AM, 'Pascal Jasmin' via Programming < >> > > [email protected]> wrote: >> > > >> > > > was running with 1e2. >> > > > >> > > > The reason the different sessions were unblocking each other is that >> > they >> > > > were using the same ports. (as best as I can guess). >> > > > >> > > > qrun hard codes the start addresses. >> > > > >> > > > >> > > > >> > > > ________________________________ >> > > > From: bill lam <[email protected]> >> > > > To: Programming forum <[email protected]> >> > > > Sent: Tuesday, October 3, 2017 10:55 PM >> > > > Subject: Re: [Jprogramming] jcs/zmq addons updated >> > > > >> > > > >> > > > >> > > > Let's take out the memory constraint factor first, say qrun with >> > sentence >> > > > 1e3. I am not sure running in different jqt instances is a good idea >> > > since >> > > > the range of 100 ports used by jcs is hardcoded and are the same for >> > each >> > > > jqt. >> > > > >> > > > On Oct 4, 2017 10:41 AM, "'Pascal Jasmin' via Programming" < >> > > > [email protected]> wrote: >> > > > >> > > > in a 4th jqt session, yes it hung on first run, though pretty far >> in. >> > > > >> > > > I started getting memory errors (without hanging), at 80 80, and 22 >> 22. >> > > I >> > > > have 4 hung jqt sessions now, but any new one lets the others >> progress. >> > > > Task manager reports very low memory use. >> > > > >> > > > 99 11 finishes just fine. It seems that in order to unblock another >> > > > session, the tasks attempted have to number the same as in the >> blocked >> > > > session, and it has to make it up to (near) the blocked task number. >> > > > >> > > > ________________________________ >> > > > From: bill lam <[email protected]> >> > > > To: Programming forum <[email protected]> >> > > > Sent: Tuesday, October 3, 2017 10:06 PM >> > > > Subject: Re: [Jprogramming] jcs/zmq addons updated >> > > > >> > > > >> > > > >> > > > Did qrun 99 99 hang in the first run? >> > > > >> > > > >> > > > On Oct 4, 2017 9:16 AM, "'Pascal Jasmin' via Programming" < >> > > > [email protected]> wrote: >> > > > >> > > > > qrun still hangs for me. Never on the first run though. In 5 of >> 6 >> > > > tries, >> > > > > it hangs on the 3rd run. On other it hanged on 2nd run. 3rd >> parameter >> > > > > always 6. >> > > > > >> > > > > I don't think I ever breeched memory/swap issues in these or >> previous >> > > > > tests. >> > > > > >> > > > > I found a way to unhang it though. >> > > > > >> > > > > start 2nd jqt session, and run qrun in it. It may hang, but other >> > > > session >> > > > > will unfreeze. If it did hang, then repeat in other session until >> > both >> > > > > unfrozen. Though, doing this enough can result in both sessions >> > frozen >> > > > > (especially if using uneven task balances)... A 3rd jqt session to >> > the >> > > > > rescue of both frozen ones. >> > > > > >> > > > > >> > > > > >> > > > > the show command and immediate jqt console output is a nice >> change. >> > > > > >> > > > > >> > > > > >> > > > > ________________________________ >> > > > > From: Eric Iverson <[email protected]> >> > > > > To: Programming forum <[email protected]> >> > > > > Sent: Tuesday, October 3, 2017 5:41 PM >> > > > > Subject: [Jprogramming] jcs/zmq addons updated >> > > > > >> > > > > >> > > > > >> > > > > A few cosmetic changes and perhaps fixes for qrun and related task >> > > > > problems. >> > > > > >> > > > > >> > > > > Note: qrun now defined in jcs/qrun.ijs >> > > > > >> > > > > >> > > > > The main problem was that a task ending could have a delayed >> close of >> > > the >> > > > > >> > > > > associated socket port and this could, depending on timing, >> prevent >> > the >> > > > > >> > > > > proper start of the next task trying to use the same port. >> > > > > >> > > > > >> > > > > The jcs sockets now set LINGER 0. This should avoid that class of >> > > > problem. >> > > > > >> > > > > Initial stress tests all run clean on Linux and Windows. >> > > > > >> > > > > >> > > > > The other problem was that a server errror in qrun caused a hang. >> > This >> > > > > >> > > > > wouldn't happen normally if the jobs were well defined and ran to >> > > > > >> > > > > completion. A way to trigger the qrun server error in Windows was >> to >> > > run >> > > > a >> > > > > >> > > > > large number of tasks with large (memory consumption) jobs. This >> > could >> > > > > >> > > > > exhaust windows swap memory and get an out-of-memory error. >> > > > > >> > > > > >> > > > > qrun now catches the server error, reports the lse error, and >> > > continues. >> > > > > >> > > > > ------------------------------------------------------------ >> > ---------- >> > > > > >> > > > > For information about J forums see http://www.jsoftware.com/ >> > forums.htm >> >> > >> > > >> > > > >> > > > >> > > > > ------------------------------------------------------------ >> > ---------- >> > > > > For information about J forums see http://www.jsoftware.com/ >> > forums.htm >> > > > ------------------------------------------------------------ >> ---------- >> > > > For information about J forums see http://www.jsoftware.com/forum >> s.htm >> > > > ------------------------------------------------------------ >> ---------- >> > > > For information about J forums see http://www.jsoftware.com/forum >> s.htm >> > > > ------------------------------------------------------------ >> ---------- >> > > > For information about J forums see http://www.jsoftware.com/forum >> s.htm >> > > > ------------------------------------------------------------ >> ---------- >> > > > For information about J forums see http://www.jsoftware.com/forum >> s.htm >> > > > >> > > ------------------------------------------------------------ >> ---------- >> > > For information about J forums see http://www.jsoftware.com/forum >> s.htm >> > > ------------------------------------------------------------ >> ---------- >> > > For information about J forums see http://www.jsoftware.com/forum >> s.htm >> > > >> > ---------------------------------------------------------------------- >> > For information about J forums see http://www.jsoftware.com/forums.htm >> > ---------------------------------------------------------------------- >> > For information about J forums see http://www.jsoftware.com/forums.htm >> > >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> > > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
