Pascal, Please edit qrun to have the following 2 additional lines:
'reads writes errors'=. poll_jcs_ 180000;'';<tasks assert 1<:(#reads)+#writes assert 0=#errors for_n. writes do. On Wed, Oct 4, 2017 at 1:59 PM, Eric Iverson <[email protected]> wrote: > The break call stack in your last report is interesting. > > Is this a break when it was hung? If it was a break when it was running > normally, then it is just normal. > > If when hung, it indicates the client is waiting (zmq_poll) for a state > change in one of the servers. In this case the tasklist/taskkill from the > previous message would be of interest. > > > > On Wed, Oct 4, 2017 at 1:54 PM, Eric Iverson <[email protected]> > wrote: > >> Thanks for the additional info. No insight yet, >> >> Please try the following: >> 1. clean system >> 2. start jconsole >> 3. 2!:6'' NB. pid for future reference >> 4. qrun until hung >> 5. windows command window: >> tasklist /FI "imagename eq jconsole.exe" >> 6. above should list the pid from earlier - and perhaps other pids >> 7. windows command window: >> taskkill /PID nnnnn - where nnnn is NOT the pid reported by 2!:6'' >> 8. this should let the main task run again and perhaps give more info >> >> I would expect this to >> >> >> On Wed, Oct 4, 2017 at 1:35 PM, 'Pascal Jasmin' via Programming < >> [email protected]> wrote: >> >>> sometimes disabling smt in bios can increase performance or avoid such >>> problems ( I didn't do this, but ran with 5 threads ie < cores) >>> >>> following sequence, >>> >>> jconsole >>> 99 5 2 fine >>> 99 5 3 fine >>> 99 5 4 hangs at "end task 12" >>> ctrl c no immediate result >>> >>> jqt, >>> 99 5 2 finishes, and jconsole unhangs to produce ctrl c output: >>> >>> |break: cdx >>> | r[check _1~:>{.r=.x cdx y >>> >>> >>> in jqt, rerunning 99 5 3 several times on 4th try with debug (ctrl-k) >>> on, hangs >>> >>> in jconsole (attempt to unblock jqt) >>> >>> qrun 99 5 2 >>> |port already in use in this task: assert >>> | 'port already in use in this task' assert-.port e.>1{"1 jcs'' >>> this error never occurred before, (when debug in jqt wasn't on). >>> >>> ________________________________ >>> From: Eric Iverson <[email protected]> >>> To: Programming forum <[email protected]> >>> Sent: Wednesday, October 4, 2017 12:58 PM >>> Subject: Re: [Jprogramming] jcs/zmq addons updated >>> >>> >>> >>> Thanks for clarifying things. >>> >>> On your system, in a clean state, jconsole qrun 99 99 2 hangs. >>> >>> When you have the clean state hang in jconsole, please try ctrl+c (if you >>> have not already done so) as this should break out of some socket hangs. >>> If >>> this breaks, it would provide important info. >>> >>> It would be useful if you could get the hang with smaller args. For >>> example, can you get the hang with: qrun each 10#40 4 2 >>> >>> Unfortunately I can not reproduce this on my windows system. I can loop >>> through 100s of this test without problem. Also on Linux and OSX. >>> >>> On Wed, Oct 4, 2017 at 12:45 PM, 'Pascal Jasmin' via Programming < >>> [email protected]> wrote: >>> >>> > running qrun in a single session hangs. One semi-solution that >>> sometimes >>> > works is to then launch another session (jqt or jconsole) and run qrun, >>> > which will unhang the original session. If both sessions are hung, >>> > launching a 3rd session may unfreeze them. >>> > >>> > A single run of 99 99 x does not always work. My initial claim that >>> first >>> > runs always worked was based on using a tasks number lower than the >>> > hardware SMT capabilities. >>> > >>> > after clean start in jconsole >>> > >>> > qrun 99 99 2 >>> > >>> > hangs at >>> > >>> > "end task: 98" >>> > >>> > since this fails, I'm not trying the 5# or 10# version. >>> > >>> > with the above hanged, doing the same run in jqt, in this case, >>> > >>> > failed to unhang jconsole >>> > >>> > hangs at "end task: 13" >>> > ________________________________ >>> > >>> > From: Eric Iverson <[email protected]> >>> > To: Programming forum <[email protected]> >>> > Sent: Wednesday, October 4, 2017 12:28 PM >>> > Subject: Re: [Jprogramming] jcs/zmq addons updated >>> > >>> > >>> > >>> > I am confused by your message. >>> > >>> > Are you trying to run qrun at the same time in different J sessions? >>> This >>> > will definitely not work and is not the intended use for qrun. >>> > >>> > We need to narrow down to a simple case that fails. >>> > >>> > You indicate you get failures in jconsole, so let's focus on that. >>> > >>> > I thought you had indicated that a single run always worked and that >>> the >>> > problem only occurred in repeated runs. If that is correct, then your >>> test >>> > must be something like the example I gave: qrun each 10#<99 99 2. >>> > >>> > Please give me the exact steps that fail and how it fails. >>> > >>> > For example: >>> > 1. clean system start >>> > 2. start jconsole >>> > 3. load'~addons/net/jcs/jcs.ijs' >>> > 4. load'~addons/net/jcs/qrun.ijs' >>> > 5. qrun each 10#<99 99 2 >>> > 6. what happens? >>> > >>> > >>> > On Wed, Oct 4, 2017 at 12:01 PM, 'Pascal Jasmin' via Programming < >>> > [email protected]> wrote: >>> > >>> > > I also had the avast virus chest issue, reran tests with shields >>> > disabled, >>> > > after restart. >>> > > >>> > > >>> > > qrun 99 99 2 is the main test I've used. Though 99 11 has more >>> success >>> > (I >>> > > have 6 core 12 hyperthread AMD Ryzen processor), it still fails. >>> > > >>> > > the tests also fail in jconsole. There is "forward momentum" >>> interaction >>> > > between jqt and jconsole sessions running the same qrun parameters. >>> > > >>> > > I've tried the following modifications to kill__ >>> > > >>> > > >>> > > kill=: 3 : 0 >>> > > access=: su >>> > > runa'exit 0' >>> > > destroy'' >>> > > killp PORT >>> > > if. IFQT do. wd 'msgs' end. >>> > > i.0 0 >>> > > ) >>> > > >>> > > though these modifications have no to potentially slightly worse >>> "getting >>> > > through" performance. >>> > > >>> > > >>> > > Engine: j806/j64avx/windows >>> > > Beta-6: commercial/2017-09-26T14:05:48 >>> > > Library: 8.06.07 >>> > > Qt IDE: 1.6.1/5.6.3 >>> > > Platform: Win 64 >>> > > Installer: J806 install >>> > > InstallPath: d:/j64-806 >>> > > >>> > > ________________________________ >>> > > From: Eric Iverson <[email protected]> >>> > > To: Programming forum <[email protected]> >>> > > Sent: Wednesday, October 4, 2017 10:39 AM >>> > > Subject: Re: [Jprogramming] jcs/zmq addons updated >>> > > >>> > > >>> > > >>> > > Pascal (qrun), >>> > > >>> > > I have run many tests on windows. The tests always run clean with >>> > jconsole >>> > > and JHS. There have been a few hiccups with Jqt. A few hangs as you >>> > > describe and one crash where avast put jqt.exe in its virus chest. >>> > > >>> > > Jqt is probably fine vs qrun but that is the only place I have seen >>> > > problems with the latest code changes. A possible suspicion is >>> wd'msgs'. >>> > I >>> > > can't imagine why running a new Jqt session with qrun would have the >>> > effect >>> > > you describe, >>> > > >>> > > Remember that the linger bug was fixed and so things run more >>> reliably >>> > than >>> > > in your tests with the first release. >>> > > >>> > > Please do the following: >>> > > 1. let us know exactly what test you run (I use: qrun each 5#<99 99 >>> 2) >>> > > 2. ensure you have the latest base, net, and qtide >>> > > 3. run your tests in jconsole or JHS until you have a failure or are >>> > > satisfied >>> > > 4. run your tests in Jqt >>> > > 5. let us know your findings >>> > > >>> > > >>> > > On Wed, Oct 4, 2017 at 8:58 AM, 'Pascal Jasmin' via Programming < >>> > > [email protected]> wrote: >>> > > >>> > > > was running with 1e2. >>> > > > >>> > > > The reason the different sessions were unblocking each other is >>> that >>> > they >>> > > > were using the same ports. (as best as I can guess). >>> > > > >>> > > > qrun hard codes the start addresses. >>> > > > >>> > > > >>> > > > >>> > > > ________________________________ >>> > > > From: bill lam <[email protected]> >>> > > > To: Programming forum <[email protected]> >>> > > > Sent: Tuesday, October 3, 2017 10:55 PM >>> > > > Subject: Re: [Jprogramming] jcs/zmq addons updated >>> > > > >>> > > > >>> > > > >>> > > > Let's take out the memory constraint factor first, say qrun with >>> > sentence >>> > > > 1e3. I am not sure running in different jqt instances is a good >>> idea >>> > > since >>> > > > the range of 100 ports used by jcs is hardcoded and are the same >>> for >>> > each >>> > > > jqt. >>> > > > >>> > > > On Oct 4, 2017 10:41 AM, "'Pascal Jasmin' via Programming" < >>> > > > [email protected]> wrote: >>> > > > >>> > > > in a 4th jqt session, yes it hung on first run, though pretty far >>> in. >>> > > > >>> > > > I started getting memory errors (without hanging), at 80 80, and >>> 22 22. >>> > > I >>> > > > have 4 hung jqt sessions now, but any new one lets the others >>> progress. >>> > > > Task manager reports very low memory use. >>> > > > >>> > > > 99 11 finishes just fine. It seems that in order to unblock >>> another >>> > > > session, the tasks attempted have to number the same as in the >>> blocked >>> > > > session, and it has to make it up to (near) the blocked task >>> number. >>> > > > >>> > > > ________________________________ >>> > > > From: bill lam <[email protected]> >>> > > > To: Programming forum <[email protected]> >>> > > > Sent: Tuesday, October 3, 2017 10:06 PM >>> > > > Subject: Re: [Jprogramming] jcs/zmq addons updated >>> > > > >>> > > > >>> > > > >>> > > > Did qrun 99 99 hang in the first run? >>> > > > >>> > > > >>> > > > On Oct 4, 2017 9:16 AM, "'Pascal Jasmin' via Programming" < >>> > > > [email protected]> wrote: >>> > > > >>> > > > > qrun still hangs for me. Never on the first run though. In 5 >>> of 6 >>> > > > tries, >>> > > > > it hangs on the 3rd run. On other it hanged on 2nd run. 3rd >>> parameter >>> > > > > always 6. >>> > > > > >>> > > > > I don't think I ever breeched memory/swap issues in these or >>> previous >>> > > > > tests. >>> > > > > >>> > > > > I found a way to unhang it though. >>> > > > > >>> > > > > start 2nd jqt session, and run qrun in it. It may hang, but >>> other >>> > > > session >>> > > > > will unfreeze. If it did hang, then repeat in other session >>> until >>> > both >>> > > > > unfrozen. Though, doing this enough can result in both sessions >>> > frozen >>> > > > > (especially if using uneven task balances)... A 3rd jqt session >>> to >>> > the >>> > > > > rescue of both frozen ones. >>> > > > > >>> > > > > >>> > > > > >>> > > > > the show command and immediate jqt console output is a nice >>> change. >>> > > > > >>> > > > > >>> > > > > >>> > > > > ________________________________ >>> > > > > From: Eric Iverson <[email protected]> >>> > > > > To: Programming forum <[email protected]> >>> > > > > Sent: Tuesday, October 3, 2017 5:41 PM >>> > > > > Subject: [Jprogramming] jcs/zmq addons updated >>> > > > > >>> > > > > >>> > > > > >>> > > > > A few cosmetic changes and perhaps fixes for qrun and related >>> task >>> > > > > problems. >>> > > > > >>> > > > > >>> > > > > Note: qrun now defined in jcs/qrun.ijs >>> > > > > >>> > > > > >>> > > > > The main problem was that a task ending could have a delayed >>> close of >>> > > the >>> > > > > >>> > > > > associated socket port and this could, depending on timing, >>> prevent >>> > the >>> > > > > >>> > > > > proper start of the next task trying to use the same port. >>> > > > > >>> > > > > >>> > > > > The jcs sockets now set LINGER 0. This should avoid that class of >>> > > > problem. >>> > > > > >>> > > > > Initial stress tests all run clean on Linux and Windows. >>> > > > > >>> > > > > >>> > > > > The other problem was that a server errror in qrun caused a hang. >>> > This >>> > > > > >>> > > > > wouldn't happen normally if the jobs were well defined and ran to >>> > > > > >>> > > > > completion. A way to trigger the qrun server error in Windows >>> was to >>> > > run >>> > > > a >>> > > > > >>> > > > > large number of tasks with large (memory consumption) jobs. This >>> > could >>> > > > > >>> > > > > exhaust windows swap memory and get an out-of-memory error. >>> > > > > >>> > > > > >>> > > > > qrun now catches the server error, reports the lse error, and >>> > > continues. >>> > > > > >>> > > > > ------------------------------------------------------------ >>> > ---------- >>> > > > > >>> > > > > For information about J forums see http://www.jsoftware.com/ >>> > forums.htm >>> >>> > >>> > > >>> > > > >>> > > > >>> > > > > ------------------------------------------------------------ >>> > ---------- >>> > > > > For information about J forums see http://www.jsoftware.com/ >>> > forums.htm >>> > > > ------------------------------------------------------------ >>> ---------- >>> > > > For information about J forums see http://www.jsoftware.com/forum >>> s.htm >>> > > > ------------------------------------------------------------ >>> ---------- >>> > > > For information about J forums see http://www.jsoftware.com/forum >>> s.htm >>> > > > ------------------------------------------------------------ >>> ---------- >>> > > > For information about J forums see http://www.jsoftware.com/forum >>> s.htm >>> > > > ------------------------------------------------------------ >>> ---------- >>> > > > For information about J forums see http://www.jsoftware.com/forum >>> s.htm >>> > > > >>> > > ------------------------------------------------------------ >>> ---------- >>> > > For information about J forums see http://www.jsoftware.com/forum >>> s.htm >>> > > ------------------------------------------------------------ >>> ---------- >>> > > For information about J forums see http://www.jsoftware.com/forum >>> s.htm >>> > > >>> > ---------------------------------------------------------------------- >>> > For information about J forums see http://www.jsoftware.com/forums.htm >>> > ---------------------------------------------------------------------- >>> > For information about J forums see http://www.jsoftware.com/forums.htm >>> > >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >>> >> >> > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
