Again, thanks for your patience. I have finally stumbled on a way to get a hang on my system. This should make it easier for both of us. I will keep you posted.
On Wed, Oct 4, 2017 at 2:28 PM, 'Pascal Jasmin' via Programming < [email protected]> wrote: > inserted the middle 2 lines, no assert failures, still hangs. (jqt) > > > > > ________________________________ > From: Eric Iverson <[email protected]> > To: Programming forum <[email protected]> > Sent: Wednesday, October 4, 2017 2:14 PM > Subject: Re: [Jprogramming] jcs/zmq addons updated > > > > Pascal, > > Please edit qrun to have the following 2 additional lines: > > 'reads writes errors'=. poll_jcs_ 180000;'';<tasks > assert 1<:(#reads)+#writes > assert 0=#errors > for_n. writes do. > > > > On Wed, Oct 4, 2017 at 1:59 PM, Eric Iverson <[email protected]> > wrote: > > > The break call stack in your last report is interesting. > > > > Is this a break when it was hung? If it was a break when it was running > > normally, then it is just normal. > > > > If when hung, it indicates the client is waiting (zmq_poll) for a state > > change in one of the servers. In this case the tasklist/taskkill from the > > previous message would be of interest. > > > > > > > > On Wed, Oct 4, 2017 at 1:54 PM, Eric Iverson <[email protected]> > > wrote: > > > >> Thanks for the additional info. No insight yet, > >> > >> Please try the following: > >> 1. clean system > >> 2. start jconsole > >> 3. 2!:6'' NB. pid for future reference > >> 4. qrun until hung > >> 5. windows command window: > >> tasklist /FI "imagename eq jconsole.exe" > >> 6. above should list the pid from earlier - and perhaps other pids > >> 7. windows command window: > >> taskkill /PID nnnnn - where nnnn is NOT the pid reported by 2!:6'' > >> 8. this should let the main task run again and perhaps give more info > >> > >> I would expect this to > >> > >> > >> On Wed, Oct 4, 2017 at 1:35 PM, 'Pascal Jasmin' via Programming < > >> [email protected]> wrote: > >> > >>> sometimes disabling smt in bios can increase performance or avoid such > >>> problems ( I didn't do this, but ran with 5 threads ie < cores) > >>> > >>> following sequence, > >>> > >>> jconsole > >>> 99 5 2 fine > >>> 99 5 3 fine > >>> 99 5 4 hangs at "end task 12" > >>> ctrl c no immediate result > >>> > >>> jqt, > >>> 99 5 2 finishes, and jconsole unhangs to produce ctrl c output: > >>> > >>> |break: cdx > >>> | r[check _1~:>{.r=.x cdx y > >>> > >>> > >>> in jqt, rerunning 99 5 3 several times on 4th try with debug (ctrl-k) > >>> on, hangs > >>> > >>> in jconsole (attempt to unblock jqt) > >>> > >>> qrun 99 5 2 > >>> |port already in use in this task: assert > >>> | 'port already in use in this task' assert-.port e.>1{"1 jcs'' > >>> this error never occurred before, (when debug in jqt wasn't on). > >>> > >>> ________________________________ > >>> From: Eric Iverson <[email protected]> > >>> To: Programming forum <[email protected]> > >>> Sent: Wednesday, October 4, 2017 12:58 PM > >>> Subject: Re: [Jprogramming] jcs/zmq addons updated > >>> > >>> > >>> > >>> Thanks for clarifying things. > >>> > >>> On your system, in a clean state, jconsole qrun 99 99 2 hangs. > >>> > >>> When you have the clean state hang in jconsole, please try ctrl+c (if > you > >>> have not already done so) as this should break out of some socket > hangs. > >>> If > >>> this breaks, it would provide important info. > >>> > >>> It would be useful if you could get the hang with smaller args. For > >>> example, can you get the hang with: qrun each 10#40 4 2 > >>> > >>> Unfortunately I can not reproduce this on my windows system. I can loop > >>> through 100s of this test without problem. Also on Linux and OSX. > >>> > >>> On Wed, Oct 4, 2017 at 12:45 PM, 'Pascal Jasmin' via Programming < > >>> [email protected]> wrote: > >>> > >>> > running qrun in a single session hangs. One semi-solution that > >>> sometimes > >>> > works is to then launch another session (jqt or jconsole) and run > qrun, > >>> > which will unhang the original session. If both sessions are hung, > >>> > launching a 3rd session may unfreeze them. > >>> > > >>> > A single run of 99 99 x does not always work. My initial claim that > >>> first > >>> > runs always worked was based on using a tasks number lower than the > >>> > hardware SMT capabilities. > >>> > > >>> > after clean start in jconsole > >>> > > >>> > qrun 99 99 2 > >>> > > >>> > hangs at > >>> > > >>> > "end task: 98" > >>> > > >>> > since this fails, I'm not trying the 5# or 10# version. > >>> > > >>> > with the above hanged, doing the same run in jqt, in this case, > >>> > > >>> > failed to unhang jconsole > >>> > > >>> > hangs at "end task: 13" > >>> > ________________________________ > >>> > > >>> > From: Eric Iverson <[email protected]> > >>> > To: Programming forum <[email protected]> > >>> > Sent: Wednesday, October 4, 2017 12:28 PM > >>> > Subject: Re: [Jprogramming] jcs/zmq addons updated > >>> > > >>> > > >>> > > >>> > I am confused by your message. > >>> > > >>> > Are you trying to run qrun at the same time in different J sessions? > >>> This > >>> > will definitely not work and is not the intended use for qrun. > >>> > > >>> > We need to narrow down to a simple case that fails. > >>> > > >>> > You indicate you get failures in jconsole, so let's focus on that. > >>> > > >>> > I thought you had indicated that a single run always worked and that > >>> the > >>> > problem only occurred in repeated runs. If that is correct, then your > >>> test > >>> > must be something like the example I gave: qrun each 10#<99 99 2. > >>> > > >>> > Please give me the exact steps that fail and how it fails. > >>> > > >>> > For example: > >>> > 1. clean system start > >>> > 2. start jconsole > >>> > 3. load'~addons/net/jcs/jcs.ijs' > >>> > 4. load'~addons/net/jcs/qrun.ijs' > >>> > 5. qrun each 10#<99 99 2 > >>> > 6. what happens? > >>> > > >>> > > >>> > On Wed, Oct 4, 2017 at 12:01 PM, 'Pascal Jasmin' via Programming < > >>> > [email protected]> wrote: > >>> > > >>> > > I also had the avast virus chest issue, reran tests with shields > >>> > disabled, > >>> > > after restart. > >>> > > > >>> > > > >>> > > qrun 99 99 2 is the main test I've used. Though 99 11 has more > >>> success > >>> > (I > >>> > > have 6 core 12 hyperthread AMD Ryzen processor), it still fails. > >>> > > > >>> > > the tests also fail in jconsole. There is "forward momentum" > >>> interaction > >>> > > between jqt and jconsole sessions running the same qrun parameters. > >>> > > > >>> > > I've tried the following modifications to kill__ > >>> > > > >>> > > > >>> > > kill=: 3 : 0 > >>> > > access=: su > >>> > > runa'exit 0' > >>> > > destroy'' > >>> > > killp PORT > >>> > > if. IFQT do. wd 'msgs' end. > >>> > > i.0 0 > >>> > > ) > >>> > > > >>> > > though these modifications have no to potentially slightly worse > >>> "getting > >>> > > through" performance. > >>> > > > >>> > > > >>> > > Engine: j806/j64avx/windows > >>> > > Beta-6: commercial/2017-09-26T14:05:48 > >>> > > Library: 8.06.07 > >>> > > Qt IDE: 1.6.1/5.6.3 > >>> > > Platform: Win 64 > >>> > > Installer: J806 install > >>> > > InstallPath: d:/j64-806 > >>> > > > >>> > > ________________________________ > >>> > > From: Eric Iverson <[email protected]> > >>> > > To: Programming forum <[email protected]> > >>> > > Sent: Wednesday, October 4, 2017 10:39 AM > >>> > > Subject: Re: [Jprogramming] jcs/zmq addons updated > >>> > > > >>> > > > >>> > > > >>> > > Pascal (qrun), > >>> > > > >>> > > I have run many tests on windows. The tests always run clean with > >>> > jconsole > >>> > > and JHS. There have been a few hiccups with Jqt. A few hangs as > you > >>> > > describe and one crash where avast put jqt.exe in its virus chest. > >>> > > > >>> > > Jqt is probably fine vs qrun but that is the only place I have seen > >>> > > problems with the latest code changes. A possible suspicion is > >>> wd'msgs'. > >>> > I > >>> > > can't imagine why running a new Jqt session with qrun would have > the > >>> > effect > >>> > > you describe, > >>> > > > >>> > > Remember that the linger bug was fixed and so things run more > >>> reliably > >>> > than > >>> > > in your tests with the first release. > >>> > > > >>> > > Please do the following: > >>> > > 1. let us know exactly what test you run (I use: qrun each 5#<99 99 > >>> 2) > >>> > > 2. ensure you have the latest base, net, and qtide > >>> > > 3. run your tests in jconsole or JHS until you have a failure or > are > >>> > > satisfied > >>> > > 4. run your tests in Jqt > >>> > > 5. let us know your findings > >>> > > > >>> > > > >>> > > On Wed, Oct 4, 2017 at 8:58 AM, 'Pascal Jasmin' via Programming < > >>> > > [email protected]> wrote: > >>> > > > >>> > > > was running with 1e2. > >>> > > > > >>> > > > The reason the different sessions were unblocking each other is > >>> that > >>> > they > >>> > > > were using the same ports. (as best as I can guess). > >>> > > > > >>> > > > qrun hard codes the start addresses. > >>> > > > > >>> > > > > >>> > > > > >>> > > > ________________________________ > >>> > > > From: bill lam <[email protected]> > >>> > > > To: Programming forum <[email protected]> > >>> > > > Sent: Tuesday, October 3, 2017 10:55 PM > >>> > > > Subject: Re: [Jprogramming] jcs/zmq addons updated > >>> > > > > >>> > > > > >>> > > > > >>> > > > Let's take out the memory constraint factor first, say qrun with > >>> > sentence > >>> > > > 1e3. I am not sure running in different jqt instances is a good > >>> idea > >>> > > since > >>> > > > the range of 100 ports used by jcs is hardcoded and are the same > >>> for > >>> > each > >>> > > > jqt. > >>> > > > > >>> > > > On Oct 4, 2017 10:41 AM, "'Pascal Jasmin' via Programming" < > >>> > > > [email protected]> wrote: > >>> > > > > >>> > > > in a 4th jqt session, yes it hung on first run, though pretty far > >>> in. > >>> > > > > >>> > > > I started getting memory errors (without hanging), at 80 80, and > >>> 22 22. > >>> > > I > >>> > > > have 4 hung jqt sessions now, but any new one lets the others > >>> progress. > >>> > > > Task manager reports very low memory use. > >>> > > > > >>> > > > 99 11 finishes just fine. It seems that in order to unblock > >>> another > >>> > > > session, the tasks attempted have to number the same as in the > >>> blocked > >>> > > > session, and it has to make it up to (near) the blocked task > >>> number. > >>> > > > > >>> > > > ________________________________ > >>> > > > From: bill lam <[email protected]> > >>> > > > To: Programming forum <[email protected]> > >>> > > > Sent: Tuesday, October 3, 2017 10:06 PM > >>> > > > Subject: Re: [Jprogramming] jcs/zmq addons updated > >>> > > > > >>> > > > > >>> > > > > >>> > > > Did qrun 99 99 hang in the first run? > >>> > > > > >>> > > > > >>> > > > On Oct 4, 2017 9:16 AM, "'Pascal Jasmin' via Programming" < > >>> > > > [email protected]> wrote: > >>> > > > > >>> > > > > qrun still hangs for me. Never on the first run though. In 5 > >>> of 6 > >>> > > > tries, > >>> > > > > it hangs on the 3rd run. On other it hanged on 2nd run. 3rd > >>> parameter > >>> > > > > always 6. > >>> > > > > > >>> > > > > I don't think I ever breeched memory/swap issues in these or > >>> previous > >>> > > > > tests. > >>> > > > > > >>> > > > > I found a way to unhang it though. > >>> > > > > > >>> > > > > start 2nd jqt session, and run qrun in it. It may hang, but > >>> other > >>> > > > session > >>> > > > > will unfreeze. If it did hang, then repeat in other session > >>> until > >>> > both > >>> > > > > unfrozen. Though, doing this enough can result in both > sessions > >>> > frozen > >>> > > > > (especially if using uneven task balances)... A 3rd jqt session > >>> to > >>> > the > >>> > > > > rescue of both frozen ones. > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > the show command and immediate jqt console output is a nice > >>> change. > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > ________________________________ > >>> > > > > From: Eric Iverson <[email protected]> > >>> > > > > To: Programming forum <[email protected]> > >>> > > > > Sent: Tuesday, October 3, 2017 5:41 PM > >>> > > > > Subject: [Jprogramming] jcs/zmq addons updated > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > A few cosmetic changes and perhaps fixes for qrun and related > >>> task > >>> > > > > problems. > >>> > > > > > >>> > > > > > >>> > > > > Note: qrun now defined in jcs/qrun.ijs > >>> > > > > > >>> > > > > > >>> > > > > The main problem was that a task ending could have a delayed > >>> close of > >>> > > the > >>> > > > > > >>> > > > > associated socket port and this could, depending on timing, > >>> prevent > >>> > the > >>> > > > > > >>> > > > > proper start of the next task trying to use the same port. > >>> > > > > > >>> > > > > > >>> > > > > The jcs sockets now set LINGER 0. This should avoid that class > of > >>> > > > problem. > >>> > > > > > >>> > > > > Initial stress tests all run clean on Linux and Windows. > >>> > > > > > >>> > > > > > >>> > > > > The other problem was that a server errror in qrun caused a > hang. > >>> > This > >>> > > > > > >>> > > > > wouldn't happen normally if the jobs were well defined and ran > to > >>> > > > > > >>> > > > > completion. A way to trigger the qrun server error in Windows > >>> was to > >>> > > run > >>> > > > a > >>> > > > > > >>> > > > > large number of tasks with large (memory consumption) jobs. > This > >>> > could > >>> > > > > > >>> > > > > exhaust windows swap memory and get an out-of-memory error. > >>> > > > > > >>> > > > > > >>> > > > > qrun now catches the server error, reports the lse error, and > >>> > > continues. > >>> > > > > > >>> > > > > ------------------------------------------------------------ > >>> > ---------- > >>> > > > > > >>> > > > > For information about J forums see http://www.jsoftware.com/ > >>> > forums.htm > >>> > >>> > > >>> > > > >>> > > > > >>> > > > > >>> > > > > ------------------------------------------------------------ > >>> > ---------- > >>> > > > > For information about J forums see http://www.jsoftware.com/ > >>> > forums.htm > >>> > > > ------------------------------------------------------------ > >>> ---------- > >>> > > > For information about J forums see > http://www.jsoftware.com/forum > >>> s.htm > >>> > > > ------------------------------------------------------------ > >>> ---------- > >>> > > > For information about J forums see > http://www.jsoftware.com/forum > >>> s.htm > >>> > > > ------------------------------------------------------------ > >>> ---------- > >>> > > > For information about J forums see > http://www.jsoftware.com/forum > >>> s.htm > >>> > > > ------------------------------------------------------------ > >>> ---------- > >>> > > > For information about J forums see > http://www.jsoftware.com/forum > >>> s.htm > >>> > > > > >>> > > ------------------------------------------------------------ > >>> ---------- > >>> > > For information about J forums see http://www.jsoftware.com/forum > >>> s.htm > >>> > > ------------------------------------------------------------ > >>> ---------- > >>> > > For information about J forums see http://www.jsoftware.com/forum > >>> s.htm > >>> > > > >>> > ------------------------------------------------------------ > ---------- > >>> > For information about J forums see http://www.jsoftware.com/ > forums.htm > >>> > ------------------------------------------------------------ > ---------- > >>> > For information about J forums see http://www.jsoftware.com/ > forums.htm > >>> > > >>> ---------------------------------------------------------------------- > >>> For information about J forums see http://www.jsoftware.com/forums.htm > >>> ---------------------------------------------------------------------- > >>> For information about J forums see http://www.jsoftware.com/forums.htm > >>> > >> > >> > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
