The break call stack in your last report is interesting.

Is this a break when it was hung? If it was a break when it was running
normally, then it is just normal.

If when hung, it indicates the client is waiting (zmq_poll) for a state
change in one of the servers. In this case the tasklist/taskkill from the
previous message would be of interest.



On Wed, Oct 4, 2017 at 1:54 PM, Eric Iverson <[email protected]>
wrote:

> Thanks for the additional info. No insight yet,
>
> Please try the following:
> 1. clean system
> 2. start jconsole
> 3.    2!:6'' NB. pid for future reference
> 4.    qrun until hung
> 5. windows command window:
>     tasklist /FI "imagename eq jconsole.exe"
> 6. above should list the pid from earlier - and perhaps other pids
> 7. windows command window:
>     taskkill /PID nnnnn - where nnnn is NOT the pid reported by 2!:6''
> 8. this should let the main task run again and perhaps give more info
>
> I would expect this to
>
>
> On Wed, Oct 4, 2017 at 1:35 PM, 'Pascal Jasmin' via Programming <
> [email protected]> wrote:
>
>> sometimes disabling smt in bios can increase performance or avoid such
>> problems ( I didn't do this, but ran with 5 threads ie < cores)
>>
>> following sequence,
>>
>> jconsole
>> 99 5 2 fine
>> 99 5 3 fine
>> 99 5 4 hangs at "end task 12"
>> ctrl c no immediate result
>>
>> jqt,
>> 99 5 2 finishes, and jconsole unhangs to produce ctrl c output:
>>
>> |break: cdx
>> |   r[check _1~:>{.r=.x     cdx y
>>
>>
>> in jqt, rerunning 99 5 3 several times on 4th try with debug (ctrl-k) on,
>> hangs
>>
>> in jconsole (attempt to unblock jqt)
>>
>> qrun 99 5 2
>> |port already in use in this task: assert
>> |   'port already in use in this task'    assert-.port e.>1{"1 jcs''
>> this error never occurred before, (when debug in jqt wasn't on).
>>
>> ________________________________
>> From: Eric Iverson <[email protected]>
>> To: Programming forum <[email protected]>
>> Sent: Wednesday, October 4, 2017 12:58 PM
>> Subject: Re: [Jprogramming] jcs/zmq addons updated
>>
>>
>>
>> Thanks for clarifying things.
>>
>> On your system, in a clean state, jconsole qrun 99 99 2 hangs.
>>
>> When you have the clean state hang in jconsole, please try ctrl+c (if you
>> have not already done so) as this should break out of some socket hangs.
>> If
>> this breaks, it would provide important info.
>>
>> It would be useful if you could get the hang with smaller args. For
>> example, can you get the hang with: qrun each 10#40 4 2
>>
>> Unfortunately I can not reproduce this on my windows system. I can loop
>> through 100s of this test without problem. Also on Linux and OSX.
>>
>> On Wed, Oct 4, 2017 at 12:45 PM, 'Pascal Jasmin' via Programming <
>> [email protected]> wrote:
>>
>> > running qrun in a single session hangs.  One semi-solution that
>> sometimes
>> > works is to then launch another session (jqt or jconsole) and run qrun,
>> > which will unhang the original session.  If both sessions are hung,
>> > launching a 3rd session may unfreeze them.
>> >
>> > A single run of 99 99 x does not always work.  My initial claim that
>> first
>> > runs always worked was based on using a tasks number lower than the
>> > hardware SMT capabilities.
>> >
>> > after clean start in jconsole
>> >
>> > qrun 99 99 2
>> >
>> > hangs at
>> >
>> > "end task: 98"
>> >
>> > since this fails, I'm not trying the 5# or 10# version.
>> >
>> > with the above hanged, doing the same run in jqt, in this case,
>> >
>> > failed to unhang jconsole
>> >
>> > hangs at "end task: 13"
>> > ________________________________
>> >
>> > From: Eric Iverson <[email protected]>
>> > To: Programming forum <[email protected]>
>> > Sent: Wednesday, October 4, 2017 12:28 PM
>> > Subject: Re: [Jprogramming] jcs/zmq addons updated
>> >
>> >
>> >
>> > I am confused by your message.
>> >
>> > Are you trying to run qrun at the same time in different J sessions?
>> This
>> > will definitely not work and is not the intended use for qrun.
>> >
>> > We need to narrow down to a simple case that fails.
>> >
>> > You indicate you get failures in jconsole, so let's focus on that.
>> >
>> > I thought you had indicated that a single run always worked and that the
>> > problem only occurred in repeated runs. If that is correct, then your
>> test
>> > must be something like the example I gave: qrun each 10#<99 99 2.
>> >
>> > Please give me the exact steps that fail and how it fails.
>> >
>> > For example:
>> > 1. clean system start
>> > 2. start jconsole
>> > 3.    load'~addons/net/jcs/jcs.ijs'
>> > 4.    load'~addons/net/jcs/qrun.ijs'
>> > 5.    qrun each 10#<99 99 2
>> > 6. what happens?
>> >
>> >
>> > On Wed, Oct 4, 2017 at 12:01 PM, 'Pascal Jasmin' via Programming <
>> > [email protected]> wrote:
>> >
>> > > I also had the avast virus chest issue, reran tests with shields
>> > disabled,
>> > > after restart.
>> > >
>> > >
>> > > qrun 99 99 2 is the main test I've used.  Though 99 11 has more
>> success
>> > (I
>> > > have 6 core 12 hyperthread AMD Ryzen processor), it still fails.
>> > >
>> > > the tests also fail in jconsole.  There is "forward momentum"
>> interaction
>> > > between jqt and jconsole sessions running the same qrun parameters.
>> > >
>> > > I've tried the following modifications to kill__
>> > >
>> > >
>> > > kill=: 3 : 0
>> > > access=: su
>> > > runa'exit 0'
>> > > destroy''
>> > > killp PORT
>> > > if. IFQT do. wd 'msgs' end.
>> > > i.0 0
>> > > )
>> > >
>> > > though these modifications have no to potentially slightly worse
>> "getting
>> > > through" performance.
>> > >
>> > >
>> > > Engine: j806/j64avx/windows
>> > > Beta-6: commercial/2017-09-26T14:05:48
>> > > Library: 8.06.07
>> > > Qt IDE: 1.6.1/5.6.3
>> > > Platform: Win 64
>> > > Installer: J806 install
>> > > InstallPath: d:/j64-806
>> > >
>> > > ________________________________
>> > > From: Eric Iverson <[email protected]>
>> > > To: Programming forum <[email protected]>
>> > > Sent: Wednesday, October 4, 2017 10:39 AM
>> > > Subject: Re: [Jprogramming] jcs/zmq addons updated
>> > >
>> > >
>> > >
>> > > Pascal (qrun),
>> > >
>> > > I have run many tests on windows. The tests always run clean with
>> > jconsole
>> > > and JHS. There have been a few hiccups with Jqt. A few  hangs as you
>> > > describe and one crash where avast put jqt.exe in its virus chest.
>> > >
>> > > Jqt is probably fine vs qrun but that is the only place I have seen
>> > > problems with the latest code changes. A possible suspicion is
>> wd'msgs'.
>> > I
>> > > can't imagine why running a new Jqt session with qrun would have the
>> > effect
>> > > you describe,
>> > >
>> > > Remember that the linger bug was fixed and so things run more reliably
>> > than
>> > > in your tests with the first release.
>> > >
>> > > Please do the following:
>> > > 1. let us know exactly what test you run (I use: qrun each 5#<99 99 2)
>> > > 2. ensure you have the latest base, net, and qtide
>> > > 3. run your tests in jconsole or JHS until you have a failure or are
>> > > satisfied
>> > > 4. run your tests in Jqt
>> > > 5. let us know your findings
>> > >
>> > >
>> > > On Wed, Oct 4, 2017 at 8:58 AM, 'Pascal Jasmin' via Programming <
>> > > [email protected]> wrote:
>> > >
>> > > > was running with 1e2.
>> > > >
>> > > > The reason the different sessions were unblocking each other is that
>> > they
>> > > > were using the same ports. (as best as I can guess).
>> > > >
>> > > > qrun hard codes the start addresses.
>> > > >
>> > > >
>> > > >
>> > > > ________________________________
>> > > > From: bill lam <[email protected]>
>> > > > To: Programming forum <[email protected]>
>> > > > Sent: Tuesday, October 3, 2017 10:55 PM
>> > > > Subject: Re: [Jprogramming] jcs/zmq addons updated
>> > > >
>> > > >
>> > > >
>> > > > Let's take out the memory constraint factor first, say qrun with
>> > sentence
>> > > > 1e3. I am not sure running in different jqt instances is a good idea
>> > > since
>> > > > the range of 100 ports used by jcs is hardcoded and are the same for
>> > each
>> > > > jqt.
>> > > >
>> > > > On Oct 4, 2017 10:41 AM, "'Pascal Jasmin' via Programming" <
>> > > > [email protected]> wrote:
>> > > >
>> > > > in a 4th jqt session, yes it hung on first run, though pretty far
>> in.
>> > > >
>> > > > I started getting memory errors (without hanging), at 80 80, and 22
>> 22.
>> > > I
>> > > > have 4 hung jqt sessions now, but any new one lets the others
>> progress.
>> > > > Task manager reports very low memory use.
>> > > >
>> > > > 99 11 finishes just fine.  It seems that in order to unblock another
>> > > > session, the tasks attempted have to number the same as in the
>> blocked
>> > > > session, and it has to make it up to (near) the blocked task number.
>> > > >
>> > > > ________________________________
>> > > > From: bill lam <[email protected]>
>> > > > To: Programming forum <[email protected]>
>> > > > Sent: Tuesday, October 3, 2017 10:06 PM
>> > > > Subject: Re: [Jprogramming] jcs/zmq addons updated
>> > > >
>> > > >
>> > > >
>> > > > Did qrun 99 99 hang in the first run?
>> > > >
>> > > >
>> > > > On Oct 4, 2017 9:16 AM, "'Pascal Jasmin' via Programming" <
>> > > > [email protected]> wrote:
>> > > >
>> > > > > qrun still hangs for me.  Never on the first run though.  In 5 of
>> 6
>> > > > tries,
>> > > > > it hangs on the 3rd run. On other it hanged on 2nd run. 3rd
>> parameter
>> > > > > always 6.
>> > > > >
>> > > > > I don't think I ever breeched memory/swap issues in these or
>> previous
>> > > > > tests.
>> > > > >
>> > > > > I found  a way to unhang it though.
>> > > > >
>> > > > > start 2nd jqt session, and run qrun in it.  It may hang, but other
>> > > > session
>> > > > > will unfreeze.  If it did hang, then repeat in other session until
>> > both
>> > > > > unfrozen.  Though, doing this enough can result in both sessions
>> > frozen
>> > > > > (especially if using uneven task balances)... A 3rd jqt session to
>> > the
>> > > > > rescue of both frozen ones.
>> > > > >
>> > > > >
>> > > > >
>> > > > > the show command and immediate jqt console output is a nice
>> change.
>> > > > >
>> > > > >
>> > > > >
>> > > > > ________________________________
>> > > > > From: Eric Iverson <[email protected]>
>> > > > > To: Programming forum <[email protected]>
>> > > > > Sent: Tuesday, October 3, 2017 5:41 PM
>> > > > > Subject: [Jprogramming] jcs/zmq addons updated
>> > > > >
>> > > > >
>> > > > >
>> > > > > A few cosmetic changes and perhaps fixes for qrun and related task
>> > > > > problems.
>> > > > >
>> > > > >
>> > > > > Note: qrun now defined in jcs/qrun.ijs
>> > > > >
>> > > > >
>> > > > > The main problem was that a task ending could have a delayed
>> close of
>> > > the
>> > > > >
>> > > > > associated socket port and this could, depending on timing,
>> prevent
>> > the
>> > > > >
>> > > > > proper start of the next task trying to use the same port.
>> > > > >
>> > > > >
>> > > > > The jcs sockets now set LINGER 0. This should avoid that class of
>> > > > problem.
>> > > > >
>> > > > > Initial stress tests all run clean on Linux and Windows.
>> > > > >
>> > > > >
>> > > > > The other problem was that a server errror in qrun caused a hang.
>> > This
>> > > > >
>> > > > > wouldn't happen normally if the jobs were well defined and ran to
>> > > > >
>> > > > > completion. A way to trigger the qrun server error in Windows was
>> to
>> > > run
>> > > > a
>> > > > >
>> > > > > large number of tasks with large (memory consumption) jobs. This
>> > could
>> > > > >
>> > > > > exhaust windows swap memory and get an out-of-memory error.
>> > > > >
>> > > > >
>> > > > > qrun now catches the server error, reports the lse error, and
>> > > continues.
>> > > > >
>> > > > > ------------------------------------------------------------
>> > ----------
>> > > > >
>> > > > > For information about J forums see http://www.jsoftware.com/
>> > forums.htm
>>
>> >
>> > >
>> > > >
>> > > >
>> > > > > ------------------------------------------------------------
>> > ----------
>> > > > > For information about J forums see http://www.jsoftware.com/
>> > forums.htm
>> > > > ------------------------------------------------------------
>> ----------
>> > > > For information about J forums see http://www.jsoftware.com/forum
>> s.htm
>> > > > ------------------------------------------------------------
>> ----------
>> > > > For information about J forums see http://www.jsoftware.com/forum
>> s.htm
>> > > > ------------------------------------------------------------
>> ----------
>> > > > For information about J forums see http://www.jsoftware.com/forum
>> s.htm
>> > > > ------------------------------------------------------------
>> ----------
>> > > > For information about J forums see http://www.jsoftware.com/forum
>> s.htm
>> > > >
>> > > ------------------------------------------------------------
>> ----------
>> > > For information about J forums see http://www.jsoftware.com/forum
>> s.htm
>> > > ------------------------------------------------------------
>> ----------
>> > > For information about J forums see http://www.jsoftware.com/forum
>> s.htm
>> > >
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> >
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to