Pascal,

Please edit qrun to have the following 2 additional lines:

  'reads writes errors'=. poll_jcs_ 180000;'';<tasks
  assert 1<:(#reads)+#writes
  assert 0=#errors
  for_n. writes do.


On Wed, Oct 4, 2017 at 1:59 PM, Eric Iverson <[email protected]>
wrote:

> The break call stack in your last report is interesting.
>
> Is this a break when it was hung? If it was a break when it was running
> normally, then it is just normal.
>
> If when hung, it indicates the client is waiting (zmq_poll) for a state
> change in one of the servers. In this case the tasklist/taskkill from the
> previous message would be of interest.
>
>
>
> On Wed, Oct 4, 2017 at 1:54 PM, Eric Iverson <[email protected]>
> wrote:
>
>> Thanks for the additional info. No insight yet,
>>
>> Please try the following:
>> 1. clean system
>> 2. start jconsole
>> 3.    2!:6'' NB. pid for future reference
>> 4.    qrun until hung
>> 5. windows command window:
>>     tasklist /FI "imagename eq jconsole.exe"
>> 6. above should list the pid from earlier - and perhaps other pids
>> 7. windows command window:
>>     taskkill /PID nnnnn - where nnnn is NOT the pid reported by 2!:6''
>> 8. this should let the main task run again and perhaps give more info
>>
>> I would expect this to
>>
>>
>> On Wed, Oct 4, 2017 at 1:35 PM, 'Pascal Jasmin' via Programming <
>> [email protected]> wrote:
>>
>>> sometimes disabling smt in bios can increase performance or avoid such
>>> problems ( I didn't do this, but ran with 5 threads ie < cores)
>>>
>>> following sequence,
>>>
>>> jconsole
>>> 99 5 2 fine
>>> 99 5 3 fine
>>> 99 5 4 hangs at "end task 12"
>>> ctrl c no immediate result
>>>
>>> jqt,
>>> 99 5 2 finishes, and jconsole unhangs to produce ctrl c output:
>>>
>>> |break: cdx
>>> |   r[check _1~:>{.r=.x     cdx y
>>>
>>>
>>> in jqt, rerunning 99 5 3 several times on 4th try with debug (ctrl-k)
>>> on, hangs
>>>
>>> in jconsole (attempt to unblock jqt)
>>>
>>> qrun 99 5 2
>>> |port already in use in this task: assert
>>> |   'port already in use in this task'    assert-.port e.>1{"1 jcs''
>>> this error never occurred before, (when debug in jqt wasn't on).
>>>
>>> ________________________________
>>> From: Eric Iverson <[email protected]>
>>> To: Programming forum <[email protected]>
>>> Sent: Wednesday, October 4, 2017 12:58 PM
>>> Subject: Re: [Jprogramming] jcs/zmq addons updated
>>>
>>>
>>>
>>> Thanks for clarifying things.
>>>
>>> On your system, in a clean state, jconsole qrun 99 99 2 hangs.
>>>
>>> When you have the clean state hang in jconsole, please try ctrl+c (if you
>>> have not already done so) as this should break out of some socket hangs.
>>> If
>>> this breaks, it would provide important info.
>>>
>>> It would be useful if you could get the hang with smaller args. For
>>> example, can you get the hang with: qrun each 10#40 4 2
>>>
>>> Unfortunately I can not reproduce this on my windows system. I can loop
>>> through 100s of this test without problem. Also on Linux and OSX.
>>>
>>> On Wed, Oct 4, 2017 at 12:45 PM, 'Pascal Jasmin' via Programming <
>>> [email protected]> wrote:
>>>
>>> > running qrun in a single session hangs.  One semi-solution that
>>> sometimes
>>> > works is to then launch another session (jqt or jconsole) and run qrun,
>>> > which will unhang the original session.  If both sessions are hung,
>>> > launching a 3rd session may unfreeze them.
>>> >
>>> > A single run of 99 99 x does not always work.  My initial claim that
>>> first
>>> > runs always worked was based on using a tasks number lower than the
>>> > hardware SMT capabilities.
>>> >
>>> > after clean start in jconsole
>>> >
>>> > qrun 99 99 2
>>> >
>>> > hangs at
>>> >
>>> > "end task: 98"
>>> >
>>> > since this fails, I'm not trying the 5# or 10# version.
>>> >
>>> > with the above hanged, doing the same run in jqt, in this case,
>>> >
>>> > failed to unhang jconsole
>>> >
>>> > hangs at "end task: 13"
>>> > ________________________________
>>> >
>>> > From: Eric Iverson <[email protected]>
>>> > To: Programming forum <[email protected]>
>>> > Sent: Wednesday, October 4, 2017 12:28 PM
>>> > Subject: Re: [Jprogramming] jcs/zmq addons updated
>>> >
>>> >
>>> >
>>> > I am confused by your message.
>>> >
>>> > Are you trying to run qrun at the same time in different J sessions?
>>> This
>>> > will definitely not work and is not the intended use for qrun.
>>> >
>>> > We need to narrow down to a simple case that fails.
>>> >
>>> > You indicate you get failures in jconsole, so let's focus on that.
>>> >
>>> > I thought you had indicated that a single run always worked and that
>>> the
>>> > problem only occurred in repeated runs. If that is correct, then your
>>> test
>>> > must be something like the example I gave: qrun each 10#<99 99 2.
>>> >
>>> > Please give me the exact steps that fail and how it fails.
>>> >
>>> > For example:
>>> > 1. clean system start
>>> > 2. start jconsole
>>> > 3.    load'~addons/net/jcs/jcs.ijs'
>>> > 4.    load'~addons/net/jcs/qrun.ijs'
>>> > 5.    qrun each 10#<99 99 2
>>> > 6. what happens?
>>> >
>>> >
>>> > On Wed, Oct 4, 2017 at 12:01 PM, 'Pascal Jasmin' via Programming <
>>> > [email protected]> wrote:
>>> >
>>> > > I also had the avast virus chest issue, reran tests with shields
>>> > disabled,
>>> > > after restart.
>>> > >
>>> > >
>>> > > qrun 99 99 2 is the main test I've used.  Though 99 11 has more
>>> success
>>> > (I
>>> > > have 6 core 12 hyperthread AMD Ryzen processor), it still fails.
>>> > >
>>> > > the tests also fail in jconsole.  There is "forward momentum"
>>> interaction
>>> > > between jqt and jconsole sessions running the same qrun parameters.
>>> > >
>>> > > I've tried the following modifications to kill__
>>> > >
>>> > >
>>> > > kill=: 3 : 0
>>> > > access=: su
>>> > > runa'exit 0'
>>> > > destroy''
>>> > > killp PORT
>>> > > if. IFQT do. wd 'msgs' end.
>>> > > i.0 0
>>> > > )
>>> > >
>>> > > though these modifications have no to potentially slightly worse
>>> "getting
>>> > > through" performance.
>>> > >
>>> > >
>>> > > Engine: j806/j64avx/windows
>>> > > Beta-6: commercial/2017-09-26T14:05:48
>>> > > Library: 8.06.07
>>> > > Qt IDE: 1.6.1/5.6.3
>>> > > Platform: Win 64
>>> > > Installer: J806 install
>>> > > InstallPath: d:/j64-806
>>> > >
>>> > > ________________________________
>>> > > From: Eric Iverson <[email protected]>
>>> > > To: Programming forum <[email protected]>
>>> > > Sent: Wednesday, October 4, 2017 10:39 AM
>>> > > Subject: Re: [Jprogramming] jcs/zmq addons updated
>>> > >
>>> > >
>>> > >
>>> > > Pascal (qrun),
>>> > >
>>> > > I have run many tests on windows. The tests always run clean with
>>> > jconsole
>>> > > and JHS. There have been a few hiccups with Jqt. A few  hangs as you
>>> > > describe and one crash where avast put jqt.exe in its virus chest.
>>> > >
>>> > > Jqt is probably fine vs qrun but that is the only place I have seen
>>> > > problems with the latest code changes. A possible suspicion is
>>> wd'msgs'.
>>> > I
>>> > > can't imagine why running a new Jqt session with qrun would have the
>>> > effect
>>> > > you describe,
>>> > >
>>> > > Remember that the linger bug was fixed and so things run more
>>> reliably
>>> > than
>>> > > in your tests with the first release.
>>> > >
>>> > > Please do the following:
>>> > > 1. let us know exactly what test you run (I use: qrun each 5#<99 99
>>> 2)
>>> > > 2. ensure you have the latest base, net, and qtide
>>> > > 3. run your tests in jconsole or JHS until you have a failure or are
>>> > > satisfied
>>> > > 4. run your tests in Jqt
>>> > > 5. let us know your findings
>>> > >
>>> > >
>>> > > On Wed, Oct 4, 2017 at 8:58 AM, 'Pascal Jasmin' via Programming <
>>> > > [email protected]> wrote:
>>> > >
>>> > > > was running with 1e2.
>>> > > >
>>> > > > The reason the different sessions were unblocking each other is
>>> that
>>> > they
>>> > > > were using the same ports. (as best as I can guess).
>>> > > >
>>> > > > qrun hard codes the start addresses.
>>> > > >
>>> > > >
>>> > > >
>>> > > > ________________________________
>>> > > > From: bill lam <[email protected]>
>>> > > > To: Programming forum <[email protected]>
>>> > > > Sent: Tuesday, October 3, 2017 10:55 PM
>>> > > > Subject: Re: [Jprogramming] jcs/zmq addons updated
>>> > > >
>>> > > >
>>> > > >
>>> > > > Let's take out the memory constraint factor first, say qrun with
>>> > sentence
>>> > > > 1e3. I am not sure running in different jqt instances is a good
>>> idea
>>> > > since
>>> > > > the range of 100 ports used by jcs is hardcoded and are the same
>>> for
>>> > each
>>> > > > jqt.
>>> > > >
>>> > > > On Oct 4, 2017 10:41 AM, "'Pascal Jasmin' via Programming" <
>>> > > > [email protected]> wrote:
>>> > > >
>>> > > > in a 4th jqt session, yes it hung on first run, though pretty far
>>> in.
>>> > > >
>>> > > > I started getting memory errors (without hanging), at 80 80, and
>>> 22 22.
>>> > > I
>>> > > > have 4 hung jqt sessions now, but any new one lets the others
>>> progress.
>>> > > > Task manager reports very low memory use.
>>> > > >
>>> > > > 99 11 finishes just fine.  It seems that in order to unblock
>>> another
>>> > > > session, the tasks attempted have to number the same as in the
>>> blocked
>>> > > > session, and it has to make it up to (near) the blocked task
>>> number.
>>> > > >
>>> > > > ________________________________
>>> > > > From: bill lam <[email protected]>
>>> > > > To: Programming forum <[email protected]>
>>> > > > Sent: Tuesday, October 3, 2017 10:06 PM
>>> > > > Subject: Re: [Jprogramming] jcs/zmq addons updated
>>> > > >
>>> > > >
>>> > > >
>>> > > > Did qrun 99 99 hang in the first run?
>>> > > >
>>> > > >
>>> > > > On Oct 4, 2017 9:16 AM, "'Pascal Jasmin' via Programming" <
>>> > > > [email protected]> wrote:
>>> > > >
>>> > > > > qrun still hangs for me.  Never on the first run though.  In 5
>>> of 6
>>> > > > tries,
>>> > > > > it hangs on the 3rd run. On other it hanged on 2nd run. 3rd
>>> parameter
>>> > > > > always 6.
>>> > > > >
>>> > > > > I don't think I ever breeched memory/swap issues in these or
>>> previous
>>> > > > > tests.
>>> > > > >
>>> > > > > I found  a way to unhang it though.
>>> > > > >
>>> > > > > start 2nd jqt session, and run qrun in it.  It may hang, but
>>> other
>>> > > > session
>>> > > > > will unfreeze.  If it did hang, then repeat in other session
>>> until
>>> > both
>>> > > > > unfrozen.  Though, doing this enough can result in both sessions
>>> > frozen
>>> > > > > (especially if using uneven task balances)... A 3rd jqt session
>>> to
>>> > the
>>> > > > > rescue of both frozen ones.
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > the show command and immediate jqt console output is a nice
>>> change.
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > ________________________________
>>> > > > > From: Eric Iverson <[email protected]>
>>> > > > > To: Programming forum <[email protected]>
>>> > > > > Sent: Tuesday, October 3, 2017 5:41 PM
>>> > > > > Subject: [Jprogramming] jcs/zmq addons updated
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > A few cosmetic changes and perhaps fixes for qrun and related
>>> task
>>> > > > > problems.
>>> > > > >
>>> > > > >
>>> > > > > Note: qrun now defined in jcs/qrun.ijs
>>> > > > >
>>> > > > >
>>> > > > > The main problem was that a task ending could have a delayed
>>> close of
>>> > > the
>>> > > > >
>>> > > > > associated socket port and this could, depending on timing,
>>> prevent
>>> > the
>>> > > > >
>>> > > > > proper start of the next task trying to use the same port.
>>> > > > >
>>> > > > >
>>> > > > > The jcs sockets now set LINGER 0. This should avoid that class of
>>> > > > problem.
>>> > > > >
>>> > > > > Initial stress tests all run clean on Linux and Windows.
>>> > > > >
>>> > > > >
>>> > > > > The other problem was that a server errror in qrun caused a hang.
>>> > This
>>> > > > >
>>> > > > > wouldn't happen normally if the jobs were well defined and ran to
>>> > > > >
>>> > > > > completion. A way to trigger the qrun server error in Windows
>>> was to
>>> > > run
>>> > > > a
>>> > > > >
>>> > > > > large number of tasks with large (memory consumption) jobs. This
>>> > could
>>> > > > >
>>> > > > > exhaust windows swap memory and get an out-of-memory error.
>>> > > > >
>>> > > > >
>>> > > > > qrun now catches the server error, reports the lse error, and
>>> > > continues.
>>> > > > >
>>> > > > > ------------------------------------------------------------
>>> > ----------
>>> > > > >
>>> > > > > For information about J forums see http://www.jsoftware.com/
>>> > forums.htm
>>>
>>> >
>>> > >
>>> > > >
>>> > > >
>>> > > > > ------------------------------------------------------------
>>> > ----------
>>> > > > > For information about J forums see http://www.jsoftware.com/
>>> > forums.htm
>>> > > > ------------------------------------------------------------
>>> ----------
>>> > > > For information about J forums see http://www.jsoftware.com/forum
>>> s.htm
>>> > > > ------------------------------------------------------------
>>> ----------
>>> > > > For information about J forums see http://www.jsoftware.com/forum
>>> s.htm
>>> > > > ------------------------------------------------------------
>>> ----------
>>> > > > For information about J forums see http://www.jsoftware.com/forum
>>> s.htm
>>> > > > ------------------------------------------------------------
>>> ----------
>>> > > > For information about J forums see http://www.jsoftware.com/forum
>>> s.htm
>>> > > >
>>> > > ------------------------------------------------------------
>>> ----------
>>> > > For information about J forums see http://www.jsoftware.com/forum
>>> s.htm
>>> > > ------------------------------------------------------------
>>> ----------
>>> > > For information about J forums see http://www.jsoftware.com/forum
>>> s.htm
>>> > >
>>> > ----------------------------------------------------------------------
>>> > For information about J forums see http://www.jsoftware.com/forums.htm
>>> > ----------------------------------------------------------------------
>>> > For information about J forums see http://www.jsoftware.com/forums.htm
>>> >
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>>
>>
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to