Again, thanks for your patience.

I have finally stumbled on a way to get a hang on my system. This should
make it easier for both of us. I will keep you posted.

On Wed, Oct 4, 2017 at 2:28 PM, 'Pascal Jasmin' via Programming <
[email protected]> wrote:

> inserted the middle 2 lines, no assert failures, still hangs. (jqt)
>
>
>
>
> ________________________________
> From: Eric Iverson <[email protected]>
> To: Programming forum <[email protected]>
> Sent: Wednesday, October 4, 2017 2:14 PM
> Subject: Re: [Jprogramming] jcs/zmq addons updated
>
>
>
> Pascal,
>
> Please edit qrun to have the following 2 additional lines:
>
>   'reads writes errors'=. poll_jcs_ 180000;'';<tasks
>   assert 1<:(#reads)+#writes
>   assert 0=#errors
>   for_n. writes do.
>
>
>
> On Wed, Oct 4, 2017 at 1:59 PM, Eric Iverson <[email protected]>
> wrote:
>
> > The break call stack in your last report is interesting.
> >
> > Is this a break when it was hung? If it was a break when it was running
> > normally, then it is just normal.
> >
> > If when hung, it indicates the client is waiting (zmq_poll) for a state
> > change in one of the servers. In this case the tasklist/taskkill from the
> > previous message would be of interest.
> >
> >
> >
> > On Wed, Oct 4, 2017 at 1:54 PM, Eric Iverson <[email protected]>
> > wrote:
> >
> >> Thanks for the additional info. No insight yet,
> >>
> >> Please try the following:
> >> 1. clean system
> >> 2. start jconsole
> >> 3.    2!:6'' NB. pid for future reference
> >> 4.    qrun until hung
> >> 5. windows command window:
> >>     tasklist /FI "imagename eq jconsole.exe"
> >> 6. above should list the pid from earlier - and perhaps other pids
> >> 7. windows command window:
> >>     taskkill /PID nnnnn - where nnnn is NOT the pid reported by 2!:6''
> >> 8. this should let the main task run again and perhaps give more info
> >>
> >> I would expect this to
> >>
> >>
> >> On Wed, Oct 4, 2017 at 1:35 PM, 'Pascal Jasmin' via Programming <
> >> [email protected]> wrote:
> >>
> >>> sometimes disabling smt in bios can increase performance or avoid such
> >>> problems ( I didn't do this, but ran with 5 threads ie < cores)
> >>>
> >>> following sequence,
> >>>
> >>> jconsole
> >>> 99 5 2 fine
> >>> 99 5 3 fine
> >>> 99 5 4 hangs at "end task 12"
> >>> ctrl c no immediate result
> >>>
> >>> jqt,
> >>> 99 5 2 finishes, and jconsole unhangs to produce ctrl c output:
> >>>
> >>> |break: cdx
> >>> |   r[check _1~:>{.r=.x     cdx y
> >>>
> >>>
> >>> in jqt, rerunning 99 5 3 several times on 4th try with debug (ctrl-k)
> >>> on, hangs
> >>>
> >>> in jconsole (attempt to unblock jqt)
> >>>
> >>> qrun 99 5 2
> >>> |port already in use in this task: assert
> >>> |   'port already in use in this task'    assert-.port e.>1{"1 jcs''
> >>> this error never occurred before, (when debug in jqt wasn't on).
> >>>
> >>> ________________________________
> >>> From: Eric Iverson <[email protected]>
> >>> To: Programming forum <[email protected]>
> >>> Sent: Wednesday, October 4, 2017 12:58 PM
> >>> Subject: Re: [Jprogramming] jcs/zmq addons updated
> >>>
> >>>
> >>>
> >>> Thanks for clarifying things.
> >>>
> >>> On your system, in a clean state, jconsole qrun 99 99 2 hangs.
> >>>
> >>> When you have the clean state hang in jconsole, please try ctrl+c (if
> you
> >>> have not already done so) as this should break out of some socket
> hangs.
> >>> If
> >>> this breaks, it would provide important info.
> >>>
> >>> It would be useful if you could get the hang with smaller args. For
> >>> example, can you get the hang with: qrun each 10#40 4 2
> >>>
> >>> Unfortunately I can not reproduce this on my windows system. I can loop
> >>> through 100s of this test without problem. Also on Linux and OSX.
> >>>
> >>> On Wed, Oct 4, 2017 at 12:45 PM, 'Pascal Jasmin' via Programming <
> >>> [email protected]> wrote:
> >>>
> >>> > running qrun in a single session hangs.  One semi-solution that
> >>> sometimes
> >>> > works is to then launch another session (jqt or jconsole) and run
> qrun,
> >>> > which will unhang the original session.  If both sessions are hung,
> >>> > launching a 3rd session may unfreeze them.
> >>> >
> >>> > A single run of 99 99 x does not always work.  My initial claim that
> >>> first
> >>> > runs always worked was based on using a tasks number lower than the
> >>> > hardware SMT capabilities.
> >>> >
> >>> > after clean start in jconsole
> >>> >
> >>> > qrun 99 99 2
> >>> >
> >>> > hangs at
> >>> >
> >>> > "end task: 98"
> >>> >
> >>> > since this fails, I'm not trying the 5# or 10# version.
> >>> >
> >>> > with the above hanged, doing the same run in jqt, in this case,
> >>> >
> >>> > failed to unhang jconsole
> >>> >
> >>> > hangs at "end task: 13"
> >>> > ________________________________
> >>> >
> >>> > From: Eric Iverson <[email protected]>
> >>> > To: Programming forum <[email protected]>
> >>> > Sent: Wednesday, October 4, 2017 12:28 PM
> >>> > Subject: Re: [Jprogramming] jcs/zmq addons updated
> >>> >
> >>> >
> >>> >
> >>> > I am confused by your message.
> >>> >
> >>> > Are you trying to run qrun at the same time in different J sessions?
> >>> This
> >>> > will definitely not work and is not the intended use for qrun.
> >>> >
> >>> > We need to narrow down to a simple case that fails.
> >>> >
> >>> > You indicate you get failures in jconsole, so let's focus on that.
> >>> >
> >>> > I thought you had indicated that a single run always worked and that
> >>> the
> >>> > problem only occurred in repeated runs. If that is correct, then your
> >>> test
> >>> > must be something like the example I gave: qrun each 10#<99 99 2.
> >>> >
> >>> > Please give me the exact steps that fail and how it fails.
> >>> >
> >>> > For example:
> >>> > 1. clean system start
> >>> > 2. start jconsole
> >>> > 3.    load'~addons/net/jcs/jcs.ijs'
> >>> > 4.    load'~addons/net/jcs/qrun.ijs'
> >>> > 5.    qrun each 10#<99 99 2
> >>> > 6. what happens?
> >>> >
> >>> >
> >>> > On Wed, Oct 4, 2017 at 12:01 PM, 'Pascal Jasmin' via Programming <
> >>> > [email protected]> wrote:
> >>> >
> >>> > > I also had the avast virus chest issue, reran tests with shields
> >>> > disabled,
> >>> > > after restart.
> >>> > >
> >>> > >
> >>> > > qrun 99 99 2 is the main test I've used.  Though 99 11 has more
> >>> success
> >>> > (I
> >>> > > have 6 core 12 hyperthread AMD Ryzen processor), it still fails.
> >>> > >
> >>> > > the tests also fail in jconsole.  There is "forward momentum"
> >>> interaction
> >>> > > between jqt and jconsole sessions running the same qrun parameters.
> >>> > >
> >>> > > I've tried the following modifications to kill__
> >>> > >
> >>> > >
> >>> > > kill=: 3 : 0
> >>> > > access=: su
> >>> > > runa'exit 0'
> >>> > > destroy''
> >>> > > killp PORT
> >>> > > if. IFQT do. wd 'msgs' end.
> >>> > > i.0 0
> >>> > > )
> >>> > >
> >>> > > though these modifications have no to potentially slightly worse
> >>> "getting
> >>> > > through" performance.
> >>> > >
> >>> > >
> >>> > > Engine: j806/j64avx/windows
> >>> > > Beta-6: commercial/2017-09-26T14:05:48
> >>> > > Library: 8.06.07
> >>> > > Qt IDE: 1.6.1/5.6.3
> >>> > > Platform: Win 64
> >>> > > Installer: J806 install
> >>> > > InstallPath: d:/j64-806
> >>> > >
> >>> > > ________________________________
> >>> > > From: Eric Iverson <[email protected]>
> >>> > > To: Programming forum <[email protected]>
> >>> > > Sent: Wednesday, October 4, 2017 10:39 AM
> >>> > > Subject: Re: [Jprogramming] jcs/zmq addons updated
> >>> > >
> >>> > >
> >>> > >
> >>> > > Pascal (qrun),
> >>> > >
> >>> > > I have run many tests on windows. The tests always run clean with
> >>> > jconsole
> >>> > > and JHS. There have been a few hiccups with Jqt. A few  hangs as
> you
> >>> > > describe and one crash where avast put jqt.exe in its virus chest.
> >>> > >
> >>> > > Jqt is probably fine vs qrun but that is the only place I have seen
> >>> > > problems with the latest code changes. A possible suspicion is
> >>> wd'msgs'.
> >>> > I
> >>> > > can't imagine why running a new Jqt session with qrun would have
> the
> >>> > effect
> >>> > > you describe,
> >>> > >
> >>> > > Remember that the linger bug was fixed and so things run more
> >>> reliably
> >>> > than
> >>> > > in your tests with the first release.
> >>> > >
> >>> > > Please do the following:
> >>> > > 1. let us know exactly what test you run (I use: qrun each 5#<99 99
> >>> 2)
> >>> > > 2. ensure you have the latest base, net, and qtide
> >>> > > 3. run your tests in jconsole or JHS until you have a failure or
> are
> >>> > > satisfied
> >>> > > 4. run your tests in Jqt
> >>> > > 5. let us know your findings
> >>> > >
> >>> > >
> >>> > > On Wed, Oct 4, 2017 at 8:58 AM, 'Pascal Jasmin' via Programming <
> >>> > > [email protected]> wrote:
> >>> > >
> >>> > > > was running with 1e2.
> >>> > > >
> >>> > > > The reason the different sessions were unblocking each other is
> >>> that
> >>> > they
> >>> > > > were using the same ports. (as best as I can guess).
> >>> > > >
> >>> > > > qrun hard codes the start addresses.
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > > ________________________________
> >>> > > > From: bill lam <[email protected]>
> >>> > > > To: Programming forum <[email protected]>
> >>> > > > Sent: Tuesday, October 3, 2017 10:55 PM
> >>> > > > Subject: Re: [Jprogramming] jcs/zmq addons updated
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > > Let's take out the memory constraint factor first, say qrun with
> >>> > sentence
> >>> > > > 1e3. I am not sure running in different jqt instances is a good
> >>> idea
> >>> > > since
> >>> > > > the range of 100 ports used by jcs is hardcoded and are the same
> >>> for
> >>> > each
> >>> > > > jqt.
> >>> > > >
> >>> > > > On Oct 4, 2017 10:41 AM, "'Pascal Jasmin' via Programming" <
> >>> > > > [email protected]> wrote:
> >>> > > >
> >>> > > > in a 4th jqt session, yes it hung on first run, though pretty far
> >>> in.
> >>> > > >
> >>> > > > I started getting memory errors (without hanging), at 80 80, and
> >>> 22 22.
> >>> > > I
> >>> > > > have 4 hung jqt sessions now, but any new one lets the others
> >>> progress.
> >>> > > > Task manager reports very low memory use.
> >>> > > >
> >>> > > > 99 11 finishes just fine.  It seems that in order to unblock
> >>> another
> >>> > > > session, the tasks attempted have to number the same as in the
> >>> blocked
> >>> > > > session, and it has to make it up to (near) the blocked task
> >>> number.
> >>> > > >
> >>> > > > ________________________________
> >>> > > > From: bill lam <[email protected]>
> >>> > > > To: Programming forum <[email protected]>
> >>> > > > Sent: Tuesday, October 3, 2017 10:06 PM
> >>> > > > Subject: Re: [Jprogramming] jcs/zmq addons updated
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > > Did qrun 99 99 hang in the first run?
> >>> > > >
> >>> > > >
> >>> > > > On Oct 4, 2017 9:16 AM, "'Pascal Jasmin' via Programming" <
> >>> > > > [email protected]> wrote:
> >>> > > >
> >>> > > > > qrun still hangs for me.  Never on the first run though.  In 5
> >>> of 6
> >>> > > > tries,
> >>> > > > > it hangs on the 3rd run. On other it hanged on 2nd run. 3rd
> >>> parameter
> >>> > > > > always 6.
> >>> > > > >
> >>> > > > > I don't think I ever breeched memory/swap issues in these or
> >>> previous
> >>> > > > > tests.
> >>> > > > >
> >>> > > > > I found  a way to unhang it though.
> >>> > > > >
> >>> > > > > start 2nd jqt session, and run qrun in it.  It may hang, but
> >>> other
> >>> > > > session
> >>> > > > > will unfreeze.  If it did hang, then repeat in other session
> >>> until
> >>> > both
> >>> > > > > unfrozen.  Though, doing this enough can result in both
> sessions
> >>> > frozen
> >>> > > > > (especially if using uneven task balances)... A 3rd jqt session
> >>> to
> >>> > the
> >>> > > > > rescue of both frozen ones.
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > > the show command and immediate jqt console output is a nice
> >>> change.
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > > ________________________________
> >>> > > > > From: Eric Iverson <[email protected]>
> >>> > > > > To: Programming forum <[email protected]>
> >>> > > > > Sent: Tuesday, October 3, 2017 5:41 PM
> >>> > > > > Subject: [Jprogramming] jcs/zmq addons updated
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > > A few cosmetic changes and perhaps fixes for qrun and related
> >>> task
> >>> > > > > problems.
> >>> > > > >
> >>> > > > >
> >>> > > > > Note: qrun now defined in jcs/qrun.ijs
> >>> > > > >
> >>> > > > >
> >>> > > > > The main problem was that a task ending could have a delayed
> >>> close of
> >>> > > the
> >>> > > > >
> >>> > > > > associated socket port and this could, depending on timing,
> >>> prevent
> >>> > the
> >>> > > > >
> >>> > > > > proper start of the next task trying to use the same port.
> >>> > > > >
> >>> > > > >
> >>> > > > > The jcs sockets now set LINGER 0. This should avoid that class
> of
> >>> > > > problem.
> >>> > > > >
> >>> > > > > Initial stress tests all run clean on Linux and Windows.
> >>> > > > >
> >>> > > > >
> >>> > > > > The other problem was that a server errror in qrun caused a
> hang.
> >>> > This
> >>> > > > >
> >>> > > > > wouldn't happen normally if the jobs were well defined and ran
> to
> >>> > > > >
> >>> > > > > completion. A way to trigger the qrun server error in Windows
> >>> was to
> >>> > > run
> >>> > > > a
> >>> > > > >
> >>> > > > > large number of tasks with large (memory consumption) jobs.
> This
> >>> > could
> >>> > > > >
> >>> > > > > exhaust windows swap memory and get an out-of-memory error.
> >>> > > > >
> >>> > > > >
> >>> > > > > qrun now catches the server error, reports the lse error, and
> >>> > > continues.
> >>> > > > >
> >>> > > > > ------------------------------------------------------------
> >>> > ----------
> >>> > > > >
> >>> > > > > For information about J forums see http://www.jsoftware.com/
> >>> > forums.htm
> >>>
> >>> >
> >>> > >
> >>> > > >
> >>> > > >
> >>> > > > > ------------------------------------------------------------
> >>> > ----------
> >>> > > > > For information about J forums see http://www.jsoftware.com/
> >>> > forums.htm
> >>> > > > ------------------------------------------------------------
> >>> ----------
> >>> > > > For information about J forums see
> http://www.jsoftware.com/forum
> >>> s.htm
> >>> > > > ------------------------------------------------------------
> >>> ----------
> >>> > > > For information about J forums see
> http://www.jsoftware.com/forum
> >>> s.htm
> >>> > > > ------------------------------------------------------------
> >>> ----------
> >>> > > > For information about J forums see
> http://www.jsoftware.com/forum
> >>> s.htm
> >>> > > > ------------------------------------------------------------
> >>> ----------
> >>> > > > For information about J forums see
> http://www.jsoftware.com/forum
> >>> s.htm
> >>> > > >
> >>> > > ------------------------------------------------------------
> >>> ----------
> >>> > > For information about J forums see http://www.jsoftware.com/forum
> >>> s.htm
> >>> > > ------------------------------------------------------------
> >>> ----------
> >>> > > For information about J forums see http://www.jsoftware.com/forum
> >>> s.htm
> >>> > >
> >>> > ------------------------------------------------------------
> ----------
> >>> > For information about J forums see http://www.jsoftware.com/
> forums.htm
> >>> > ------------------------------------------------------------
> ----------
> >>> > For information about J forums see http://www.jsoftware.com/
> forums.htm
> >>> >
> >>> ----------------------------------------------------------------------
> >>> For information about J forums see http://www.jsoftware.com/forums.htm
> >>> ----------------------------------------------------------------------
> >>> For information about J forums see http://www.jsoftware.com/forums.htm
> >>>
> >>
> >>
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to