P:ascal,

Thanks for your patience on this.

I just saw something in qrun that seems wrong. Take a time out while I
study it. I will get back to you.


On Wed, Oct 4, 2017 at 2:25 PM, 'Pascal Jasmin' via Programming <
[email protected]> wrote:

> yes the break was issued when hung.
>
> I did what you recommended, but with jqt as the "initial client"
>
> tasklist /FI lists 13 jconsole instances.
>
> killing them all (one at a time) did not unfreeze jqt.
>
> A new jqt session did unfreeze it, but froze itself.  No jconsole pid's
> while frozen on qrun 99 5 3 at "end task: 4" (though finish 98 4)
>
> By far the most common freeze in all tests is on the very last job.
>
> ________________________________
> From: Eric Iverson <[email protected]>
> To: Programming forum <[email protected]>
> Sent: Wednesday, October 4, 2017 1:59 PM
> Subject: Re: [Jprogramming] jcs/zmq addons updated
>
>
>
> The break call stack in your last report is interesting.
>
> Is this a break when it was hung? If it was a break when it was running
> normally, then it is just normal.
>
> If when hung, it indicates the client is waiting (zmq_poll) for a state
> change in one of the servers. In this case the tasklist/taskkill from the
> previous message would be of interest.
>
>
>
>
> On Wed, Oct 4, 2017 at 1:54 PM, Eric Iverson <[email protected]>
> wrote:
>
> > Thanks for the additional info. No insight yet,
> >
> > Please try the following:
> > 1. clean system
> > 2. start jconsole
> > 3.    2!:6'' NB. pid for future reference
> > 4.    qrun until hung
> > 5. windows command window:
> >     tasklist /FI "imagename eq jconsole.exe"
> > 6. above should list the pid from earlier - and perhaps other pids
> > 7. windows command window:
> >     taskkill /PID nnnnn - where nnnn is NOT the pid reported by 2!:6''
> > 8. this should let the main task run again and perhaps give more info
> >
> > I would expect this to
> >
> >
> > On Wed, Oct 4, 2017 at 1:35 PM, 'Pascal Jasmin' via Programming <
> > [email protected]> wrote:
> >
> >> sometimes disabling smt in bios can increase performance or avoid such
> >> problems ( I didn't do this, but ran with 5 threads ie < cores)
> >>
> >> following sequence,
> >>
> >> jconsole
> >> 99 5 2 fine
> >> 99 5 3 fine
> >> 99 5 4 hangs at "end task 12"
> >> ctrl c no immediate result
> >>
> >> jqt,
> >> 99 5 2 finishes, and jconsole unhangs to produce ctrl c output:
> >>
> >> |break: cdx
> >> |   r[check _1~:>{.r=.x     cdx y
> >>
> >>
> >> in jqt, rerunning 99 5 3 several times on 4th try with debug (ctrl-k)
> on,
> >> hangs
> >>
> >> in jconsole (attempt to unblock jqt)
> >>
> >> qrun 99 5 2
> >> |port already in use in this task: assert
> >> |   'port already in use in this task'    assert-.port e.>1{"1 jcs''
> >> this error never occurred before, (when debug in jqt wasn't on).
> >>
> >> ________________________________
> >> From: Eric Iverson <[email protected]>
> >> To: Programming forum <[email protected]>
> >> Sent: Wednesday, October 4, 2017 12:58 PM
> >> Subject: Re: [Jprogramming] jcs/zmq addons updated
> >>
> >>
> >>
> >> Thanks for clarifying things.
> >>
> >> On your system, in a clean state, jconsole qrun 99 99 2 hangs.
> >>
> >> When you have the clean state hang in jconsole, please try ctrl+c (if
> you
> >> have not already done so) as this should break out of some socket hangs.
> >> If
> >> this breaks, it would provide important info.
> >>
> >> It would be useful if you could get the hang with smaller args. For
> >> example, can you get the hang with: qrun each 10#40 4 2
> >>
> >> Unfortunately I can not reproduce this on my windows system. I can loop
> >> through 100s of this test without problem. Also on Linux and OSX.
> >>
> >> On Wed, Oct 4, 2017 at 12:45 PM, 'Pascal Jasmin' via Programming <
> >> [email protected]> wrote:
> >>
> >> > running qrun in a single session hangs.  One semi-solution that
> >> sometimes
> >> > works is to then launch another session (jqt or jconsole) and run
> qrun,
> >> > which will unhang the original session.  If both sessions are hung,
> >> > launching a 3rd session may unfreeze them.
> >> >
> >> > A single run of 99 99 x does not always work.  My initial claim that
> >> first
> >> > runs always worked was based on using a tasks number lower than the
> >> > hardware SMT capabilities.
> >> >
> >> > after clean start in jconsole
> >> >
> >> > qrun 99 99 2
> >> >
> >> > hangs at
> >> >
> >> > "end task: 98"
> >> >
> >> > since this fails, I'm not trying the 5# or 10# version.
> >> >
> >> > with the above hanged, doing the same run in jqt, in this case,
> >> >
> >> > failed to unhang jconsole
> >> >
> >> > hangs at "end task: 13"
> >> > ________________________________
> >> >
> >> > From: Eric Iverson <[email protected]>
> >> > To: Programming forum <[email protected]>
> >> > Sent: Wednesday, October 4, 2017 12:28 PM
> >> > Subject: Re: [Jprogramming] jcs/zmq addons updated
> >> >
> >> >
> >> >
> >> > I am confused by your message.
> >> >
> >> > Are you trying to run qrun at the same time in different J sessions?
> >> This
> >> > will definitely not work and is not the intended use for qrun.
> >> >
> >> > We need to narrow down to a simple case that fails.
> >> >
> >> > You indicate you get failures in jconsole, so let's focus on that.
> >> >
> >> > I thought you had indicated that a single run always worked and that
> the
> >> > problem only occurred in repeated runs. If that is correct, then your
> >> test
> >> > must be something like the example I gave: qrun each 10#<99 99 2.
> >> >
> >> > Please give me the exact steps that fail and how it fails.
> >> >
> >> > For example:
> >> > 1. clean system start
> >> > 2. start jconsole
> >> > 3.    load'~addons/net/jcs/jcs.ijs'
> >> > 4.    load'~addons/net/jcs/qrun.ijs'
> >> > 5.    qrun each 10#<99 99 2
> >> > 6. what happens?
> >> >
> >> >
> >> > On Wed, Oct 4, 2017 at 12:01 PM, 'Pascal Jasmin' via Programming <
> >> > [email protected]> wrote:
> >> >
> >> > > I also had the avast virus chest issue, reran tests with shields
> >> > disabled,
> >> > > after restart.
> >> > >
> >> > >
> >> > > qrun 99 99 2 is the main test I've used.  Though 99 11 has more
> >> success
> >> > (I
> >> > > have 6 core 12 hyperthread AMD Ryzen processor), it still fails.
> >> > >
> >> > > the tests also fail in jconsole.  There is "forward momentum"
> >> interaction
> >> > > between jqt and jconsole sessions running the same qrun parameters.
> >> > >
> >> > > I've tried the following modifications to kill__
> >> > >
> >> > >
> >> > > kill=: 3 : 0
> >> > > access=: su
> >> > > runa'exit 0'
> >> > > destroy''
> >> > > killp PORT
> >> > > if. IFQT do. wd 'msgs' end.
> >> > > i.0 0
> >> > > )
> >> > >
> >> > > though these modifications have no to potentially slightly worse
> >> "getting
> >> > > through" performance.
> >> > >
> >> > >
> >> > > Engine: j806/j64avx/windows
> >> > > Beta-6: commercial/2017-09-26T14:05:48
> >> > > Library: 8.06.07
> >> > > Qt IDE: 1.6.1/5.6.3
> >> > > Platform: Win 64
> >> > > Installer: J806 install
> >> > > InstallPath: d:/j64-806
> >> > >
> >> > > ________________________________
> >> > > From: Eric Iverson <[email protected]>
> >> > > To: Programming forum <[email protected]>
> >> > > Sent: Wednesday, October 4, 2017 10:39 AM
> >> > > Subject: Re: [Jprogramming] jcs/zmq addons updated
> >> > >
> >> > >
> >> > >
> >> > > Pascal (qrun),
> >> > >
> >> > > I have run many tests on windows. The tests always run clean with
> >> > jconsole
> >> > > and JHS. There have been a few hiccups with Jqt. A few  hangs as you
> >> > > describe and one crash where avast put jqt.exe in its virus chest.
> >> > >
> >> > > Jqt is probably fine vs qrun but that is the only place I have seen
> >> > > problems with the latest code changes. A possible suspicion is
> >> wd'msgs'.
> >> > I
> >> > > can't imagine why running a new Jqt session with qrun would have the
> >> > effect
> >> > > you describe,
> >> > >
> >> > > Remember that the linger bug was fixed and so things run more
> reliably
> >> > than
> >> > > in your tests with the first release.
> >> > >
> >> > > Please do the following:
> >> > > 1. let us know exactly what test you run (I use: qrun each 5#<99 99
> 2)
> >> > > 2. ensure you have the latest base, net, and qtide
> >> > > 3. run your tests in jconsole or JHS until you have a failure or are
> >> > > satisfied
> >> > > 4. run your tests in Jqt
> >> > > 5. let us know your findings
> >> > >
> >> > >
> >> > > On Wed, Oct 4, 2017 at 8:58 AM, 'Pascal Jasmin' via Programming <
> >> > > [email protected]> wrote:
> >> > >
> >> > > > was running with 1e2.
> >> > > >
> >> > > > The reason the different sessions were unblocking each other is
> that
> >> > they
> >> > > > were using the same ports. (as best as I can guess).
> >> > > >
> >> > > > qrun hard codes the start addresses.
> >> > > >
> >> > > >
> >> > > >
> >> > > > ________________________________
> >> > > > From: bill lam <[email protected]>
> >> > > > To: Programming forum <[email protected]>
> >> > > > Sent: Tuesday, October 3, 2017 10:55 PM
> >> > > > Subject: Re: [Jprogramming] jcs/zmq addons updated
> >> > > >
> >> > > >
> >> > > >
> >> > > > Let's take out the memory constraint factor first, say qrun with
> >> > sentence
> >> > > > 1e3. I am not sure running in different jqt instances is a good
> idea
> >> > > since
> >> > > > the range of 100 ports used by jcs is hardcoded and are the same
> for
> >> > each
> >> > > > jqt.
> >> > > >
> >> > > > On Oct 4, 2017 10:41 AM, "'Pascal Jasmin' via Programming" <
> >> > > > [email protected]> wrote:
> >> > > >
> >> > > > in a 4th jqt session, yes it hung on first run, though pretty far
> >> in.
> >> > > >
> >> > > > I started getting memory errors (without hanging), at 80 80, and
> 22
> >> 22.
> >> > > I
> >> > > > have 4 hung jqt sessions now, but any new one lets the others
> >> progress.
> >> > > > Task manager reports very low memory use.
> >> > > >
> >> > > > 99 11 finishes just fine.  It seems that in order to unblock
> another
> >> > > > session, the tasks attempted have to number the same as in the
> >> blocked
> >> > > > session, and it has to make it up to (near) the blocked task
> number.
> >> > > >
> >> > > > ________________________________
> >> > > > From: bill lam <[email protected]>
> >> > > > To: Programming forum <[email protected]>
> >> > > > Sent: Tuesday, October 3, 2017 10:06 PM
> >> > > > Subject: Re: [Jprogramming] jcs/zmq addons updated
> >> > > >
> >> > > >
> >> > > >
> >> > > > Did qrun 99 99 hang in the first run?
> >> > > >
> >> > > >
> >> > > > On Oct 4, 2017 9:16 AM, "'Pascal Jasmin' via Programming" <
> >> > > > [email protected]> wrote:
> >> > > >
> >> > > > > qrun still hangs for me.  Never on the first run though.  In 5
> of
> >> 6
> >> > > > tries,
> >> > > > > it hangs on the 3rd run. On other it hanged on 2nd run. 3rd
> >> parameter
> >> > > > > always 6.
> >> > > > >
> >> > > > > I don't think I ever breeched memory/swap issues in these or
> >> previous
> >> > > > > tests.
> >> > > > >
> >> > > > > I found  a way to unhang it though.
> >> > > > >
> >> > > > > start 2nd jqt session, and run qrun in it.  It may hang, but
> other
> >> > > > session
> >> > > > > will unfreeze.  If it did hang, then repeat in other session
> until
> >> > both
> >> > > > > unfrozen.  Though, doing this enough can result in both sessions
> >> > frozen
> >> > > > > (especially if using uneven task balances)... A 3rd jqt session
> to
> >> > the
> >> > > > > rescue of both frozen ones.
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > the show command and immediate jqt console output is a nice
> >> change.
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > ________________________________
> >> > > > > From: Eric Iverson <[email protected]>
> >> > > > > To: Programming forum <[email protected]>
> >> > > > > Sent: Tuesday, October 3, 2017 5:41 PM
> >> > > > > Subject: [Jprogramming] jcs/zmq addons updated
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > A few cosmetic changes and perhaps fixes for qrun and related
> task
> >> > > > > problems.
> >> > > > >
> >> > > > >
> >> > > > > Note: qrun now defined in jcs/qrun.ijs
> >> > > > >
> >> > > > >
> >> > > > > The main problem was that a task ending could have a delayed
> >> close of
> >> > > the
> >> > > > >
> >> > > > > associated socket port and this could, depending on timing,
> >> prevent
> >> > the
> >> > > > >
> >> > > > > proper start of the next task trying to use the same port.
> >> > > > >
> >> > > > >
> >> > > > > The jcs sockets now set LINGER 0. This should avoid that class
> of
> >> > > > problem.
> >> > > > >
> >> > > > > Initial stress tests all run clean on Linux and Windows.
> >> > > > >
> >> > > > >
> >> > > > > The other problem was that a server errror in qrun caused a
> hang.
> >> > This
> >> > > > >
> >> > > > > wouldn't happen normally if the jobs were well defined and ran
> to
> >> > > > >
> >> > > > > completion. A way to trigger the qrun server error in Windows
> was
> >> to
> >> > > run
> >> > > > a
> >> > > > >
> >> > > > > large number of tasks with large (memory consumption) jobs. This
> >> > could
> >> > > > >
> >> > > > > exhaust windows swap memory and get an out-of-memory error.
> >> > > > >
> >> > > > >
> >> > > > > qrun now catches the server error, reports the lse error, and
> >> > > continues.
> >> > > > >
> >> > > > > ------------------------------------------------------------
> >> > ----------
> >> > > > >
> >> > > > > For information about J forums see http://www.jsoftware.com/
> >> > forums.htm
> >>
> >> >
> >> > >
> >> > > >
> >> > > >
> >> > > > > ------------------------------------------------------------
> >> > ----------
> >> > > > > For information about J forums see http://www.jsoftware.com/
> >> > forums.htm
> >> > > > ------------------------------------------------------------
> >> ----------
> >> > > > For information about J forums see http://www.jsoftware.com/forum
> >> s.htm
> >> > > > ------------------------------------------------------------
> >> ----------
> >> > > > For information about J forums see http://www.jsoftware.com/forum
> >> s.htm
> >> > > > ------------------------------------------------------------
> >> ----------
> >> > > > For information about J forums see http://www.jsoftware.com/forum
> >> s.htm
> >> > > > ------------------------------------------------------------
> >> ----------
> >> > > > For information about J forums see http://www.jsoftware.com/forum
> >> s.htm
> >> > > >
> >> > > ------------------------------------------------------------
> >> ----------
> >> > > For information about J forums see http://www.jsoftware.com/forum
> >> s.htm
> >> > > ------------------------------------------------------------
> >> ----------
> >> > > For information about J forums see http://www.jsoftware.com/forum
> >> s.htm
> >> > >
> >> > ------------------------------------------------------------
> ----------
> >> > For information about J forums see http://www.jsoftware.com/
> forums.htm
> >> > ------------------------------------------------------------
> ----------
> >> > For information about J forums see http://www.jsoftware.com/
> forums.htm
> >> >
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> >>
> >
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to