Turns out my methodology was wrong (I re-ran the tests logging each
test's output to file, and found five different instances where J
crashed - I had not rigged up my tests to a debugger, so there was no
command line to fall back to).

I can start narrowing this down now,

Thanks,

-- 
Raul

On Sat, Nov 18, 2023 at 8:21 AM Raul Miller <[email protected]> wrote:
>
> Well... I've run 19 batches of tests, breaking up the test cases
> preceding g300.ijs, grouping the test files by the first three
> characters of their file names (and then running g300 after each of
> the batches, but starting a new jconsole for the next batch). All
> tests passed successfully (jconsole did not unlock for input in any of
> these cases).
>
> So this is looking more like an exhaustion or overflow issue than
> something with a direct trigger.
>
> (Or maybe something intermittent? But the error seemed reliable when
> running the full test suite.)
>
> --
> Raul
>
> On Fri, Nov 17, 2023 at 3:44 PM Henry Rich <[email protected]> wrote:
> >
> > That's what I was trying to say. Run the last 20 testcases that come before
> > g300.  If that fails, reduce the number as far as you can.
> >
> > hhr
> >
> > On Fri, Nov 17, 2023, 12:34 PM Raul Miller <[email protected]> wrote:
> >
> > >    RUN4 testfiles 'g300'
> > >
> > > completes without error.
> > >
> > > So I think something in the testga.ijs testfiles before g300 is
> > > necessary to condition the namespaces (or whatever) so that this
> > > double free error is triggered.
> > >
> > > --
> > > Raul
> > >
> > >
> > > --
> > > Raul
> > >
> > > On Fri, Nov 17, 2023 at 12:21 PM Henry Rich <[email protected]> wrote:
> > > >
> > > > It is not unusual for the failing script to run by itself. Run the last
> > > few
> > > > scripts ending in g300 and see how many it takes to make it fail.
> > > >
> > > > hhr
> > > >
> > > > On Fri, Nov 17, 2023, 11:54 AM Raul Miller <[email protected]>
> > > wrote:
> > > >
> > > > > Sadly, the error is not totally repeatable.
> > > > >
> > > > > (Running g300 by itself worked.)
> > > > >
> > > > > So I'm looking at a couple of days for each attempt to reproduce the
> > > issue.
> > > > >
> > > > > --
> > > > > Raul
> > > > >
> > > > > On Fri, Nov 17, 2023 at 9:41 AM Henry Rich <[email protected]>
> > > wrote:
> > > > > >
> > > > > > Looking at the stack, it crashes while creating a nameref,
> > > presumably for
> > > > > > eqf. That suggests that the error is during the assignment to m. I
> > > would
> > > > > > put calls around the parser to get an earlier failure.
> > > > > >
> > > > > > If the failure is totally repeatable you can turn off the audits
> > > until
> > > > > > jt->parsercalls gets near the error.
> > > > > >
> > > > > > hhr
> > > > > >
> > > > > > On Fri, Nov 17, 2023, 9:21 AM Raul Miller <[email protected]>
> > > wrote:
> > > > > >
> > > > > > > The test suite under MEMAUDIT=0x2 is quite slow, even with
> > > QKTEST=:1.
> > > > > > > And, sometimes windows will reboot itself and/or lose 
> > > > > > > configuration
> > > > > > > information. (I'm running debian 12.2 under windows wsl2.)
> > > > > > >
> > > > > > > But, I've found a line that fails the test suite on line 264 of
> > > g300.
> > > > > > > Unfortunately, it has not been reproducible running g300 in
> > > isolation
> > > > > > > under the default j64/jconsole instance.
> > > > > > >
> > > > > > > But, for now at least, I have a live session with the stack at the
> > > > > > > point where the fault was detected, in case further inspection of
> > > > > > > details might be useful here:
> > > > > > >
> > > > > > >    (-"+/ .* eqf -/ .*) m=: %/1+?2 4 4$200x
> > > > > > > trap : file ../../../../jsrc/m.c line 553
> > > > > > >
> > > > > > > Thread 1 "jconsole" received signal SIGILL, Illegal instruction.
> > > > > > > 0x00007ffff293b2ae in auditsimdelete (jt=0x7ffff1438200,
> > > > > > > w=0x555556496d40) at ../../../../jsrc/m.c:553
> > > > > > > 553      if((delct =
> > > > > > > ((AFLAG(w)+=AFAUDITUC)>>AFAUDITUCX))>ACUC(w))SEGFAULT;   // hang 
> > > > > > > if
> > > > > > > too many deletes
> > > > > > >
> > > > > > > (gdb) where
> > > > > > > #0  0x00007ffff293b2ae in auditsimdelete (jt=0x7ffff1438200,
> > > > > > > w=0x555556496d40) at ../../../../jsrc/m.c:553
> > > > > > > #1  0x00007ffff293b6da in auditsimdelete (jt=0x7ffff1438200,
> > > > > > > w=0x5555556da4c0) at ../../../../jsrc/m.c:570
> > > > > > > #2  0x00007ffff293ab14 in audittstack (jt=0x7ffff1438200) at
> > > > > > > ../../../../jsrc/m.c:648
> > > > > > > #3  0x00007ffff2970041 in jtnamerefacv (jt=0x7ffff1438200,
> > > > > > > a=0x5555564b8840, val=0x5555564c31d8)
> > > > > > >     at ../../../../jsrc/sc.c:355
> > > > > > > #4  0x00007ffff2944f9e in jtparsea (jt=0x7ffff1438200,
> > > > > > > queue=0x55555598b838, nwds=23) at ../../../../jsrc/p.c:618
> > > > > > > #5  0x00007ffff2944314 in jtparse (jt=0x7ffff1438200,
> > > > > > > w=0x55555598b7c0) at ../../../../jsrc/p.c:290
> > > > > > > #6  0x00007ffff2952b35 in jtimmex (jt=0x7ffff1438200,
> > > > > > > w=0x55555598b7c0) at ../../../../jsrc/px.c:54
> > > > > > > #7  0x00007ffff2952c23 in jtimmea (jt=0x7ffff1438200,
> > > > > > > w=0x55555598b7c0) at ../../../../jsrc/px.c:63
> > > > > > > #8  0x00007ffff2fb7517 in jtline (jt=0x7ffff1438200,
> > > w=0x555555b5d300,
> > > > > > > si=159, ce=3 '\003', tso=1 '\001')
> > > > > > >     at ../../../../jsrc/xs.c:87
> > > > > > > #9  0x00007ffff2fb7b2d in jtlinf (jt=0x7ffff1438200,
> > > a=0x7ffff3a25e80
> > > > > > > <Bmark>, w=0x7fffffffb6c0, ce=3 '\003',
> > > > > > >     tso=1 '\001') at ../../../../jsrc/xs.c:142
> > > > > > > #10 0x00007ffff2fb83e6 in jtscy1 (jt=0x7ffff1438200,
> > > w=0x7fffffffb6c0,
> > > > > > > self=0x5555558f9a00)
> > > > > > >     at ../../../../jsrc/xs.c:174
> > > > > > > #11 0x00007ffff28a8306 in jtrank1ex0 (jt=0x7ffff1438200,
> > > > > > > w=0x555555aaf5c0, fs=0x5555558f9a00,
> > > > > > >     f1=0x7ffff2fb8290 <jtscy1>) at ../../../../jsrc/cr.c:192
> > > > > > > #12 0x00007ffff2fb8366 in jtscy1 (jt=0x7ffff1438200,
> > > w=0x555555aaf5c0,
> > > > > > > self=0x5555558f9a00)
> > > > > > >     at ../../../../jsrc/xs.c:174
> > > > > > > #13 0x00007ffff296e556 in jtunquote (jt=0x7ffff1438200,
> > > > > > > a=0x555555aaf5c0, w=0x5555558f9a00, self=0x5555558f3f00)
> > > > > > >     at ../../../../jsrc/sc.c:163
> > > > > > > #14 0x00007ffff283fbc3 in jtcasei12 (jt=0x7ffff1438200,
> > > > > > > a=0x555555aaf5c0, w=0x555555aaf5c0, self=0x5555558f3d80)
> > > > > > >     at ../../../../jsrc/cg.c:345
> > > > > > > #15 0x00007ffff27f26a6 in on1cell (jt=0x7ffff1438200,
> > > > > > > w=0x555555aaf5c1, self=0x5555558f3a80)
> > > > > > >     at ../../../../jsrc/ca.c:102
> > > > > > > #16 0x00007ffff27f2853 in on1cell (jt=0x7ffff1438200, w=0x0,
> > > > > > > self=0x5555558f3a00) at ../../../../jsrc/ca.c:102
> > > > > > > #17 0x00007ffff296e556 in jtunquote (jt=0x7ffff1438200,
> > > > > > > a=0x555555aaf5c0, w=0x5555558f3a00, self=0x5555555c9140)
> > > > > > >     at ../../../../jsrc/sc.c:163
> > > > > > > #18 0x00007ffff2945a3f in jtparsea (jt=0x7ffff1438200,
> > > > > > > queue=0x5555555c7688, nwds=5) at ../../../../jsrc/p.c:751
> > > > > > > #19 0x00007ffff2944314 in jtparse (jt=0x7ffff1438200,
> > > > > > > w=0x5555555c7640) at ../../../../jsrc/p.c:290
> > > > > > > #20 0x00007ffff2952b35 in jtimmex (jt=0x7ffff1438200,
> > > > > > > w=0x5555558d2300) at ../../../../jsrc/px.c:54
> > > > > > > #21 0x00007ffff291d409 in jtimmexexecct (jt=0x7ffff1438200,
> > > > > > > x=0x5555558d2300) at ../../../../jsrc/io.c:382
> > > > > > > #22 0x00007ffff291d21a in runiep (jjt=0x7ffff1438000,
> > > > > > > jt=0x7ffff1438200, old=0x5555555a1008, savcallstack=0)
> > > > > > >     at ../../../../jsrc/io.c:395
> > > > > > > ...
> > > > > > >
> > > > > > > I have some more experiments I could try (for example, maybe
> > > setting
> > > > > > > LC_ALL=fr_FR.UTF-8 would do something that triggers this error --
> > > it's
> > > > > > > not a locale that my machine recognizes...) But I'm running blind
> > > here
> > > > > > > and some informed guessing would be better than my current
> > > guesswork.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > --
> > > > > > > Raul
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Nov 13, 2023 at 8:13 AM Henry Rich <[email protected]>
> > > > > wrote:
> > > > > > > >
> > > > > > > > I think you are saying that you have a double free: you have an
> > > > > argument
> > > > > > > > that contains deadbeef, indicating that it has been freed.  I
> > > don't
> > > > > > > > think you need to write any code.
> > > > > > > >
> > > > > > > > You need to turn on 0x2 in MEMAUDIT, to engage tstack auditing.
> > > > > Remember
> > > > > > > > that the tstack contains death warrants for blocks that have 
> > > > > > > > been
> > > > > > > > allocated.  The double free happens when there is an erroneous
> > > free
> > > > > that
> > > > > > > > frees the block while the death warrant is still active; the
> > > > > application
> > > > > > > > of the death warrant is the double free, but the error happened
> > > > > earlier.
> > > > > > > >
> > > > > > > > tstack auditing goes through the tstack, counting the number of
> > > death
> > > > > > > > warrants for each block.  If that number exceeds the usecount of
> > > the
> > > > > > > > block, an erroneopus free has occurred and the audit segfaults.
> > > You
> > > > > can
> > > > > > > > add calls to the tstack auditor in your code to narrow down the
> > > > > source
> > > > > > > > of the error.
> > > > > > > >
> > > > > > > > If you still want to see all allocations, they go through
> > > jtgaf() in
> > > > > m.c.
> > > > > > > >
> > > > > > > > hhr
> > > > > > > >
> > > > > > > > On 11/13/2023 12:51 AM, Raul Miller wrote:
> > > > > > > > > The problem i'm experiencing is that I'm adding two extended
> > > > > precision
> > > > > > > > > numbers and I get a segfault because one has a corrupted 
> > > > > > > > > memory
> > > > > > > > > address. I turn on MEMAUDIT=0x1d and I get an ARGCHK failure
> > > > > because
> > > > > > > > > one of the arguments is has low bits set in flags (and is
> > > > > 0xdeadbeef).
> > > > > > > > >
> > > > > > > > > So, I have a memory address which was allocated and I need to
> > > "go
> > > > > back
> > > > > > > > > in time" to see what's happening with that memory address. (If
> > > it
> > > > > > > > > changes in response to my code update, it presumably would 
> > > > > > > > > only
> > > > > change
> > > > > > > > > once - not when I only change the numeric value that I'm
> > > searching
> > > > > > > > > for.)
> > > > > > > > >
> > > > > > > > > In other words, I want to create a routine deadcheck() which
> > > > > reports
> > > > > > > > > when it's being called with an address which matches the
> > > failing
> > > > > > > > > address, along with a counter. Once I have this information, I
> > > can
> > > > > > > > > perform a run where I stop when I reach a certain count of the
> > > > > > > > > appearance of that memory address (or maybe every time, if the
> > > > > total
> > > > > > > > > count is low), and inspect the stack to see what's going on
> > > there.
> > > > > I
> > > > > > > > > am hoping that with this information I can zero in on what is
> > > being
> > > > > > > > > corrupted, and when.
> > > > > > > > >
> > > > > > > > > But, to do this, I need to run deadcheck every time memory 
> > > > > > > > > gets
> > > > > > > > > allocated / handed to J as a new empty array. (I also put in a
> > > > > > > > > deadcheck in the gmp memory allocator, of course.)
> > > > > > > > >
> > > > > > > > > So, ... I'm wondering where I can check all memory 
> > > > > > > > > allocations.
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Raul
> > > > > > > > >
> > > > > > > > > On Sun, Nov 12, 2023 at 8:40 PM Henry Rich <
> > > [email protected]>
> > > > > > > wrote:
> > > > > > > > >> I'm not sure I understand what you want to know.  The tpop
> > > stack
> > > > > holds
> > > > > > > > >> the death warrants for recently allocated blocks.
> > > > > > > > >>
> > > > > > > > >> hhr
> > > > > > > > >>
> > > > > > > > >> On 11/12/2023 6:56 PM, Raul Miller wrote:
> > > > > > > > >>> If I want to test all the addresses of "newly allocated
> > > pointers
> > > > > to
> > > > > > > > >>> memory" from m.c, where should I do that?
> > > > > > > > >>>
> > > > > > > > >>> Thanks,
> > > > > > > > >>>
> > > > > > > > >>
> > > > > ----------------------------------------------------------------------
> > > > > > > > >> For information about J forums see
> > > > > > > http://www.jsoftware.com/forums.htm
> > > > > > > > >
> > > > > ----------------------------------------------------------------------
> > > > > > > > > For information about J forums see
> > > > > http://www.jsoftware.com/forums.htm
> > > > > > > >
> > > > > > > >
> > > > > ----------------------------------------------------------------------
> > > > > > > > For information about J forums see
> > > > > http://www.jsoftware.com/forums.htm
> > > > > > >
> > > ----------------------------------------------------------------------
> > > > > > > For information about J forums see
> > > http://www.jsoftware.com/forums.htm
> > > > > > >
> > > > > >
> > > ----------------------------------------------------------------------
> > > > > > For information about J forums see
> > > http://www.jsoftware.com/forums.htm
> > > > > ----------------------------------------------------------------------
> > > > > For information about J forums see http://www.jsoftware.com/forums.htm
> > > > >
> > > > ----------------------------------------------------------------------
> > > > For information about J forums see http://www.jsoftware.com/forums.htm
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to