I forgot that the mail server filters out .ijs files.

Hopefully, the attachment will come through this time, with the
extension changed to .txt

(Also, I have seen some odd behavior in my testing, and I have some
concern that this issue might have a hardware-dependent character. So
it would be good to find out if there's machines which do not segfault
with this script with mem audit enabled on the gmp-redo0 branch.)

Thanks,

-- 
Raul


On Tue, Nov 21, 2023 at 5:42 PM Raul Miller <rauldmil...@gmail.com> wrote:
>
> The attached script is a relatively minimal script which triggers the
> error I have been encountering in the gmp-redo0 branch with
> -DMEMAUDIT=0x2. I do not yet know if anyone else can reproduce this
> error. The setup to trigger the problem is rather extensive.
>
> It consists of lines from test/g000i.ijs and test/g300.ijs (and in a
> few cases I've also trimmed the lines slightly, but mostly they are
> unaltered).
>
> This script needs about 2 and a half minutes on the little laptop I've
> been using. Most of that time is in the line with the BLAS comment.
>
> FYI,
>
> --
> Raul
>
> On Sun, Nov 19, 2023 at 12:23 AM Raul Miller <rauldmil...@gmail.com> wrote:
> >
> > Turns out my methodology was wrong (I re-ran the tests logging each
> > test's output to file, and found five different instances where J
> > crashed - I had not rigged up my tests to a debugger, so there was no
> > command line to fall back to).
> >
> > I can start narrowing this down now,
> >
> > Thanks,
> >
> > --
> > Raul
> >
> > On Sat, Nov 18, 2023 at 8:21 AM Raul Miller <rauldmil...@gmail.com> wrote:
> > >
> > > Well... I've run 19 batches of tests, breaking up the test cases
> > > preceding g300.ijs, grouping the test files by the first three
> > > characters of their file names (and then running g300 after each of
> > > the batches, but starting a new jconsole for the next batch). All
> > > tests passed successfully (jconsole did not unlock for input in any of
> > > these cases).
> > >
> > > So this is looking more like an exhaustion or overflow issue than
> > > something with a direct trigger.
> > >
> > > (Or maybe something intermittent? But the error seemed reliable when
> > > running the full test suite.)
> > >
> > > --
> > > Raul
> > >
> > > On Fri, Nov 17, 2023 at 3:44 PM Henry Rich <henryhr...@gmail.com> wrote:
> > > >
> > > > That's what I was trying to say. Run the last 20 testcases that come 
> > > > before
> > > > g300.  If that fails, reduce the number as far as you can.
> > > >
> > > > hhr
> > > >
> > > > On Fri, Nov 17, 2023, 12:34 PM Raul Miller <rauldmil...@gmail.com> 
> > > > wrote:
> > > >
> > > > >    RUN4 testfiles 'g300'
> > > > >
> > > > > completes without error.
> > > > >
> > > > > So I think something in the testga.ijs testfiles before g300 is
> > > > > necessary to condition the namespaces (or whatever) so that this
> > > > > double free error is triggered.
> > > > >
> > > > > --
> > > > > Raul
> > > > >
> > > > >
> > > > > --
> > > > > Raul
> > > > >
> > > > > On Fri, Nov 17, 2023 at 12:21 PM Henry Rich <henryhr...@gmail.com> 
> > > > > wrote:
> > > > > >
> > > > > > It is not unusual for the failing script to run by itself. Run the 
> > > > > > last
> > > > > few
> > > > > > scripts ending in g300 and see how many it takes to make it fail.
> > > > > >
> > > > > > hhr
> > > > > >
> > > > > > On Fri, Nov 17, 2023, 11:54 AM Raul Miller <rauldmil...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Sadly, the error is not totally repeatable.
> > > > > > >
> > > > > > > (Running g300 by itself worked.)
> > > > > > >
> > > > > > > So I'm looking at a couple of days for each attempt to reproduce 
> > > > > > > the
> > > > > issue.
> > > > > > >
> > > > > > > --
> > > > > > > Raul
> > > > > > >
> > > > > > > On Fri, Nov 17, 2023 at 9:41 AM Henry Rich <henryhr...@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > Looking at the stack, it crashes while creating a nameref,
> > > > > presumably for
> > > > > > > > eqf. That suggests that the error is during the assignment to 
> > > > > > > > m. I
> > > > > would
> > > > > > > > put calls around the parser to get an earlier failure.
> > > > > > > >
> > > > > > > > If the failure is totally repeatable you can turn off the audits
> > > > > until
> > > > > > > > jt->parsercalls gets near the error.
> > > > > > > >
> > > > > > > > hhr
> > > > > > > >
> > > > > > > > On Fri, Nov 17, 2023, 9:21 AM Raul Miller 
> > > > > > > > <rauldmil...@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > The test suite under MEMAUDIT=0x2 is quite slow, even with
> > > > > QKTEST=:1.
> > > > > > > > > And, sometimes windows will reboot itself and/or lose 
> > > > > > > > > configuration
> > > > > > > > > information. (I'm running debian 12.2 under windows wsl2.)
> > > > > > > > >
> > > > > > > > > But, I've found a line that fails the test suite on line 264 
> > > > > > > > > of
> > > > > g300.
> > > > > > > > > Unfortunately, it has not been reproducible running g300 in
> > > > > isolation
> > > > > > > > > under the default j64/jconsole instance.
> > > > > > > > >
> > > > > > > > > But, for now at least, I have a live session with the stack 
> > > > > > > > > at the
> > > > > > > > > point where the fault was detected, in case further 
> > > > > > > > > inspection of
> > > > > > > > > details might be useful here:
> > > > > > > > >
> > > > > > > > >    (-"+/ .* eqf -/ .*) m=: %/1+?2 4 4$200x
> > > > > > > > > trap : file ../../../../jsrc/m.c line 553
> > > > > > > > >
> > > > > > > > > Thread 1 "jconsole" received signal SIGILL, Illegal 
> > > > > > > > > instruction.
> > > > > > > > > 0x00007ffff293b2ae in auditsimdelete (jt=0x7ffff1438200,
> > > > > > > > > w=0x555556496d40) at ../../../../jsrc/m.c:553
> > > > > > > > > 553      if((delct =
> > > > > > > > > ((AFLAG(w)+=AFAUDITUC)>>AFAUDITUCX))>ACUC(w))SEGFAULT;   // 
> > > > > > > > > hang if
> > > > > > > > > too many deletes
> > > > > > > > >
> > > > > > > > > (gdb) where
> > > > > > > > > #0  0x00007ffff293b2ae in auditsimdelete (jt=0x7ffff1438200,
> > > > > > > > > w=0x555556496d40) at ../../../../jsrc/m.c:553
> > > > > > > > > #1  0x00007ffff293b6da in auditsimdelete (jt=0x7ffff1438200,
> > > > > > > > > w=0x5555556da4c0) at ../../../../jsrc/m.c:570
> > > > > > > > > #2  0x00007ffff293ab14 in audittstack (jt=0x7ffff1438200) at
> > > > > > > > > ../../../../jsrc/m.c:648
> > > > > > > > > #3  0x00007ffff2970041 in jtnamerefacv (jt=0x7ffff1438200,
> > > > > > > > > a=0x5555564b8840, val=0x5555564c31d8)
> > > > > > > > >     at ../../../../jsrc/sc.c:355
> > > > > > > > > #4  0x00007ffff2944f9e in jtparsea (jt=0x7ffff1438200,
> > > > > > > > > queue=0x55555598b838, nwds=23) at ../../../../jsrc/p.c:618
> > > > > > > > > #5  0x00007ffff2944314 in jtparse (jt=0x7ffff1438200,
> > > > > > > > > w=0x55555598b7c0) at ../../../../jsrc/p.c:290
> > > > > > > > > #6  0x00007ffff2952b35 in jtimmex (jt=0x7ffff1438200,
> > > > > > > > > w=0x55555598b7c0) at ../../../../jsrc/px.c:54
> > > > > > > > > #7  0x00007ffff2952c23 in jtimmea (jt=0x7ffff1438200,
> > > > > > > > > w=0x55555598b7c0) at ../../../../jsrc/px.c:63
> > > > > > > > > #8  0x00007ffff2fb7517 in jtline (jt=0x7ffff1438200,
> > > > > w=0x555555b5d300,
> > > > > > > > > si=159, ce=3 '\003', tso=1 '\001')
> > > > > > > > >     at ../../../../jsrc/xs.c:87
> > > > > > > > > #9  0x00007ffff2fb7b2d in jtlinf (jt=0x7ffff1438200,
> > > > > a=0x7ffff3a25e80
> > > > > > > > > <Bmark>, w=0x7fffffffb6c0, ce=3 '\003',
> > > > > > > > >     tso=1 '\001') at ../../../../jsrc/xs.c:142
> > > > > > > > > #10 0x00007ffff2fb83e6 in jtscy1 (jt=0x7ffff1438200,
> > > > > w=0x7fffffffb6c0,
> > > > > > > > > self=0x5555558f9a00)
> > > > > > > > >     at ../../../../jsrc/xs.c:174
> > > > > > > > > #11 0x00007ffff28a8306 in jtrank1ex0 (jt=0x7ffff1438200,
> > > > > > > > > w=0x555555aaf5c0, fs=0x5555558f9a00,
> > > > > > > > >     f1=0x7ffff2fb8290 <jtscy1>) at ../../../../jsrc/cr.c:192
> > > > > > > > > #12 0x00007ffff2fb8366 in jtscy1 (jt=0x7ffff1438200,
> > > > > w=0x555555aaf5c0,
> > > > > > > > > self=0x5555558f9a00)
> > > > > > > > >     at ../../../../jsrc/xs.c:174
> > > > > > > > > #13 0x00007ffff296e556 in jtunquote (jt=0x7ffff1438200,
> > > > > > > > > a=0x555555aaf5c0, w=0x5555558f9a00, self=0x5555558f3f00)
> > > > > > > > >     at ../../../../jsrc/sc.c:163
> > > > > > > > > #14 0x00007ffff283fbc3 in jtcasei12 (jt=0x7ffff1438200,
> > > > > > > > > a=0x555555aaf5c0, w=0x555555aaf5c0, self=0x5555558f3d80)
> > > > > > > > >     at ../../../../jsrc/cg.c:345
> > > > > > > > > #15 0x00007ffff27f26a6 in on1cell (jt=0x7ffff1438200,
> > > > > > > > > w=0x555555aaf5c1, self=0x5555558f3a80)
> > > > > > > > >     at ../../../../jsrc/ca.c:102
> > > > > > > > > #16 0x00007ffff27f2853 in on1cell (jt=0x7ffff1438200, w=0x0,
> > > > > > > > > self=0x5555558f3a00) at ../../../../jsrc/ca.c:102
> > > > > > > > > #17 0x00007ffff296e556 in jtunquote (jt=0x7ffff1438200,
> > > > > > > > > a=0x555555aaf5c0, w=0x5555558f3a00, self=0x5555555c9140)
> > > > > > > > >     at ../../../../jsrc/sc.c:163
> > > > > > > > > #18 0x00007ffff2945a3f in jtparsea (jt=0x7ffff1438200,
> > > > > > > > > queue=0x5555555c7688, nwds=5) at ../../../../jsrc/p.c:751
> > > > > > > > > #19 0x00007ffff2944314 in jtparse (jt=0x7ffff1438200,
> > > > > > > > > w=0x5555555c7640) at ../../../../jsrc/p.c:290
> > > > > > > > > #20 0x00007ffff2952b35 in jtimmex (jt=0x7ffff1438200,
> > > > > > > > > w=0x5555558d2300) at ../../../../jsrc/px.c:54
> > > > > > > > > #21 0x00007ffff291d409 in jtimmexexecct (jt=0x7ffff1438200,
> > > > > > > > > x=0x5555558d2300) at ../../../../jsrc/io.c:382
> > > > > > > > > #22 0x00007ffff291d21a in runiep (jjt=0x7ffff1438000,
> > > > > > > > > jt=0x7ffff1438200, old=0x5555555a1008, savcallstack=0)
> > > > > > > > >     at ../../../../jsrc/io.c:395
> > > > > > > > > ...
> > > > > > > > >
> > > > > > > > > I have some more experiments I could try (for example, maybe
> > > > > setting
> > > > > > > > > LC_ALL=fr_FR.UTF-8 would do something that triggers this 
> > > > > > > > > error --
> > > > > it's
> > > > > > > > > not a locale that my machine recognizes...) But I'm running 
> > > > > > > > > blind
> > > > > here
> > > > > > > > > and some informed guessing would be better than my current
> > > > > guesswork.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Raul
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Nov 13, 2023 at 8:13 AM Henry Rich 
> > > > > > > > > <henryhr...@gmail.com>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > I think you are saying that you have a double free: you 
> > > > > > > > > > have an
> > > > > > > argument
> > > > > > > > > > that contains deadbeef, indicating that it has been freed.  
> > > > > > > > > > I
> > > > > don't
> > > > > > > > > > think you need to write any code.
> > > > > > > > > >
> > > > > > > > > > You need to turn on 0x2 in MEMAUDIT, to engage tstack 
> > > > > > > > > > auditing.
> > > > > > > Remember
> > > > > > > > > > that the tstack contains death warrants for blocks that 
> > > > > > > > > > have been
> > > > > > > > > > allocated.  The double free happens when there is an 
> > > > > > > > > > erroneous
> > > > > free
> > > > > > > that
> > > > > > > > > > frees the block while the death warrant is still active; the
> > > > > > > application
> > > > > > > > > > of the death warrant is the double free, but the error 
> > > > > > > > > > happened
> > > > > > > earlier.
> > > > > > > > > >
> > > > > > > > > > tstack auditing goes through the tstack, counting the 
> > > > > > > > > > number of
> > > > > death
> > > > > > > > > > warrants for each block.  If that number exceeds the 
> > > > > > > > > > usecount of
> > > > > the
> > > > > > > > > > block, an erroneopus free has occurred and the audit 
> > > > > > > > > > segfaults.
> > > > > You
> > > > > > > can
> > > > > > > > > > add calls to the tstack auditor in your code to narrow down 
> > > > > > > > > > the
> > > > > > > source
> > > > > > > > > > of the error.
> > > > > > > > > >
> > > > > > > > > > If you still want to see all allocations, they go through
> > > > > jtgaf() in
> > > > > > > m.c.
> > > > > > > > > >
> > > > > > > > > > hhr
> > > > > > > > > >
> > > > > > > > > > On 11/13/2023 12:51 AM, Raul Miller wrote:
> > > > > > > > > > > The problem i'm experiencing is that I'm adding two 
> > > > > > > > > > > extended
> > > > > > > precision
> > > > > > > > > > > numbers and I get a segfault because one has a corrupted 
> > > > > > > > > > > memory
> > > > > > > > > > > address. I turn on MEMAUDIT=0x1d and I get an ARGCHK 
> > > > > > > > > > > failure
> > > > > > > because
> > > > > > > > > > > one of the arguments is has low bits set in flags (and is
> > > > > > > 0xdeadbeef).
> > > > > > > > > > >
> > > > > > > > > > > So, I have a memory address which was allocated and I 
> > > > > > > > > > > need to
> > > > > "go
> > > > > > > back
> > > > > > > > > > > in time" to see what's happening with that memory 
> > > > > > > > > > > address. (If
> > > > > it
> > > > > > > > > > > changes in response to my code update, it presumably 
> > > > > > > > > > > would only
> > > > > > > change
> > > > > > > > > > > once - not when I only change the numeric value that I'm
> > > > > searching
> > > > > > > > > > > for.)
> > > > > > > > > > >
> > > > > > > > > > > In other words, I want to create a routine deadcheck() 
> > > > > > > > > > > which
> > > > > > > reports
> > > > > > > > > > > when it's being called with an address which matches the
> > > > > failing
> > > > > > > > > > > address, along with a counter. Once I have this 
> > > > > > > > > > > information, I
> > > > > can
> > > > > > > > > > > perform a run where I stop when I reach a certain count 
> > > > > > > > > > > of the
> > > > > > > > > > > appearance of that memory address (or maybe every time, 
> > > > > > > > > > > if the
> > > > > > > total
> > > > > > > > > > > count is low), and inspect the stack to see what's going 
> > > > > > > > > > > on
> > > > > there.
> > > > > > > I
> > > > > > > > > > > am hoping that with this information I can zero in on 
> > > > > > > > > > > what is
> > > > > being
> > > > > > > > > > > corrupted, and when.
> > > > > > > > > > >
> > > > > > > > > > > But, to do this, I need to run deadcheck every time 
> > > > > > > > > > > memory gets
> > > > > > > > > > > allocated / handed to J as a new empty array. (I also put 
> > > > > > > > > > > in a
> > > > > > > > > > > deadcheck in the gmp memory allocator, of course.)
> > > > > > > > > > >
> > > > > > > > > > > So, ... I'm wondering where I can check all memory 
> > > > > > > > > > > allocations.
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Raul
> > > > > > > > > > >
> > > > > > > > > > > On Sun, Nov 12, 2023 at 8:40 PM Henry Rich <
> > > > > henryhr...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > >> I'm not sure I understand what you want to know.  The 
> > > > > > > > > > >> tpop
> > > > > stack
> > > > > > > holds
> > > > > > > > > > >> the death warrants for recently allocated blocks.
> > > > > > > > > > >>
> > > > > > > > > > >> hhr
> > > > > > > > > > >>
> > > > > > > > > > >> On 11/12/2023 6:56 PM, Raul Miller wrote:
> > > > > > > > > > >>> If I want to test all the addresses of "newly allocated
> > > > > pointers
> > > > > > > to
> > > > > > > > > > >>> memory" from m.c, where should I do that?
> > > > > > > > > > >>>
> > > > > > > > > > >>> Thanks,
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > ----------------------------------------------------------------------
> > > > > > > > > > >> For information about J forums see
> > > > > > > > > http://www.jsoftware.com/forums.htm
> > > > > > > > > > >
> > > > > > > ----------------------------------------------------------------------
> > > > > > > > > > > For information about J forums see
> > > > > > > http://www.jsoftware.com/forums.htm
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > ----------------------------------------------------------------------
> > > > > > > > > > For information about J forums see
> > > > > > > http://www.jsoftware.com/forums.htm
> > > > > > > > >
> > > > > ----------------------------------------------------------------------
> > > > > > > > > For information about J forums see
> > > > > http://www.jsoftware.com/forums.htm
> > > > > > > > >
> > > > > > > >
> > > > > ----------------------------------------------------------------------
> > > > > > > > For information about J forums see
> > > > > http://www.jsoftware.com/forums.htm
> > > > > > > ----------------------------------------------------------------------
> > > > > > > For information about J forums see 
> > > > > > > http://www.jsoftware.com/forums.htm
> > > > > > >
> > > > > > ----------------------------------------------------------------------
> > > > > > For information about J forums see 
> > > > > > http://www.jsoftware.com/forums.htm
> > > > > ----------------------------------------------------------------------
> > > > > For information about J forums see http://www.jsoftware.com/forums.htm
> > > > >
> > > > ----------------------------------------------------------------------
> > > > For information about J forums see http://www.jsoftware.com/forums.htm
eq=: 4 : 'x=y'
x=:?3 8 32$2
f=: 3 : '(=/ -: eq/) ?y$2'
,f"1 x=:7 8 9,."0 1 [ _1 0 1+  255
test =: +/ .* -: +/@(*"1 _)
f=: 4 : 0 NB. test float variants
 p =. 1 + ? 32
 xx=: (x,p) ?@$ 0
 yy=: (p,y) ?@$ 0
 assert. xx test yy
 1
)
(200000 + i. 1) f"0 ] 10 + i. 1  NB. BLAS - just 2 times, since the test is slow
(-/ .* -:!.5e_11 */@((<0 1)&|:)) m=:(<:/~i.6) * 6 6 ?@$ 0
(-/ .* -:!.5e_11 */@((<0 1)&|:)) m=:(<:/~i.7) * 7 7 ?@$ 0
eqf=: 4 : 0
 (x -:!.t y) +. (t>|x) *. t>|y [ t=. 2^_34
)
(-"+/ .* eqf -/ .*) m=: _100+?7 7$200
(-"+/ .* eqf -/ .*) m=: _100+?7 7$200x
(-"+/ .* eqf -/ .*) m=: %/1+?2 4 4$200x  NB. <-- segfault here


(-"+/ .* eqf -/ .*) m=: %/1+?2 5 5$200x
(-"+/ .* eqf -/ .*) m=: %/1+?2 6 6$200x
_= (-/ .*) x: 4 4$_ __ 0 0 1 1 0 0 0 0 1 0 0 0 0 1         NB. test for crash
1= (+/ .*) ::1: x: 4 4$_ __ 0 0 1 1 0 0 0 0 1 0 0 0 0 1    NB. test for crash
exit''

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to