I forgot that the mail server filters out .ijs files. Hopefully, the attachment will come through this time, with the extension changed to .txt
(Also, I have seen some odd behavior in my testing, and I have some concern that this issue might have a hardware-dependent character. So it would be good to find out if there's machines which do not segfault with this script with mem audit enabled on the gmp-redo0 branch.) Thanks, -- Raul On Tue, Nov 21, 2023 at 5:42 PM Raul Miller <rauldmil...@gmail.com> wrote: > > The attached script is a relatively minimal script which triggers the > error I have been encountering in the gmp-redo0 branch with > -DMEMAUDIT=0x2. I do not yet know if anyone else can reproduce this > error. The setup to trigger the problem is rather extensive. > > It consists of lines from test/g000i.ijs and test/g300.ijs (and in a > few cases I've also trimmed the lines slightly, but mostly they are > unaltered). > > This script needs about 2 and a half minutes on the little laptop I've > been using. Most of that time is in the line with the BLAS comment. > > FYI, > > -- > Raul > > On Sun, Nov 19, 2023 at 12:23 AM Raul Miller <rauldmil...@gmail.com> wrote: > > > > Turns out my methodology was wrong (I re-ran the tests logging each > > test's output to file, and found five different instances where J > > crashed - I had not rigged up my tests to a debugger, so there was no > > command line to fall back to). > > > > I can start narrowing this down now, > > > > Thanks, > > > > -- > > Raul > > > > On Sat, Nov 18, 2023 at 8:21 AM Raul Miller <rauldmil...@gmail.com> wrote: > > > > > > Well... I've run 19 batches of tests, breaking up the test cases > > > preceding g300.ijs, grouping the test files by the first three > > > characters of their file names (and then running g300 after each of > > > the batches, but starting a new jconsole for the next batch). All > > > tests passed successfully (jconsole did not unlock for input in any of > > > these cases). > > > > > > So this is looking more like an exhaustion or overflow issue than > > > something with a direct trigger. > > > > > > (Or maybe something intermittent? But the error seemed reliable when > > > running the full test suite.) > > > > > > -- > > > Raul > > > > > > On Fri, Nov 17, 2023 at 3:44 PM Henry Rich <henryhr...@gmail.com> wrote: > > > > > > > > That's what I was trying to say. Run the last 20 testcases that come > > > > before > > > > g300. If that fails, reduce the number as far as you can. > > > > > > > > hhr > > > > > > > > On Fri, Nov 17, 2023, 12:34 PM Raul Miller <rauldmil...@gmail.com> > > > > wrote: > > > > > > > > > RUN4 testfiles 'g300' > > > > > > > > > > completes without error. > > > > > > > > > > So I think something in the testga.ijs testfiles before g300 is > > > > > necessary to condition the namespaces (or whatever) so that this > > > > > double free error is triggered. > > > > > > > > > > -- > > > > > Raul > > > > > > > > > > > > > > > -- > > > > > Raul > > > > > > > > > > On Fri, Nov 17, 2023 at 12:21 PM Henry Rich <henryhr...@gmail.com> > > > > > wrote: > > > > > > > > > > > > It is not unusual for the failing script to run by itself. Run the > > > > > > last > > > > > few > > > > > > scripts ending in g300 and see how many it takes to make it fail. > > > > > > > > > > > > hhr > > > > > > > > > > > > On Fri, Nov 17, 2023, 11:54 AM Raul Miller <rauldmil...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > Sadly, the error is not totally repeatable. > > > > > > > > > > > > > > (Running g300 by itself worked.) > > > > > > > > > > > > > > So I'm looking at a couple of days for each attempt to reproduce > > > > > > > the > > > > > issue. > > > > > > > > > > > > > > -- > > > > > > > Raul > > > > > > > > > > > > > > On Fri, Nov 17, 2023 at 9:41 AM Henry Rich <henryhr...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > > Looking at the stack, it crashes while creating a nameref, > > > > > presumably for > > > > > > > > eqf. That suggests that the error is during the assignment to > > > > > > > > m. I > > > > > would > > > > > > > > put calls around the parser to get an earlier failure. > > > > > > > > > > > > > > > > If the failure is totally repeatable you can turn off the audits > > > > > until > > > > > > > > jt->parsercalls gets near the error. > > > > > > > > > > > > > > > > hhr > > > > > > > > > > > > > > > > On Fri, Nov 17, 2023, 9:21 AM Raul Miller > > > > > > > > <rauldmil...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > > > The test suite under MEMAUDIT=0x2 is quite slow, even with > > > > > QKTEST=:1. > > > > > > > > > And, sometimes windows will reboot itself and/or lose > > > > > > > > > configuration > > > > > > > > > information. (I'm running debian 12.2 under windows wsl2.) > > > > > > > > > > > > > > > > > > But, I've found a line that fails the test suite on line 264 > > > > > > > > > of > > > > > g300. > > > > > > > > > Unfortunately, it has not been reproducible running g300 in > > > > > isolation > > > > > > > > > under the default j64/jconsole instance. > > > > > > > > > > > > > > > > > > But, for now at least, I have a live session with the stack > > > > > > > > > at the > > > > > > > > > point where the fault was detected, in case further > > > > > > > > > inspection of > > > > > > > > > details might be useful here: > > > > > > > > > > > > > > > > > > (-"+/ .* eqf -/ .*) m=: %/1+?2 4 4$200x > > > > > > > > > trap : file ../../../../jsrc/m.c line 553 > > > > > > > > > > > > > > > > > > Thread 1 "jconsole" received signal SIGILL, Illegal > > > > > > > > > instruction. > > > > > > > > > 0x00007ffff293b2ae in auditsimdelete (jt=0x7ffff1438200, > > > > > > > > > w=0x555556496d40) at ../../../../jsrc/m.c:553 > > > > > > > > > 553 if((delct = > > > > > > > > > ((AFLAG(w)+=AFAUDITUC)>>AFAUDITUCX))>ACUC(w))SEGFAULT; // > > > > > > > > > hang if > > > > > > > > > too many deletes > > > > > > > > > > > > > > > > > > (gdb) where > > > > > > > > > #0 0x00007ffff293b2ae in auditsimdelete (jt=0x7ffff1438200, > > > > > > > > > w=0x555556496d40) at ../../../../jsrc/m.c:553 > > > > > > > > > #1 0x00007ffff293b6da in auditsimdelete (jt=0x7ffff1438200, > > > > > > > > > w=0x5555556da4c0) at ../../../../jsrc/m.c:570 > > > > > > > > > #2 0x00007ffff293ab14 in audittstack (jt=0x7ffff1438200) at > > > > > > > > > ../../../../jsrc/m.c:648 > > > > > > > > > #3 0x00007ffff2970041 in jtnamerefacv (jt=0x7ffff1438200, > > > > > > > > > a=0x5555564b8840, val=0x5555564c31d8) > > > > > > > > > at ../../../../jsrc/sc.c:355 > > > > > > > > > #4 0x00007ffff2944f9e in jtparsea (jt=0x7ffff1438200, > > > > > > > > > queue=0x55555598b838, nwds=23) at ../../../../jsrc/p.c:618 > > > > > > > > > #5 0x00007ffff2944314 in jtparse (jt=0x7ffff1438200, > > > > > > > > > w=0x55555598b7c0) at ../../../../jsrc/p.c:290 > > > > > > > > > #6 0x00007ffff2952b35 in jtimmex (jt=0x7ffff1438200, > > > > > > > > > w=0x55555598b7c0) at ../../../../jsrc/px.c:54 > > > > > > > > > #7 0x00007ffff2952c23 in jtimmea (jt=0x7ffff1438200, > > > > > > > > > w=0x55555598b7c0) at ../../../../jsrc/px.c:63 > > > > > > > > > #8 0x00007ffff2fb7517 in jtline (jt=0x7ffff1438200, > > > > > w=0x555555b5d300, > > > > > > > > > si=159, ce=3 '\003', tso=1 '\001') > > > > > > > > > at ../../../../jsrc/xs.c:87 > > > > > > > > > #9 0x00007ffff2fb7b2d in jtlinf (jt=0x7ffff1438200, > > > > > a=0x7ffff3a25e80 > > > > > > > > > <Bmark>, w=0x7fffffffb6c0, ce=3 '\003', > > > > > > > > > tso=1 '\001') at ../../../../jsrc/xs.c:142 > > > > > > > > > #10 0x00007ffff2fb83e6 in jtscy1 (jt=0x7ffff1438200, > > > > > w=0x7fffffffb6c0, > > > > > > > > > self=0x5555558f9a00) > > > > > > > > > at ../../../../jsrc/xs.c:174 > > > > > > > > > #11 0x00007ffff28a8306 in jtrank1ex0 (jt=0x7ffff1438200, > > > > > > > > > w=0x555555aaf5c0, fs=0x5555558f9a00, > > > > > > > > > f1=0x7ffff2fb8290 <jtscy1>) at ../../../../jsrc/cr.c:192 > > > > > > > > > #12 0x00007ffff2fb8366 in jtscy1 (jt=0x7ffff1438200, > > > > > w=0x555555aaf5c0, > > > > > > > > > self=0x5555558f9a00) > > > > > > > > > at ../../../../jsrc/xs.c:174 > > > > > > > > > #13 0x00007ffff296e556 in jtunquote (jt=0x7ffff1438200, > > > > > > > > > a=0x555555aaf5c0, w=0x5555558f9a00, self=0x5555558f3f00) > > > > > > > > > at ../../../../jsrc/sc.c:163 > > > > > > > > > #14 0x00007ffff283fbc3 in jtcasei12 (jt=0x7ffff1438200, > > > > > > > > > a=0x555555aaf5c0, w=0x555555aaf5c0, self=0x5555558f3d80) > > > > > > > > > at ../../../../jsrc/cg.c:345 > > > > > > > > > #15 0x00007ffff27f26a6 in on1cell (jt=0x7ffff1438200, > > > > > > > > > w=0x555555aaf5c1, self=0x5555558f3a80) > > > > > > > > > at ../../../../jsrc/ca.c:102 > > > > > > > > > #16 0x00007ffff27f2853 in on1cell (jt=0x7ffff1438200, w=0x0, > > > > > > > > > self=0x5555558f3a00) at ../../../../jsrc/ca.c:102 > > > > > > > > > #17 0x00007ffff296e556 in jtunquote (jt=0x7ffff1438200, > > > > > > > > > a=0x555555aaf5c0, w=0x5555558f3a00, self=0x5555555c9140) > > > > > > > > > at ../../../../jsrc/sc.c:163 > > > > > > > > > #18 0x00007ffff2945a3f in jtparsea (jt=0x7ffff1438200, > > > > > > > > > queue=0x5555555c7688, nwds=5) at ../../../../jsrc/p.c:751 > > > > > > > > > #19 0x00007ffff2944314 in jtparse (jt=0x7ffff1438200, > > > > > > > > > w=0x5555555c7640) at ../../../../jsrc/p.c:290 > > > > > > > > > #20 0x00007ffff2952b35 in jtimmex (jt=0x7ffff1438200, > > > > > > > > > w=0x5555558d2300) at ../../../../jsrc/px.c:54 > > > > > > > > > #21 0x00007ffff291d409 in jtimmexexecct (jt=0x7ffff1438200, > > > > > > > > > x=0x5555558d2300) at ../../../../jsrc/io.c:382 > > > > > > > > > #22 0x00007ffff291d21a in runiep (jjt=0x7ffff1438000, > > > > > > > > > jt=0x7ffff1438200, old=0x5555555a1008, savcallstack=0) > > > > > > > > > at ../../../../jsrc/io.c:395 > > > > > > > > > ... > > > > > > > > > > > > > > > > > > I have some more experiments I could try (for example, maybe > > > > > setting > > > > > > > > > LC_ALL=fr_FR.UTF-8 would do something that triggers this > > > > > > > > > error -- > > > > > it's > > > > > > > > > not a locale that my machine recognizes...) But I'm running > > > > > > > > > blind > > > > > here > > > > > > > > > and some informed guessing would be better than my current > > > > > guesswork. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Raul > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Nov 13, 2023 at 8:13 AM Henry Rich > > > > > > > > > <henryhr...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > I think you are saying that you have a double free: you > > > > > > > > > > have an > > > > > > > argument > > > > > > > > > > that contains deadbeef, indicating that it has been freed. > > > > > > > > > > I > > > > > don't > > > > > > > > > > think you need to write any code. > > > > > > > > > > > > > > > > > > > > You need to turn on 0x2 in MEMAUDIT, to engage tstack > > > > > > > > > > auditing. > > > > > > > Remember > > > > > > > > > > that the tstack contains death warrants for blocks that > > > > > > > > > > have been > > > > > > > > > > allocated. The double free happens when there is an > > > > > > > > > > erroneous > > > > > free > > > > > > > that > > > > > > > > > > frees the block while the death warrant is still active; the > > > > > > > application > > > > > > > > > > of the death warrant is the double free, but the error > > > > > > > > > > happened > > > > > > > earlier. > > > > > > > > > > > > > > > > > > > > tstack auditing goes through the tstack, counting the > > > > > > > > > > number of > > > > > death > > > > > > > > > > warrants for each block. If that number exceeds the > > > > > > > > > > usecount of > > > > > the > > > > > > > > > > block, an erroneopus free has occurred and the audit > > > > > > > > > > segfaults. > > > > > You > > > > > > > can > > > > > > > > > > add calls to the tstack auditor in your code to narrow down > > > > > > > > > > the > > > > > > > source > > > > > > > > > > of the error. > > > > > > > > > > > > > > > > > > > > If you still want to see all allocations, they go through > > > > > jtgaf() in > > > > > > > m.c. > > > > > > > > > > > > > > > > > > > > hhr > > > > > > > > > > > > > > > > > > > > On 11/13/2023 12:51 AM, Raul Miller wrote: > > > > > > > > > > > The problem i'm experiencing is that I'm adding two > > > > > > > > > > > extended > > > > > > > precision > > > > > > > > > > > numbers and I get a segfault because one has a corrupted > > > > > > > > > > > memory > > > > > > > > > > > address. I turn on MEMAUDIT=0x1d and I get an ARGCHK > > > > > > > > > > > failure > > > > > > > because > > > > > > > > > > > one of the arguments is has low bits set in flags (and is > > > > > > > 0xdeadbeef). > > > > > > > > > > > > > > > > > > > > > > So, I have a memory address which was allocated and I > > > > > > > > > > > need to > > > > > "go > > > > > > > back > > > > > > > > > > > in time" to see what's happening with that memory > > > > > > > > > > > address. (If > > > > > it > > > > > > > > > > > changes in response to my code update, it presumably > > > > > > > > > > > would only > > > > > > > change > > > > > > > > > > > once - not when I only change the numeric value that I'm > > > > > searching > > > > > > > > > > > for.) > > > > > > > > > > > > > > > > > > > > > > In other words, I want to create a routine deadcheck() > > > > > > > > > > > which > > > > > > > reports > > > > > > > > > > > when it's being called with an address which matches the > > > > > failing > > > > > > > > > > > address, along with a counter. Once I have this > > > > > > > > > > > information, I > > > > > can > > > > > > > > > > > perform a run where I stop when I reach a certain count > > > > > > > > > > > of the > > > > > > > > > > > appearance of that memory address (or maybe every time, > > > > > > > > > > > if the > > > > > > > total > > > > > > > > > > > count is low), and inspect the stack to see what's going > > > > > > > > > > > on > > > > > there. > > > > > > > I > > > > > > > > > > > am hoping that with this information I can zero in on > > > > > > > > > > > what is > > > > > being > > > > > > > > > > > corrupted, and when. > > > > > > > > > > > > > > > > > > > > > > But, to do this, I need to run deadcheck every time > > > > > > > > > > > memory gets > > > > > > > > > > > allocated / handed to J as a new empty array. (I also put > > > > > > > > > > > in a > > > > > > > > > > > deadcheck in the gmp memory allocator, of course.) > > > > > > > > > > > > > > > > > > > > > > So, ... I'm wondering where I can check all memory > > > > > > > > > > > allocations. > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Raul > > > > > > > > > > > > > > > > > > > > > > On Sun, Nov 12, 2023 at 8:40 PM Henry Rich < > > > > > henryhr...@gmail.com> > > > > > > > > > wrote: > > > > > > > > > > >> I'm not sure I understand what you want to know. The > > > > > > > > > > >> tpop > > > > > stack > > > > > > > holds > > > > > > > > > > >> the death warrants for recently allocated blocks. > > > > > > > > > > >> > > > > > > > > > > >> hhr > > > > > > > > > > >> > > > > > > > > > > >> On 11/12/2023 6:56 PM, Raul Miller wrote: > > > > > > > > > > >>> If I want to test all the addresses of "newly allocated > > > > > pointers > > > > > > > to > > > > > > > > > > >>> memory" from m.c, where should I do that? > > > > > > > > > > >>> > > > > > > > > > > >>> Thanks, > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > >> For information about J forums see > > > > > > > > > http://www.jsoftware.com/forums.htm > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > > For information about J forums see > > > > > > > http://www.jsoftware.com/forums.htm > > > > > > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > For information about J forums see > > > > > > > http://www.jsoftware.com/forums.htm > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > For information about J forums see > > > > > http://www.jsoftware.com/forums.htm > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > For information about J forums see > > > > > http://www.jsoftware.com/forums.htm > > > > > > > ---------------------------------------------------------------------- > > > > > > > For information about J forums see > > > > > > > http://www.jsoftware.com/forums.htm > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > For information about J forums see > > > > > > http://www.jsoftware.com/forums.htm > > > > > ---------------------------------------------------------------------- > > > > > For information about J forums see http://www.jsoftware.com/forums.htm > > > > > > > > > ---------------------------------------------------------------------- > > > > For information about J forums see http://www.jsoftware.com/forums.htm
eq=: 4 : 'x=y' x=:?3 8 32$2 f=: 3 : '(=/ -: eq/) ?y$2' ,f"1 x=:7 8 9,."0 1 [ _1 0 1+ 255 test =: +/ .* -: +/@(*"1 _) f=: 4 : 0 NB. test float variants p =. 1 + ? 32 xx=: (x,p) ?@$ 0 yy=: (p,y) ?@$ 0 assert. xx test yy 1 ) (200000 + i. 1) f"0 ] 10 + i. 1 NB. BLAS - just 2 times, since the test is slow (-/ .* -:!.5e_11 */@((<0 1)&|:)) m=:(<:/~i.6) * 6 6 ?@$ 0 (-/ .* -:!.5e_11 */@((<0 1)&|:)) m=:(<:/~i.7) * 7 7 ?@$ 0 eqf=: 4 : 0 (x -:!.t y) +. (t>|x) *. t>|y [ t=. 2^_34 ) (-"+/ .* eqf -/ .*) m=: _100+?7 7$200 (-"+/ .* eqf -/ .*) m=: _100+?7 7$200x (-"+/ .* eqf -/ .*) m=: %/1+?2 4 4$200x NB. <-- segfault here (-"+/ .* eqf -/ .*) m=: %/1+?2 5 5$200x (-"+/ .* eqf -/ .*) m=: %/1+?2 6 6$200x _= (-/ .*) x: 4 4$_ __ 0 0 1 1 0 0 0 0 1 0 0 0 0 1 NB. test for crash 1= (+/ .*) ::1: x: 4 4$_ __ 0 0 1 1 0 0 0 0 1 0 0 0 0 1 NB. test for crash exit''
---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm