If it is freed during tpop, it must be when a recursive XNUM is freed. You can verify that by looking at the stack when it crashes. I am assuming you do not create death warrants for gmp blocks.
There is nothing wrong with having a gmp allocation in an XNUM. Why are you crashing then? If you think there should never be a gmp allocation there, you need to see when it happens - during word formation probably. hhr On Tue, Nov 7, 2023, 3:01 PM Raul Miller <rauldmil...@gmail.com> wrote: > Yes to both questions. > > But there's only two J allocations which occur here, and in my testing > neither of them generated that address. > > Like I said, I'm baffled. > > -- > Raul > > On Tue, Nov 7, 2023 at 8:49 AM Henry Rich <henryhr...@gmail.com> wrote: > > > > That's certainly a gmp block. Was it being freed during EPILOG? If so, > was > > it freed during tpop? That would be an error- no such block should be on > > the tpop stack. The only way that could happen is if you allocate a j > > block and then change it to a gmp block, which would be bad. > > > > hhr > > > > On Tue, Nov 7, 2023, 2:16 PM Raul Miller <rauldmil...@gmail.com> wrote: > > > > > Here's an example of the header block from the offending free: > > > > > > 12345x > > > trap : file ../../../../jsrc/m.c line 1418 > > > > > > Program received signal SIGILL, Illegal instruction. > > > 0x00007ffff293e230 in jtmf (jt=0x7ffff1440200, w=0x5555556b6e50, > > > hrh=16384) at ../../../../jsrc/m.c:1418 > > > 1418 if(unlikely(ISGMP(w))) SEGFAULT; // do not free libgmp managed > > > memory here > > > (gdb) p *w > > > $39 = {kchain = {k = 64, chain = 0x40, globalst = 0x40, locpath = > > > 0x40}, flag = 0, mback = {m = 93824992546896, > > > back = 0x5555555a1050, jobpyx = 0x5555555a1050, zaploc = > > > 0x5555555a1050, aarg = 0x5555555a1050}, tproxy = {t = 2, > > > proxychain = 0x2}, c = 1, n = 6, r = 1 '\001', filler = 48 '0', h > > > = 16384, origin = 0, lock = 13360, s = { > > > 842346800}} > > > > > > The code (without the line 1418 segfault test which I added to m.c) is > > > in the gmp-redo0 branch, which I have pushed to the jsoftware jsource > > > repo. (This branch has not been mirrored to the github repo.) > > > > > > Thanks, > > > > > > -- > > > Raul > > > > > > On Tue, Nov 7, 2023 at 2:51 AM Henry Rich <henryhr...@gmail.com> > wrote: > > > > > > > > I don't have my machine & this is more than I can handle with my > phone. > > > If > > > > you put your code into a branch I can check out, I can work with it > on > > > > Saturday. > > > > > > > > Can you show me the entire header of the failing block? Garbage > shape > > > does > > > > not matter to the free code. > > > > > > > > hhr > > > > > > > > On Tue, Nov 7, 2023, 12:05 AM Raul Miller <rauldmil...@gmail.com> > wrote: > > > > > > > > > So... further "refinement" of my problem: > > > > > > > > > > 12345x > > > > > (segfault) > > > > > > > > > > This is from the EPILOG of jtthorn1main where Z is rank 1, n=5, > shape > > > > > 5 and contains the characters '12345' > > > > > > > > > > The value being freed is a garbage LIT value, rank 1, n=6, garbage > > > > > shape, and contains the characters '12345',NUL which was produced > by > > > > > libgmp.in jtthx1 (the backing store for result returned by the > macro > > > > > SgetX). This garbage LIT value is being handled correctly by memory > > > > > management code in J9.4 and the current J9.5 beta builds. And, I > > > > > didn't touch anything in m.c (other than adding this conditional > > > > > SEGFAULT). > > > > > > > > > > So I think this means that there's some bit of accounting that I > > > > > screwed up in my re-implementation of jtxplus? > > > > > > > > > > But that's the GAX line here: > > > > > > > > > > XF2(jtxplus){ // a+w > > > > > ARGCHK2(a,w); > > > > > if (unlikely(ISX0(a))) R w; > > > > > if (unlikely(ISX0(w))) R a; > > > > > // X z= XaddXX(a,w); // XaddXX could become [temporary] syntactic > > > > > sugar for jtxplus > > > > > mp_size_t an= XLIMBLEN(a), wn= XLIMBLEN(w); // arg sizes > > > > > mp_size_t m= MIN(an, wn), n= MAX(an, wn); // result sizes > > > > > X z; GAX(z, n+1); const mp_ptr zd= voidAV1(z), ad= voidAV1(a), wd= > > > > > voidAV1(w); // data locations > > > > > B ap= XSGN(a)>0; B wp= XSGN(w)>0; // positive or negative? > > > > > B ax= an>= wn; // when w and a have different lengths, larger > should > > > > > be first arg to mpn_add/mpn_sub > > > > > if (ap==wp) { // work with unsigned magnitudes: > > > > > if (jmpn_add(zd, ax?ad:wd, n, ax?wd:ad, m)) zd[n++]= 1; > > > > > XSGN(z)= ap ?n :-n; // signs matched, result has sign of both > args > > > > > } else { > > > > > B zp= ax ?ap :wp; > > > > > if (jmpn_sub(zd, ax?ad:wd, n, ax?wd:ad, m)) { > > > > > zp= 1-zp; // borrow means need to negate result > > > > > if(unlikely(!jmpn_neg(zd, zd, n))) {fr(z); R X0; /* this X0 > > > > > presumably never happens */} > > > > > } > > > > > while (likely(n) && unlikely(!zd[n-1])) n--; /* trim leading > zeros */ > > > > > if (unlikely(!n)) {fr(z); R X0;} /* this X0 is presumably the one > > > > > that happens */ > > > > > XSGN(z)= zp ?n :-n; > > > > > } > > > > > if (XLIMBLEN(z)>100000) SEGFAULT; // try to catch a bug... > > > > > R z; > > > > > } > > > > > > > > > > Where GAX is currently > > > > > > > > > > #define GAX(z, n) GA10(z, LIT, SZI*n) // like GAGMP but native J > > > alloc > > > > > > > > > > Anyways... as best I understand it, none of this work should have > > > > > resulted in this error state. > > > > > > > > > > I'm baffled, and do not know what I should be looking at. > > > > > > > > > > -- > > > > > Raul > > > > > > > > > > On Mon, Nov 6, 2023 at 9:25 AM Henry Rich <henryhr...@gmail.com> > > > wrote: > > > > > > > > > > > > This level of detail is hard to write commentary for. I try, but > miss > > > > > large > > > > > > areas. > > > > > > > > > > > > I feel certain that there is no assumption of homogeneity within > an > > > XNUM. > > > > > > Indirect types recur in each component independently. > > > > > > > > > > > > And I think that some components may be PERMANENT and thus not > > > eligible > > > > > to > > > > > > be freed. > > > > > > > > > > > > When you get your first crash on i. 10x, learn what's going on. > That > > > > > > shouldn't crash, but perhaps one of the components is anomalous. > > > > > > > > > > > > hhr > > > > > > > > > > > > On Mon, Nov 6, 2023, 2:36 PM Raul Miller <rauldmil...@gmail.com> > > > wrote: > > > > > > > > > > > > > So... > > > > > > > > > > > > > > It turns out that it's fairly easy to find a simple example of > this > > > > > > > problem. > > > > > > > > > > > > > > In m.c, add: > > > > > > > if(unlikely(ISGMP(w)) SEGFAULT; // do not free libgmp managed > > > memory > > > > > here > > > > > > > > > > > > > > as the first line of jtmf. This will crash on i.10x > > > > > > > > > > > > > > That said, replacing the SEGFAULT with {gmpfree(w);R;} or > (since > > > ja.h > > > > > > > isn't available here) its definition, changing x for w), still > > > results > > > > > > > in a crash, it just takes longer. > > > > > > > > > > > > > > This behavior roughly matches an earlier suspicion (that the > > > decision > > > > > > > to use the gmp deallocator vs the j deallocator was being > bypassed > > > - > > > > > > > that there was an assumption that all members of an XNUM (or a > RAT) > > > > > > > were constructed using the same allocator), but I don't know > my way > > > > > > > around the memory management code to see where I should be > looking > > > to > > > > > > > find this decision. > > > > > > > > > > > > > > That said, also, I was expecting a segfault in the gmp > deallocator, > > > > > > > not in m.c, but that corresponding SEGFAULT doesn't seem to > > > trigger. > > > > > > > > > > > > > > So... do you have any suggestions on how I should approach > looking > > > in > > > > > > > m.c to find the assumptions about homogeneous XNUMs that I'm > > > breaking? > > > > > > > > > > > > > > (My current level of ignorance on this subject feels like the > kind > > > of > > > > > > > architectural detail ignorance that routinely trips people up > in > > > many, > > > > > > > many "enterprise" contexts. I can presumably work through it by > > > > > > > performing inspection and experiments, but it seems more > fruitful > > > to > > > > > > > just ask.) > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > -- > > > > > > > Raul > > > > > > > > > > > > > > > > > > > > > On Fri, Nov 3, 2023 at 6:22 PM Raul Miller < > rauldmil...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > > This sounds like a good plan. > > > > > > > > > > > > > > > > Unfortunately, my machine is crashing (and sometimes failing > to > > > > > > > > reboot) just doing step 1. > > > > > > > > > > > > > > > > So I'm not even sure that the problem I'm encountering is a > > > problem > > > > > in > > > > > > > > my changes - it might be a problem in my machine. (That > said, my > > > code > > > > > > > > is still the prime suspect.) > > > > > > > > > > > > > > > > So... I've pushed a copy of my changes to a new branch > > > (gmp-redo0). > > > > > > > > This is partially to guard against a complete loss of my > > > machine, and > > > > > > > > partially to give someone else a chance of looking at the > > > problem. > > > > > > > > > > > > > > > > I've not given up, but I have expanded the scope of my > concerns, > > > > > which > > > > > > > > is going to slow me down. > > > > > > > > > > > > > > > > FYI, > > > > > > > > > > > > > > > > -- > > > > > > > > Raul > > > > > > > > > > > > > > > > On Fri, Nov 3, 2023 at 1:41 PM Henry Rich < > henryhr...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > > > > You have an unknown memory corruption running the test > suite. > > > This > > > > > is > > > > > > > how I > > > > > > > > > debug those: > > > > > > > > > > > > > > > > > > 1. RECHO ddall to see where it crashes. > > > > > > > > > 2. Run the scripts before the crash to see if you can crash > > > with a > > > > > > > shorter > > > > > > > > > run > > > > > > > > > 3. When you have the crash as small as you can, set > MEMAUDIT > > > to 1d > > > > > and > > > > > > > see > > > > > > > > > if you get an audit failure. > > > > > > > > > 4. Once you get an audit failure you want to increase the > > > > > frequency of > > > > > > > > > audits. This is where 6!:5 (1) comes in. Once you execute > > > that, it > > > > > > > audits > > > > > > > > > the free pool very frequently. That slows things down so > you > > > want > > > > > to > > > > > > > set it > > > > > > > > > as close to the actual error as possible. > > > > > > > > > 5. When you have found the first failure, it will be soon > > > after the > > > > > > > errant > > > > > > > > > code. Add calls to auditmemchains liberally until you have > > > > > isolated the > > > > > > > > > error line. > > > > > > > > > 6. If at any point you need to know what J sentence is > > > executing, > > > > > set > > > > > > > > > TRACKINFO to 1 and look at the name track* in any routine > that > > > > > defines > > > > > > > > > them. > > > > > > > > > > > > > > > > > > hhr > > > > > > > > > > > > > > > > > > On Fri, Nov 3, 2023, 5:15 PM Raul Miller < > > > rauldmil...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Can you give me some hints on using 6!:5? > > > > > > > > > > > > > > > > > > > > I don't see any comments on jtpeekdata, and I haven't > > > visualized > > > > > what > > > > > > > > > > I'd be looking for, nor when, for that matter. > > > > > > > > > > > > > > > > > > > > I got the 0xdeadbeef stuff running MEMAUDIT=0xd. > > > > > > > > > > > > > > > > > > > > I was running MEMAUDIT=0xff overnight, to see if I could > > > catch > > > > > the > > > > > > > > > > problem earlier, but my machine rebooted. I don't know if > > > that > > > > > was > > > > > > > > > > windows update or if that was some other issue. I'll > give it > > > > > another > > > > > > > > > > shot. > > > > > > > > > > > > > > > > > > > > Basically, though, this isn't a problem which I've > figured > > > out a > > > > > good > > > > > > > > > > way of triggering, so it's slow going. > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Raul > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Nov 3, 2023 at 6:18 AM Henry Rich < > > > henryhr...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > Set MEMAUDIT and 6!:5 to see where the free chain is > > > corrupted. > > > > > > > > > > > > > > > > > > > > > > hhr > > > > > > > > > > > > > > > > > > > > > > On Thu, Nov 2, 2023, 3:32 PM Raul Miller < > > > > > rauldmil...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > This is turning out to be more difficult than I had > > > > > anticipated. > > > > > > > > > > > > > > > > > > > > > > > > I've modified jtxplus to use mpn_add (and mpn_sub > and a > > > macro > > > > > > > > > > > > workalike for the inlined mpn_neg, which in turn uses > > > > > mpn_com - > > > > > > > > > > > > necessary because the mpn_ family of routines works > on > > > > > unsigned > > > > > > > limb > > > > > > > > > > > > sequences). And, it *mostly* works. > > > > > > > > > > > > > > > > > > > > > > > > However, when running script/testga.sh, I encounter a > > > double > > > > > free > > > > > > > > > > problem. > > > > > > > > > > > > > > > > > > > > > > > > My current best guess is that somewhere I'm relying > on a > > > > > > > container > > > > > > > > > > > > test (XNUM/RAT) instead of relying on the ISGMP() > test. > > > But I > > > > > > > looked > > > > > > > > > > > > through m.c and I'm not seeing anything there that > looks > > > > > > > plausible. > > > > > > > > > > > > > > > > > > > > > > > > I did notice that the frgmp() macro is not referenced > > > > > anywhere, > > > > > > > and I > > > > > > > > > > > > used the corresponding fr() macro in my > implementation > > > rather > > > > > > > than > > > > > > > > > > > > mf() - but if that's an issue, I need a better > > > understanding > > > > > of > > > > > > > this > > > > > > > > > > > > part of the internal api. > > > > > > > > > > > > > > > > > > > > > > > > So... anyways... before I dig this hole too deep, I > > > figure I > > > > > > > should > > > > > > > > > > > > ask for advice on how to proceed. > > > > > > > > > > > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > Raul > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 25, 2023 at 3:12 AM Henry Rich < > > > > > henryhr...@gmail.com > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > OK, I see now. The implementation using low-level > GMP > > > > > would > > > > > > > be more > > > > > > > > > > > > > parallel with Roger's original version, right? > > > > > > > > > > > > > > > > > > > > > > > > > > That would end up being simpler than having reserve > > > > > memory, as > > > > > > > well > > > > > > > > > > as > > > > > > > > > > > > > stabler. > > > > > > > > > > > > > > > > > > > > > > > > > > Since you currently mark blocks that are to be > freed by > > > > > GMP, > > > > > > > you > > > > > > > > > > could > > > > > > > > > > > > > make this change piecemeal, right? When you > rewrite > > > > > addition > > > > > > > to use > > > > > > > > > > the > > > > > > > > > > > > > low-level routines, you mark the blocks allocated > by > > > > > addition > > > > > > > as J > > > > > > > > > > not > > > > > > > > > > > > > GMP, and everything else follows automatically. > > > > > > > > > > > > > > > > > > > > > > > > > > Tbat's a great idea. > > > > > > > > > > > > > > > > > > > > > > > > > > hhr > > > > > > > > > > > > > > > > > > > > > > > > > > On 10/24/2023 8:34 PM, Raul Miller wrote: > > > > > > > > > > > > > > libzahl is not thread safe, and even if it was, > it's > > > not > > > > > > > clear to > > > > > > > > > > me > > > > > > > > > > > > > > that it adequately supports enough architectures. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Meanwhile, libgmp's problems are addressable. I > just > > > > > have to > > > > > > > use a > > > > > > > > > > > > > > different part of its API. > > > > > > > > > > > > > > > > > > > > > > > > > > > > (Also, on windows, we're using mpir rather than > > > libgmp.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > (J currently uses parts of the libgmp high level > API, > > > > > which > > > > > > > > > > performs > > > > > > > > > > > > > > memory allocations within the libgmp library > > > routines, > > > > > using > > > > > > > > > > callbacks > > > > > > > > > > > > > > whose implementation I supply. But it also > exposes > > > the > > > > > low > > > > > > > level > > > > > > > > > > > > > > routines used to build those high level > routines, and > > > > > those > > > > > > > low > > > > > > > > > > level > > > > > > > > > > > > > > routines do not perform memory allocation, which > > > means > > > > > that > > > > > > > we can > > > > > > > > > > > > > > manage the memory outside of the API.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > ((The problem with libgmp's high level API is > that > > > if a > > > > > > > memory > > > > > > > > > > > > > > allocation fails, it exits the program. So we > came up > > > > > with a > > > > > > > > > > > > > > workaround which reserves a memory pool, and > limits > > > > > > > arguments to > > > > > > > > > > > > > > certain routines, so successful memory > allocations > > > will > > > > > > > succeed > > > > > > > > > > even > > > > > > > > > > > > > > under low memory conditions. That's not ideal, > but > > > it has > > > > > > > been > > > > > > > > > > "good > > > > > > > > > > > > > > enough, so far". But libgmp supports another > > > approach. > > > > > It's a > > > > > > > > > > little > > > > > > > > > > > > > > more work, but not an excessive amount of work.)) > > > > > > > > > > > > > > > > > > > > > > > > > > > > I hope this makes sense, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > > > > For information about J forums see > > > > > > > > > > http://www.jsoftware.com/forums.htm > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > > > For information about J forums see > > > > > > > http://www.jsoftware.com/forums.htm > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > > For information about J forums see > > > > > > > http://www.jsoftware.com/forums.htm > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > For information about J forums see > > > > > > > http://www.jsoftware.com/forums.htm > > > > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > For information about J forums see > > > > > http://www.jsoftware.com/forums.htm > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > For information about J forums see > > > http://www.jsoftware.com/forums.htm > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > For information about J forums see > > > http://www.jsoftware.com/forums.htm > > > > > > ---------------------------------------------------------------------- > > > > > For information about J forums see > http://www.jsoftware.com/forums.htm > > > > > > > > > > ---------------------------------------------------------------------- > > > > For information about J forums see > http://www.jsoftware.com/forums.htm > > > ---------------------------------------------------------------------- > > > For information about J forums see http://www.jsoftware.com/forums.htm > > > > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm