I don't have my machine & this is more than I can handle with my phone. If you put your code into a branch I can check out, I can work with it on Saturday.
Can you show me the entire header of the failing block? Garbage shape does not matter to the free code. hhr On Tue, Nov 7, 2023, 12:05 AM Raul Miller <rauldmil...@gmail.com> wrote: > So... further "refinement" of my problem: > > 12345x > (segfault) > > This is from the EPILOG of jtthorn1main where Z is rank 1, n=5, shape > 5 and contains the characters '12345' > > The value being freed is a garbage LIT value, rank 1, n=6, garbage > shape, and contains the characters '12345',NUL which was produced by > libgmp.in jtthx1 (the backing store for result returned by the macro > SgetX). This garbage LIT value is being handled correctly by memory > management code in J9.4 and the current J9.5 beta builds. And, I > didn't touch anything in m.c (other than adding this conditional > SEGFAULT). > > So I think this means that there's some bit of accounting that I > screwed up in my re-implementation of jtxplus? > > But that's the GAX line here: > > XF2(jtxplus){ // a+w > ARGCHK2(a,w); > if (unlikely(ISX0(a))) R w; > if (unlikely(ISX0(w))) R a; > // X z= XaddXX(a,w); // XaddXX could become [temporary] syntactic > sugar for jtxplus > mp_size_t an= XLIMBLEN(a), wn= XLIMBLEN(w); // arg sizes > mp_size_t m= MIN(an, wn), n= MAX(an, wn); // result sizes > X z; GAX(z, n+1); const mp_ptr zd= voidAV1(z), ad= voidAV1(a), wd= > voidAV1(w); // data locations > B ap= XSGN(a)>0; B wp= XSGN(w)>0; // positive or negative? > B ax= an>= wn; // when w and a have different lengths, larger should > be first arg to mpn_add/mpn_sub > if (ap==wp) { // work with unsigned magnitudes: > if (jmpn_add(zd, ax?ad:wd, n, ax?wd:ad, m)) zd[n++]= 1; > XSGN(z)= ap ?n :-n; // signs matched, result has sign of both args > } else { > B zp= ax ?ap :wp; > if (jmpn_sub(zd, ax?ad:wd, n, ax?wd:ad, m)) { > zp= 1-zp; // borrow means need to negate result > if(unlikely(!jmpn_neg(zd, zd, n))) {fr(z); R X0; /* this X0 > presumably never happens */} > } > while (likely(n) && unlikely(!zd[n-1])) n--; /* trim leading zeros */ > if (unlikely(!n)) {fr(z); R X0;} /* this X0 is presumably the one > that happens */ > XSGN(z)= zp ?n :-n; > } > if (XLIMBLEN(z)>100000) SEGFAULT; // try to catch a bug... > R z; > } > > Where GAX is currently > > #define GAX(z, n) GA10(z, LIT, SZI*n) // like GAGMP but native J alloc > > Anyways... as best I understand it, none of this work should have > resulted in this error state. > > I'm baffled, and do not know what I should be looking at. > > -- > Raul > > On Mon, Nov 6, 2023 at 9:25 AM Henry Rich <henryhr...@gmail.com> wrote: > > > > This level of detail is hard to write commentary for. I try, but miss > large > > areas. > > > > I feel certain that there is no assumption of homogeneity within an XNUM. > > Indirect types recur in each component independently. > > > > And I think that some components may be PERMANENT and thus not eligible > to > > be freed. > > > > When you get your first crash on i. 10x, learn what's going on. That > > shouldn't crash, but perhaps one of the components is anomalous. > > > > hhr > > > > On Mon, Nov 6, 2023, 2:36 PM Raul Miller <rauldmil...@gmail.com> wrote: > > > > > So... > > > > > > It turns out that it's fairly easy to find a simple example of this > > > problem. > > > > > > In m.c, add: > > > if(unlikely(ISGMP(w)) SEGFAULT; // do not free libgmp managed memory > here > > > > > > as the first line of jtmf. This will crash on i.10x > > > > > > That said, replacing the SEGFAULT with {gmpfree(w);R;} or (since ja.h > > > isn't available here) its definition, changing x for w), still results > > > in a crash, it just takes longer. > > > > > > This behavior roughly matches an earlier suspicion (that the decision > > > to use the gmp deallocator vs the j deallocator was being bypassed - > > > that there was an assumption that all members of an XNUM (or a RAT) > > > were constructed using the same allocator), but I don't know my way > > > around the memory management code to see where I should be looking to > > > find this decision. > > > > > > That said, also, I was expecting a segfault in the gmp deallocator, > > > not in m.c, but that corresponding SEGFAULT doesn't seem to trigger. > > > > > > So... do you have any suggestions on how I should approach looking in > > > m.c to find the assumptions about homogeneous XNUMs that I'm breaking? > > > > > > (My current level of ignorance on this subject feels like the kind of > > > architectural detail ignorance that routinely trips people up in many, > > > many "enterprise" contexts. I can presumably work through it by > > > performing inspection and experiments, but it seems more fruitful to > > > just ask.) > > > > > > Thanks, > > > > > > -- > > > Raul > > > > > > > > > On Fri, Nov 3, 2023 at 6:22 PM Raul Miller <rauldmil...@gmail.com> > wrote: > > > > > > > > This sounds like a good plan. > > > > > > > > Unfortunately, my machine is crashing (and sometimes failing to > > > > reboot) just doing step 1. > > > > > > > > So I'm not even sure that the problem I'm encountering is a problem > in > > > > my changes - it might be a problem in my machine. (That said, my code > > > > is still the prime suspect.) > > > > > > > > So... I've pushed a copy of my changes to a new branch (gmp-redo0). > > > > This is partially to guard against a complete loss of my machine, and > > > > partially to give someone else a chance of looking at the problem. > > > > > > > > I've not given up, but I have expanded the scope of my concerns, > which > > > > is going to slow me down. > > > > > > > > FYI, > > > > > > > > -- > > > > Raul > > > > > > > > On Fri, Nov 3, 2023 at 1:41 PM Henry Rich <henryhr...@gmail.com> > wrote: > > > > > > > > > > You have an unknown memory corruption running the test suite. This > is > > > how I > > > > > debug those: > > > > > > > > > > 1. RECHO ddall to see where it crashes. > > > > > 2. Run the scripts before the crash to see if you can crash with a > > > shorter > > > > > run > > > > > 3. When you have the crash as small as you can, set MEMAUDIT to 1d > and > > > see > > > > > if you get an audit failure. > > > > > 4. Once you get an audit failure you want to increase the > frequency of > > > > > audits. This is where 6!:5 (1) comes in. Once you execute that, it > > > audits > > > > > the free pool very frequently. That slows things down so you want > to > > > set it > > > > > as close to the actual error as possible. > > > > > 5. When you have found the first failure, it will be soon after the > > > errant > > > > > code. Add calls to auditmemchains liberally until you have > isolated the > > > > > error line. > > > > > 6. If at any point you need to know what J sentence is executing, > set > > > > > TRACKINFO to 1 and look at the name track* in any routine that > defines > > > > > them. > > > > > > > > > > hhr > > > > > > > > > > On Fri, Nov 3, 2023, 5:15 PM Raul Miller <rauldmil...@gmail.com> > > > wrote: > > > > > > > > > > > Can you give me some hints on using 6!:5? > > > > > > > > > > > > I don't see any comments on jtpeekdata, and I haven't visualized > what > > > > > > I'd be looking for, nor when, for that matter. > > > > > > > > > > > > I got the 0xdeadbeef stuff running MEMAUDIT=0xd. > > > > > > > > > > > > I was running MEMAUDIT=0xff overnight, to see if I could catch > the > > > > > > problem earlier, but my machine rebooted. I don't know if that > was > > > > > > windows update or if that was some other issue. I'll give it > another > > > > > > shot. > > > > > > > > > > > > Basically, though, this isn't a problem which I've figured out a > good > > > > > > way of triggering, so it's slow going. > > > > > > > > > > > > -- > > > > > > Raul > > > > > > > > > > > > > > > > > > On Fri, Nov 3, 2023 at 6:18 AM Henry Rich <henryhr...@gmail.com> > > > wrote: > > > > > > > > > > > > > > Set MEMAUDIT and 6!:5 to see where the free chain is corrupted. > > > > > > > > > > > > > > hhr > > > > > > > > > > > > > > On Thu, Nov 2, 2023, 3:32 PM Raul Miller < > rauldmil...@gmail.com> > > > wrote: > > > > > > > > > > > > > > > This is turning out to be more difficult than I had > anticipated. > > > > > > > > > > > > > > > > I've modified jtxplus to use mpn_add (and mpn_sub and a macro > > > > > > > > workalike for the inlined mpn_neg, which in turn uses > mpn_com - > > > > > > > > necessary because the mpn_ family of routines works on > unsigned > > > limb > > > > > > > > sequences). And, it *mostly* works. > > > > > > > > > > > > > > > > However, when running script/testga.sh, I encounter a double > free > > > > > > problem. > > > > > > > > > > > > > > > > My current best guess is that somewhere I'm relying on a > > > container > > > > > > > > test (XNUM/RAT) instead of relying on the ISGMP() test. But I > > > looked > > > > > > > > through m.c and I'm not seeing anything there that looks > > > plausible. > > > > > > > > > > > > > > > > I did notice that the frgmp() macro is not referenced > anywhere, > > > and I > > > > > > > > used the corresponding fr() macro in my implementation rather > > > than > > > > > > > > mf() - but if that's an issue, I need a better understanding > of > > > this > > > > > > > > part of the internal api. > > > > > > > > > > > > > > > > So... anyways... before I dig this hole too deep, I figure I > > > should > > > > > > > > ask for advice on how to proceed. > > > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > -- > > > > > > > > Raul > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 25, 2023 at 3:12 AM Henry Rich < > henryhr...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > OK, I see now. The implementation using low-level GMP > would > > > be more > > > > > > > > > parallel with Roger's original version, right? > > > > > > > > > > > > > > > > > > That would end up being simpler than having reserve > memory, as > > > well > > > > > > as > > > > > > > > > stabler. > > > > > > > > > > > > > > > > > > Since you currently mark blocks that are to be freed by > GMP, > > > you > > > > > > could > > > > > > > > > make this change piecemeal, right? When you rewrite > addition > > > to use > > > > > > the > > > > > > > > > low-level routines, you mark the blocks allocated by > addition > > > as J > > > > > > not > > > > > > > > > GMP, and everything else follows automatically. > > > > > > > > > > > > > > > > > > Tbat's a great idea. > > > > > > > > > > > > > > > > > > hhr > > > > > > > > > > > > > > > > > > On 10/24/2023 8:34 PM, Raul Miller wrote: > > > > > > > > > > libzahl is not thread safe, and even if it was, it's not > > > clear to > > > > > > me > > > > > > > > > > that it adequately supports enough architectures. > > > > > > > > > > > > > > > > > > > > Meanwhile, libgmp's problems are addressable. I just > have to > > > use a > > > > > > > > > > different part of its API. > > > > > > > > > > > > > > > > > > > > (Also, on windows, we're using mpir rather than libgmp.) > > > > > > > > > > > > > > > > > > > > (J currently uses parts of the libgmp high level API, > which > > > > > > performs > > > > > > > > > > memory allocations within the libgmp library routines, > using > > > > > > callbacks > > > > > > > > > > whose implementation I supply. But it also exposes the > low > > > level > > > > > > > > > > routines used to build those high level routines, and > those > > > low > > > > > > level > > > > > > > > > > routines do not perform memory allocation, which means > that > > > we can > > > > > > > > > > manage the memory outside of the API.) > > > > > > > > > > > > > > > > > > > > ((The problem with libgmp's high level API is that if a > > > memory > > > > > > > > > > allocation fails, it exits the program. So we came up > with a > > > > > > > > > > workaround which reserves a memory pool, and limits > > > arguments to > > > > > > > > > > certain routines, so successful memory allocations will > > > succeed > > > > > > even > > > > > > > > > > under low memory conditions. That's not ideal, but it has > > > been > > > > > > "good > > > > > > > > > > enough, so far". But libgmp supports another approach. > It's a > > > > > > little > > > > > > > > > > more work, but not an excessive amount of work.)) > > > > > > > > > > > > > > > > > > > > I hope this makes sense, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > For information about J forums see > > > > > > http://www.jsoftware.com/forums.htm > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > For information about J forums see > > > http://www.jsoftware.com/forums.htm > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > For information about J forums see > > > http://www.jsoftware.com/forums.htm > > > > > > > > > ---------------------------------------------------------------------- > > > > > > For information about J forums see > > > http://www.jsoftware.com/forums.htm > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > For information about J forums see > http://www.jsoftware.com/forums.htm > > > ---------------------------------------------------------------------- > > > For information about J forums see http://www.jsoftware.com/forums.htm > > > > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm