That's certainly a gmp block. Was it being freed during EPILOG? If so, was
it freed during tpop? That would be an error- no such block should be on
the tpop stack.  The only way that could happen is if you allocate a j
block and then change it to a gmp block, which would be bad.

hhr

On Tue, Nov 7, 2023, 2:16 PM Raul Miller <rauldmil...@gmail.com> wrote:

> Here's an example of the header block from the offending free:
>
>    12345x
> trap : file ../../../../jsrc/m.c line 1418
>
> Program received signal SIGILL, Illegal instruction.
> 0x00007ffff293e230 in jtmf (jt=0x7ffff1440200, w=0x5555556b6e50,
> hrh=16384) at ../../../../jsrc/m.c:1418
> 1418    if(unlikely(ISGMP(w))) SEGFAULT; // do not free libgmp managed
> memory here
> (gdb) p *w
> $39 = {kchain = {k = 64, chain = 0x40, globalst = 0x40, locpath =
> 0x40}, flag = 0, mback = {m = 93824992546896,
>     back = 0x5555555a1050, jobpyx = 0x5555555a1050, zaploc =
> 0x5555555a1050, aarg = 0x5555555a1050}, tproxy = {t = 2,
>     proxychain = 0x2}, c = 1, n = 6, r = 1 '\001', filler = 48 '0', h
> = 16384, origin = 0, lock = 13360, s = {
>     842346800}}
>
> The code (without the line 1418 segfault test which I added to m.c) is
> in the gmp-redo0 branch, which I have pushed to the jsoftware jsource
> repo. (This branch has not been mirrored to the github repo.)
>
> Thanks,
>
> --
> Raul
>
> On Tue, Nov 7, 2023 at 2:51 AM Henry Rich <henryhr...@gmail.com> wrote:
> >
> > I don't have my machine & this is more than I can handle with my phone.
> If
> > you put your code into a branch I can check out, I can work with it on
> > Saturday.
> >
> > Can you show me the entire header of the failing block?  Garbage shape
> does
> > not matter to the free code.
> >
> > hhr
> >
> > On Tue, Nov 7, 2023, 12:05 AM Raul Miller <rauldmil...@gmail.com> wrote:
> >
> > > So... further "refinement" of my problem:
> > >
> > >    12345x
> > > (segfault)
> > >
> > > This is from the EPILOG of jtthorn1main where Z is rank 1, n=5, shape
> > > 5 and contains the characters '12345'
> > >
> > > The value being freed is a garbage LIT value, rank 1, n=6, garbage
> > > shape, and contains the characters '12345',NUL which was produced by
> > > libgmp.in jtthx1 (the backing store for result returned by the macro
> > > SgetX). This garbage LIT value is being handled correctly by memory
> > > management code in J9.4 and the current J9.5 beta builds. And, I
> > > didn't touch anything in m.c (other than adding this conditional
> > > SEGFAULT).
> > >
> > > So I think this means that there's some bit of accounting that I
> > > screwed up in my re-implementation of jtxplus?
> > >
> > > But that's the GAX line here:
> > >
> > > XF2(jtxplus){ // a+w
> > >  ARGCHK2(a,w);
> > >  if (unlikely(ISX0(a))) R w;
> > >  if (unlikely(ISX0(w))) R a;
> > >  // X z= XaddXX(a,w); // XaddXX could become [temporary] syntactic
> > > sugar for jtxplus
> > >  mp_size_t an= XLIMBLEN(a), wn= XLIMBLEN(w); // arg sizes
> > >  mp_size_t m= MIN(an, wn), n= MAX(an, wn); // result sizes
> > >  X z; GAX(z, n+1); const mp_ptr zd= voidAV1(z), ad= voidAV1(a), wd=
> > > voidAV1(w); // data locations
> > >  B ap= XSGN(a)>0; B wp= XSGN(w)>0; // positive or negative?
> > >  B ax= an>= wn; // when w and a have different lengths, larger should
> > > be first arg to mpn_add/mpn_sub
> > >  if (ap==wp) { // work with unsigned magnitudes:
> > >   if (jmpn_add(zd, ax?ad:wd, n, ax?wd:ad, m)) zd[n++]= 1;
> > >   XSGN(z)= ap ?n :-n; // signs matched, result has sign of both args
> > >  } else {
> > >   B zp= ax ?ap :wp;
> > >   if (jmpn_sub(zd, ax?ad:wd, n, ax?wd:ad, m)) {
> > >    zp= 1-zp; // borrow means need to negate result
> > >    if(unlikely(!jmpn_neg(zd, zd, n))) {fr(z); R X0; /* this X0
> > > presumably never happens */}
> > >   }
> > >   while (likely(n) && unlikely(!zd[n-1])) n--; /* trim leading zeros */
> > >   if (unlikely(!n)) {fr(z); R X0;} /* this X0 is presumably the one
> > > that happens */
> > >   XSGN(z)= zp ?n :-n;
> > >  }
> > >  if (XLIMBLEN(z)>100000) SEGFAULT; // try to catch a bug...
> > >  R z;
> > > }
> > >
> > > Where GAX is currently
> > >
> > > #define GAX(z, n) GA10(z, LIT, SZI*n)   // like GAGMP but native J
> alloc
> > >
> > > Anyways... as best I understand it, none of this work should have
> > > resulted in this error state.
> > >
> > > I'm baffled, and do not know what I should be looking at.
> > >
> > > --
> > > Raul
> > >
> > > On Mon, Nov 6, 2023 at 9:25 AM Henry Rich <henryhr...@gmail.com>
> wrote:
> > > >
> > > > This level of detail is hard to write commentary for. I try, but miss
> > > large
> > > > areas.
> > > >
> > > > I feel certain that there is no assumption of homogeneity within an
> XNUM.
> > > > Indirect types recur in each component independently.
> > > >
> > > > And I think that some components may be PERMANENT and thus not
> eligible
> > > to
> > > > be freed.
> > > >
> > > > When you get your first crash on i. 10x, learn what's going on. That
> > > > shouldn't crash, but perhaps one of the components is anomalous.
> > > >
> > > > hhr
> > > >
> > > > On Mon, Nov 6, 2023, 2:36 PM Raul Miller <rauldmil...@gmail.com>
> wrote:
> > > >
> > > > > So...
> > > > >
> > > > > It turns out that it's fairly easy to find a simple example of this
> > > > > problem.
> > > > >
> > > > > In m.c, add:
> > > > > if(unlikely(ISGMP(w)) SEGFAULT; // do not free libgmp managed
> memory
> > > here
> > > > >
> > > > > as the first line of jtmf. This will crash on i.10x
> > > > >
> > > > > That said, replacing the SEGFAULT with {gmpfree(w);R;} or (since
> ja.h
> > > > > isn't available here) its definition, changing x for w), still
> results
> > > > > in a crash, it just takes longer.
> > > > >
> > > > > This behavior roughly matches an earlier suspicion (that the
> decision
> > > > > to use the gmp deallocator vs the j deallocator was being bypassed
> -
> > > > > that there was an assumption that all members of an XNUM (or a RAT)
> > > > > were constructed using the same allocator), but I don't know my way
> > > > > around the memory management code to see where I should be looking
> to
> > > > > find this decision.
> > > > >
> > > > > That said, also, I was expecting a segfault in the gmp deallocator,
> > > > > not in m.c, but that corresponding SEGFAULT doesn't seem to
> trigger.
> > > > >
> > > > > So... do you have any suggestions on how I should approach looking
> in
> > > > > m.c to find the assumptions about homogeneous XNUMs that I'm
> breaking?
> > > > >
> > > > > (My current level of ignorance on this subject feels like the kind
> of
> > > > > architectural detail ignorance that routinely trips people up in
> many,
> > > > > many "enterprise" contexts. I can presumably work through it by
> > > > > performing inspection and experiments, but it seems more fruitful
> to
> > > > > just ask.)
> > > > >
> > > > > Thanks,
> > > > >
> > > > > --
> > > > > Raul
> > > > >
> > > > >
> > > > > On Fri, Nov 3, 2023 at 6:22 PM Raul Miller <rauldmil...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > This sounds like a good plan.
> > > > > >
> > > > > > Unfortunately, my machine is crashing (and sometimes failing to
> > > > > > reboot) just doing step 1.
> > > > > >
> > > > > > So I'm not even sure that the problem I'm encountering is a
> problem
> > > in
> > > > > > my changes - it might be a problem in my machine. (That said, my
> code
> > > > > > is still the prime suspect.)
> > > > > >
> > > > > > So... I've pushed a copy of my changes to a new branch
> (gmp-redo0).
> > > > > > This is partially to guard against a complete loss of my
> machine, and
> > > > > > partially to give someone else a chance of looking at the
> problem.
> > > > > >
> > > > > > I've not given up, but I have expanded the scope of my concerns,
> > > which
> > > > > > is going to slow me down.
> > > > > >
> > > > > > FYI,
> > > > > >
> > > > > > --
> > > > > > Raul
> > > > > >
> > > > > > On Fri, Nov 3, 2023 at 1:41 PM Henry Rich <henryhr...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > You have an unknown memory corruption running the test suite.
> This
> > > is
> > > > > how I
> > > > > > > debug those:
> > > > > > >
> > > > > > > 1. RECHO ddall to see where it crashes.
> > > > > > > 2. Run the scripts before the crash to see if you can crash
> with a
> > > > > shorter
> > > > > > > run
> > > > > > > 3. When you have the crash as small as you can, set MEMAUDIT
> to 1d
> > > and
> > > > > see
> > > > > > > if you get an audit failure.
> > > > > > > 4. Once you get an audit failure you want to increase the
> > > frequency of
> > > > > > > audits. This is where 6!:5 (1) comes in. Once you execute
> that, it
> > > > > audits
> > > > > > > the free pool very frequently. That slows things down so you
> want
> > > to
> > > > > set it
> > > > > > > as close to the actual error as possible.
> > > > > > > 5. When you have found the first failure, it will be soon
> after the
> > > > > errant
> > > > > > > code. Add calls to auditmemchains liberally until you have
> > > isolated the
> > > > > > > error line.
> > > > > > > 6. If at any point you need to know what J sentence is
> executing,
> > > set
> > > > > > > TRACKINFO to 1 and look at the name track* in any routine that
> > > defines
> > > > > > > them.
> > > > > > >
> > > > > > > hhr
> > > > > > >
> > > > > > > On Fri, Nov 3, 2023, 5:15 PM Raul Miller <
> rauldmil...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Can you give me some hints on using 6!:5?
> > > > > > > >
> > > > > > > > I don't see any comments on jtpeekdata, and I haven't
> visualized
> > > what
> > > > > > > > I'd be looking for, nor when, for that matter.
> > > > > > > >
> > > > > > > > I got the 0xdeadbeef stuff running MEMAUDIT=0xd.
> > > > > > > >
> > > > > > > > I was running MEMAUDIT=0xff overnight, to see if I could
> catch
> > > the
> > > > > > > > problem earlier, but my machine rebooted. I don't know if
> that
> > > was
> > > > > > > > windows update or if that was some other issue. I'll give it
> > > another
> > > > > > > > shot.
> > > > > > > >
> > > > > > > > Basically, though, this isn't a problem which I've figured
> out a
> > > good
> > > > > > > > way of triggering, so it's slow going.
> > > > > > > >
> > > > > > > > --
> > > > > > > > Raul
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Nov 3, 2023 at 6:18 AM Henry Rich <
> henryhr...@gmail.com>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > Set MEMAUDIT and 6!:5 to see where the free chain is
> corrupted.
> > > > > > > > >
> > > > > > > > > hhr
> > > > > > > > >
> > > > > > > > > On Thu, Nov 2, 2023, 3:32 PM Raul Miller <
> > > rauldmil...@gmail.com>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > This is turning out to be more difficult than I had
> > > anticipated.
> > > > > > > > > >
> > > > > > > > > > I've modified jtxplus to use mpn_add (and mpn_sub and a
> macro
> > > > > > > > > > workalike for the inlined mpn_neg, which in turn uses
> > > mpn_com -
> > > > > > > > > > necessary because the mpn_ family of routines works on
> > > unsigned
> > > > > limb
> > > > > > > > > > sequences). And, it *mostly* works.
> > > > > > > > > >
> > > > > > > > > > However, when running script/testga.sh, I encounter a
> double
> > > free
> > > > > > > > problem.
> > > > > > > > > >
> > > > > > > > > > My current best guess is that somewhere I'm relying on a
> > > > > container
> > > > > > > > > > test (XNUM/RAT) instead of relying on the ISGMP() test.
> But I
> > > > > looked
> > > > > > > > > > through m.c and I'm not seeing anything there that looks
> > > > > plausible.
> > > > > > > > > >
> > > > > > > > > > I did notice that the frgmp() macro is not referenced
> > > anywhere,
> > > > > and I
> > > > > > > > > > used the corresponding fr() macro in my implementation
> rather
> > > > > than
> > > > > > > > > > mf() - but if that's an issue, I need a better
> understanding
> > > of
> > > > > this
> > > > > > > > > > part of the internal api.
> > > > > > > > > >
> > > > > > > > > > So... anyways... before I dig this hole too deep, I
> figure I
> > > > > should
> > > > > > > > > > ask for advice on how to proceed.
> > > > > > > > > >
> > > > > > > > > > Thoughts?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Raul
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Oct 25, 2023 at 3:12 AM Henry Rich <
> > > henryhr...@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > OK, I see now.  The implementation using low-level GMP
> > > would
> > > > > be more
> > > > > > > > > > > parallel with Roger's original version, right?
> > > > > > > > > > >
> > > > > > > > > > > That would end up being simpler than having reserve
> > > memory, as
> > > > > well
> > > > > > > > as
> > > > > > > > > > > stabler.
> > > > > > > > > > >
> > > > > > > > > > > Since you currently mark blocks that are to be freed by
> > > GMP,
> > > > > you
> > > > > > > > could
> > > > > > > > > > > make this change piecemeal, right?  When you rewrite
> > > addition
> > > > > to use
> > > > > > > > the
> > > > > > > > > > > low-level routines, you mark the blocks allocated by
> > > addition
> > > > > as J
> > > > > > > > not
> > > > > > > > > > > GMP, and everything else follows automatically.
> > > > > > > > > > >
> > > > > > > > > > > Tbat's a great idea.
> > > > > > > > > > >
> > > > > > > > > > > hhr
> > > > > > > > > > >
> > > > > > > > > > > On 10/24/2023 8:34 PM, Raul Miller wrote:
> > > > > > > > > > > > libzahl is not thread safe, and even if it was, it's
> not
> > > > > clear to
> > > > > > > > me
> > > > > > > > > > > > that it adequately supports enough architectures.
> > > > > > > > > > > >
> > > > > > > > > > > > Meanwhile, libgmp's problems are addressable. I just
> > > have to
> > > > > use a
> > > > > > > > > > > > different part of its API.
> > > > > > > > > > > >
> > > > > > > > > > > > (Also, on windows, we're using mpir rather than
> libgmp.)
> > > > > > > > > > > >
> > > > > > > > > > > > (J currently uses parts of the libgmp high level API,
> > > which
> > > > > > > > performs
> > > > > > > > > > > > memory allocations within the libgmp library
> routines,
> > > using
> > > > > > > > callbacks
> > > > > > > > > > > > whose implementation I supply. But it also exposes
> the
> > > low
> > > > > level
> > > > > > > > > > > > routines used to build those high level routines, and
> > > those
> > > > > low
> > > > > > > > level
> > > > > > > > > > > > routines do not perform memory allocation, which
> means
> > > that
> > > > > we can
> > > > > > > > > > > > manage the memory outside of the API.)
> > > > > > > > > > > >
> > > > > > > > > > > > ((The problem with libgmp's high level API is that
> if a
> > > > > memory
> > > > > > > > > > > > allocation fails, it exits the program. So we came up
> > > with a
> > > > > > > > > > > > workaround which reserves a memory pool, and limits
> > > > > arguments to
> > > > > > > > > > > > certain routines, so successful memory allocations
> will
> > > > > succeed
> > > > > > > > even
> > > > > > > > > > > > under low memory conditions. That's not ideal, but
> it has
> > > > > been
> > > > > > > > "good
> > > > > > > > > > > > enough, so far". But libgmp supports another
> approach.
> > > It's a
> > > > > > > > little
> > > > > > > > > > > > more work, but not an excessive amount of work.))
> > > > > > > > > > > >
> > > > > > > > > > > > I hope this makes sense,
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > >
> > > > >
> ----------------------------------------------------------------------
> > > > > > > > > > > For information about J forums see
> > > > > > > > http://www.jsoftware.com/forums.htm
> > > > > > > > > >
> > > > >
> ----------------------------------------------------------------------
> > > > > > > > > > For information about J forums see
> > > > > http://www.jsoftware.com/forums.htm
> > > > > > > > > >
> > > > > > > > >
> > > > >
> ----------------------------------------------------------------------
> > > > > > > > > For information about J forums see
> > > > > http://www.jsoftware.com/forums.htm
> > > > > > > >
> > > > >
> ----------------------------------------------------------------------
> > > > > > > > For information about J forums see
> > > > > http://www.jsoftware.com/forums.htm
> > > > > > > >
> > > > > > >
> > > ----------------------------------------------------------------------
> > > > > > > For information about J forums see
> > > http://www.jsoftware.com/forums.htm
> > > > >
> ----------------------------------------------------------------------
> > > > > For information about J forums see
> http://www.jsoftware.com/forums.htm
> > > > >
> > > >
> ----------------------------------------------------------------------
> > > > For information about J forums see
> http://www.jsoftware.com/forums.htm
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to