This level of detail is hard to write commentary for. I try, but miss large areas.
I feel certain that there is no assumption of homogeneity within an XNUM. Indirect types recur in each component independently. And I think that some components may be PERMANENT and thus not eligible to be freed. When you get your first crash on i. 10x, learn what's going on. That shouldn't crash, but perhaps one of the components is anomalous. hhr On Mon, Nov 6, 2023, 2:36 PM Raul Miller <rauldmil...@gmail.com> wrote: > So... > > It turns out that it's fairly easy to find a simple example of this > problem. > > In m.c, add: > if(unlikely(ISGMP(w)) SEGFAULT; // do not free libgmp managed memory here > > as the first line of jtmf. This will crash on i.10x > > That said, replacing the SEGFAULT with {gmpfree(w);R;} or (since ja.h > isn't available here) its definition, changing x for w), still results > in a crash, it just takes longer. > > This behavior roughly matches an earlier suspicion (that the decision > to use the gmp deallocator vs the j deallocator was being bypassed - > that there was an assumption that all members of an XNUM (or a RAT) > were constructed using the same allocator), but I don't know my way > around the memory management code to see where I should be looking to > find this decision. > > That said, also, I was expecting a segfault in the gmp deallocator, > not in m.c, but that corresponding SEGFAULT doesn't seem to trigger. > > So... do you have any suggestions on how I should approach looking in > m.c to find the assumptions about homogeneous XNUMs that I'm breaking? > > (My current level of ignorance on this subject feels like the kind of > architectural detail ignorance that routinely trips people up in many, > many "enterprise" contexts. I can presumably work through it by > performing inspection and experiments, but it seems more fruitful to > just ask.) > > Thanks, > > -- > Raul > > > On Fri, Nov 3, 2023 at 6:22 PM Raul Miller <rauldmil...@gmail.com> wrote: > > > > This sounds like a good plan. > > > > Unfortunately, my machine is crashing (and sometimes failing to > > reboot) just doing step 1. > > > > So I'm not even sure that the problem I'm encountering is a problem in > > my changes - it might be a problem in my machine. (That said, my code > > is still the prime suspect.) > > > > So... I've pushed a copy of my changes to a new branch (gmp-redo0). > > This is partially to guard against a complete loss of my machine, and > > partially to give someone else a chance of looking at the problem. > > > > I've not given up, but I have expanded the scope of my concerns, which > > is going to slow me down. > > > > FYI, > > > > -- > > Raul > > > > On Fri, Nov 3, 2023 at 1:41 PM Henry Rich <henryhr...@gmail.com> wrote: > > > > > > You have an unknown memory corruption running the test suite. This is > how I > > > debug those: > > > > > > 1. RECHO ddall to see where it crashes. > > > 2. Run the scripts before the crash to see if you can crash with a > shorter > > > run > > > 3. When you have the crash as small as you can, set MEMAUDIT to 1d and > see > > > if you get an audit failure. > > > 4. Once you get an audit failure you want to increase the frequency of > > > audits. This is where 6!:5 (1) comes in. Once you execute that, it > audits > > > the free pool very frequently. That slows things down so you want to > set it > > > as close to the actual error as possible. > > > 5. When you have found the first failure, it will be soon after the > errant > > > code. Add calls to auditmemchains liberally until you have isolated the > > > error line. > > > 6. If at any point you need to know what J sentence is executing, set > > > TRACKINFO to 1 and look at the name track* in any routine that defines > > > them. > > > > > > hhr > > > > > > On Fri, Nov 3, 2023, 5:15 PM Raul Miller <rauldmil...@gmail.com> > wrote: > > > > > > > Can you give me some hints on using 6!:5? > > > > > > > > I don't see any comments on jtpeekdata, and I haven't visualized what > > > > I'd be looking for, nor when, for that matter. > > > > > > > > I got the 0xdeadbeef stuff running MEMAUDIT=0xd. > > > > > > > > I was running MEMAUDIT=0xff overnight, to see if I could catch the > > > > problem earlier, but my machine rebooted. I don't know if that was > > > > windows update or if that was some other issue. I'll give it another > > > > shot. > > > > > > > > Basically, though, this isn't a problem which I've figured out a good > > > > way of triggering, so it's slow going. > > > > > > > > -- > > > > Raul > > > > > > > > > > > > On Fri, Nov 3, 2023 at 6:18 AM Henry Rich <henryhr...@gmail.com> > wrote: > > > > > > > > > > Set MEMAUDIT and 6!:5 to see where the free chain is corrupted. > > > > > > > > > > hhr > > > > > > > > > > On Thu, Nov 2, 2023, 3:32 PM Raul Miller <rauldmil...@gmail.com> > wrote: > > > > > > > > > > > This is turning out to be more difficult than I had anticipated. > > > > > > > > > > > > I've modified jtxplus to use mpn_add (and mpn_sub and a macro > > > > > > workalike for the inlined mpn_neg, which in turn uses mpn_com - > > > > > > necessary because the mpn_ family of routines works on unsigned > limb > > > > > > sequences). And, it *mostly* works. > > > > > > > > > > > > However, when running script/testga.sh, I encounter a double free > > > > problem. > > > > > > > > > > > > My current best guess is that somewhere I'm relying on a > container > > > > > > test (XNUM/RAT) instead of relying on the ISGMP() test. But I > looked > > > > > > through m.c and I'm not seeing anything there that looks > plausible. > > > > > > > > > > > > I did notice that the frgmp() macro is not referenced anywhere, > and I > > > > > > used the corresponding fr() macro in my implementation rather > than > > > > > > mf() - but if that's an issue, I need a better understanding of > this > > > > > > part of the internal api. > > > > > > > > > > > > So... anyways... before I dig this hole too deep, I figure I > should > > > > > > ask for advice on how to proceed. > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > -- > > > > > > Raul > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 25, 2023 at 3:12 AM Henry Rich <henryhr...@gmail.com > > > > > > wrote: > > > > > > > > > > > > > > OK, I see now. The implementation using low-level GMP would > be more > > > > > > > parallel with Roger's original version, right? > > > > > > > > > > > > > > That would end up being simpler than having reserve memory, as > well > > > > as > > > > > > > stabler. > > > > > > > > > > > > > > Since you currently mark blocks that are to be freed by GMP, > you > > > > could > > > > > > > make this change piecemeal, right? When you rewrite addition > to use > > > > the > > > > > > > low-level routines, you mark the blocks allocated by addition > as J > > > > not > > > > > > > GMP, and everything else follows automatically. > > > > > > > > > > > > > > Tbat's a great idea. > > > > > > > > > > > > > > hhr > > > > > > > > > > > > > > On 10/24/2023 8:34 PM, Raul Miller wrote: > > > > > > > > libzahl is not thread safe, and even if it was, it's not > clear to > > > > me > > > > > > > > that it adequately supports enough architectures. > > > > > > > > > > > > > > > > Meanwhile, libgmp's problems are addressable. I just have to > use a > > > > > > > > different part of its API. > > > > > > > > > > > > > > > > (Also, on windows, we're using mpir rather than libgmp.) > > > > > > > > > > > > > > > > (J currently uses parts of the libgmp high level API, which > > > > performs > > > > > > > > memory allocations within the libgmp library routines, using > > > > callbacks > > > > > > > > whose implementation I supply. But it also exposes the low > level > > > > > > > > routines used to build those high level routines, and those > low > > > > level > > > > > > > > routines do not perform memory allocation, which means that > we can > > > > > > > > manage the memory outside of the API.) > > > > > > > > > > > > > > > > ((The problem with libgmp's high level API is that if a > memory > > > > > > > > allocation fails, it exits the program. So we came up with a > > > > > > > > workaround which reserves a memory pool, and limits > arguments to > > > > > > > > certain routines, so successful memory allocations will > succeed > > > > even > > > > > > > > under low memory conditions. That's not ideal, but it has > been > > > > "good > > > > > > > > enough, so far". But libgmp supports another approach. It's a > > > > little > > > > > > > > more work, but not an excessive amount of work.)) > > > > > > > > > > > > > > > > I hope this makes sense, > > > > > > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > For information about J forums see > > > > http://www.jsoftware.com/forums.htm > > > > > > > ---------------------------------------------------------------------- > > > > > > For information about J forums see > http://www.jsoftware.com/forums.htm > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > For information about J forums see > http://www.jsoftware.com/forums.htm > > > > > ---------------------------------------------------------------------- > > > > For information about J forums see > http://www.jsoftware.com/forums.htm > > > > > > > ---------------------------------------------------------------------- > > > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm