I would like to have a Guide For Coders giving the rules for using the memory system to perform specific tasks, such as
1. Creating an array of boxes
2. Modifying an array of boxes
3. Including a preexisting box in an array of boxes
4. Creating/modifying/including other indirect forms, such as XINT and RAT


Now that it looks like this is going to work, we need to think about deployment. We need to convince ourselves that there is no problem. The debug mode I mentioned below seems to me the way to go - items 1 & 2 for sure. And we could also do

5. The memory system will keep a list of all the A blocks that are active, and from time to time go through them, checking for consistency. Each block has enough flags in it to tell what kind of block it is; the audit would ensure that pointers to A blocks point only to valid A blocks. This test would not be 100% thorough, since some fields have variable use, but enough would be checkable to give us a real feeling of security. In particular, it would check boxed arrays. A B-tree would be the best data structure for the block list, I reckon.

Would you create such a debug mode for the memory system? I can help if you like.

Henry






On 5/25/2016 6:36 PM, Henry Rich wrote:
I haven't got into this in detail yet, but I am wondering what tools we could have to verify correct operation. We need a debug mode for the memory system. Maybe we have them already.

1. In debug mode the allocator should scramble the contents of a memory block when it is freed.

2. In debug mode each freed-and-scrambled block should be checksummed so that when it is next allocated it can be checked to see that there were no stores to it.

3. Debug mode could alter the order of the queue of free blocks, to change its FIFO/LIFO character.

4. How hard would it be to keep track of the number of bytes that should be allocated? What can tie up memory long-term? I can think of
* assigned names
* preparsed scripts
* locales including their symbol tables
* 15!:3 and 15!:4

If the list is not too long, couldn't we keep track of how many bytes should be assigned to each use, and do a consistency check from time to time (at each entry to immex for sure, but more often if possible)?

Henry

On 5/25/2016 12:51 PM, Marshall Lochbaum wrote:
I assume you mean derecursivize (derec), not dereference. With the new
formula and tweak to derec, I think it is safe to derec an object at any
time. However, it is important to wait until EPILOG is called to do a
derec, since it isn't safe to add a descendant to a non-recursive object
and a J function can add a descendant at any time before EPILOG.

Here's the last example I gave with derec moved to the beginning rather
than the end. Since a and b have one reference, which isn't propagated,
before calling derec, they have zero propagated references and derec
actually adds one to the reference count of each descendant.

   initial
   |      |      |
   a 1 -> b 1 -> c 1

   derec(a)
   |      |      |
   a.1 -> b.2 -> c.2

   ra(a)
   |      |      |
   a.2 -> b.2 -> c.2

   tpop(old)
          |      |
   a.1 -> b.2 -> c.2

   tpush(a)
   |      |      |
   a.1 -> b.2 -> c.2

Here ra->tpop->tpush doesn't appear to do anything, but it actually
clears all of the unused objects from the stack, which is the entire
purpose of EPILOG. ra should be thought of as making a temporary copy of
a (so that a's reference count appears one too high after ra and tpop),
and tpush should be thought of as transferring that reference to the
stack.

This order doesn't use AFCLR at all, or rather AFCLR is on exactly when
AFREC is off. It may be that with this order we can get rid of AFCLR.
I'll have to think about whether that is always the case.

Marshall

On Wed, May 25, 2016 at 12:27:12PM -0400, Raul Miller wrote:
If I understand your essay here correctly (and please tell me if I do
not), the only time it is safe to dereference objects based on
reference counts is late in EPILOG?

(Or is the impact more subtle than that? Technically, all we need is a
guarantee that the reference count only falls to zero when it is safe
to recover its memory, and another "guarantee" that that will usually
happen in a reasonable time frame. And if the reference count is "too
high" or even "too low but not zero" for pending calculations in older
parts of the stack, we can sort of ignore that.... but that's as far
as I can see into this... and I might be overlooking some important
perspective.)

Thanks,

--
Raul



On Wed, May 25, 2016 at 12:06 PM, Marshall Lochbaum
<[email protected]> wrote:
After finding a major mismatch between the model I outlined and how J
actually works, I have fixed all remaining bugs with the new memory
model. The only work left now is to rewrite 7!:0 when not all objects
reside on the stack. Long explanation of the changes follows. They've
been pushed to the refcount branch again.

In my original proof, I assumed that

2. When EPILOG(z) is called, z and its descendants obey the reference
    count formula.

This is actually never true, except in the trivial case that z has no
descendants. Instead, the following holds:

2. After executing the funny series of operations in EPILOG that come
before derec, z and those of its descendants popped by tpop obey the
    reference count formula.

Not as good. Let's look at this series of operations (best viewed in
monospace):

   initial state (b is a descendent of a)
   |      |
   a 1 -> b 1

   ra(a)
   |      |
   a 2 -> b 2

   tpop(old)

   a 1 -> b 1

   tpush(a)
   |
   a 1 -> b 2

   derec(a)
   |      |
   a.1 -> b.2

The vertical bars are references from the stack to objects. The number
next to each object is its reference count (with a dot for non-recursive
counts), and arrows between them are references. The reference count
formula states that the count of each object is equal to the sum, for
each referrer that refers to it, of 1 if that referrer is the stack or a
non-recursive object, and that object's reference count if it is a
recursive object. This formula is wrong until after tpush! Before tpop,
b's reference count is too high, and after it a's count is too high.

Here's a problem that occurred in the tests various times. It happens if b and c are created in a function with no EPILOG, and then a is created
in a function with an EPILOG.

   initial (assume only a is above old on the stack)
   |      |      |
   a 1 -> b 1 -> c 1

   ra(a)
   |      |      |
   a 2 -> b 2 -> c 2

   tpop(old)
          |      |
   a 1 -> b 2 -> c 2

   tpush(a)
   |      |      |
   a 1 -> b 2 -> c 2

   derec(a)
   |      |      |
   a.1 -> b.2 -> c.1

Our formula is wrong all the way through! Because b is never popped from
the stack, its reference count is higher than 1 when tpush is called,
and this isn't reflected in c's reference count, which should be 3. So
when derec is called, it removes excess references from c which were
never there, and c's count ends up being one too low. Once b and c are
taken off the stack (say at the end of the current sentence), b's count
goes to 1 but c's goes to 0, and c is freed leaving a dangling
reference.

The problem was that I had treated the discrepancy in reference counts
like an "implicit reference" from a to b. Thus to fix it, we should call
ra(a), making it into a real reference, and clear the stack under it.
But this ra call never happens for objects which aren't EPILOGged. Their descendants end up with no indication of the reference. Instead, I need
a different reference count formula, which will actually hold when
EPILOG is called. This requires a new flag, although there are perhaps
other methods in which it wouldn't.

We define the flag AFCLR. The absence of this flag indicates that the
stack reference is not propagated to descendants (this only makes sense
for recursive objects. The AFCLR flag will always be on for
non-recursive objects). Initially AFCLR is off. There are two situations
in which we should turn AFCLR on for object w, and the discrepancy is
corrected in each case:

- w is popped from the stack. tpop uses fa, which non-recursively
   decreases w's reference count.
- w is made non-recursive. We account for the missing reference by
decrementing the counts of w's descendants by one less than we usually
   would.

Here's how that works for our problem scenario before. Objects with
AFCLR and AFREC set are marked with *.

   initial
   |      |      |
   a 1 -> b 1 -> c 1

   ra
   |      |      |
   a 2 -> b 2 -> c 2

   tpop
          |      |
   a*1 -> b 2 -> c 2

   tpush
   |      |      |
   a*1 -> b 2 -> c 2

   derec
   |      |      |
   a.1 -> b.2 -> c.2

You can verify that the modified reference count formula is true all the way through. The reference count of each object is the sum of the counts of its referrers, minus one for each referrer without a *. Once derec is
called, c's count of 2 matches with its two referrers.

Marshall

On Mon, May 23, 2016 at 06:29:13PM -0400, Marshall Lochbaum wrote:
I have modified cp.c, cx.c, and vgauss.c and am back to no new test
failures. Test failures in all three sparse tests have mysteriously
vanished. This is not incredibly surprising because those failures could have been caused by omitting gc on the simpler functions, some of which
deal with sparse arrays. However, it would be nice to get Valgrind
working to see if there are use-after-frees which are still there but
aren't causing any bugs in those particular tests.

For cx.c and vgauss.c I just replaced gc with a version that doesn't use derec. This is essentially the same as the old functionality except that
objects are not pushed to the stack recursively, so we still get some
improvement in performance. For cp.c I actually rewrote the relevant
operations.

It turns out it's not actually safe to call derec on an object on the
stack of unknown origin, because it may have implicit references to
children which are on the stack. The only known safe way to
derecursivize an object is with gc (or gc3), which is safe by
construction. Here are the rules for performing a running update of
object x:

- Ensure that x is not recursive. Calling derec immediately after GA is
   safe, since x has no children yet.
- After creating a value z which will be added to x, call gc(z,old).
Stack index old should be should be set late enough so that nothing
   important is clobbered (in particular, the original copy of x).
- Before adding z to x, call ra(z). Since ra(z) returns z, you can just
   write xv[k]=ra(z) or similar.

Changes have been pushed to the refcount branch again. Remaining
problems are that 7!:0 doesn't work and that tests g13x, g5x2, and gq
fail.

Marshall

On Mon, May 23, 2016 at 04:17:52PM -0400, Marshall Lochbaum wrote:
w should be made non-recursive before doing anything else--otherwise we
would have to keep track of whether it is or not during the loop. So
adding it to the stack just increments the reference count, and popping it decrements it again. No danger of messing with child reference counts
or anything.

I made the changes to the two files I mentioned, and naturally there are still test failures beyond those with no-op gc. They shouldn't take too long to work out, though. I'm tired enough of actually reading code that
I think I'll just do a binary search on instances of gc...

Marshall

On Mon, May 23, 2016 at 03:59:25PM -0400, Henry Rich wrote:
You are right, provided you are sure w is off the stack. If you can
be sure of that, it's better to derec the child as you add it.

I also answered my question about the use of jt->tbase.

Henry

On 5/23/2016 1:56 PM, Marshall Lochbaum wrote:
I don't see why tpopnotw would be any better than just leaving w off the stack (or, perhaps, why leaving w off the stack is flawed). The only
reason these functions break our current system is that they add a
recursive object as a child of a non-recursive one. So calling derec on
the children solves the problem, and any other changes are for
performance. And tpopexcept is clearly going to be slower than just
tpop. So what's the benefit?

Marshall

On Sun, May 22, 2016 at 12:17:19PM -0400, Henry Rich wrote:
no, tpopnotw would not leave the w dangling - it would move it (see my latest). It would work like this (you will have to reflect any
changes you made to tpop/tpush):

tpopexcept(old,w){
  I wfound = 0; // set if there was a free of w
while(old<jt->tbase+jt->ttop){ // till we have freed back to 'old' //?? question: why is 'old <' used rather than 'old !='? Aren't the // blocks of the stack in accidental order, so that this test
would fail
// if the stack spans a block boundary, and the second block is at a
      // lower address than the first?
if(1<jt->ttop) { // if the stack item is a pointer to a free block A blocktofree = jt->tstack[--jt->ttop]; // fetch address to free if(blocktofree==w)++wfound;else fr(blocktofree); // free it,
unless it's w
elsetf(); // stack item is a stack-block chain pointer - free
the stack block
  }
DO(wfound, tpush(w)); // restore any occurrences of w that were
suppressed
R old; // return free-to pointer - it may not be the top, if w was
restored
}

I agree that if it's w being changed, nothing special is needed, but
this would be a good idea for the other places.

What testcase would you like me to look at first?

Henry


On 5/22/2016 11:52 AM, Marshall Lochbaum wrote:
Replying to both of your comments here.

Almost all uses of this pattern actually replace w rather than append to it. I haven't tested, but I think it's fine in all of these cases to use gc or gc3. The exceptions which do modify w are in cp.c and vgauss.c,
although it's possible I missed some.

tpopnotw isn't possible--the stack is stored in an array, so ignoring w would just leave it dangling beyond the end of the stack. Your first suggestion still works, except that derec, rather than ra, makes objects
non-recursive.

I've pushed my changes so far to the unbox branch refcount:
https://github.com/iocane/unbox/commits/refcount

Failing tests are

- Because 7!:0 is no longer accurate
     core/g202.ijs
     core/g202b.ijs
     core/g7x.ijs
     core/glocale.ijs

- Problem with in-place sparse amends
     sparse/gsp530i.ijs
     sparse/gsp530l.ijs
     sparse/gsp530n.ijs

- Miscellaneous
     core/g13x.ijs
     core/g5x2.ijs
     core/gq.ijs

Presumably the last two groups are use-after-frees. Standard tools don't work to identify these because of J's pool allocation, and making J use mallocs instead of pool allocations causes bugs for reasons that are probably even harder to track down. I have tried to configure Valgrind
to recognize pool allocations (using the requests described at
http://www.network-theory.co.uk/docs/valgrind/valgrind_57.html), but it seems to only recognize that certain reads/writes are invalid, not the
free that caused them to be invalid. Useless.

Marshall

On Sat, May 21, 2016 at 04:29:03PM -0400, Henry Rich wrote:
   store the current stack top in variable old.
   for(...) {
      add something to boxed array w.
     call gc(w,old), putting w in non-recursive form.
   }

What I was suggesting is:

    store the current stack top in variable old.
    ra(w);
    for(...) {
      add something to boxed array w.
ra(something); // putting something in non-recursive form.
      tpop(old);
   }
    tpush(w);

Now I suggest

    store the current stack top in variable old.
    for(...) {
      add something to boxed array w.
ra(something); // putting something in non-recursive form?
      tpopnotw(old,w);
   }

Where tpopnotw is like tpop but simply ignores stack entries that match w.

If this idea is sound, I think you might as well do it, because

* you will have to inspect each place gc() is used anyway
* assuming that the current code is there because the free was needed, this preserves that behavior. It will be hard to be confident that this is not needed (you'd have to deploy and wait for user complaints -
    I wouldn't want to take that chance)
* No usecount will be modified or block freed that doesn't need to be
* it is necessary to increment the usecount of (something)

Henry












On 5/20/2016 10:06 AM, Marshall Lochbaum wrote:
A number of spots in the J source follow this pattern, or similar:
   store the current stack top in variable old.
   for(...) {
     add something to boxed array w.
     call gc(w,old), putting w in non-recursive form.
   }

(A particular example is cp.c, lines 91--97, which performs (v^:l x)
where l is an integer list).
---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to