From: *Marshall Lochbaum* <mwlochb...@gmail.com>
Date: 29 April 2016 at 09:51
To: jeng...@jsoftware.com


I am currently working on changing J's current reference counting scheme
to a much more efficient and standard method.

J stores a reference count in every object (AC(w)) indicating the number
of times it is referenced. The count is incremented when a copy is made
and decremented when one is deleted. Once the reference count drops to
zero, J knows the object can be freed, and does so.

Currently, J uses an unusual recursive procedure for reference counts.
If a boxed array (or verb with arguments, or other object with children)
is copied, then the reference counts of all its children (items in
boxes) are incremented, and so on recursively. This makes copies of
composite objects take a long time.

The typical procedure is to treat the parent/child relationship as a
single reference. Thus the reference counts of children are incremented
when they are added to the parent, and decremented when the parent is
deleted, and otherwise not changed. To copy a composite object, simply
increment its reference count, and to delete, decrement it. I am
currently trying to convince the J engine to use this procedure.

The hangup is another idiosyncratic J feature, which in contrast to the
weird reference count scheme is actually useful. J maintains a stack of
local nouns, whose function (but not implementation) is equivalent to
that of C's stack. Any time an object is created, a pointer to it is
pushed to the stack. Thus J functions can ignore all the reference count
arithmetic (which from experience I know is quite painful) while using
objects. Instead, they call PROLOG, which saves the location of the
beginning of the stack, before doing anything, and EPILOG(z), where z is
the value to be returned, after everything. EPILOG copies z, deletes
everything added to the stack since PROLOG was called, and returns z.

A complication arises if z is a composite object. In this case we want
to avoid deleting any of z's children, despite the fact that they may be
on the stack. In the current paradigm this is automatic: on copying z,
all of its children are also copied, and deleting the copies on the
stack leaves those copies. But if incrementing the reference count is
not recursive, this doesn't happen, and the children are deleted,
leaving z with dangling pointers.

In a sense, the problem arises because when verbs connect z to its
children, they don't bother to inform the children. The reference from z
to its children cannot be reflected in their reference counts. So the
correct fix might be to increment the reference counts as they are added
to z. This is not feasible: the places where children are added to
objects just look like assignments, and are impossible to detect
automatically.

There is a workaround: EPILOG should do the pointer-incrementing job for
us. So rather than a simple copy of z, it actually goes through all of
z's children and increments pointers. Specifically, it should increment
the descendents which are children of objects on the stack. This
requires an extra flag: AFSTK, which indicates whether an object is on
the stack.

I have implemented all the above, and am still getting bugs. I think
that I have made all the necessary architecture changes, and that the
remainder is just a few spots that relied in some way on the old system,
but it's impossible to be sure how much work there is left to do. I will
update on future progress.

Marshall

----------
From: *Henry Rich* <henryhr...@gmail.com>
Date: 29 April 2016 at 12:40
To: jeng...@jsoftware.com


Your workaround seems sound.  Let me know what you find the problems are.

To see if I understand the situation, would this work: keep a long long
counter of PROLOGs, giving each PROLOG a unique number; store the current
number in the block header when a block is created.  Then at EPILOG,
increment use-counts in the children of Z that have the stored value
matching the current number.  This would add 8 bytes to the header, so I
don't like it, but it would keep you from having to match children-of-z
with children-of-the-noun-stack.

Henry

----------
From: *Henry Rich* <henryhr...@gmail.com>
Date: 29 April 2016 at 17:18
To: jeng...@jsoftware.com


Is one way to think of this problem that the use-counts in z need to be the
old recursive style, so that they can be freed by the epilogue?

If so, a solution would be to traverse z, converting its use-counts to
recursive style (and incrementing - I guess that's what you referred to as
'copying'); decrement them in recursive style; then traverse again,
converting back to nonrecursive style.

Would that work?

If so, a more elegant solution would be to have two use counts: a count
that has yet to be propagated to descendants, and a count that has been
propagated.  The use-count routines would handle these correctly.  EPILOG
would traverse to push unpropagated use-count to descendants, and then free
the blocks as before.  The conversion back to nonrecursive style would not
be required, because both styles would coexist.

Implementation: I have my eye on the AF() field, which seems to have just 4
bits assigned - is that right?  Reserve 6 bits for flags, and use the upper
bits for the unpropagated usecount.  At the same time, make the
corresponding change to AC(), leaving 6 bits of flags and the rest
propagated usecount.


Henry


On 4/29/2016 12:51 PM, Marshall Lochbaum wrote:

----------
From: *Marshall Lochbaum* <mwlochb...@gmail.com>
Date: 29 April 2016 at 21:56
To: jeng...@jsoftware.com


I'm not sure whether the first solution would work. Converting between
the two styles is simple enough--recursively add or remove the reference
count of the parent from all of its descendents--but the overall
procedure is kind of complicated, and I still don't know exactly how the
stack is used all of the time.

I don't think that nouns are ever copied while on the stack, just used
without regard for their reference count. So the second method is the
same as mine, but mine combines the two counts.

I assume you mean AFLAG for the field. There are 17 extra flags, defined
only for verbs, further on in jtype.h ("type V flag values"). These end
at 2^24, leaving the top 32 bits untouched. But given how quickly we
seem to be proposing new flags, it may not be a good idea to use those.

Marshall

----------
From: *Henry Rich* <henryhr...@gmail.com>
Date: 30 April 2016 at 04:34
To: jeng...@jsoftware.com


[I thought the V flags were for the flag field of the V struct, not the A
struct.]

I have better ideas about this problem now.

First, though, let me say this.  When there were still problems after you
put in the nonrecursive use-counts, and you suspected some code that relied
on the old methods: that's almost a fatal problem, isn't it?  Because we
can't hope to find all the old code, and would never want to release until
we were sure we had.  We need a design that guarantees compatibility.



Here's what you have learned: execution freely allocates blocks, putting
them on the stack, and combines them promiscuously, producing objects, one
of which becomes the result z for use in EPILOG(z).  The objects produced
by execution all have the old recursive use-counts, and thus must be freed
using the recursive method.

Yet we would very much like objects to use nonrecursive use-counts.  This
suggests the following design:

Blocks are either nonrecursive or semirecursive and flagged to indicate
which kind they are.  The use-count of a nonrecursive block has not fully
been included in the use-counts of its descendants, and need not be, while
the use-count of a semirecursive block has been propagated to descendants.

The descendants of a nonrecursive block are all nonrecursive.  The
descendants of a semirecursive block can be either type.
Incrementing/decrementing the use count involves traversing the tree, but
the traversal can be terminated when a nonrecursive block is encountered.

Nonrecursive blocks are created ONLY by EPILOG(z).  After the stack has
been freed, the result z is traversed to transform it from semirecursive to
nonrecursive.

Important lemma (please check!): No block pointed to by the stack will ever
be contained in a nonrecursive block. Proof: All blocks are allocated as
semirecursive, and execution only combines these into larger semirecursive
aggregates. (replacing-in-place is not implemented for indirect data types
- yet).



That's the design.  It's saying that before EPILOG, anything goes.
Whatever the Old Code produces is OK.  At EPILOG() time, the surviving z is
then put into efficient nonrecursive form, after the stack has been taken
care of.

Examples:

J sentence:   a (; <@(+/)"1) b

<@(+/)"1 will run on b to produce its result, in semirecursive form.
EPILOG will turn this into the nonrecursive result, call it c.

a starts out nonrecursive (if it is an indirect type), because it is the
result of some earlier EPILOG.

a ; c   will join the two nonrecursive values into a semirecursive result.
But traversing that result during EPILOG will be fast, because the
traversal may stop after one level of recursion.

J sentence: ({. a) ; 2 {:: b

(2 {:: b) selects the subtree of b (and presumably increments its use
count, but that's in the Old Code and we don't have to worry about it).
This subtree will most likely be nonrecursive. Depending on the type of a,
the result of ({. a) is either a new semirecursive block or the address of
the nonrecursive subtree of a.  These are joined by ; into a semirecursive
result, which again can be traversed quickly.  The key point is that no
block on the stack can ever appear inside a nonrecursive block and thus
cannot cause trouble when the stack is freed.



This design needs a flag NONRECURSIVE, but it does not need the ONSTACK
flag.

Henry

----------
From: *Marshall Lochbaum* <mwlochb...@gmail.com>
Date: 30 April 2016 at 16:43
To: jeng...@jsoftware.com


I think this is the scheme I was grasping for. The recursive flag (I
called it AFREC) is just a better name for the stack flag--in fact
objects are returned to the stack after being pulled off in EPILOG, so
that name is wrong. The major change is realizing that ra and fa (copy
and delete copy) should work differently (recursively) on objects with
the flag set.

It seems to work, but there are problems with the end of jtxdefn, which
has its own variant of EPILOG (jtxdefn is the only such function, as I
have verified by checking for uses of tpush outside m.c). This epilog
frees the symbol table for local variables, but it seems this is done
incorrectly as it causes invalid reads later on. It shouldn't take too
long to fix.

And you're right about the V flags versus A flags. The perils of untyped
data...

Marshall

----------
From: *Henry Rich* <henryhr...@gmail.com>
Date: 30 April 2016 at 16:57
To: jeng...@jsoftware.com


That sounds promising.  I think we're in agreement.

Does EPILOG(z) put anything back onto the stack except for z?  [I'm just
trying to make sure I keep up with how it works (but trying to avoid
looking through the code).]  If it does, do you have to do anything
different when z is eventually freed? Not that there's a problem, since you
have the recursive flag to tell you what to do.

I guess I'd better understand this code if I'm going to talk about it.  I
would look in PROLOG, EPILOG, and where else to see how the stack is used?


If we wanted to save 8 bytes in the header, we could move those A flags
into the low bits of the use count.  Not now, but someday...


If we couldn't deal with untyped data, we wouldn't be working in J, would
we?

Henry

----------
From: *Marshall Lochbaum* <mwlochb...@gmail.com>
Date: 30 April 2016 at 17:22
To: jeng...@jsoftware.com


EPILOG(z) pushes z onto the stack (with tpush). Currently, this is
recursive, so it pushes all of z's descendents as well. In the new
paradigm, it isn't, and only z is added.

PROLOG and EPILOG are very simple:
#define PROLOG          I _ttop=jt->tbase+jt->ttop
#define EPILOG(z)       R gc(z,_ttop)
A jtgc (J jt,A w,I old){ra(w); tpop(old); R tpush(w);}

where ra copies w and tpush(w) adds it to the stack. tpop(old) pops
(with fr, which is not recursive) all the items after index old from the
stack. In the new version I have replace tpush with a different function
that derecursivinates the argument, then pushes it.

Since I've written out most of it, I will probably make my comments on
memory management into a complete guide and post it somewhere soon.

Marshall

----------
From: *Henry Rich* <henryhr...@gmail.com>
Date: 1 May 2016 at 07:33
To: jeng...@jsoftware.com


I think this works, but I believe my lemma from last post is flawed (more
below).  First, some advice _de profundis_.

I urge you to write out your comments right now, before writing any (more)
code, as a way to prove the design before coding.  This should take the
form of a short paper, included as commentary in the code, in which you
describe the principles of the design, the actions that can be performed,
and the assertions that can be made about the state of the system at
various times.  Then, give a passably rigorous proof of the assertions.

This is a good way to design, and I have the scars to prove it.  Many times
I have had to tear up a design because the attempt to formally prove
correctness turned up flaws.  Not only will you save coding time, you will
learn where you need to spend your testing effort.

Of course, all who come after you will be grateful for the commentary.  But
you yourself will get your greatest benefit from it the earlier you write
it.  Post it here by all means when it's done, but makes sure it gets into
the code.


Now, about those memory blocks.  Here's my current understanding.

All allocated blocks bear the curse of Adam: burdened with Original Sin,
the time of their doom is marked on the stack at the moment of their
birth.  Their use-count is 1, but that is borrowed time.  After their
threescore and ten, they return, dust to dust, to the memory pool.

All, that is, except one elect from each generation.  This virtuous block,
z, is selected to carry the faith to the next generation.  The use-counts
of z are incremented, expunging the Original Sin, and z survives the
reckoning.  But only temporarily: z is then put back onto the stack,
reinstating the Original Sin, and z's added life is possibly for just one
generation, possibly for more, according to the will of the Lord.

When z is put back onto the stack, it is as a born-again nonrecursive
block, in which the top block of z answers for all its descendants.

Here is the question: are we sure z is up to the task?  I thought last time
I had proved Yes, that z could not contain blocks that are on the stack
slated for destruction.  But now I see that that is false: you could have
this scenario:

Call level 1: PROLOG; create block A; pass block A into call level 2
Call level 2: PROLOG; create block B, an indirect block that points to A;
EPILOG(B)

The stack now points to A and B; B is marked nonrecursive, but what about A?

I think the answer is that it has to be OK, because Call level 2 must have
incremented the use-count of A when it incorporated A into B.  That is, if
it didn't the system wouldn't work to begin with, so we don't have to be
concerned with the details of how that happens.

This suggests a corollary concerning the nonrecursive flag: it indicates
'this block has been through EPILOG and had its use-counts incremented.'
That inoculates it against having itself OR ANY COMPONENT destroyed before
its time.

This scenario imposes a requirement on the derecursivinative procedure: it
must not make any attempt to collect use-counts and forward them to higher
levels.  One might think it clever to notice that all the use-counts of the
children of a node are 2, and pull one use-count into the parent.  It
_would_ be clever, except that it would leave the child exposed to
destruction from the stack or destruction of a name.

[You should include the discussion of the previous paragraph, including the
issues, decisions, and alternatives, in your design document.  It will help
readers greatly to understand the issues involved.  Also include a proof of
the OR ANY COMPONENT statement two paragraphs back.]

Henry

----------
From: *Marshall Lochbaum* <mwlochb...@gmail.com>
Date: 5 May 2016 at 12:02
To: jeng...@jsoftware.com


The problems I thought were due to the not-EPILOG in jtxdefn were
actually a result of a change I made to debug: reducing PLIM (and PLIML,
its logarithm) in m.c so that more objects would be allocated on the C
stack rather than J's pool. This makes more memory leaks detectable with
a tool like valgrind, since leaked objects in the pool are still seen as
reachable. Unfortunately and for reasons I don't understand, it also
introduces a number of bugs.

The new memory scheme now passes a lot of tests, but fails a few.
Continuing investigation.

Marshall

----------
From: *Marshall Lochbaum* <mwlochb...@gmail.com>
Date: 5 May 2016 at 13:52
To: jeng...@jsoftware.com


Here's the formal(ish) proof that managing memory derecursivinatively
doesn't break anything that wasn't already broken. There are two
assumptions about current the J behavior, which are collected at the
end.

Definition:
A referrer is either a J object or the stack. An object references
another object if that object would be visited by a call to traverse
(with a non-recursive function as its argument). The stack is a list of
pointers; it references each object it points to. A referrer may
reference an object multiple times if it is visited multiple times by
traverse or appears multiple times in the stack. The list of objects it
refers to are its referents.

Code specification:
Every object is created with AFREC set and reference count zero. Calling
EPILOG(z) copies z, then pops from the stack, then calls derec(z,0).
Calling derec(w,n), where w's reference count is initially m, decrements
w's reference count by n. Then, if w has AFREC set, it calls
derec(z,m-1) on each object referred to by z and removes the flag.
There is no other way to remove the AFREC flag, and no way to add the
AFREC flag at all.

derec(z,0) is equivalent to derec1(z), which uses more traversals:
derec1(z) does nothing if z does not have AFREC set, and otherwise
decrements the counts of each of z's descendants by one less than z's
reference count, then recurses.

Invariant 1:
At all times except during calls to derec, an object z that refers to an
object marked AFREC is also marked AFREC. Equivalently, descendants of
non-recursive objects are never recursive.
Assume: Descendants are added only to objects not marked AFREC.
Proof: When z is created, it has no descendants, so the property holds.
Three events can potentially falsify the invariant:
- A new descendant is added to z. If z does not have AFREC set, then
  descendants cannot be added to it, by the assumption. If it does have
  AFREC set, the invariant holds regardless.
- The AFREC flag is added to a descendant of z. This is impossible.
- The AFREC flag is removed from z--that is, derec is called on z, and
  AFREC is set for z. In this case, derec removes the flag and recurses.
  But then the flag will be removed for each descendant as well, and the
  invariant still holds.

Invariant 2:
The reference count of an object is the sum over all referrers (with
multiplicity) of either 1, if the referrer is the stack or an object not
marked AFREC, or that object's reference count, if the referrer is an
object marked AFREC.
Except during the execution of memory functions, this invariant holds
- At all times for objects not marked AFREC, and
- When EPILOG(z) is called for objects z marked AFREC and their
  descendants.
We assume the invariant in the second case. If not, the old system was
broken.
We now show that derec1(z), equivalent to the main work of EPILOG
derec(z,0), does not falsify the invariant. We omit the proof that other
memory operations (ra, fa, tpop) satisfy it, as it is simple in each
case.
- If AFREC is not set, then derec1(z) does nothing by definition.
- If AFREC is set, then derec1(z) removes the AFREC flag from z, which
  decreases the corresponding terms in the reference formula from m to
  1, where m is the reference count of z. It also decrements the counts
  of z's descendants, to the same effect. Note that the formula counts
  referrers with multiplicity, and traverse does the same. Then derec1
  recurses, which maintains our invariant proveded the rest of the
  function does as well.


So, our assumptions about J's code base are:

1. Descendants are only added to objects marked AFREC.
This could be violated by some things like in-place amend. If it is, it
will break invariant 1, which is easy to test for in derec by doing a
full recurse. The fix is to make sure each addition calls derec on the
added objects.

2. When EPILOG(z) is called, z and its descendants obey the reference
   count formula.
Mostly, this depends on J working already. The only way the changes can
break it is if a function does strange memory stuff (not fa or ra) to an
object not marked AFREC. But that's not really out of the question.

There are about 40 failing tests right now, and they're probably due to
specific functions that break one of the invariants. Hopefully there are
a lot fewer than 40 functions causing these failures.

Marshall

----------
From: *Henry Rich* <henryhr...@gmail.com>
Date: 10 May 2016 at 16:28
To: jeng...@jsoftware.com


Just starting to look at this.

Creating a block with AFREC set and reference count 0 may be trouble.  I
see in places where the code checks for AC(w)>1 to indicate that a block
has been put into a boxed structure.  That would not work in the new system.

My suggestion: put AFREC in the low bit of AC, and go through all
occurrences of AC, replacing them with a macro to extract the right value
from AC-plus-flag.

Henry
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to