From: *Marshall Lochbaum* <mwlochb...@gmail.com> Date: 29 April 2016 at 09:51 To: jeng...@jsoftware.com
I am currently working on changing J's current reference counting scheme to a much more efficient and standard method. J stores a reference count in every object (AC(w)) indicating the number of times it is referenced. The count is incremented when a copy is made and decremented when one is deleted. Once the reference count drops to zero, J knows the object can be freed, and does so. Currently, J uses an unusual recursive procedure for reference counts. If a boxed array (or verb with arguments, or other object with children) is copied, then the reference counts of all its children (items in boxes) are incremented, and so on recursively. This makes copies of composite objects take a long time. The typical procedure is to treat the parent/child relationship as a single reference. Thus the reference counts of children are incremented when they are added to the parent, and decremented when the parent is deleted, and otherwise not changed. To copy a composite object, simply increment its reference count, and to delete, decrement it. I am currently trying to convince the J engine to use this procedure. The hangup is another idiosyncratic J feature, which in contrast to the weird reference count scheme is actually useful. J maintains a stack of local nouns, whose function (but not implementation) is equivalent to that of C's stack. Any time an object is created, a pointer to it is pushed to the stack. Thus J functions can ignore all the reference count arithmetic (which from experience I know is quite painful) while using objects. Instead, they call PROLOG, which saves the location of the beginning of the stack, before doing anything, and EPILOG(z), where z is the value to be returned, after everything. EPILOG copies z, deletes everything added to the stack since PROLOG was called, and returns z. A complication arises if z is a composite object. In this case we want to avoid deleting any of z's children, despite the fact that they may be on the stack. In the current paradigm this is automatic: on copying z, all of its children are also copied, and deleting the copies on the stack leaves those copies. But if incrementing the reference count is not recursive, this doesn't happen, and the children are deleted, leaving z with dangling pointers. In a sense, the problem arises because when verbs connect z to its children, they don't bother to inform the children. The reference from z to its children cannot be reflected in their reference counts. So the correct fix might be to increment the reference counts as they are added to z. This is not feasible: the places where children are added to objects just look like assignments, and are impossible to detect automatically. There is a workaround: EPILOG should do the pointer-incrementing job for us. So rather than a simple copy of z, it actually goes through all of z's children and increments pointers. Specifically, it should increment the descendents which are children of objects on the stack. This requires an extra flag: AFSTK, which indicates whether an object is on the stack. I have implemented all the above, and am still getting bugs. I think that I have made all the necessary architecture changes, and that the remainder is just a few spots that relied in some way on the old system, but it's impossible to be sure how much work there is left to do. I will update on future progress. Marshall ---------- From: *Henry Rich* <henryhr...@gmail.com> Date: 29 April 2016 at 12:40 To: jeng...@jsoftware.com Your workaround seems sound. Let me know what you find the problems are. To see if I understand the situation, would this work: keep a long long counter of PROLOGs, giving each PROLOG a unique number; store the current number in the block header when a block is created. Then at EPILOG, increment use-counts in the children of Z that have the stored value matching the current number. This would add 8 bytes to the header, so I don't like it, but it would keep you from having to match children-of-z with children-of-the-noun-stack. Henry ---------- From: *Henry Rich* <henryhr...@gmail.com> Date: 29 April 2016 at 17:18 To: jeng...@jsoftware.com Is one way to think of this problem that the use-counts in z need to be the old recursive style, so that they can be freed by the epilogue? If so, a solution would be to traverse z, converting its use-counts to recursive style (and incrementing - I guess that's what you referred to as 'copying'); decrement them in recursive style; then traverse again, converting back to nonrecursive style. Would that work? If so, a more elegant solution would be to have two use counts: a count that has yet to be propagated to descendants, and a count that has been propagated. The use-count routines would handle these correctly. EPILOG would traverse to push unpropagated use-count to descendants, and then free the blocks as before. The conversion back to nonrecursive style would not be required, because both styles would coexist. Implementation: I have my eye on the AF() field, which seems to have just 4 bits assigned - is that right? Reserve 6 bits for flags, and use the upper bits for the unpropagated usecount. At the same time, make the corresponding change to AC(), leaving 6 bits of flags and the rest propagated usecount. Henry On 4/29/2016 12:51 PM, Marshall Lochbaum wrote: ---------- From: *Marshall Lochbaum* <mwlochb...@gmail.com> Date: 29 April 2016 at 21:56 To: jeng...@jsoftware.com I'm not sure whether the first solution would work. Converting between the two styles is simple enough--recursively add or remove the reference count of the parent from all of its descendents--but the overall procedure is kind of complicated, and I still don't know exactly how the stack is used all of the time. I don't think that nouns are ever copied while on the stack, just used without regard for their reference count. So the second method is the same as mine, but mine combines the two counts. I assume you mean AFLAG for the field. There are 17 extra flags, defined only for verbs, further on in jtype.h ("type V flag values"). These end at 2^24, leaving the top 32 bits untouched. But given how quickly we seem to be proposing new flags, it may not be a good idea to use those. Marshall ---------- From: *Henry Rich* <henryhr...@gmail.com> Date: 30 April 2016 at 04:34 To: jeng...@jsoftware.com [I thought the V flags were for the flag field of the V struct, not the A struct.] I have better ideas about this problem now. First, though, let me say this. When there were still problems after you put in the nonrecursive use-counts, and you suspected some code that relied on the old methods: that's almost a fatal problem, isn't it? Because we can't hope to find all the old code, and would never want to release until we were sure we had. We need a design that guarantees compatibility. Here's what you have learned: execution freely allocates blocks, putting them on the stack, and combines them promiscuously, producing objects, one of which becomes the result z for use in EPILOG(z). The objects produced by execution all have the old recursive use-counts, and thus must be freed using the recursive method. Yet we would very much like objects to use nonrecursive use-counts. This suggests the following design: Blocks are either nonrecursive or semirecursive and flagged to indicate which kind they are. The use-count of a nonrecursive block has not fully been included in the use-counts of its descendants, and need not be, while the use-count of a semirecursive block has been propagated to descendants. The descendants of a nonrecursive block are all nonrecursive. The descendants of a semirecursive block can be either type. Incrementing/decrementing the use count involves traversing the tree, but the traversal can be terminated when a nonrecursive block is encountered. Nonrecursive blocks are created ONLY by EPILOG(z). After the stack has been freed, the result z is traversed to transform it from semirecursive to nonrecursive. Important lemma (please check!): No block pointed to by the stack will ever be contained in a nonrecursive block. Proof: All blocks are allocated as semirecursive, and execution only combines these into larger semirecursive aggregates. (replacing-in-place is not implemented for indirect data types - yet). That's the design. It's saying that before EPILOG, anything goes. Whatever the Old Code produces is OK. At EPILOG() time, the surviving z is then put into efficient nonrecursive form, after the stack has been taken care of. Examples: J sentence: a (; <@(+/)"1) b <@(+/)"1 will run on b to produce its result, in semirecursive form. EPILOG will turn this into the nonrecursive result, call it c. a starts out nonrecursive (if it is an indirect type), because it is the result of some earlier EPILOG. a ; c will join the two nonrecursive values into a semirecursive result. But traversing that result during EPILOG will be fast, because the traversal may stop after one level of recursion. J sentence: ({. a) ; 2 {:: b (2 {:: b) selects the subtree of b (and presumably increments its use count, but that's in the Old Code and we don't have to worry about it). This subtree will most likely be nonrecursive. Depending on the type of a, the result of ({. a) is either a new semirecursive block or the address of the nonrecursive subtree of a. These are joined by ; into a semirecursive result, which again can be traversed quickly. The key point is that no block on the stack can ever appear inside a nonrecursive block and thus cannot cause trouble when the stack is freed. This design needs a flag NONRECURSIVE, but it does not need the ONSTACK flag. Henry ---------- From: *Marshall Lochbaum* <mwlochb...@gmail.com> Date: 30 April 2016 at 16:43 To: jeng...@jsoftware.com I think this is the scheme I was grasping for. The recursive flag (I called it AFREC) is just a better name for the stack flag--in fact objects are returned to the stack after being pulled off in EPILOG, so that name is wrong. The major change is realizing that ra and fa (copy and delete copy) should work differently (recursively) on objects with the flag set. It seems to work, but there are problems with the end of jtxdefn, which has its own variant of EPILOG (jtxdefn is the only such function, as I have verified by checking for uses of tpush outside m.c). This epilog frees the symbol table for local variables, but it seems this is done incorrectly as it causes invalid reads later on. It shouldn't take too long to fix. And you're right about the V flags versus A flags. The perils of untyped data... Marshall ---------- From: *Henry Rich* <henryhr...@gmail.com> Date: 30 April 2016 at 16:57 To: jeng...@jsoftware.com That sounds promising. I think we're in agreement. Does EPILOG(z) put anything back onto the stack except for z? [I'm just trying to make sure I keep up with how it works (but trying to avoid looking through the code).] If it does, do you have to do anything different when z is eventually freed? Not that there's a problem, since you have the recursive flag to tell you what to do. I guess I'd better understand this code if I'm going to talk about it. I would look in PROLOG, EPILOG, and where else to see how the stack is used? If we wanted to save 8 bytes in the header, we could move those A flags into the low bits of the use count. Not now, but someday... If we couldn't deal with untyped data, we wouldn't be working in J, would we? Henry ---------- From: *Marshall Lochbaum* <mwlochb...@gmail.com> Date: 30 April 2016 at 17:22 To: jeng...@jsoftware.com EPILOG(z) pushes z onto the stack (with tpush). Currently, this is recursive, so it pushes all of z's descendents as well. In the new paradigm, it isn't, and only z is added. PROLOG and EPILOG are very simple: #define PROLOG I _ttop=jt->tbase+jt->ttop #define EPILOG(z) R gc(z,_ttop) A jtgc (J jt,A w,I old){ra(w); tpop(old); R tpush(w);} where ra copies w and tpush(w) adds it to the stack. tpop(old) pops (with fr, which is not recursive) all the items after index old from the stack. In the new version I have replace tpush with a different function that derecursivinates the argument, then pushes it. Since I've written out most of it, I will probably make my comments on memory management into a complete guide and post it somewhere soon. Marshall ---------- From: *Henry Rich* <henryhr...@gmail.com> Date: 1 May 2016 at 07:33 To: jeng...@jsoftware.com I think this works, but I believe my lemma from last post is flawed (more below). First, some advice _de profundis_. I urge you to write out your comments right now, before writing any (more) code, as a way to prove the design before coding. This should take the form of a short paper, included as commentary in the code, in which you describe the principles of the design, the actions that can be performed, and the assertions that can be made about the state of the system at various times. Then, give a passably rigorous proof of the assertions. This is a good way to design, and I have the scars to prove it. Many times I have had to tear up a design because the attempt to formally prove correctness turned up flaws. Not only will you save coding time, you will learn where you need to spend your testing effort. Of course, all who come after you will be grateful for the commentary. But you yourself will get your greatest benefit from it the earlier you write it. Post it here by all means when it's done, but makes sure it gets into the code. Now, about those memory blocks. Here's my current understanding. All allocated blocks bear the curse of Adam: burdened with Original Sin, the time of their doom is marked on the stack at the moment of their birth. Their use-count is 1, but that is borrowed time. After their threescore and ten, they return, dust to dust, to the memory pool. All, that is, except one elect from each generation. This virtuous block, z, is selected to carry the faith to the next generation. The use-counts of z are incremented, expunging the Original Sin, and z survives the reckoning. But only temporarily: z is then put back onto the stack, reinstating the Original Sin, and z's added life is possibly for just one generation, possibly for more, according to the will of the Lord. When z is put back onto the stack, it is as a born-again nonrecursive block, in which the top block of z answers for all its descendants. Here is the question: are we sure z is up to the task? I thought last time I had proved Yes, that z could not contain blocks that are on the stack slated for destruction. But now I see that that is false: you could have this scenario: Call level 1: PROLOG; create block A; pass block A into call level 2 Call level 2: PROLOG; create block B, an indirect block that points to A; EPILOG(B) The stack now points to A and B; B is marked nonrecursive, but what about A? I think the answer is that it has to be OK, because Call level 2 must have incremented the use-count of A when it incorporated A into B. That is, if it didn't the system wouldn't work to begin with, so we don't have to be concerned with the details of how that happens. This suggests a corollary concerning the nonrecursive flag: it indicates 'this block has been through EPILOG and had its use-counts incremented.' That inoculates it against having itself OR ANY COMPONENT destroyed before its time. This scenario imposes a requirement on the derecursivinative procedure: it must not make any attempt to collect use-counts and forward them to higher levels. One might think it clever to notice that all the use-counts of the children of a node are 2, and pull one use-count into the parent. It _would_ be clever, except that it would leave the child exposed to destruction from the stack or destruction of a name. [You should include the discussion of the previous paragraph, including the issues, decisions, and alternatives, in your design document. It will help readers greatly to understand the issues involved. Also include a proof of the OR ANY COMPONENT statement two paragraphs back.] Henry ---------- From: *Marshall Lochbaum* <mwlochb...@gmail.com> Date: 5 May 2016 at 12:02 To: jeng...@jsoftware.com The problems I thought were due to the not-EPILOG in jtxdefn were actually a result of a change I made to debug: reducing PLIM (and PLIML, its logarithm) in m.c so that more objects would be allocated on the C stack rather than J's pool. This makes more memory leaks detectable with a tool like valgrind, since leaked objects in the pool are still seen as reachable. Unfortunately and for reasons I don't understand, it also introduces a number of bugs. The new memory scheme now passes a lot of tests, but fails a few. Continuing investigation. Marshall ---------- From: *Marshall Lochbaum* <mwlochb...@gmail.com> Date: 5 May 2016 at 13:52 To: jeng...@jsoftware.com Here's the formal(ish) proof that managing memory derecursivinatively doesn't break anything that wasn't already broken. There are two assumptions about current the J behavior, which are collected at the end. Definition: A referrer is either a J object or the stack. An object references another object if that object would be visited by a call to traverse (with a non-recursive function as its argument). The stack is a list of pointers; it references each object it points to. A referrer may reference an object multiple times if it is visited multiple times by traverse or appears multiple times in the stack. The list of objects it refers to are its referents. Code specification: Every object is created with AFREC set and reference count zero. Calling EPILOG(z) copies z, then pops from the stack, then calls derec(z,0). Calling derec(w,n), where w's reference count is initially m, decrements w's reference count by n. Then, if w has AFREC set, it calls derec(z,m-1) on each object referred to by z and removes the flag. There is no other way to remove the AFREC flag, and no way to add the AFREC flag at all. derec(z,0) is equivalent to derec1(z), which uses more traversals: derec1(z) does nothing if z does not have AFREC set, and otherwise decrements the counts of each of z's descendants by one less than z's reference count, then recurses. Invariant 1: At all times except during calls to derec, an object z that refers to an object marked AFREC is also marked AFREC. Equivalently, descendants of non-recursive objects are never recursive. Assume: Descendants are added only to objects not marked AFREC. Proof: When z is created, it has no descendants, so the property holds. Three events can potentially falsify the invariant: - A new descendant is added to z. If z does not have AFREC set, then descendants cannot be added to it, by the assumption. If it does have AFREC set, the invariant holds regardless. - The AFREC flag is added to a descendant of z. This is impossible. - The AFREC flag is removed from z--that is, derec is called on z, and AFREC is set for z. In this case, derec removes the flag and recurses. But then the flag will be removed for each descendant as well, and the invariant still holds. Invariant 2: The reference count of an object is the sum over all referrers (with multiplicity) of either 1, if the referrer is the stack or an object not marked AFREC, or that object's reference count, if the referrer is an object marked AFREC. Except during the execution of memory functions, this invariant holds - At all times for objects not marked AFREC, and - When EPILOG(z) is called for objects z marked AFREC and their descendants. We assume the invariant in the second case. If not, the old system was broken. We now show that derec1(z), equivalent to the main work of EPILOG derec(z,0), does not falsify the invariant. We omit the proof that other memory operations (ra, fa, tpop) satisfy it, as it is simple in each case. - If AFREC is not set, then derec1(z) does nothing by definition. - If AFREC is set, then derec1(z) removes the AFREC flag from z, which decreases the corresponding terms in the reference formula from m to 1, where m is the reference count of z. It also decrements the counts of z's descendants, to the same effect. Note that the formula counts referrers with multiplicity, and traverse does the same. Then derec1 recurses, which maintains our invariant proveded the rest of the function does as well. So, our assumptions about J's code base are: 1. Descendants are only added to objects marked AFREC. This could be violated by some things like in-place amend. If it is, it will break invariant 1, which is easy to test for in derec by doing a full recurse. The fix is to make sure each addition calls derec on the added objects. 2. When EPILOG(z) is called, z and its descendants obey the reference count formula. Mostly, this depends on J working already. The only way the changes can break it is if a function does strange memory stuff (not fa or ra) to an object not marked AFREC. But that's not really out of the question. There are about 40 failing tests right now, and they're probably due to specific functions that break one of the invariants. Hopefully there are a lot fewer than 40 functions causing these failures. Marshall ---------- From: *Henry Rich* <henryhr...@gmail.com> Date: 10 May 2016 at 16:28 To: jeng...@jsoftware.com Just starting to look at this. Creating a block with AFREC set and reference count 0 may be trouble. I see in places where the code checks for AC(w)>1 to indicate that a block has been put into a boxed structure. That would not work in the new system. My suggestion: put AFREC in the low bit of AC, and go through all occurrences of AC, replacing them with a macro to extract the right value from AC-plus-flag. Henry ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm