I am curious how payoff is measured (I see the estimates, I mean - are the payoffs decreased memory use? increased speed? on either of those do we have decent benchmarks? are they increased code stability? opportunities for better documentation? something else?)
I have seen many projects adopt "improved efficiency" measures which instead made the system slower and more fragile. I hope we can avoid that mistake here. (I also I hope I am not sticking my nose somewhere where it causes problems. Thanks, -- Raul On Wed, May 11, 2016 at 9:52 PM, chris burke <[email protected]> wrote: > ---------- Forwarded message ---------- > From: Henry Rich <[email protected]> > Date: 25 April 2016 at 17:35 > Subject: Initial proposals for work > To: [email protected] > > > This is a list of things that I have in mind to work on. These are mostly > features that will improve performance, rather than new functionality. > That's because I don't think I'm ready to write, say, new special code for > f"_1@:(g/\.)@:(<@[ ,~ <"_1@]). I would like us to discuss what new special > code we need, or other work, so that I and others can work on them as we > feel up to it. > > I developed this list after a look with Marshall at where time is spent in > a large J task (the task was a LaTeX file conversion). I took aim at the > spots that spent lots of time, while keeping in mind what I know about the > JE and other opportunities for improvement. The #1 time-user was the > parser, but reference counts, quite surprisingly, took almost as much time. > > For calibration purposes, I would rate my recent parser rewrite as (Effort: > Moderate, Risk: Small, Payoff: Large) > > > > 1. Miscellaneous recoding > Some high-use functions can be recoded for better speed. I know of bp() > and will look at others. > > Effort: Small, Risk: Small, Payoff: Moderate > > 2. Reference Counts > Every data block has a reference count, aka use count. In a boxed array, > each box has its own use count, including subboxes of boxes. When a boxed > array is assigned, its use-counts are incremented by a recursive traversal > through the tree. This can be quite time-consuming. Marshall had the idea > that since all you really need to know is whether the use-count is equal to > 1, the recursive traversal is needed only when the use count is decremented > to 0 or incremented to 1. He ran this by Roger, who agrees (as do I). > Marshall is implementing this. > > Effort: Small, Risk: Small, Payoff: Large > > 3. Reference Headers > As long as Marshall is in the area, we are going to create a new header > type, called a *reference header*, that contains a shape but no data; the > data is in a different header (or in a mapped file). Reference headers > point to the data header, which holds the data use count that is shared by > all headers for a block. > > Effort: Small, Risk: Small, Payoff: Nil > > 4. Use reference headers for reshaping > Verbs that merely reshape the data include (, y), (,: y), (,. y), (x $ y), > (,/ y), (x ($,) y), and maybe others. These verbs will no longer need to > copy the data: they merely create a reference header and point to the data. > > Effort: Moderate (only because of the number of verbs involved; it's a > small effort to make (, y) work, and then it's just a matter of copying the > change to other verbs), Risk: Small, Payoff: Large > > 5. Allow reference headers to change size as well as shape > There is no reason that the reference headers have to refer to exactly the > same data, as long as the common data is not disturbed. Thus, (x , y) can > see if the allocation buffer for x has room for added elements, as if so, > append y to x in-place without copying x. Other verbs that could operate > this way include (}. y), (}: y), (x {. y), (x }. y). (a {. b }. y) would > run in small constant time regardless of the size of y. > > Effort: Moderate, Risk: Moderate, Payoff: Large > > 6. Clean up unneeded special code > Much special code that implements forms such as (f@,) will no longer be > needed, since the reference header does the work. For example, (f , y) > will perform like (f@, y). The special code can be removed for simplicity. > > Effort: Moderate, Risk: Moderate, Payoff: Nil > The real payoff is in code quality. This might be a project left to new > JCDs as a way to get introduced to the code. > > 7. Revamp in-place operations > Operations are performed in-place only on certain brittle forms recognized > early in analysis (I haven't found where yet). This leads to errors such as > name =: name ,"0 'abc' > which is erroneously treated as an in-place operation, and > name =: name m}~ values > which is not detected as in-place. > > It also misses numerous opportunities for in-place operation, such as (m}) > when used in a tacit verb, or in w m} x + y. > > I have a design for how to solve this. The parser will detect nouns that > can be modified in place - they will include intermediate results that are > not assigned to a name, or names that are reassigned to the same name. > Verbs that understand in-place operations are flagged. > > Effort: Moderate, Risk: Moderate, Payoff: Moderate > > 8. Enhance m} to allow in-place update to boxed arrays. > > Effort: Unsized, Risk: Unsized, Payoff: Large when it's needed > > 9. Reduce memory allocations > Every call to the parser allocates a stack: perhaps they can be combined > into a large stack. Every numeric constant allocates a data area: perhaps > we should use the preallocated ones in num[], and add more preallocated > ones for the short lists that are often the right operands to " . (i. y) > always allocates a buffer: couldn't we have just one buffer, say 100 long, > and preallocated reference headers pointing to it? > > Effort: Small, Risk: Small, Payoff: Small > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
