---------- Forwarded message ---------- From: Henry Rich <henryhr...@gmail.com> Date: 25 April 2016 at 17:35 Subject: Initial proposals for work To: jeng...@jsoftware.com
This is a list of things that I have in mind to work on. These are mostly features that will improve performance, rather than new functionality. That's because I don't think I'm ready to write, say, new special code for f"_1@:(g/\.)@:(<@[ ,~ <"_1@]). I would like us to discuss what new special code we need, or other work, so that I and others can work on them as we feel up to it. I developed this list after a look with Marshall at where time is spent in a large J task (the task was a LaTeX file conversion). I took aim at the spots that spent lots of time, while keeping in mind what I know about the JE and other opportunities for improvement. The #1 time-user was the parser, but reference counts, quite surprisingly, took almost as much time. For calibration purposes, I would rate my recent parser rewrite as (Effort: Moderate, Risk: Small, Payoff: Large) 1. Miscellaneous recoding Some high-use functions can be recoded for better speed. I know of bp() and will look at others. Effort: Small, Risk: Small, Payoff: Moderate 2. Reference Counts Every data block has a reference count, aka use count. In a boxed array, each box has its own use count, including subboxes of boxes. When a boxed array is assigned, its use-counts are incremented by a recursive traversal through the tree. This can be quite time-consuming. Marshall had the idea that since all you really need to know is whether the use-count is equal to 1, the recursive traversal is needed only when the use count is decremented to 0 or incremented to 1. He ran this by Roger, who agrees (as do I). Marshall is implementing this. Effort: Small, Risk: Small, Payoff: Large 3. Reference Headers As long as Marshall is in the area, we are going to create a new header type, called a *reference header*, that contains a shape but no data; the data is in a different header (or in a mapped file). Reference headers point to the data header, which holds the data use count that is shared by all headers for a block. Effort: Small, Risk: Small, Payoff: Nil 4. Use reference headers for reshaping Verbs that merely reshape the data include (, y), (,: y), (,. y), (x $ y), (,/ y), (x ($,) y), and maybe others. These verbs will no longer need to copy the data: they merely create a reference header and point to the data. Effort: Moderate (only because of the number of verbs involved; it's a small effort to make (, y) work, and then it's just a matter of copying the change to other verbs), Risk: Small, Payoff: Large 5. Allow reference headers to change size as well as shape There is no reason that the reference headers have to refer to exactly the same data, as long as the common data is not disturbed. Thus, (x , y) can see if the allocation buffer for x has room for added elements, as if so, append y to x in-place without copying x. Other verbs that could operate this way include (}. y), (}: y), (x {. y), (x }. y). (a {. b }. y) would run in small constant time regardless of the size of y. Effort: Moderate, Risk: Moderate, Payoff: Large 6. Clean up unneeded special code Much special code that implements forms such as (f@,) will no longer be needed, since the reference header does the work. For example, (f , y) will perform like (f@, y). The special code can be removed for simplicity. Effort: Moderate, Risk: Moderate, Payoff: Nil The real payoff is in code quality. This might be a project left to new JCDs as a way to get introduced to the code. 7. Revamp in-place operations Operations are performed in-place only on certain brittle forms recognized early in analysis (I haven't found where yet). This leads to errors such as name =: name ,"0 'abc' which is erroneously treated as an in-place operation, and name =: name m}~ values which is not detected as in-place. It also misses numerous opportunities for in-place operation, such as (m}) when used in a tacit verb, or in w m} x + y. I have a design for how to solve this. The parser will detect nouns that can be modified in place - they will include intermediate results that are not assigned to a name, or names that are reassigned to the same name. Verbs that understand in-place operations are flagged. Effort: Moderate, Risk: Moderate, Payoff: Moderate 8. Enhance m} to allow in-place update to boxed arrays. Effort: Unsized, Risk: Unsized, Payoff: Large when it's needed 9. Reduce memory allocations Every call to the parser allocates a stack: perhaps they can be combined into a large stack. Every numeric constant allocates a data area: perhaps we should use the preallocated ones in num[], and add more preallocated ones for the short lists that are often the right operands to " . (i. y) always allocates a buffer: couldn't we have just one buffer, say 100 long, and preallocated reference headers pointing to it? Effort: Small, Risk: Small, Payoff: Small ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm