---------- Forwarded message ----------
From: Henry Rich <henryhr...@gmail.com>
Date: 25 April 2016 at 17:35
Subject: Initial proposals for work
To: jeng...@jsoftware.com


This is a list of things that I have in mind to work on.  These are mostly
features that will improve performance, rather than new functionality.
That's because I don't think I'm ready to write, say, new special code for
f"_1@:(g/\.)@:(<@[ ,~ <"_1@]).  I would like us to discuss what new special
code we need, or other work, so that I and others can work on them as we
feel up to it.

I developed this list after a look with Marshall at where time is spent in
a large J task (the task was a LaTeX file conversion).  I took aim at the
spots that spent lots of time, while keeping in mind what I know about the
JE and other opportunities for improvement.  The #1 time-user was the
parser, but reference counts, quite surprisingly, took almost as much time.

For calibration purposes, I would rate my recent parser rewrite as (Effort:
Moderate, Risk: Small, Payoff: Large)



1. Miscellaneous recoding
Some high-use functions can be recoded for better speed.  I know of bp()
and will look at others.

Effort: Small, Risk: Small, Payoff: Moderate

2. Reference Counts
Every data block has a reference count, aka use count.  In a boxed array,
each box has its own use count, including subboxes of boxes.  When a boxed
array is assigned, its use-counts are incremented by a recursive traversal
through the tree.  This can be quite time-consuming.  Marshall had the idea
that since all you really need to know is whether the use-count is equal to
1, the recursive traversal is needed only when the use count is decremented
to 0 or incremented to 1.  He ran this by Roger, who agrees (as do I).
Marshall is implementing this.

Effort: Small, Risk: Small, Payoff: Large

3. Reference Headers
As long as Marshall is in the area, we are going to create a new header
type, called a *reference header*, that contains a shape but no data; the
data is in a different header (or in a mapped file).  Reference headers
point to the data header, which holds the data use count that is shared by
all headers for a block.

Effort: Small, Risk: Small, Payoff: Nil

4. Use reference headers for reshaping
Verbs that merely reshape the data include (, y), (,: y), (,. y), (x $ y),
(,/ y), (x ($,) y), and maybe others.  These verbs will no longer need to
copy the data: they merely create a reference header and point to the data.

Effort: Moderate (only because of the number of verbs involved; it's a
small effort to make (, y) work, and then it's just a matter of copying the
change to other verbs), Risk: Small, Payoff: Large

5. Allow reference headers to change size as well as shape
There is no reason that the reference headers have to refer to exactly the
same data, as long as the common data is not disturbed.  Thus, (x , y) can
see if the allocation buffer for x has room for added elements, as if so,
append y to x in-place without copying x.  Other verbs that could operate
this way include (}. y), (}: y), (x {. y), (x }. y).  (a {. b }. y) would
run in small constant time regardless of the size of y.

Effort: Moderate, Risk: Moderate, Payoff: Large

6. Clean up unneeded special code
Much special code that implements forms such as (f@,) will no longer be
needed, since the reference header does the work.  For example, (f , y)
will perform like (f@, y).  The special code can be removed for simplicity.

Effort: Moderate, Risk: Moderate, Payoff: Nil
The real payoff is in code quality.  This might be a project left to new
JCDs as a way to get introduced to the code.

7. Revamp in-place operations
Operations are performed in-place only on certain brittle forms recognized
early in analysis (I haven't found where yet).  This leads to errors such as
name =: name ,"0 'abc'
which is erroneously treated as an in-place operation, and
name =: name m}~ values
which is not detected as in-place.

It also misses numerous opportunities for in-place operation, such as (m})
when used in a tacit verb, or in w m} x + y.

I have a design for how to solve this.  The parser will detect nouns that
can be modified in place - they will include intermediate results that are
not assigned to a name, or names that are reassigned to the same name.
Verbs that understand in-place operations are flagged.

Effort: Moderate, Risk: Moderate, Payoff: Moderate

8. Enhance m} to allow in-place update to boxed arrays.

Effort: Unsized, Risk: Unsized, Payoff: Large when it's needed

9. Reduce memory allocations
Every call to the parser allocates a stack: perhaps they can be combined
into a large stack.  Every numeric constant allocates a data area: perhaps
we should use the preallocated ones in num[], and add more preallocated
ones for the short lists that are often the right operands to " .  (i. y)
always allocates a buffer: couldn't we have just one buffer, say 100 long,
and preallocated reference headers pointing to it?

Effort: Small, Risk: Small, Payoff: Small
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to