I am curious how payoff is measured (I see the estimates, I mean - are
the payoffs decreased memory use? increased speed? on either of those
do we have decent benchmarks? are they increased code stability?
opportunities for better documentation? something else?)

I have seen many projects adopt "improved efficiency" measures which
instead made the system slower and more fragile. I hope we can avoid
that mistake here.

(I also I hope I am not sticking my nose somewhere where it causes problems.

Thanks,

-- 
Raul


On Wed, May 11, 2016 at 9:52 PM, chris burke <[email protected]> wrote:
> ---------- Forwarded message ----------
> From: Henry Rich <[email protected]>
> Date: 25 April 2016 at 17:35
> Subject: Initial proposals for work
> To: [email protected]
>
>
> This is a list of things that I have in mind to work on.  These are mostly
> features that will improve performance, rather than new functionality.
> That's because I don't think I'm ready to write, say, new special code for
> f"_1@:(g/\.)@:(<@[ ,~ <"_1@]).  I would like us to discuss what new special
> code we need, or other work, so that I and others can work on them as we
> feel up to it.
>
> I developed this list after a look with Marshall at where time is spent in
> a large J task (the task was a LaTeX file conversion).  I took aim at the
> spots that spent lots of time, while keeping in mind what I know about the
> JE and other opportunities for improvement.  The #1 time-user was the
> parser, but reference counts, quite surprisingly, took almost as much time.
>
> For calibration purposes, I would rate my recent parser rewrite as (Effort:
> Moderate, Risk: Small, Payoff: Large)
>
>
>
> 1. Miscellaneous recoding
> Some high-use functions can be recoded for better speed.  I know of bp()
> and will look at others.
>
> Effort: Small, Risk: Small, Payoff: Moderate
>
> 2. Reference Counts
> Every data block has a reference count, aka use count.  In a boxed array,
> each box has its own use count, including subboxes of boxes.  When a boxed
> array is assigned, its use-counts are incremented by a recursive traversal
> through the tree.  This can be quite time-consuming.  Marshall had the idea
> that since all you really need to know is whether the use-count is equal to
> 1, the recursive traversal is needed only when the use count is decremented
> to 0 or incremented to 1.  He ran this by Roger, who agrees (as do I).
> Marshall is implementing this.
>
> Effort: Small, Risk: Small, Payoff: Large
>
> 3. Reference Headers
> As long as Marshall is in the area, we are going to create a new header
> type, called a *reference header*, that contains a shape but no data; the
> data is in a different header (or in a mapped file).  Reference headers
> point to the data header, which holds the data use count that is shared by
> all headers for a block.
>
> Effort: Small, Risk: Small, Payoff: Nil
>
> 4. Use reference headers for reshaping
> Verbs that merely reshape the data include (, y), (,: y), (,. y), (x $ y),
> (,/ y), (x ($,) y), and maybe others.  These verbs will no longer need to
> copy the data: they merely create a reference header and point to the data.
>
> Effort: Moderate (only because of the number of verbs involved; it's a
> small effort to make (, y) work, and then it's just a matter of copying the
> change to other verbs), Risk: Small, Payoff: Large
>
> 5. Allow reference headers to change size as well as shape
> There is no reason that the reference headers have to refer to exactly the
> same data, as long as the common data is not disturbed.  Thus, (x , y) can
> see if the allocation buffer for x has room for added elements, as if so,
> append y to x in-place without copying x.  Other verbs that could operate
> this way include (}. y), (}: y), (x {. y), (x }. y).  (a {. b }. y) would
> run in small constant time regardless of the size of y.
>
> Effort: Moderate, Risk: Moderate, Payoff: Large
>
> 6. Clean up unneeded special code
> Much special code that implements forms such as (f@,) will no longer be
> needed, since the reference header does the work.  For example, (f , y)
> will perform like (f@, y).  The special code can be removed for simplicity.
>
> Effort: Moderate, Risk: Moderate, Payoff: Nil
> The real payoff is in code quality.  This might be a project left to new
> JCDs as a way to get introduced to the code.
>
> 7. Revamp in-place operations
> Operations are performed in-place only on certain brittle forms recognized
> early in analysis (I haven't found where yet).  This leads to errors such as
> name =: name ,"0 'abc'
> which is erroneously treated as an in-place operation, and
> name =: name m}~ values
> which is not detected as in-place.
>
> It also misses numerous opportunities for in-place operation, such as (m})
> when used in a tacit verb, or in w m} x + y.
>
> I have a design for how to solve this.  The parser will detect nouns that
> can be modified in place - they will include intermediate results that are
> not assigned to a name, or names that are reassigned to the same name.
> Verbs that understand in-place operations are flagged.
>
> Effort: Moderate, Risk: Moderate, Payoff: Moderate
>
> 8. Enhance m} to allow in-place update to boxed arrays.
>
> Effort: Unsized, Risk: Unsized, Payoff: Large when it's needed
>
> 9. Reduce memory allocations
> Every call to the parser allocates a stack: perhaps they can be combined
> into a large stack.  Every numeric constant allocates a data area: perhaps
> we should use the preallocated ones in num[], and add more preallocated
> ones for the short lists that are often the right operands to " .  (i. y)
> always allocates a buffer: couldn't we have just one buffer, say 100 long,
> and preallocated reference headers pointing to it?
>
> Effort: Small, Risk: Small, Payoff: Small
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to