[Jprogramming] memoization good and bad for performance, can you flush it?

Nick Simicich Tue, 18 Oct 2011 05:45:06 -0700

I see that J allowed one to do automatic memoization by coding a M. in the
header.  I have used it a couple times.  I was working on euler 66, which is
the one where you have to solve for the lowest solution for non-square
values of D between 1 and 1000, and you pick the largest x and the answer is
the D associated with the largest minimum x. (ignoring the implicit solution
where x=1 and y = 0).


x^2 - Dy^2 = 1

So I dug around on the net and found that these are called Pell Equations,
and they are solved by deriving the generalized periodic continued fraction
for the square root of D.  Mathworld had a page that listed a specific
scheme for doing the job, there were a bunch of recursive piecewise
functions that were easily restatable in J.

http://mathworld.wolfram.com/PellEquation.html


I wrote the program, and when it ran successfully the first time, it took 15
minutes to run to completion (bugs were mostly related to my failure to use
extended arithmetic in all needed places).  Because the functions that
calculate

Then I started working on the program.  Remember that there is a one minute
guideline in Euler. I also noted that many people had reported very fast
times for this problem using approximately the same scheme.  First I made a
new version with a bunch of stuff stripped out.  I accepted order of
magnitude results as good enough.  I stopped calculating q (y) and checking
the equation.  Then I tried reordering statements, changing if.s to
select.s.

I got the run time down to 1.5 minutes, but, after one small change, I
forgot to reload the code and my next run took six minutes.

I began to suspect that the programs were running slower and slower the
longer that they were run and I began to suspect the memoization.

Now, make no mistake, these functions need memoization.  Without
memoization, running d from 1 to 100 takes many minutes.  I started a test a
couple minutes ago, and it has not completed as I write this, while with
memoization, I was able to get this same stuff to run from 1 to 1000 in 1.5
minutes.

But by reloading my code (which I assumes clears memoization) every 100
ticks of the main loop, I could get the code to finish in a couple seconds
for the fast version and under 15 seconds for the slow version.  I actually
stuck a load'file' to reload the file the scripts were from in the main
loop, to run every hundred ticks of the main loop.  In fact, the code runs
slightly faster when I reload it from the file every 50 ticks of the main
loop. I assume that reading from the O/S is very expensive.  But it is a win
relative to searching 1000 entries in the memoization table 1000 times.

If I could, I'd clear the memoization every cycle or two of the main loop.
 x is d, passed to every routine, and y is the level of the function, they
recur over each other down to where most of them originate as 1, 0 or the
floor of the square root of D.  This is an example of three of the routines,
they somehow work together to calculate the a term, which is used to
calculate r.  r is one less than the first term of a which is double the
value of the zero term and r, possibly transformed, tells you which p and q
(lower case, not these) terms to use for the minimum values for x and y.  As
you can see, it calls itself at level zero, which just takes the floor of
the square root of the left term, and two other routines, that end up
calling themselves and each other and a again, descending in value of the
right argument.  Lots of calls to calculate, say, a_10 since you actually
need (for this function) to calculate a i.r without knowing r in advance.
 That actually makes good use of memoization (I presume)

e66a =: dyad define M.
if. y = 0 do. x: <.@%: x return. end.
x: <. ((x e66a 0)+(x e66P y))% (x e66Q y)
)
e66P =: dyad define M.
if. y = 0 do. x: 0 return. end. if. y = 1 do. x e66a 0 return. end.
x: ((x e66a <:y)*(x e66Q <:y))-(x e66P <:y)
)
e66Q =: dyad define M.
if. y = 0 do. x: 1 return. end. if. y = 1 do. x: x - *: x e66a 0 return.
end.
x: (x-*: x e66P y)%(x e66Q <: y)
)

The "fast" version uses selects, no significant difference.  The point is
that once I am done making calls about 5 e66a"0 i.10 I will never make
another call with 5 as the left argument. and it seems that the function
calls slow way way way down.  First time from 1-1000, the "fast" version ran
in 1.5 minutes (the unmemoized version has not finished >:i.100).  The
second time it ran in six minutes.  Just to make myself happy, I stuck a
"load" statement in my main routine to be run every 100 times through the
loop.  The "fast version" ran in 6 seconds.  The slow version that used
extended arithmetic for everything, that I had never seen run in under three
minutes over the space 1<:x<:1000 ran in 12 seconds.  What I was saving by
avoiding arithmetic was another thousand trips through the memo lists.

Of the 1.5 minutes my routine took to run, it seems that 96% of the time I
was looking through memoized stuff that I never wanted to see again.  The
only possible thing I can imagine that the load of the same code is doing is
getting the compiler to clear the memoization tables.

Calculating a is so expensive that it makes a noticable difference if I
calculate the terms in a while loop, stopping on the exact term that is the
last one needed, rather than calculating them 3 or 5 at a time.  But it is
not the properly memoized calculation of a that it so expensive, it is the
poor memo lookup.


I have manually memoized functions many times, manually, mostly in Perl and
D.  Both of those languages have hashes - you can make the arguments into a
key, somehow, and build a hash that is indexed by the tree.  The tree lookup
slows down the entry to the function, but not too much, and furthermore, if
you put a thousand times as much into the tree, it only slows the lookup a
little.  I don't see that here - it is so expensive to load the tree that it
is by far cheaper to reread everything from disk.

I am beginning to suspect that the M. implementation does not store things
into trees or hashes or anything efficient for lookup, it stores them into
maybe linked lists that are searched sequentially, or maybe some other
scheme is used that slows the lookup to a crawl when you get more than a few
thousand items in the list.  The highest value of r was in the 70's, about
half had a value of 10, so let's say that a was called about 20 times per
tick.  This means that every 100 main loop ticks, I added 2000 entries to
the memoization table - and it started to noticeably slow down, while by the
time I had added 10000 entries to the table, it was crawling, and at about
100000, it was slower than running the original code the first time.

See, when I run the same function twice in a row, without looking anything
up, it would seem to me that if things are completely memoized, that every
possible value I am looking up is in the memo tables.   But the reality is
that the function runs slower.  Six minutes vs. 1.5 minutes.

I have not looked in the source, but I am gonna guess that the memoization
is a linear lookup in an unordered list, or if it is using any sort of
indexing, it gets overwhelmed by a few thousand input records, and it
reverts to some sort of unordered overflow area that has to be searched.

So I guess I would like to know if there is any work going on in the
memoization area, and if it would be possible to add code so that the
memoization would be cleared?  Like, a foreign conjunction, or some trick?
Can I define any function and clear memoization? (I tested that, does not
work, even when the function you define also uses memoization.)  These
functions really need memoization, the routine has not yet finished the
space over 1<:x<:100 without memoization, it has been an hour now that
process explorer reports that J has been using one of my CPUs.  I killed it
before it finished.  But then memoization is the biggest performance hit to
my finished code..


-- 
Of course I can ride in the carpool lane, officer.  Jesus is my constant
companion.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

[Jprogramming] memoization good and bad for performance, can you flush it?

Reply via email to