Re: [Jprogramming] Programming Digest, Vol 73, Issue 45

Andrew Shepp Tue, 18 Oct 2011 09:32:30 -0700

On Tue, Oct 18, 2011 at 12:24, <[email protected]> wrote:


> Send Programming mailing list submissions to
>        [email protected]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://jsoftware.com/cgi-bin/mailman/listinfo/programming
> or, via email, send a message with subject or body 'help' to
>        [email protected]
>
> You can reach the person managing the list at
>        [email protected]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Programming digest..."
>
>
> Today's Topics:
>
>   1. memoization good and bad for performance, can you flush it?
>      (Nick Simicich)
>   2. Re: memoization good and bad for performance, can you flush
>      it? (Raul Miller)
>   3. Re: Repeated application of a verb (Raul Miller)
>   4. Exact surds (David Vaughan)
>   5. Re: Exact surds (Raul Miller)
>   6. Re: Exact surds (David Vaughan)
>   7. Re: Exact surds (Raul Miller)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 18 Oct 2011 08:43:47 -0400
> From: Nick Simicich <[email protected]>
> Subject: [Jprogramming] memoization good and bad for performance,       can
>        you flush it?
> To: Programming forum <[email protected]>
> Message-ID:
>        <CA+2d-ZwBSDyzvm2eEpcai_DttU5rGf=wd-svyz-x6vyot1a...@mail.gmail.com
> >
> Content-Type: text/plain; charset=ISO-8859-1
>
> I see that J allowed one to do automatic memoization by coding a M. in the
> header.  I have used it a couple times.  I was working on euler 66, which
> is
> the one where you have to solve for the lowest solution for non-square
> values of D between 1 and 1000, and you pick the largest x and the answer
> is
> the D associated with the largest minimum x. (ignoring the implicit
> solution
> where x=1 and y = 0).
>
> x^2 - Dy^2 = 1
>
> So I dug around on the net and found that these are called Pell Equations,
> and they are solved by deriving the generalized periodic continued fraction
> for the square root of D.  Mathworld had a page that listed a specific
> scheme for doing the job, there were a bunch of recursive piecewise
> functions that were easily restatable in J.
>
> http://mathworld.wolfram.com/PellEquation.html
>
>
> I wrote the program, and when it ran successfully the first time, it took
> 15
> minutes to run to completion (bugs were mostly related to my failure to use
> extended arithmetic in all needed places).  Because the functions that
> calculate
>
> Then I started working on the program.  Remember that there is a one minute
> guideline in Euler. I also noted that many people had reported very fast
> times for this problem using approximately the same scheme.  First I made a
> new version with a bunch of stuff stripped out.  I accepted order of
> magnitude results as good enough.  I stopped calculating q (y) and checking
> the equation.  Then I tried reordering statements, changing if.s to
> select.s.
>
> I got the run time down to 1.5 minutes, but, after one small change, I
> forgot to reload the code and my next run took six minutes.
>
> I began to suspect that the programs were running slower and slower the
> longer that they were run and I began to suspect the memoization.
>
> Now, make no mistake, these functions need memoization.  Without
> memoization, running d from 1 to 100 takes many minutes.  I started a test
> a
> couple minutes ago, and it has not completed as I write this, while with
> memoization, I was able to get this same stuff to run from 1 to 1000 in 1.5
> minutes.
>
> But by reloading my code (which I assumes clears memoization) every 100
> ticks of the main loop, I could get the code to finish in a couple seconds
> for the fast version and under 15 seconds for the slow version.  I actually
> stuck a load'file' to reload the file the scripts were from in the main
> loop, to run every hundred ticks of the main loop.  In fact, the code runs
> slightly faster when I reload it from the file every 50 ticks of the main
> loop. I assume that reading from the O/S is very expensive.  But it is a
> win
> relative to searching 1000 entries in the memoization table 1000 times.
>
> If I could, I'd clear the memoization every cycle or two of the main loop.
>  x is d, passed to every routine, and y is the level of the function, they
> recur over each other down to where most of them originate as 1, 0 or the
> floor of the square root of D.  This is an example of three of the
> routines,
> they somehow work together to calculate the a term, which is used to
> calculate r.  r is one less than the first term of a which is double the
> value of the zero term and r, possibly transformed, tells you which p and q
> (lower case, not these) terms to use for the minimum values for x and y.
>  As
> you can see, it calls itself at level zero, which just takes the floor of
> the square root of the left term, and two other routines, that end up
> calling themselves and each other and a again, descending in value of the
> right argument.  Lots of calls to calculate, say, a_10 since you actually
> need (for this function) to calculate a i.r without knowing r in advance.
>  That actually makes good use of memoization (I presume)
>
> e66a =: dyad define M.
> if. y = 0 do. x: <.@%: x return. end.
> x: <. ((x e66a 0)+(x e66P y))% (x e66Q y)
> )
> e66P =: dyad define M.
> if. y = 0 do. x: 0 return. end. if. y = 1 do. x e66a 0 return. end.
> x: ((x e66a <:y)*(x e66Q <:y))-(x e66P <:y)
> )
> e66Q =: dyad define M.
> if. y = 0 do. x: 1 return. end. if. y = 1 do. x: x - *: x e66a 0 return.
> end.
> x: (x-*: x e66P y)%(x e66Q <: y)
> )
>
> The "fast" version uses selects, no significant difference.  The point is
> that once I am done making calls about 5 e66a"0 i.10 I will never make
> another call with 5 as the left argument. and it seems that the function
> calls slow way way way down.  First time from 1-1000, the "fast" version
> ran
> in 1.5 minutes (the unmemoized version has not finished >:i.100).  The
> second time it ran in six minutes.  Just to make myself happy, I stuck a
> "load" statement in my main routine to be run every 100 times through the
> loop.  The "fast version" ran in 6 seconds.  The slow version that used
> extended arithmetic for everything, that I had never seen run in under
> three
> minutes over the space 1<:x<:1000 ran in 12 seconds.  What I was saving by
> avoiding arithmetic was another thousand trips through the memo lists.
>
> Of the 1.5 minutes my routine took to run, it seems that 96% of the time I
> was looking through memoized stuff that I never wanted to see again.  The
> only possible thing I can imagine that the load of the same code is doing
> is
> getting the compiler to clear the memoization tables.
>
> Calculating a is so expensive that it makes a noticable difference if I
> calculate the terms in a while loop, stopping on the exact term that is the
> last one needed, rather than calculating them 3 or 5 at a time.  But it is
> not the properly memoized calculation of a that it so expensive, it is the
> poor memo lookup.
>
>
> I have manually memoized functions many times, manually, mostly in Perl and
> D.  Both of those languages have hashes - you can make the arguments into a
> key, somehow, and build a hash that is indexed by the tree.  The tree
> lookup
> slows down the entry to the function, but not too much, and furthermore, if
> you put a thousand times as much into the tree, it only slows the lookup a
> little.  I don't see that here - it is so expensive to load the tree that
> it
> is by far cheaper to reread everything from disk.
>
> I am beginning to suspect that the M. implementation does not store things
> into trees or hashes or anything efficient for lookup, it stores them into
> maybe linked lists that are searched sequentially, or maybe some other
> scheme is used that slows the lookup to a crawl when you get more than a
> few
> thousand items in the list.  The highest value of r was in the 70's, about
> half had a value of 10, so let's say that a was called about 20 times per
> tick.  This means that every 100 main loop ticks, I added 2000 entries to
> the memoization table - and it started to noticeably slow down, while by
> the
> time I had added 10000 entries to the table, it was crawling, and at about
> 100000, it was slower than running the original code the first time.
>
> See, when I run the same function twice in a row, without looking anything
> up, it would seem to me that if things are completely memoized, that every
> possible value I am looking up is in the memo tables.   But the reality is
> that the function runs slower.  Six minutes vs. 1.5 minutes.
>
> I have not looked in the source, but I am gonna guess that the memoization
> is a linear lookup in an unordered list, or if it is using any sort of
> indexing, it gets overwhelmed by a few thousand input records, and it
> reverts to some sort of unordered overflow area that has to be searched.
>
> So I guess I would like to know if there is any work going on in the
> memoization area, and if it would be possible to add code so that the
> memoization would be cleared?  Like, a foreign conjunction, or some trick?
> Can I define any function and clear memoization? (I tested that, does not
> work, even when the function you define also uses memoization.)  These
> functions really need memoization, the routine has not yet finished the
> space over 1<:x<:100 without memoization, it has been an hour now that
> process explorer reports that J has been using one of my CPUs.  I killed it
> before it finished.  But then memoization is the biggest performance hit to
> my finished code..
>
>
> --
> Of course I can ride in the carpool lane, officer.  Jesus is my constant
> companion.
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 18 Oct 2011 09:11:01 -0400
> From: Raul Miller <[email protected]>
> Subject: Re: [Jprogramming] memoization good and bad for performance,
>        can you flush it?
> To: Programming forum <[email protected]>
> Message-ID:
>        <CAD2jOU9QvcgS15cD=6a4slobkzxmzw-_cerwcq4gqv8xq46...@mail.gmail.com
> >
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Tue, Oct 18, 2011 at 8:43 AM, Nick Simicich <[email protected]> wrote:
> > Can I define any function and clear memoization? (I tested that, does not
> > work, even when the function you define also uses memoization.)
>
> I am surprised that function redefinition does not clear the
> memoization, but maybe the memo is keyed by the function definition?
>
> Anyways, I would try updating the definition with an extraneous
> constant.  Since J does not do data flow analysis, that should be
> sufficient.
>
> In other words, if you have something like:
>   F=: f&([ 23&]) M.
>
> then changing the number should get you a fresh set of memo buckets
> (if my hypothesis about M. implementation is correct).
>
> --
> Raul
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 18 Oct 2011 10:26:33 -0400
> From: Raul Miller <[email protected]>
> Subject: Re: [Jprogramming] Repeated application of a verb
> To: Programming forum <[email protected]>
> Message-ID:
>        <CAD2jOU-5FmzTWCDMuVSuJhJ9=wrMvxwEEeO=OP+XvqXnup=u...@mail.gmail.com
> >
> Content-Type: text/plain; charset=ISO-8859-1
>
> I was asked to explain a statement I had made earlier:
>
> On Sat, Oct 8, 2011 at 8:04 AM, Raul Miller <[email protected]> wrote:
> > Depending on what specifically you are trying to do, ^: may or may not
> > be the right thing to use.
> >
> > The issue is that ^: assumes that it is dealing with a mathematical
> > function. ?If its function gives the same result twice in a row, it
> > will stop right there.
> >
> > This is how ^:(test)^:_ works something like a while loop -- when
> > (test) returns false, the inner ^: becomes an identity function and
> > the outer ^: knows that it's time to stop.
>
> This is probably easiest to clarify using examples.
>
>   <.@-:^:_]20
> 0
>
> If I repeatedly divide a number in half and find the next lower
> integer, I get 0.  I can see the intermediate results if I replace ^:_
> with ^:a:
>
>   <.@-:^:a:]20
> 20 10 5 2 1 0
>
> This works because ^: stops when it sees the same value twice in a row.
>
> Now, for contrast, let's consider:
>
>   ? bind 3^:a:]0
> 0 1 2 0 1 0 1 0
>   ? bind 3^:a:]0
> 0 1 2 1
>   ? bind 3^:a:]0
> 0
>   ? bind 3^:a:]0
> 0 2 0 2 0 2 1
>
> Here, ? bind 3 is not a function.  It arbitrarily picks a result which
> will be 0, 1 or 2.  So this mechanism stops, arbitrarily, when ? has
> given us the same value twice in a row.
>
> Or, here's a different example:
>
>   require'misc'
>      prompt bind 'type something: '^:a: ''
> type something: this is something
> type something: so is this
> type something: and this
> type something: and this
>
> this is something
> so is this
> and this
>
> Here, when I entered the same thing twice in a row, J decided that the
> "function" I was using was done -- after all, it gave the same result
> twice in a row.
>
> So, anyways, this means that the mechanism
>   F ^:test^:_
> will terminate prematurely if F is not a function and it produces the
> same result twice in a row, even when the result of test was a true
> value.
>
> If F was a function and it produced the same result twice in a row,
> this would not matter:  If F was a function and it produced the same
> result twice in a row, it would produce the same result an infinite
> number of times in a row.  So either the result is correct or the
> result is irrelevant.  But if you are looking for F to do
> "non-functional things" you might be upset if your loop terminated
> prematurely (and the simple workaround is to use some other mechanism
> to implement your "while" loop -- but you can also arrange so that F
> never produces the same result twice in a row).
>
> I hope this makes sense.
>
> --
> Raul
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 18 Oct 2011 16:30:17 +0100
> From: David Vaughan <[email protected]>
> Subject: [Jprogramming] Exact surds
> To: Programming forum <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=us-ascii
>
> Can J represent surds exactly instead of reverting to floating point? It
> would be more convenient if I could do that rather than storing the squared
> version.
>
> ------------------------------
>
> Message: 5
> Date: Tue, 18 Oct 2011 11:50:15 -0400
> From: Raul Miller <[email protected]>
> Subject: Re: [Jprogramming] Exact surds
> To: Programming forum <[email protected]>
> Message-ID:
>        <cad2jou-nsgawxuqorqm9em4hrj2msb9s955qr9f1cxipnwx...@mail.gmail.com
> >
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Tue, Oct 18, 2011 at 11:30 AM, David Vaughan
> <[email protected]> wrote:
> > Can J represent surds exactly instead of reverting to floating point? It
> would be more convenient if I could do that rather than storing the squared
> version.
>
> Yes, but by generating a verb which would generate the floating point
> value and not as a number.
>
> cubeRootOf5=: %:&5 bind 3
>
> Of course you can manipulate the resulting numbers but those results
> are still floating point:
>
>   (cubeRootOf5 ^ 3:) ''
> 5
>   5-(cubeRootOf5 ^ 3:) ''
> 8.88178e_16
>
> That said, in some cases, J can clean up floating point discrepancies
>
>   5-x:(cubeRootOf5 ^ 3:) ''
> 0
>
> but that's from direct manipulation of the values and does not care
> about delayed evaluation:
>
>   5-x:((3 %:5) ^ 3:) ''
> 0
>
> Anyways, J implements the data structures to support symbolic
> manipulation but it does not have a particularly large library of
> prebuilt symbolic manipulation routines.
>
> FYI,
>
> --
> Raul
>
>
> ------------------------------
>
> Message: 6
> Date: Tue, 18 Oct 2011 17:09:08 +0100
> From: David Vaughan <[email protected]>
> Subject: Re: [Jprogramming] Exact surds
> To: Programming forum <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=us-ascii
>
> I'm hoping to write a verb to compute the continued fraction expansion of a
> square root. I've done it in Java though there are accuracy issues with
> floating points and speed issues with using exact numbers (you end up
> multiplying massive numbers in some cases). Is J equipped to deal with these
> problems or should I be looking for an alternative method entirely?
>
> On 18 Oct 2011, at 16:50, Raul Miller wrote:
>
> > On Tue, Oct 18, 2011 at 11:30 AM, David Vaughan
> > <[email protected]> wrote:
> >> Can J represent surds exactly instead of reverting to floating point? It
> would be more convenient if I could do that rather than storing the squared
> version.
> >
> > Yes, but by generating a verb which would generate the floating point
> > value and not as a number.
> >
> > cubeRootOf5=: %:&5 bind 3
> >
> > Of course you can manipulate the resulting numbers but those results
> > are still floating point:
> >
> >   (cubeRootOf5 ^ 3:) ''
> > 5
> >   5-(cubeRootOf5 ^ 3:) ''
> > 8.88178e_16
> >
> > That said, in some cases, J can clean up floating point discrepancies
> >
> >   5-x:(cubeRootOf5 ^ 3:) ''
> > 0
> >
> > but that's from direct manipulation of the values and does not care
> > about delayed evaluation:
> >
> >   5-x:((3 %:5) ^ 3:) ''
> > 0
> >
> > Anyways, J implements the data structures to support symbolic
> > manipulation but it does not have a particularly large library of
> > prebuilt symbolic manipulation routines.
> >
> > FYI,
> >
> > --
> > Raul
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
>
>
>
> ------------------------------
>
> Message: 7
> Date: Tue, 18 Oct 2011 12:23:11 -0400
> From: Raul Miller <[email protected]>
> Subject: Re: [Jprogramming] Exact surds
> To: Programming forum <[email protected]>
> Message-ID:
>        <CAD2jOU_yZU4cn-g_ir1F6Vk0ipohQ_1o6Zm4JSFFV=edfmx...@mail.gmail.com
> >
> Content-Type: text/plain; charset=ISO-8859-1
>
> The best I can think of at the moment is using x: on the floating
> point value for an initial value and then newton's method to gain the
> extra precision you want.
>
> --
> Raul
>
> On Tue, Oct 18, 2011 at 12:09 PM, David Vaughan
> <[email protected]> wrote:
> > I'm hoping to write a verb to compute the continued fraction expansion of
> a square root. I've done it in Java though there are accuracy issues with
> floating points and speed issues with using exact numbers (you end up
> multiplying massive numbers in some cases). Is J equipped to deal with these
> problems or should I be looking for an alternative method entirely?
> >
> > On 18 Oct 2011, at 16:50, Raul Miller wrote:
> >
> >> On Tue, Oct 18, 2011 at 11:30 AM, David Vaughan
> >> <[email protected]> wrote:
> >>> Can J represent surds exactly instead of reverting to floating point?
> It would be more convenient if I could do that rather than storing the
> squared version.
> >>
> >> Yes, but by generating a verb which would generate the floating point
> >> value and not as a number.
> >>
> >> cubeRootOf5=: %:&5 bind 3
> >>
> >> Of course you can manipulate the resulting numbers but those results
> >> are still floating point:
> >>
> >> ? (cubeRootOf5 ^ 3:) ''
> >> 5
> >> ? 5-(cubeRootOf5 ^ 3:) ''
> >> 8.88178e_16
> >>
> >> That said, in some cases, J can clean up floating point discrepancies
> >>
> >> ? 5-x:(cubeRootOf5 ^ 3:) ''
> >> 0
> >>
> >> but that's from direct manipulation of the values and does not care
> >> about delayed evaluation:
> >>
> >> ? 5-x:((3 %:5) ^ 3:) ''
> >> 0
> >>
> >> Anyways, J implements the data structures to support symbolic
> >> manipulation but it does not have a particularly large library of
> >> prebuilt symbolic manipulation routines.
> >>
> >> FYI,
> >>
> >> --
> >> Raul
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
>
>
> ------------------------------
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
> End of Programming Digest, Vol 73, Issue 45
> *******************************************
>



-- 
C. Andrew Shepp
Shepp & Associates LLC
www.linkedin.com/in/andrewshepp

Mail and attachments are private communications only
voice +1 314 3243139
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Programming Digest, Vol 73, Issue 45

Reply via email to