Re: Partially Memoized Functions

2002-12-12 Thread James Mastros
On 12/10/2002 5:46 PM, Smylers wrote:

OK.  There was something on MJD's QOTW recently where using the current
Perl 5 Memoize module slowed code down -- that gave me the impression
that caching had the potential.

It does.  In fact, all caching has that potential.  Specificly, if the 
time to look up somthing in the cache is greater then the time to 
recompute it, caching is a loose.  Additionaly, if the hit rate 
(probablity; 0..1) times the time to recompute the datum is less then 
the time to look up the result in the cache, it's a loss for that datum.

MJD has a set of presentation slides on this at 
http://perl.plover.com/yak/memoize-quant/ -- "Quantitative Analysis of 
Memoization".

	-=- James Mastros

PS -- This is getting offtopic, even for p6l.



Re: Partially Memoized Functions

2002-12-10 Thread Damian Conway
Adam Turoff wrote:


 sub days_in_month(Str $month, Int $year)
 {
   $month = lc $month;
   if $month eq 'feb'
   {
 my sub feb_days (Int $year) is cached {
 my $leap = $year % 4 == 0
 && ($year % 100 != 0 || $year % 400 == 0);
 return $leap ? 29 : 28;
 }
 return feb_days($year);
   }

   else
   {
 # Simple look-up, so caching would be counter-productive:
 return %days{$month};  # %days was declared above (honest)
   }
 }



I don't think that works correctly.  This will create a new cached
sub each time $month eq 'feb'?


Nope. It simply declares C to be a *named* subroutine (i.e.
*not* a closure) that happens to be lexically scoped to inside
C.

In other words, it works exactly as if I had left off the C, except
(with the C) it avoids polluting a global namespace with a subroutine
that's not relevant anywhere else.



That'll generate a lot of cached subs


Nope. It's a named subroutine. So "thar ken be onla one" ;-)



values will be calculated each time $month eq 'feb, and none
of the values will ever be returned from any of those caches.


Nope.



Schwern's approach of factoring out days_in_feb into a cached sub
is the same basic idea, and doesn't have this issue.


That's true. But my version doesn't have the issue either, and also
elegantly illustrates why we're adding lexical subroutines to Perl 6. :-)

Damian





Re: Partially Memoized Functions

2002-12-10 Thread Michael G Schwern
On Tue, Dec 10, 2002 at 10:46:07PM -, Smylers wrote:
> > The example above is a classic example of premature optimization.
> > There's nothing which ways the cache would be counter-productive for
> > simple calculations since the caching logic is nearly as simple.
> 
> OK.  There was something on MJD's QOTW recently where using the current
> Perl 5 Memoize module slowed code down -- that gave me the impression
> that caching had the potential.

Applying memoization to really simple calcuations, such as the number of
days in a month, will probably wind up slower than just doing the original
calcuation.  This is also an example of premature optimization. :)

For something as simple as your example, the cost of entering and exiting
the subroutine is probably more expensive than the code to actually do the
work.  In that case you likely want to inline the code which flies off into
a whole other class of optimizations...


> > However, if you *really* want to do partial caching and given that
> > this is an edge case, ...
> 
> It wasn't supposed to be claiming to be a serious function where caching
> would be useful!

I meant that the idea of partial caching is an edge case.


> > I'd say to just do it as two functions
> 
> Yeah, I thought of that (thanks for bothering to write it out) then I
> thought of having the caching be per-C and couldn't work out
> whether it'd be generally useful or not, which is why I asked here, for
> an increased sample size.
> 
> And that increased sample size indicates "not", so I'll forget about it.

I dunno, my motto is "never hurts to put it in a library".


-- 

Michael G. Schwern   <[EMAIL PROTECTED]>http://www.pobox.com/~schwern/
Perl Quality Assurance  <[EMAIL PROTECTED]> Kwalitee Is Job One
MERV GRIFFIN!



Re: Partially Memoized Functions

2002-12-10 Thread Smylers
Michael G Schwern wrote:

> On Mon, Dec 09, 2002 at 08:36:20PM -, Smylers wrote:
> 
> > That way a function could decide to cache some return values but not
> > all of them.
> 
> The example above is a classic example of premature optimization.
> There's nothing which ways the cache would be counter-productive for
> simple calculations since the caching logic is nearly as simple.

OK.  There was something on MJD's QOTW recently where using the current
Perl 5 Memoize module slowed code down -- that gave me the impression
that caching had the potential.

> However, if you *really* want to do partial caching and given that
> this is an edge case, ...

It wasn't supposed to be claiming to be a serious function where caching
would be useful!

> I'd say to just do it as two functions

Yeah, I thought of that (thanks for bothering to write it out) then I
thought of having the caching be per-C and couldn't work out
whether it'd be generally useful or not, which is why I asked here, for
an increased sample size.

And that increased sample size indicates "not", so I'll forget about it.

> rather than making the caching semantics more involved.

(As hard as this may seem to believe, the idea actually sprang from
wanting to make caching _less_ involved.  I think my brain was going
along the lines of "I've just writing this expensive calculation to
compute a value, let's save it somewhere before returning it" so wanting
to deal with the caching at _that_ point; I hadn't thought about caching
when writing the C line some time previously and it seemed
inconvenient to have to go back there now that I was considering it.  Or
something.)

Smylers



Re: Partially Memoized Functions

2002-12-10 Thread Adam Turoff
On Mon, Dec 09, 2002 at 02:20:01PM -0800, Austin Hastings wrote:
> --- Paul Johnson <[EMAIL PROTECTED]> wrote:
> > How about the same way as one would do it now?  Presumably we won't
> > all
> > forget how to program when Perl 6 comes out.
> 
> I think you've missed the point. The original poster (Smylers) asked if
> there was a benefit to only cacheing certain values, presuming the
> remainder would be either trivial to compute or internally cached, or
> both.

I think *you've* missed the point.  

There's not enough benefit to support "caching certain values"
through linguistic warping.  That technique is possible, and it's
the kind of solution that is better suited to (1) hand-rolling a
cache management strategy, or (2) refactoring the code to work with
standard memoization.

The best solutions involve caching all of the values returned by
this function (ignoring the possible waste as insignificant), or
refactoring the code so that "all of the meaningful values" are
cached (and computed by a separate cached sub).

Z.




Re: Partially Memoized Functions

2002-12-10 Thread Adam Turoff
On Mon, Dec 09, 2002 at 01:58:11PM -0800, Austin Hastings wrote:
> --- Adam Turoff <[EMAIL PROTECTED]> wrote:
> > I think you're trying to overoptimize something here.  I can't see
> > a benefit to caching only sometimes.  If there is, then you probably
> > want to implement a more sophisticated cache management strategy
> > for your sub, not warp the language for a special case.
> 
> Ahh. This is better. How does one implement a more sophisticated cache
> management strategy?

By memoizing specific cases by hand, of course.  :-)

> That is, what is the mechanism for manipulating the run-time system
> behavior of subs?

Memoization does not have to involve manipulating runtime behavior.
However, manipulating runtime behavior is a simple, generic and
effective way to memoize random subs.

Here's an example of a memoizing only the values for 'feb'. 
Schwern's solution is simpler and easier to read though.

{
## start of a lexical scope to hide %feb_cache
my %feb_cache;

sub days_in_month(Str $month, Int $year) {
  $month = lc $month;
  if $month eq 'feb' { 
unless $feb_cache{$year} {
my $leap = $year % 4 == 0 
&& ($year % 100 != 0 || $year % 400 == 0);
$feb_cache{$year} = $leap ? 29 : 28;
}
return $feb_cache{$year};
  } else {   
return %days{$month};
  }   
}
}

Z.




Re: Partially Memoized Functions

2002-12-10 Thread Adam Turoff
On Mon, Dec 09, 2002 at 01:58:11PM -0800, Austin Hastings wrote:
> --- Adam Turoff <[EMAIL PROTECTED]> wrote:
> > It doesn't matter whether some of the values are cheap lookups
> > while other values are "complex calculations".  Once a cached sub
> > is called with a set of parameter values, the return value will
> > always be a cheap lookup in the memoized sub's cache.  
> 
> You may get some disagreement from those for whom memory is neither
> virtual nor free.

Memoization is simply the exchange of cheap memory lookups for 
complicated calculations.  If memory is more precious than a few CPU
cycles, then you shouldn't be memoizing.

Z.




Re: Partially Memoized Functions

2002-12-10 Thread Adam Turoff
On Tue, Dec 10, 2002 at 01:53:28PM +1100, Damian Conway wrote:
> And in those rare cases where you really do need partial caching, the
> simplest solution is to split the partially cached subroutine into a
> fully cached sub and an uncached sub:
> 
>   sub days_in_month(Str $month, Int $year)
>   {
> $month = lc $month;
> if $month eq 'feb'
> {
>   my sub feb_days (Int $year) is cached {
>   my $leap = $year % 4 == 0
>   && ($year % 100 != 0 || $year % 400 == 0);
>   return $leap ? 29 : 28;
>   }
>   return feb_days($year);
> }
> 
> else
> {
>   # Simple look-up, so caching would be counter-productive:
>   return %days{$month};  # %days was declared above (honest)
> }
>   }

I don't think that works correctly.  This will create a new cached
sub each time $month eq 'feb'?  That'll generate a lot of cached
subs, values will be calculated each time $month eq 'feb, and none
of the values will ever be returned from any of those caches.

Schwern's approach of factoring out days_in_feb into a cached sub
is the same basic idea, and doesn't have this issue.

Z.




Re: Partially Memoized Functions

2002-12-09 Thread Damian Conway
Smylers wrote:


I was wondering whether it'd be better to have this specified per
C rather than per C.


I doubt it. There's no performance gain from partial caching since
you have to check the cache anyway to detect that a particular result
isn't cached.

And in those rare cases where you really do need partial caching, the
simplest solution is to split the partially cached subroutine into a
fully cached sub and an uncached sub:

  sub days_in_month(Str $month, Int $year)
  {
$month = lc $month;
if $month eq 'feb'
{
  my sub feb_days (Int $year) is cached {
  my $leap = $year % 4 == 0
  && ($year % 100 != 0 || $year % 400 == 0);
  return $leap ? 29 : 28;
  }
  return feb_days($year);
}

else
{
  # Simple look-up, so caching would be counter-productive:
  return %days{$month};  # %days was declared above (honest)
}
  }

Damian





Re: Partially Memoized Functions

2002-12-09 Thread Michael G Schwern
On Mon, Dec 09, 2002 at 08:36:20PM -, Smylers wrote:
> Last month's discussion on memoization[*0] had the consensus that
> C is the appropriate property name, used like so:
> 
>   sub square (Num $n) is cached { ... }
> 
> I was wondering whether it'd be better to have this specified per
> C rather than per C.  That'd permit something a long the
> lines of:
> 
>   sub days_in_month(Str $month, Int $year)
>   {
> $month = lc $month;
> if $month eq 'feb'
> {
>   # Do 'complicated' calculation and cache the result for later use:
>   my $leap = $year % 4 == 0
>   && ($year % 100 != 0 || $year % 400 == 0);
>   cache_and_return $leap ? 29 : 28;
> }
> 
> else
> {
>   # Simple look-up, so caching would be counter-productive:
>   return %days{$month};  # %days was declared above (honest)
> }
>   }
> 
> That way a function could decide to cache some return values but not all
> of them.

The example above is a classic example of premature optimization.  There's
nothing which ways the cache would be counter-productive for simple
calculations since the caching logic is nearly as simple.

However, if you *really* want to do partial caching and given that this is
an edge case, I'd say to just do it as two functions rather than making the
caching semantics more involved.

   my %days = (...);
   sub days_in_month(Str $month, Int $year)
   {
 $month = lc $month;
 return days_in_feb($year) if $month eq 'feb';
 return %days{$month};
   }

   sub days_in_feb(Int $year) is cached
   {
 # Do 'complicated' calculation and cache the result for later use:
 my $leap = $year % 4 == 0
 && ($year % 100 != 0 || $year % 400 == 0);
 return $leap ? 29 : 28;
   }

Of course there's nothing which says you can't just do *both*.
cache_and_return() is merely a function.  Write it, stick it in a library,
distribute for fun and/or profit.


-- 

Michael G. Schwern   <[EMAIL PROTECTED]>http://www.pobox.com/~schwern/
Perl Quality Assurance  <[EMAIL PROTECTED]> Kwalitee Is Job One
If God made anything more guerrila than your breast, I hope he kept it for
your father.



Re: Partially Memoized Functions

2002-12-09 Thread Paul Johnson
On Mon, Dec 09, 2002 at 02:20:01PM -0800, Austin Hastings wrote:
> 
> --- Paul Johnson <[EMAIL PROTECTED]> wrote:
> > On Mon, Dec 09, 2002 at 01:58:11PM -0800, Austin Hastings wrote:
> > > Ahh. This is better. How does one implement a more sophisticated
> > > cache management strategy?
> > > 
> > > That is, what is the mechanism for manipulating the run-time system
> > > behavior of subs?
> > 
> > How about the same way as one would do it now?  Presumably we won't
> > all
> > forget how to program when Perl 6 comes out.
> 
> Paul,
> 
> I think you've missed the point. The original poster (Smylers) asked if
> there was a benefit to only cacheing certain values, presuming the
> remainder would be either trivial to compute or internally cached, or
> both.
> 
> The suggestion was that a more advanced cache management strategy could
> be attached, presumably changing the behavior of the
> function-return-caching subsystem. 
> 
> I'm all in favor of that, but it's a new rock to turn over looking for
> details.
> 
> If you're proposing that the right answer is to not cache the function,
> but rather implement an internal cache, then cool - you've got a friend
> in Smylersvania. 
> 
> But if not, then presumably you know something I do not know: enlighten
> me, please?

What I'm suggesting is that instead of:

  sub days_in_month(Str $month, Int $year)
  {
$month = lc $month;
if $month eq 'feb'
{
  # Do 'complicated' calculation and cache the result for later use:
  my $leap = $year % 4 == 0
  && ($year % 100 != 0 || $year % 400 == 0);
  cache_and_return $leap ? 29 : 28;
}

else
{
  # Simple look-up, so caching would be counter-productive:
  return %days{$month};  # %days was declared above (honest)  
}
  }

We get:

  sub days_in_month(Str $month, Int $year)
  {
$month = lc $month;
if $month eq 'feb'
{
  # Do 'complicated' calculation and cache the result for later use:
  my %leap_cache is static;  # or something ...
  my $leap = $leap_cache{$year} //=
  $year % 4 == 0 && ($year % 100 != 0 || $year % 400 == 0);
  return $leap ? 29 : 28;
}

else
{
  # Simple look-up, so caching would be counter-productive:
  return %days{$month};  # %days was declared above (honest)  
}
  }

OK, the cache isn't identical, but it could be programmed to be
identical, or it could be better by caching lc $month rather than
$month.

That way I can use constructs I already (will) know and do the caching
just how I want.  Of course, I can do this anyway, and maybe there are
benefits to having a partial caching system built into the language, but
I don't see it at the moment.

-- 
Paul Johnson - [EMAIL PROTECTED]
http://www.pjcj.net



Re: Partially Memoized Functions

2002-12-09 Thread Austin Hastings

--- Paul Johnson <[EMAIL PROTECTED]> wrote:
> On Mon, Dec 09, 2002 at 01:58:11PM -0800, Austin Hastings wrote:
> > Ahh. This is better. How does one implement a more sophisticated
> > cache management strategy?
> > 
> > That is, what is the mechanism for manipulating the run-time system
> > behavior of subs?
> 
> How about the same way as one would do it now?  Presumably we won't
> all
> forget how to program when Perl 6 comes out.

Paul,

I think you've missed the point. The original poster (Smylers) asked if
there was a benefit to only cacheing certain values, presuming the
remainder would be either trivial to compute or internally cached, or
both.

The suggestion was that a more advanced cache management strategy could
be attached, presumably changing the behavior of the
function-return-caching subsystem. 

I'm all in favor of that, but it's a new rock to turn over looking for
details.

If you're proposing that the right answer is to not cache the function,
but rather implement an internal cache, then cool - you've got a friend
in Smylersvania. 

But if not, then presumably you know something I do not know: enlighten
me, please?

=Austin




Re: Partially Memoized Functions

2002-12-09 Thread Paul Johnson
On Mon, Dec 09, 2002 at 01:58:11PM -0800, Austin Hastings wrote:
> 
> --- Adam Turoff <[EMAIL PROTECTED]> wrote:
> > On Mon, Dec 09, 2002 at 08:36:20PM -, Smylers wrote:
> > > Anybody else like this, or are we better off leaving things as they
> > > were?
> > 
> > I think you're trying to overoptimize something here.  I can't see
> > a benefit to caching only sometimes.  If there is, then you probably
> > want to implement a more sophisticated cache management strategy
> > for your sub, not warp the language for a special case.
> 
> Ahh. This is better. How does one implement a more sophisticated cache
> management strategy?
> 
> That is, what is the mechanism for manipulating the run-time system
> behavior of subs?
> 
> sub days_in_month is cached(&other_cache_function) ?
> 
> Or what?

How about the same way as one would do it now?  Presumably we won't all
forget how to program when Perl 6 comes out.

-- 
Paul Johnson - [EMAIL PROTECTED]
http://www.pjcj.net



Re: Partially Memoized Functions

2002-12-09 Thread Austin Hastings

--- Adam Turoff <[EMAIL PROTECTED]> wrote:
> On Mon, Dec 09, 2002 at 08:36:20PM -, Smylers wrote:
> > Perhaps there are only some edge cases which require calculation;
> > or the function is liable to be called with many invalid input
> > values, which can quickly be determined yield C and so 
> > which don't need caching; or there is a pattern as to which sets
> > of input parameters are
> > likely to be passed multiple times so the function only bother
> > caching those.
> 
> It doesn't matter whether some of the values are cheap lookups
> while other values are "complex calculations".  Once a cached sub
> is called with a set of parameter values, the return value will
> always be a cheap lookup in the memoized sub's cache.  

You may get some disagreement from those for whom memory is neither
virtual nor free.

> > Anybody else like this, or are we better off leaving things as they
> > were?
> 
> I think you're trying to overoptimize something here.  I can't see
> a benefit to caching only sometimes.  If there is, then you probably
> want to implement a more sophisticated cache management strategy
> for your sub, not warp the language for a special case.

Ahh. This is better. How does one implement a more sophisticated cache
management strategy?

That is, what is the mechanism for manipulating the run-time system
behavior of subs?

sub days_in_month is cached(&other_cache_function) ?

Or what?

And, by the way, what happens when I change it in midstream? Obviously
the new cache begins empty, but does my old cache behave like a
closure, hanging about until the sub dies?

=Austin




Re: Partially Memoized Functions

2002-12-09 Thread Adam Turoff
On Mon, Dec 09, 2002 at 08:36:20PM -, Smylers wrote:
> I was wondering whether it'd be better to have this specified per
> C rather than per C.  That'd permit something a long the
> lines of:
> 
>   sub days_in_month(Str $month, Int $year)
>   {
>   
>   }
> 
> Perhaps there are only some edge cases which require calculation; or the
> function is liable to be called with many invalid input values, which
> can quickly be determined yield C and so which don't need
> caching; or there is a pattern as to which sets of input parameters are
> likely to be passed multiple times so the function only bother caching
> those.

It doesn't matter whether some of the values are cheap lookups
while other values are "complex calculations".  Once a cached sub
is called with a set of parameter values, the return value will
always be a cheap lookup in the memoized sub's cache.  

It's irrelevant if you have a different but comparable "cheap
lookup" for some values.
 
> Anybody else like this, or are we better off leaving things as they
> were?

I think you're trying to overoptimize something here.  I can't see
a benefit to caching only sometimes.  If there is, then you probably
want to implement a more sophisticated cache management strategy
for your sub, not warp the language for a special case.

Z.




Partially Memoized Functions

2002-12-09 Thread Smylers
Last month's discussion on memoization[*0] had the consensus that
C is the appropriate property name, used like so:

  sub square (Num $n) is cached { ... }

I was wondering whether it'd be better to have this specified per
C rather than per C.  That'd permit something a long the
lines of:

  sub days_in_month(Str $month, Int $year)
  {
$month = lc $month;
if $month eq 'feb'
{
  # Do 'complicated' calculation and cache the result for later use:
  my $leap = $year % 4 == 0
  && ($year % 100 != 0 || $year % 400 == 0);
  cache_and_return $leap ? 29 : 28;
}

else
{
  # Simple look-up, so caching would be counter-productive:
  return %days{$month};  # %days was declared above (honest)
}
  }

That way a function could decide to cache some return values but not all
of them.

Perhaps there are only some edge cases which require calculation; or the
function is liable to be called with many invalid input values, which
can quickly be determined yield C and so which don't need
caching; or there is a pattern as to which sets of input parameters are
likely to be passed multiple times so the function only bother caching
those.

This would be more flexible than applying the caching to an entire sub.
The downside would be that somebody who wants to cache all their values
but has an algorithm that naturally leads to having many exit points
would have to remember to specify the caching on each of them -- that
may make this not worth bothering with.

Obviously C is a stupid placeholder name; were this to
be implemented a better syntax would be needed.  Slapping C
on the end of the C statement makes no sense, since it's not a
property of the thing being returned but of the input data to the
function.

Anybody else like this, or are we better off leaving things as they
were?

  [*0]  See the thread round about:

http://groups.google.co.uk/groups?hl=en&threadm=3DCF8085.70902%40conway.org

Smylers