Re: Partially Memoized Functions
On 12/10/2002 5:46 PM, Smylers wrote: OK. There was something on MJD's QOTW recently where using the current Perl 5 Memoize module slowed code down -- that gave me the impression that caching had the potential. It does. In fact, all caching has that potential. Specificly, if the time to look up somthing in the cache is greater then the time to recompute it, caching is a loose. Additionaly, if the hit rate (probablity; 0..1) times the time to recompute the datum is less then the time to look up the result in the cache, it's a loss for that datum. MJD has a set of presentation slides on this at http://perl.plover.com/yak/memoize-quant/ -- "Quantitative Analysis of Memoization". -=- James Mastros PS -- This is getting offtopic, even for p6l.
Re: Partially Memoized Functions
Adam Turoff wrote: sub days_in_month(Str $month, Int $year) { $month = lc $month; if $month eq 'feb' { my sub feb_days (Int $year) is cached { my $leap = $year % 4 == 0 && ($year % 100 != 0 || $year % 400 == 0); return $leap ? 29 : 28; } return feb_days($year); } else { # Simple look-up, so caching would be counter-productive: return %days{$month}; # %days was declared above (honest) } } I don't think that works correctly. This will create a new cached sub each time $month eq 'feb'? Nope. It simply declares C to be a *named* subroutine (i.e. *not* a closure) that happens to be lexically scoped to inside C. In other words, it works exactly as if I had left off the C, except (with the C) it avoids polluting a global namespace with a subroutine that's not relevant anywhere else. That'll generate a lot of cached subs Nope. It's a named subroutine. So "thar ken be onla one" ;-) values will be calculated each time $month eq 'feb, and none of the values will ever be returned from any of those caches. Nope. Schwern's approach of factoring out days_in_feb into a cached sub is the same basic idea, and doesn't have this issue. That's true. But my version doesn't have the issue either, and also elegantly illustrates why we're adding lexical subroutines to Perl 6. :-) Damian
Re: Partially Memoized Functions
On Tue, Dec 10, 2002 at 10:46:07PM -, Smylers wrote: > > The example above is a classic example of premature optimization. > > There's nothing which ways the cache would be counter-productive for > > simple calculations since the caching logic is nearly as simple. > > OK. There was something on MJD's QOTW recently where using the current > Perl 5 Memoize module slowed code down -- that gave me the impression > that caching had the potential. Applying memoization to really simple calcuations, such as the number of days in a month, will probably wind up slower than just doing the original calcuation. This is also an example of premature optimization. :) For something as simple as your example, the cost of entering and exiting the subroutine is probably more expensive than the code to actually do the work. In that case you likely want to inline the code which flies off into a whole other class of optimizations... > > However, if you *really* want to do partial caching and given that > > this is an edge case, ... > > It wasn't supposed to be claiming to be a serious function where caching > would be useful! I meant that the idea of partial caching is an edge case. > > I'd say to just do it as two functions > > Yeah, I thought of that (thanks for bothering to write it out) then I > thought of having the caching be per-C and couldn't work out > whether it'd be generally useful or not, which is why I asked here, for > an increased sample size. > > And that increased sample size indicates "not", so I'll forget about it. I dunno, my motto is "never hurts to put it in a library". -- Michael G. Schwern <[EMAIL PROTECTED]>http://www.pobox.com/~schwern/ Perl Quality Assurance <[EMAIL PROTECTED]> Kwalitee Is Job One MERV GRIFFIN!
Re: Partially Memoized Functions
Michael G Schwern wrote: > On Mon, Dec 09, 2002 at 08:36:20PM -, Smylers wrote: > > > That way a function could decide to cache some return values but not > > all of them. > > The example above is a classic example of premature optimization. > There's nothing which ways the cache would be counter-productive for > simple calculations since the caching logic is nearly as simple. OK. There was something on MJD's QOTW recently where using the current Perl 5 Memoize module slowed code down -- that gave me the impression that caching had the potential. > However, if you *really* want to do partial caching and given that > this is an edge case, ... It wasn't supposed to be claiming to be a serious function where caching would be useful! > I'd say to just do it as two functions Yeah, I thought of that (thanks for bothering to write it out) then I thought of having the caching be per-C and couldn't work out whether it'd be generally useful or not, which is why I asked here, for an increased sample size. And that increased sample size indicates "not", so I'll forget about it. > rather than making the caching semantics more involved. (As hard as this may seem to believe, the idea actually sprang from wanting to make caching _less_ involved. I think my brain was going along the lines of "I've just writing this expensive calculation to compute a value, let's save it somewhere before returning it" so wanting to deal with the caching at _that_ point; I hadn't thought about caching when writing the C line some time previously and it seemed inconvenient to have to go back there now that I was considering it. Or something.) Smylers
Re: Partially Memoized Functions
On Mon, Dec 09, 2002 at 02:20:01PM -0800, Austin Hastings wrote: > --- Paul Johnson <[EMAIL PROTECTED]> wrote: > > How about the same way as one would do it now? Presumably we won't > > all > > forget how to program when Perl 6 comes out. > > I think you've missed the point. The original poster (Smylers) asked if > there was a benefit to only cacheing certain values, presuming the > remainder would be either trivial to compute or internally cached, or > both. I think *you've* missed the point. There's not enough benefit to support "caching certain values" through linguistic warping. That technique is possible, and it's the kind of solution that is better suited to (1) hand-rolling a cache management strategy, or (2) refactoring the code to work with standard memoization. The best solutions involve caching all of the values returned by this function (ignoring the possible waste as insignificant), or refactoring the code so that "all of the meaningful values" are cached (and computed by a separate cached sub). Z.
Re: Partially Memoized Functions
On Mon, Dec 09, 2002 at 01:58:11PM -0800, Austin Hastings wrote: > --- Adam Turoff <[EMAIL PROTECTED]> wrote: > > I think you're trying to overoptimize something here. I can't see > > a benefit to caching only sometimes. If there is, then you probably > > want to implement a more sophisticated cache management strategy > > for your sub, not warp the language for a special case. > > Ahh. This is better. How does one implement a more sophisticated cache > management strategy? By memoizing specific cases by hand, of course. :-) > That is, what is the mechanism for manipulating the run-time system > behavior of subs? Memoization does not have to involve manipulating runtime behavior. However, manipulating runtime behavior is a simple, generic and effective way to memoize random subs. Here's an example of a memoizing only the values for 'feb'. Schwern's solution is simpler and easier to read though. { ## start of a lexical scope to hide %feb_cache my %feb_cache; sub days_in_month(Str $month, Int $year) { $month = lc $month; if $month eq 'feb' { unless $feb_cache{$year} { my $leap = $year % 4 == 0 && ($year % 100 != 0 || $year % 400 == 0); $feb_cache{$year} = $leap ? 29 : 28; } return $feb_cache{$year}; } else { return %days{$month}; } } } Z.
Re: Partially Memoized Functions
On Mon, Dec 09, 2002 at 01:58:11PM -0800, Austin Hastings wrote: > --- Adam Turoff <[EMAIL PROTECTED]> wrote: > > It doesn't matter whether some of the values are cheap lookups > > while other values are "complex calculations". Once a cached sub > > is called with a set of parameter values, the return value will > > always be a cheap lookup in the memoized sub's cache. > > You may get some disagreement from those for whom memory is neither > virtual nor free. Memoization is simply the exchange of cheap memory lookups for complicated calculations. If memory is more precious than a few CPU cycles, then you shouldn't be memoizing. Z.
Re: Partially Memoized Functions
On Tue, Dec 10, 2002 at 01:53:28PM +1100, Damian Conway wrote: > And in those rare cases where you really do need partial caching, the > simplest solution is to split the partially cached subroutine into a > fully cached sub and an uncached sub: > > sub days_in_month(Str $month, Int $year) > { > $month = lc $month; > if $month eq 'feb' > { > my sub feb_days (Int $year) is cached { > my $leap = $year % 4 == 0 > && ($year % 100 != 0 || $year % 400 == 0); > return $leap ? 29 : 28; > } > return feb_days($year); > } > > else > { > # Simple look-up, so caching would be counter-productive: > return %days{$month}; # %days was declared above (honest) > } > } I don't think that works correctly. This will create a new cached sub each time $month eq 'feb'? That'll generate a lot of cached subs, values will be calculated each time $month eq 'feb, and none of the values will ever be returned from any of those caches. Schwern's approach of factoring out days_in_feb into a cached sub is the same basic idea, and doesn't have this issue. Z.
Re: Partially Memoized Functions
Smylers wrote: I was wondering whether it'd be better to have this specified per C rather than per C. I doubt it. There's no performance gain from partial caching since you have to check the cache anyway to detect that a particular result isn't cached. And in those rare cases where you really do need partial caching, the simplest solution is to split the partially cached subroutine into a fully cached sub and an uncached sub: sub days_in_month(Str $month, Int $year) { $month = lc $month; if $month eq 'feb' { my sub feb_days (Int $year) is cached { my $leap = $year % 4 == 0 && ($year % 100 != 0 || $year % 400 == 0); return $leap ? 29 : 28; } return feb_days($year); } else { # Simple look-up, so caching would be counter-productive: return %days{$month}; # %days was declared above (honest) } } Damian
Re: Partially Memoized Functions
On Mon, Dec 09, 2002 at 08:36:20PM -, Smylers wrote: > Last month's discussion on memoization[*0] had the consensus that > C is the appropriate property name, used like so: > > sub square (Num $n) is cached { ... } > > I was wondering whether it'd be better to have this specified per > C rather than per C. That'd permit something a long the > lines of: > > sub days_in_month(Str $month, Int $year) > { > $month = lc $month; > if $month eq 'feb' > { > # Do 'complicated' calculation and cache the result for later use: > my $leap = $year % 4 == 0 > && ($year % 100 != 0 || $year % 400 == 0); > cache_and_return $leap ? 29 : 28; > } > > else > { > # Simple look-up, so caching would be counter-productive: > return %days{$month}; # %days was declared above (honest) > } > } > > That way a function could decide to cache some return values but not all > of them. The example above is a classic example of premature optimization. There's nothing which ways the cache would be counter-productive for simple calculations since the caching logic is nearly as simple. However, if you *really* want to do partial caching and given that this is an edge case, I'd say to just do it as two functions rather than making the caching semantics more involved. my %days = (...); sub days_in_month(Str $month, Int $year) { $month = lc $month; return days_in_feb($year) if $month eq 'feb'; return %days{$month}; } sub days_in_feb(Int $year) is cached { # Do 'complicated' calculation and cache the result for later use: my $leap = $year % 4 == 0 && ($year % 100 != 0 || $year % 400 == 0); return $leap ? 29 : 28; } Of course there's nothing which says you can't just do *both*. cache_and_return() is merely a function. Write it, stick it in a library, distribute for fun and/or profit. -- Michael G. Schwern <[EMAIL PROTECTED]>http://www.pobox.com/~schwern/ Perl Quality Assurance <[EMAIL PROTECTED]> Kwalitee Is Job One If God made anything more guerrila than your breast, I hope he kept it for your father.
Re: Partially Memoized Functions
On Mon, Dec 09, 2002 at 02:20:01PM -0800, Austin Hastings wrote: > > --- Paul Johnson <[EMAIL PROTECTED]> wrote: > > On Mon, Dec 09, 2002 at 01:58:11PM -0800, Austin Hastings wrote: > > > Ahh. This is better. How does one implement a more sophisticated > > > cache management strategy? > > > > > > That is, what is the mechanism for manipulating the run-time system > > > behavior of subs? > > > > How about the same way as one would do it now? Presumably we won't > > all > > forget how to program when Perl 6 comes out. > > Paul, > > I think you've missed the point. The original poster (Smylers) asked if > there was a benefit to only cacheing certain values, presuming the > remainder would be either trivial to compute or internally cached, or > both. > > The suggestion was that a more advanced cache management strategy could > be attached, presumably changing the behavior of the > function-return-caching subsystem. > > I'm all in favor of that, but it's a new rock to turn over looking for > details. > > If you're proposing that the right answer is to not cache the function, > but rather implement an internal cache, then cool - you've got a friend > in Smylersvania. > > But if not, then presumably you know something I do not know: enlighten > me, please? What I'm suggesting is that instead of: sub days_in_month(Str $month, Int $year) { $month = lc $month; if $month eq 'feb' { # Do 'complicated' calculation and cache the result for later use: my $leap = $year % 4 == 0 && ($year % 100 != 0 || $year % 400 == 0); cache_and_return $leap ? 29 : 28; } else { # Simple look-up, so caching would be counter-productive: return %days{$month}; # %days was declared above (honest) } } We get: sub days_in_month(Str $month, Int $year) { $month = lc $month; if $month eq 'feb' { # Do 'complicated' calculation and cache the result for later use: my %leap_cache is static; # or something ... my $leap = $leap_cache{$year} //= $year % 4 == 0 && ($year % 100 != 0 || $year % 400 == 0); return $leap ? 29 : 28; } else { # Simple look-up, so caching would be counter-productive: return %days{$month}; # %days was declared above (honest) } } OK, the cache isn't identical, but it could be programmed to be identical, or it could be better by caching lc $month rather than $month. That way I can use constructs I already (will) know and do the caching just how I want. Of course, I can do this anyway, and maybe there are benefits to having a partial caching system built into the language, but I don't see it at the moment. -- Paul Johnson - [EMAIL PROTECTED] http://www.pjcj.net
Re: Partially Memoized Functions
--- Paul Johnson <[EMAIL PROTECTED]> wrote: > On Mon, Dec 09, 2002 at 01:58:11PM -0800, Austin Hastings wrote: > > Ahh. This is better. How does one implement a more sophisticated > > cache management strategy? > > > > That is, what is the mechanism for manipulating the run-time system > > behavior of subs? > > How about the same way as one would do it now? Presumably we won't > all > forget how to program when Perl 6 comes out. Paul, I think you've missed the point. The original poster (Smylers) asked if there was a benefit to only cacheing certain values, presuming the remainder would be either trivial to compute or internally cached, or both. The suggestion was that a more advanced cache management strategy could be attached, presumably changing the behavior of the function-return-caching subsystem. I'm all in favor of that, but it's a new rock to turn over looking for details. If you're proposing that the right answer is to not cache the function, but rather implement an internal cache, then cool - you've got a friend in Smylersvania. But if not, then presumably you know something I do not know: enlighten me, please? =Austin
Re: Partially Memoized Functions
On Mon, Dec 09, 2002 at 01:58:11PM -0800, Austin Hastings wrote: > > --- Adam Turoff <[EMAIL PROTECTED]> wrote: > > On Mon, Dec 09, 2002 at 08:36:20PM -, Smylers wrote: > > > Anybody else like this, or are we better off leaving things as they > > > were? > > > > I think you're trying to overoptimize something here. I can't see > > a benefit to caching only sometimes. If there is, then you probably > > want to implement a more sophisticated cache management strategy > > for your sub, not warp the language for a special case. > > Ahh. This is better. How does one implement a more sophisticated cache > management strategy? > > That is, what is the mechanism for manipulating the run-time system > behavior of subs? > > sub days_in_month is cached(&other_cache_function) ? > > Or what? How about the same way as one would do it now? Presumably we won't all forget how to program when Perl 6 comes out. -- Paul Johnson - [EMAIL PROTECTED] http://www.pjcj.net
Re: Partially Memoized Functions
--- Adam Turoff <[EMAIL PROTECTED]> wrote: > On Mon, Dec 09, 2002 at 08:36:20PM -, Smylers wrote: > > Perhaps there are only some edge cases which require calculation; > > or the function is liable to be called with many invalid input > > values, which can quickly be determined yield C and so > > which don't need caching; or there is a pattern as to which sets > > of input parameters are > > likely to be passed multiple times so the function only bother > > caching those. > > It doesn't matter whether some of the values are cheap lookups > while other values are "complex calculations". Once a cached sub > is called with a set of parameter values, the return value will > always be a cheap lookup in the memoized sub's cache. You may get some disagreement from those for whom memory is neither virtual nor free. > > Anybody else like this, or are we better off leaving things as they > > were? > > I think you're trying to overoptimize something here. I can't see > a benefit to caching only sometimes. If there is, then you probably > want to implement a more sophisticated cache management strategy > for your sub, not warp the language for a special case. Ahh. This is better. How does one implement a more sophisticated cache management strategy? That is, what is the mechanism for manipulating the run-time system behavior of subs? sub days_in_month is cached(&other_cache_function) ? Or what? And, by the way, what happens when I change it in midstream? Obviously the new cache begins empty, but does my old cache behave like a closure, hanging about until the sub dies? =Austin
Re: Partially Memoized Functions
On Mon, Dec 09, 2002 at 08:36:20PM -, Smylers wrote: > I was wondering whether it'd be better to have this specified per > C rather than per C. That'd permit something a long the > lines of: > > sub days_in_month(Str $month, Int $year) > { > > } > > Perhaps there are only some edge cases which require calculation; or the > function is liable to be called with many invalid input values, which > can quickly be determined yield C and so which don't need > caching; or there is a pattern as to which sets of input parameters are > likely to be passed multiple times so the function only bother caching > those. It doesn't matter whether some of the values are cheap lookups while other values are "complex calculations". Once a cached sub is called with a set of parameter values, the return value will always be a cheap lookup in the memoized sub's cache. It's irrelevant if you have a different but comparable "cheap lookup" for some values. > Anybody else like this, or are we better off leaving things as they > were? I think you're trying to overoptimize something here. I can't see a benefit to caching only sometimes. If there is, then you probably want to implement a more sophisticated cache management strategy for your sub, not warp the language for a special case. Z.
Partially Memoized Functions
Last month's discussion on memoization[*0] had the consensus that C is the appropriate property name, used like so: sub square (Num $n) is cached { ... } I was wondering whether it'd be better to have this specified per C rather than per C. That'd permit something a long the lines of: sub days_in_month(Str $month, Int $year) { $month = lc $month; if $month eq 'feb' { # Do 'complicated' calculation and cache the result for later use: my $leap = $year % 4 == 0 && ($year % 100 != 0 || $year % 400 == 0); cache_and_return $leap ? 29 : 28; } else { # Simple look-up, so caching would be counter-productive: return %days{$month}; # %days was declared above (honest) } } That way a function could decide to cache some return values but not all of them. Perhaps there are only some edge cases which require calculation; or the function is liable to be called with many invalid input values, which can quickly be determined yield C and so which don't need caching; or there is a pattern as to which sets of input parameters are likely to be passed multiple times so the function only bother caching those. This would be more flexible than applying the caching to an entire sub. The downside would be that somebody who wants to cache all their values but has an algorithm that naturally leads to having many exit points would have to remember to specify the caching on each of them -- that may make this not worth bothering with. Obviously C is a stupid placeholder name; were this to be implemented a better syntax would be needed. Slapping C on the end of the C statement makes no sense, since it's not a property of the thing being returned but of the input data to the function. Anybody else like this, or are we better off leaving things as they were? [*0] See the thread round about: http://groups.google.co.uk/groups?hl=en&threadm=3DCF8085.70902%40conway.org Smylers