Re: Schwartzian Transform
Dan Sugalski [EMAIL PROTECTED] writes: At 09:26 AM 3/27/2001 -0800, Peter Buckingham wrote: Dan Sugalski wrote: At 09:50 PM 3/26/2001 -0500, James Mastros wrote: [..] I'd think /perl/ should complain if your comparison function isn't idempotent (if warnings on, of course). If nothing else, it's probably an indicator that you should be using that schwartz thang. If you figure out how, tell me. I'd like to make arrangements to be there for your Nobel acceptance speech. :) (This is, alas, a variant on the halting problem, unless you're proposing perl do the memoizing *and* still call the functions, and complain if one doesn't match the other) not wanting to collect my nobel prize just yet, but... could you not try a simple test (not guaranteed to be 100% accurate though), by copying the first data element and apply it twice and then check to see that the result of applying it once is the same as applying it twice. Feels a little too magic to me, and awfully fragile. I'm not comfortable doing that, though arguments to the contrary are welcome. There are 3 separate notions of idempotency at work here; I'm making up their names. 1. THE MATHEMATICAL NOTION: f(f(x)) == f(x) for all x We don't care about that *at all* when we're sorting (we're sorting the x's in order of their f(x)'s, with not an f(f(x)) in sight!). I'm not going to talk about this idempotency, I just mention it because of the incorrect analogy from linear algebra that has been floating around. 2. "STRONG" IDEMPOTENCY: (PURE FUNCTIONS) Saying C$a=f(x); $a=f(x) leaves the program in the same state as just C$a=f(x). This is achieved by avoiding assignment, not using any routines with state (I/O, PRNGs, ...) and the like. 3. "WEAK" IDEMPOTENCY: ("PURE RESULT"?) After C$a=f(x); more_stuff(); $b=f(x) we have C$a==$b. But the state of the program may have been changed. Examples include any f() that does caching (or I/O for other purposes), logging, or profiling. If f() is memoized then it is weakly idempotent but not strongly idempotent, since calling f(x) for the first time may change the memoized data. If f() is strongly idempotent then it is of course weakly idempotent also. (3) is enough for the sorting examples, I think. Unfortunately it's also a lot harder to test syntactically than is (2). So I'd like to suggest a cop-out, similar to the "let a module do the work" argument. MJD has a module Memoize, which is very easy to use. You can only memoize a pure function (in one of the above 2 senses; you can always memoize (2), and you can do (3) if the semantics of it are "good enough" for what you want to do). So you can say use Memoize; # ... memoize 'f'; @sorted = sort { my_compare(f($a),f($b)) } @unsorted to get a lot of the effect of the S word. Yes, I know it's not the transform (never better and often worse), since Memoize was designed with other things in mind. But it's probably a "good enough" solution, and has very low brain overhead. -- Ariel Scolnicov|"GCAAGAATTGAACTGTAG"| [EMAIL PROTECTED] Compugen Ltd. |Tel: +972-2-5713025 (Jerusalem) \ We recycle all our Hz 72 Pinhas Rosen St.|Tel: +972-3-7658117 (Main office)`- Tel-Aviv 69512, ISRAEL |Fax: +972-3-7658555http://3w.compugen.co.il/~ariels
Re: Schwartzian Transform
So you can say use Memoize; # ... memoize 'f'; @sorted = sort { my_compare(f($a),f($b)) } @unsorted to get a lot of the effect of the S word. Yes, and of course the inline version of this technique is also common: @sorted = sort { my $ac = $cache{$a} ||= f($a); my $bc = $cache{$b} ||= f($b); my_compare($ac,$bc); } @unsorted; Joseph Hall calls this the 'Orcish Maneuver'. However (I don't know who suggested this, but:) I'd think /perl/ should complain if your comparison function isn't idempotent (if warnings on, of course). If nothing else, it's probably an indicator that you should be using that schwartz thang. I have to agree with whoever followed up that this is a really dumb idea. It reminds me of the time I was teaching the regex class at TPC3, and I explained how the /o in /$foo/o represents a promise to Perl that $foo will never change, so Perl can skip the operation of checking to see if it has changed every time the match is performed. Then there was a question from someone in the audience, asking if Perl would emit a warning if $foo changed. On the other side of the argument, however, I should mention that I've planned for a long time to write a Sort::Test module which *would* check to make sure the comparator function behaved properly, and would report problems. When you use the module, it would make all your sorts run really slowly, but you would get a warning if your comparator was bad. Idempotency is not the important thing here. The *important* property that the comparator needs, and the one that bad comparators usually lack is if my_compare(a,b) 0, and my_compare(b,c) 0, then it should also be the case that my_compare(a,c) 0 for all keys a, b, and c. Sort::Test would run a quadratic sort such as a bubble sort, and make sure that this essential condition held true. Note in particular that if the comparator has the form { my_compare(f(a),f(b)) }, then it does not matter if f() is idempotent; what really matters is that my_compare should have the property above. I had also planned to have optional checks: use Sort::Test 'self'; (Make sure that my_compare(a,a) == 0 for all a) use Sort::Test 'twice'; (Make sure that my_compare(a,b) == my_compare(a,b) for all a,b) This last is essentially the idempotency restriction again. The reason I've never implemented this module is that in perl 5, sort() cannot be overridden, so the usefulness seemed low; you would have to rewrite your source code to use it. I hope this limitation is fixed in perl 6, because it would be a cool hack. Finally, another argument in the opposite direction yet again. It has always seemed to me that this 'inconsistent sort comparator' thing is a tempest in a teapot. In the past it has gotten a lot of attention because some system libraries have a qsort() function that dumps core if the comparator is inconsistent. To me, this obviously indicates a defective implementation of qsort(). If the sort function dumps core or otherwise detects an inconsistent comparator, it is obviously functioning suboptimally. An optimal sort will not notice that the comparator is inconsistent, because the only you can find out that the comparator is returning inconsistent results is if you call it in a situation where you already know what the result should be, and it returns a different result. An optimal sort function will not call the comparator if it already knows what the result should be! For example, consider the property from above: if my_compare(a,b) 0, and my_compare(b,c) 0, then my_compare(a,c) 0. If the qsort() already knows that ab and bc, what is it doing calling my_compare(a,c)? It should be able to figure that out without the call! The fact that it dumps core if ca means that it has called my_compare() when it didn't need to. Similarly, a qsort that dumps core if my_compare(a,a) != 0 has obviously wasted time by calling my_compare() at all in this case; it should *assume* that my_compare will return 0 here. So perhaps the best answer to the whole discussion is that if qsort() dumps core when given an inconsistent comparator, you should replace it with a better qsort.
Re: Schwartzian Transform
On Wed, Mar 28, 2001 at 09:13:01AM -0500, Mark-Jason Dominus wrote: So you can say use Memoize; # ... memoize 'f'; @sorted = sort { my_compare(f($a),f($b)) } @unsorted to get a lot of the effect of the S word. Yes, and of course the inline version of this technique is also common: @sorted = sort { my $ac = $cache{$a} ||= f($a); my $bc = $cache{$b} ||= f($b); my_compare($ac,$bc); } @unsorted; Joseph Hall calls this the 'Orcish Maneuver'. That does have the (slight) disadvantage that if f(x) returns a false value then f(x) may be called multiple times. Of course this can be fixed by using exists. It also has the overhead of computing the hash value of a and b on every itteration Personally I have found the fastest sort tends to be my @f = map { f($_) } @unsorted; @sorted = @unsorted[ sort { $f[$a] cmp $f[$b] } 0..$#f ]; Graham.
Re: Schwartzian Transform
On Wed, Mar 28, 2001 at 09:38:59AM -0500, John Porter wrote: Mark-Jason Dominus wrote: I have to agree with whoever followed up that this is a really dumb idea. Yahbut... (See last paragraph, below). OK, I'm agreeing with MJD on this one, and it was my idea. There is no easy way to check for statelessness in all cases. In some cases you can -- sin($a) = cos($b) is obviously stateless (if $a and $b aren't magic), because it is composed only of stateless primitives. This runs afoul of the halting problem real quick. Or so I'm told, and I belive it. If the comparison (or key extraction) function is not idempotent, that is a much worse situation than simply one of degraded performance. It means the result of sort() won't be (or at least will not be guaranteed to be) sorted! My intuition says that it will either be sorted, for a simple def of sorted ($s[i] = $s[i+1]), or will take infinite time. (The "take infinite time" is, I assume, the one that made things dump core.) if my_compare(a,b) 0, and my_compare(b,c) 0, then it should also be the case that my_compare(a,c) 0 Hm. We could call that "relative idempotency", I suppose. I'd go with "transitive", since this is a property of the comparator, not the extractor. If you seperate the comparator and the extractor(s), then the comparator must be transitive, and the extractors must be internaly stateless. But, aaui, it is not a question of the comparison function being idempotent, but the key extraction function being so. aaui? Hm. Let's define some terms like "idempotency". These are also my (current) ideas for sub attributes. Stateless: The function neither depends on state, nor modifies it. This makes it a pure (IE mathematical) function. f(a) == f(a), and there is no side-effect. sin() is stateless. This means the function is memonizeable. Internaly stateless: f(a) == f(a), but there might be sideefects that do not effect the return value. printf is internaly stateless (ignoring linenoize vars). This is the essencial property of key extractors. Note that all stateless functions are internaly stateless. Transitive: if my_compare(a,b) 0, and my_compare(b,c) 0, then it should also be the case that my_compare(a,c) 0 I can't define it better then that. (Though there's more to it then that). Note that only the sign of the answer is gaurnteed, so it doesn't even have to be internaly stateless -- but it probably doesn't make sense for it not to be. Give the braindead no head. You might want to change that to "heed". -=- James Mastros -- The most beautiful thing we can experience is the mysterious. It is the source of all true art and science. He to whom this emotion is a stranger, who can no longer pause to wonder and stand wrapt in awe, is as good as dead. -=- Albert Einstein AIM: theorbtwo homepage: http://www.rtweb.net/theorb/
Re: Schwartzian Transform
James Mastros wrote: This runs afoul of the halting problem real quick. That would be taking the entirely wrong approach. All you'd need to do is check the return values from multiple calls with the same arguments. As long as they appear idempotent, that's all you care about. My intuition says that it will either be sorted, for a simple def of sorted ($s[i] = $s[i+1]), or will take infinite time. (The "take infinite time" is, I assume, the one that made things dump core.) Well, your intuition is not serving you well. You're mistaken on both counts. Hm. We could call that "relative idempotency", I suppose. I'd go with "transitive", since this is a property of the comparator, not the extractor. It's relative, because while the return values doen't need to be exactly the same each time, they only need to have the same signum(difference()). If you seperate the comparator and the extractor(s), then the comparator must be transitive, and the extractors must be internaly stateless. Must be? I think not. Give the braindead no head. You might want to change that to "heed". I definitely do not. The whole reason I put it in my .sig is because I like the way it sounds. If you find it offensive, well, I'll have a new one soon. -- John Porter
Schwartzian transforms
While all the discussion around Schwartzian transforms is interesting, the single ultimate question is: "Can perl automatically optimize away function and tie calls inside a sort function, and under what circumstances?" It doesn't really matter if the functions inside the sort function are idempotent--what matters is whether it's OK for us to go and memoize the things (or whatever else we might choose to do) Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Schwartzian transforms
On Wed, Mar 28, 2001 at 11:11:20AM -0500, Dan Sugalski wrote: "Can perl automatically optimize away function and tie calls inside a sort function, and under what circumstances?" Agreed. It doesn't really matter if the functions inside the sort function are idempotent--what matters is whether it's OK for us to go and memoize the things (or whatever else we might choose to do) As far as I can see, this, in essence, gives a few basic cases: 1) The sort function is ill-defined. 2) The sort function is stateless. 3) The sort function is simply internaly stateless. 4) The function is well-defined, but not stateless whatsoever. In case 1, The sort won't work anyway, so we can ignore this case. I'm of the opinion that we should consider 3 to be Just Plain Silly and not worth worring about overmuch. (These would be functions that increment a counter every time they are accessed, for example.) I think that the difference between 43 dosn't matter. We only have things in 4 and not 3 that vary in abs(), but not sign. We're left with 12, and for 1, the sort won't work anyway. So long as we consider 2 Just Plain Silly, we're OK memonizing. -=- James Mastros -- The most beautiful thing we can experience is the mysterious. It is the source of all true art and science. He to whom this emotion is a stranger, who can no longer pause to wonder and stand wrapt in awe, is as good as dead. -=- Albert Einstein AIM: theorbtwo homepage: http://www.rtweb.net/theorb/
Re: Schwartzian transforms
At 11:22 AM 3/28/2001 -0500, John Porter wrote: Dan Sugalski wrote: It doesn't really matter if the functions inside the sort function are idempotent--what matters is whether it's OK for us to go and memoize the things (or whatever else we might choose to do) Exactly, that's what I've been trying to say. And that's why I propose the :constant/:function/:pure/:stateless attribute, so that perl only has to trust the programmer to say which functions can be memoized. I'm actually considering whether we even need to care what the programmer's said. If we can just flat-out say "We may optimize your sort function, and we make no guarantees as to the number of times tied data is fetched or subs inside the sort sub are called" then life becomes much easier. Of course, we may not be able to say that, in which case hints of any sort are a Good Thing. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Schwartzian transforms
James Mastros wrote: I'm of the opinion that we should consider 3 to be Just Plain Silly and not worth worring about overmuch. AFAICT, you're worrying about everything overmuch. It suffices, I believe, to put the following contract on the sort() function: If the user-supplied comparison function is not relative-idempotent, then the results of sort() are not guaranteed to be sorted. If the user-supplied key extraction function is tagged with :function/:pure (or whatever), then perl is free to optimize the operation of sort() by memoizing the results of calls to that function. -- John Porter
Re: Schwartzian transforms
On Wednesday 28 March 2001 11:47, Dan Sugalski wrote: At 11:22 AM 3/28/2001 -0500, John Porter wrote: Dan Sugalski wrote: It doesn't really matter if the functions inside the sort function are idempotent--what matters is whether it's OK for us to go and memoize the things (or whatever else we might choose to do) Exactly, that's what I've been trying to say. And that's why I propose the :constant/:function/:pure/:stateless attribute, so that perl only has to trust the programmer to say which functions can be memoized. I'm actually considering whether we even need to care what the programmer's said. If we can just flat-out say "We may optimize your sort function, and we make no guarantees as to the number of times tied data is fetched or subs inside the sort sub are called" then life becomes much easier. But you can't. A complex sort can currently by simplified, if desired. To invert the behavior (simplification first), you'd still need a way to GWBrecomplexify/GWB it, for the folks who need a fetch every time. Of course, we may not be able to say that, in which case hints of any sort are a Good Thing. Yes. One way or t'other. -- Bryan C. Warnock [EMAIL PROTECTED]
Re: Schwartzian transforms
Dan Sugalski wrote: ... subs inside the sort sub are called" then life becomes much easier. Easier for perl. Don't we want to make life easier for the programmer? I mean, in general, it would be nice if there were a way to have perl memoize for us, rather than have to arrange it ourself. It could benefit a lot of situations besides sorting. -- John Porter
Re: Schwartzian transforms
On Wed, Mar 28, 2001 at 11:59:19AM -0500, John Porter wrote: I mean, in general, it would be nice if there were a way to have perl memoize for us, rather than have to arrange it ourself. Again with the over-specific solutions! What you seem to want is for (for instance) sub foo :memoize # Colon Rule conformance { # THING } to automagically memoize the subroutine. That's the specific solution. The general solution is "allow people to register user-defined behaviour triggered by certain attributes", hence: use attribute::sub memoize = sub { my ($coderef, $argument) = @_; exists $cache{$coderef}{$argument} ? $cache{$coderef}{$argument} : $cache{$coderef}{$argument} = $coderef-($argument); } -- An ASCII character walks into a bar and orders a double. "Having a bad day?" asks the barman. "Yeah, I have a parity error," replies the ASCII character. The barman says, "Yeah, I thought you looked a bit off." -- from Skud
Re: Schwartzian transforms
Since I'm supposed to be summarizing this thread for Simon's weekly write-up, let me make sure I have the four(?) basic suggestions/stances. 1) There are too many variations/problems/issues to bother having Perl try to handle them all. If folks want an optimized sort, they should continue to use the ST, or roll something similar. 2) Perl should create some form of special syntax explicitly for doing an ST on data. (Other than the special syntax of the ST, of course.) 3) Perl should provide a general memoization mechanism, usable outside sort, but that could be used to get ST-like behavior from a regular sort routine. sort { f'($a) cmp f''($b) | ... | f``($a) cmp f`($b) } @list; or sort { $a-f' cmp $b-f'' | ... | $a-f`` cmp $b-f` } @list; Each value in list would have the results for f() cached for subsequent comparisons within the sort. This would eliminate the need for the ST. 4) Should should grok a sort as an ST. sort { f'($a) cmp f''($b) | ... | f``($a) cmp f`($b) } @list; or sort { $a-f' cmp $b-f'' | ... | $a-f`` cmp $b-f` } @list; Perl should see this and think aha! map { $_-[0] } sort { $a-[1] cmp $b-[2] | ... | $a-[-2] cmp $b-[-1] } map { [$_, f'($_), f''($_), ... , f``($_), f`($_)] } @list; Did I grossly miss anyone's position? On Wednesday 28 March 2001 15:02, Dan Sugalski wrote: At 11:59 AM 3/28/2001 -0500, John Porter wrote: Dan Sugalski wrote: ... subs inside the sort sub are called" then life becomes much easier. Easier for perl. Don't we want to make life easier for the programmer? I mean, in general, it would be nice if there were a way to have perl memoize for us, rather than have to arrange it ourself. It could benefit a lot of situations besides sorting. I'm not talking about making it easier on perl so much as making it faster. Basically to give us the wiggle room to recognize some simple constructs like foo($a) = bar($b) or foo($a) cmp bar($b) and optimize them to a table build and sort. This would work for plain perl data structures as well, as we might potentially be doing a fair amount of data conversion through the variable vtable interface. (Not to mention the issues of data mangling for proper Unicode sorting support) Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk -- Bryan C. Warnock [EMAIL PROTECTED]
Re: Schwartzian transforms
Dan Sugalski [EMAIL PROTECTED] writes: I'm actually considering whether we even need to care what the programmer's said. If we can just flat-out say "We may optimize your sort function, and we make no guarantees as to the number of times tied data is fetched or subs inside the sort sub are called" then life becomes much easier. I am strongly in favor of that approach. I see no reason to allow for weird side effects in Perl 6. (Perl 5 would be a different matter, of course.) Not only is it simpler to deal with, it's simpler to *explain*, and that's important. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
RE: Schwartzian transforms
From: Russ Allbery [mailto:[EMAIL PROTECTED]] we can just flat-out say "We may optimize your sort function" I am strongly in favor of that approach. I see no reason to allow for weird side effects in Perl 6. Let me second the motion. "Allow optimisation" should be the default. A programmer should, however, be able to say sort sub :no_memoize { $global++ ; (($a*10+$b)%3)-1 } (1..10); if they really want to. But make the programmer say "I am doing something wierd", not the other way round. Dave.
Re: Schwartzian transforms
Russ Allbery wrote: I'd really like to see a concrete example of a sane sorting function which cannot be memoized. (Issues of syntax aside; just caching the result of comparing any two pairs of data results in caching data that a sane sorting algorithm will never use again. Well, we seem to be talking about different things. I'm talking about memoizing the key extraction function. Even a *sane* sorter will call get_keys($a) multiple times for any given value of $a, in general. But I guess your question still stands: Why/when would we ever *not* want to cache the result? Any user code which depends on how many times the key extraction function was called is certainly kludgerific, and should not be condoned! :-) -- John Porter Give the braindead no head.
What can we optimize (was Re: Schwartzian transforms)
At 01:22 PM 3/28/2001 -0800, Russ Allbery wrote: Dan Sugalski [EMAIL PROTECTED] writes: I'm actually considering whether we even need to care what the programmer's said. If we can just flat-out say "We may optimize your sort function, and we make no guarantees as to the number of times tied data is fetched or subs inside the sort sub are called" then life becomes much easier. I am strongly in favor of that approach. I see no reason to allow for weird side effects in Perl 6. (Perl 5 would be a different matter, of course.) Well, weird side effects can be rather useful to exploit. Not, mind, that I can think of one in the case of sorting, bur that doesn't mean there aren't any. Not only is it simpler to deal with, it's simpler to *explain*, and that's important. True enough. This is a small subset of general optimizations. For example, this: $i = 0; foreach (1..1000) { $i++; } can be easily optimized to: $i = 1000; and most language implementations with any sort of optimizer will do this. If $i isn't actually used after that point without another assignment it might well be completely optimized away. With perl, though, this does potentially unexpected things if $i is tied. Do we still optimize it away? Do we only do it if we can tell that $i's not tied? Do we force the programmer to explicitly note which variables are potentially tied with the "my Dog $spot" syntax and assume that everything else is fair game? Can we even do that in the face of runtime requires, dos, or evals? (Or does that force a complete reevaluation of the optimized bytecode) I think I need to go fetch my brain out from behind the bookcase again... Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: What can we optimize (was Re: Schwartzian transforms)
On Wed, Mar 28, 2001 at 04:36:58PM -0500, Dan Sugalski wrote: With perl, though, this does potentially unexpected things if $i is tied. Do we still optimize it away? Do we only do it if we can tell that $i's not tied? Yep. And in non-trivial cases, the only way to do that might be for $i to be :simple. Do we force the programmer to explicitly note which variables are potentially tied with the "my Dog $spot" syntax and assume that everything else is fair game? I'd rather see that things that will never be tied or otherwise magic be marked as :simple. Code always works, and will be faster with a little effort. Can we even do that in the face of runtime requires, dos, or evals? (Or does that force a complete reevaluation of the optimized bytecode) It is a catchable error to remove a behavorial restrictor attribute such as :simple or :stateless. So let it be spoken, so let it be done. This isn't any more preverse then the "you can't assign to constants" rule. -=- James Mastros -- The most beautiful thing we can experience is the mysterious. It is the source of all true art and science. He to whom this emotion is a stranger, who can no longer pause to wonder and stand wrapt in awe, is as good as dead. -=- Albert Einstein AIM: theorbtwo homepage: http://www.rtweb.net/theorb/
Re: What can we optimize (was Re: Schwartzian transforms)
On Wed, Mar 28, 2001 at 05:57:30PM -0500, James Mastros wrote: [A bunch of stuff] Oh, and I agree with sombody else on this thread that unless otherwise stated, the sort should always assume statelessness (and thus the ability to cache at will). If it's trivial to see that the sort function isn't stateless (IE it's a named sub that doesn't have the :stateless attribute set), then have an optional warning, because you probably don't want to be using that function, or the function should be marked :stateless. -=- James Mastros -- The most beautiful thing we can experience is the mysterious. It is the source of all true art and science. He to whom this emotion is a stranger, who can no longer pause to wonder and stand wrapt in awe, is as good as dead. -=- Albert Einstein AIM: theorbtwo homepage: http://www.rtweb.net/theorb/
Re: Perl culture, perl readabillity
- Make readability your main objective. Readability is possibly the weakest part of Perl. - Keep your eyes on modularity. Modularity is by far the best concept where complexity could be hidden. - Don't forget usability. This is after all the point why people use Perl in the first place. It seems you are not interested in critics, so lets end this thread. O. Wyss