Re: The Sort Problem: a definitive ruling
On Friday, February 20, 2004, at 05:48 , Damian Conway wrote: Joe Gottman asked: How do you decide whether a key-extractor block returns number? Do you look at the signature, or do you simply evaluate the result of the key-extractor for each element in the unsorted list? For example, what is the result of the following code? sort {$_.key} (1= 'a', 10 = 'b', 2 ='c'); There is nothing in the signature of the key-extractor to suggest that all the keys are numbers, but as it turns out they all are. Will the sort end up being numerical or alphabetic? Whilst I'd very much like it to analyse the keys, detect that they're all numbers, and use C = Eek! Please don't even TRY to do that. It'd be creepy if the same call to sort could swap at runtime between numeric and string comparisons based upon its input. I would hope that the determination be made at compile time. Consider the poor schmuck sorting new objects in preparation for a merge sort, only to find that his new array isn't sorted the same as his old array was, even though they came back from the exact same call to sort... Blech. But if sort's arguments were specifically typed, i.e.: my @array of Int; @array = sort @array; Does this meet the key extractor returns number qualification? Gordon Henriksen [EMAIL PROTECTED]
Re: The Sort Problem: a definitive ruling
Luke Palmer writes: Uri Guttman writes: DC == Damian Conway [EMAIL PROTECTED] writes: DC @sorted = sort {-M}={$^b cmp $^a} @unsorted; but there is no comma before @unsorted. is that correct? Yes. Commas may be ommitted on either side of a block when used as an argument. That's what I thought too. But Damian gave exactly the opposite answer to Uri's question, claiming he'd made a typo and a comma would be required. So which is it -- is Luke right in saying that Damian was right in the first place? Or is Damian right in saying that his example was wrong? Smylers
Re: The Sort Problem: a definitive ruling
Smylers writes: Luke Palmer writes: Uri Guttman writes: DC == Damian Conway [EMAIL PROTECTED] writes: DC @sorted = sort {-M}={$^b cmp $^a} @unsorted; but there is no comma before @unsorted. is that correct? Yes. Commas may be ommitted on either side of a block when used as an argument. That's what I thought too. But Damian gave exactly the opposite answer to Uri's question, claiming he'd made a typo and a comma would be required. So which is it -- is Luke right in saying that Damian was right in the first place? Or is Damian right in saying that his example was wrong? I was wrong in saying that he was right. Those aren't simple blocks, as Damian said, so you need the comma. Luke Smylers
Re: The Sort Problem: a definitive ruling
Joe Gottman writes: sort {$_.key} (1= 'a', 10 = 'b', 2 ='c'); There is nothing in the signature of the key-extractor to suggest that all the keys are numbers, but as it turns out they all are. Are they? I'd been presuming that pair keys would always be strings (as for hashes in Perl 5), and that the C = operator would automatically quote a preceding word, stringifying it (as in Perl 5). So the keys above are strings, albeit ones composed only of digits. Of course that doesn't actually help with your question, since there are other data structures of which the same could be asked. Smylers
Re: The Sort Problem: a definitive ruling
Smylers writes: Joe Gottman writes: sort {$_.key} (1= 'a', 10 = 'b', 2 ='c'); There is nothing in the signature of the key-extractor to suggest that all the keys are numbers, but as it turns out they all are. Are they? I'd been presuming that pair keys would always be strings (as for hashes in Perl 5), and that the C = operator would automatically quote a preceding word, stringifying it (as in Perl 5). So the keys above are strings, albeit ones composed only of digits. Of course that doesn't actually help with your question, since there are other data structures of which the same could be asked. I think you're forgetting what language you're talking about. Those are numbers. After this statement: $x = '345'; C$x is a number. I should hope it would be treated as one during multimethod dispatch. However, I'm not saying this with authority. I'm just extrapolating. If it's not correct, I'd appreciate that someone who knows correct me. Luke
Re: The Sort Problem: a definitive ruling
Luke Palmer writes: After this statement: $x = '345'; C$x is a number. Oh. I'd been assuming that quote marks indicated strings, and that, while a string containing only digits could obviously be treated as a number (as in Perl 5), it wouldn't be one without being provoked. I should hope it would be treated as one during multimethod dispatch. What about: $x = '0345'; Is that a number? And if so which of these is the same as? $x = 345; $x = 0345; What about if the variable contains a line read from user input? As a programmer I'd expect that to be a string -- and if a user happens to type only digits then it'd be surprising to find the variable is considered to be of a different type. User input comes with a trailing line-break character, which would make it not a number. But would it suddenly become a number after Cchomping it? Or if the input stream was auto-chomping? Smylers
Re: The Sort Problem: a definitive ruling
At 2:49 PM -0700 2/20/04, Luke Palmer wrote: After this statement: $x = '345'; C$x is a number. No, it isn't. It's a string. Or, rather, it's a PerlScalar. I should hope it would be treated as one during multimethod dispatch. I should certainly hope *not*. If so, it's a bug. We ought to go add some tests to the test suite once we expose this bit of the engine. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: The Sort Problem: a definitive ruling
Smylers writes: Luke Palmer writes: After this statement: $x = '345'; C$x is a number. Oh. I'd been assuming that quote marks indicated strings, and that, while a string containing only digits could obviously be treated as a number (as in Perl 5), it wouldn't be one without being provoked. I should hope it would be treated as one during multimethod dispatch. What about: $x = '0345'; Is that a number? And if so which of these is the same as? $x = 345; $x = 0345; Well, since those are the same number, I imagine the, um, first? Don't forget that octal numbers look like 0o345. What about if the variable contains a line read from user input? As a programmer I'd expect that to be a string -- and if a user happens to type only digits then it'd be surprising to find the variable is considered to be of a different type. Yeah, that's a tough question. I'd want it to be a number if it were only digits, unless I wanted it to be a string. Since numbers and strings are polymorphic with one another, maybe it's wrong to think that we can differentiate. But Csort has to know when to use C = . Maybe you're right. In the presence of multimethod dispatch, it might be simpler just to tag something with either a num or str marker (I'm completely neglecting Parrot's implementation for this discussion), and treat it as its tag. That wouldn't change the behavior of adding two strings together, or concatenating two numbers, of course. Luke
Re: The Sort Problem: a definitive ruling
Uri wondered: DC No. C infix:= is the name of the binary C = operator. so how is that allowed there without a block? A Code object in a scalar context yields a Code reference. Damian
Re: The Sort Problem: a definitive ruling
Smylers wrote: sort {$_.key} (1= 'a', 10 = 'b', 2 ='c'); There is nothing in the signature of the key-extractor to suggest that all the keys are numbers, but as it turns out they all are. Are they? I'd been presuming that pair keys would always be strings Nope. and that the C = operator would automatically quote a preceding word, stringifying it (as in Perl 5). Yes. But numbers aren't words. C = will continue to autostringify *identifiers*, as in Perl 6, and those keys aren't identifiers. Of course, if you used the pairs to populate a hash, the *hash* will convert the non-stringific pair keys to stringific hash keys (unless the hash is defined to take non-string keys). Damian
Re: The Sort Problem: a definitive ruling
Luke wrote: I think you're forgetting what language you're talking about. Those are numbers. After this statement: $x = '345'; C$x is a number. I don't think so. C$x is, of course, a variable. And what it contains after that statement will depend on whether the variable is explicitly typed or not. If C$x is explicitly typed, the rvalue will have been converted to that type (if possible). If C$x is not explicitly typed (i.e. implicitly typed to CAny), then it will contain a string, since that's what the rvalue inherently is. I should hope it would be treated as one during multimethod dispatch. I would expect that the multiple dispatch mechanism would allow the CStr in C$x to be coerced to a CNum, if that's the parameter type it was seeking to match. And I would expect that the distance of that coercion would be 1. However, I'm not saying this with authority. I'm just extrapolating. If it's not correct, I'd appreciate that someone who knows correct me. Ditto. ;-) Damian
Re: The Sort Problem: a definitive ruling
Smylers wrote: Oh. I'd been assuming that quote marks indicated strings, and that, while a string containing only digits could obviously be treated as a number (as in Perl 5), it wouldn't be one without being provoked. Correct. What about: $x = '0345'; Is that a number? Nope. A string (unless C$X is otherwised typed). What about if the variable contains a line read from user input? As a programmer I'd expect that to be a string You'd be right (unless, of course, the variable's storage type forced a coercion during the assignment). Damian
Re: The Sort Problem
On Mon, Feb 16, 2004 at 01:04:01AM -0700, Luke Palmer wrote: : This is pattern matching more than it is type comparison. And Perl's : all about pattern matching. I'm just wondering whether it needs Itwo : pattern archetectures. I suspect it does, at least from the viewpoint of mere mortals. The regex engine is built around the notion of ordered rules, whereas the multi dispatch engine is designed to run rules in order of distance. I think we'll have done good enough if the regex engine can be used at runtime against a list of actual argument types. The regex engine is allowed to have deep, powerful nooks, after all. People are trained not to expect to understand patterns on first glance. But I a strong gut feeling that the dispatch engine needs to remain transparent to casual readers--or there will never be any casual readers of Perl 6. Of course, someone will doubtless attempt this unification, since a single use can turn Perl into another language. It's just not a language I would have bothered to learn 25 years ago... Larry
The Sort Problem: a definitive ruling
The design team discussed The Sort Problem during yesterday's teleconference. Here is Larry's decision: final, definitive, and unalterable (well...for this week at least ;-) -cut-cut-cut-cut-cut-cut Csort in Perl6 is a global multisub: multi sub *sort(Criterion @by: [EMAIL PROTECTED]) {...} multi sub *sort(Criterion $by: [EMAIL PROTECTED]) {...} multi sub *sort( : [EMAIL PROTECTED]) {...} where: type KeyExtractor ::= Code(Any) returns Any; type Comparator ::= Code(Any, Any) returns Int; type Criterion::= KeyExtractor | Comparator | Pair(KeyExtractor, Comparator) ; That means that we can call Csort without a block (to sort stringifically ascending with Ccmp): # Stringifically ascending... @sorted = sort @unsorted; or with a single two-argument block/closure (to sort by whatever the specified comparator is): # Numerically ascending... @sorted = sort {$^a = $^b} @unsorted; # Namewise stringifically descending case-insensitive... @sorted = sort {lc $^b.name cmp lc $^a.name} @unsorted; # or... @sorted = sort {$^b.name cmp $^a.name} is insensitive @unsorted; # or... @sorted = sort {$^a.name cmp $^b.name} is descending is insensitive @unsorted; # Modtimewise numerically ascending... @sorted = sort {-M $^a = -M $^b} @unsorted; # Fuzz-ifically... sub fuzzy_cmp($x, $y) returns Int; @sorted = sort fuzzy_cmp, @unsorted; or with a single one-argument block/closure (to sort according whatever the specified key extractor returns): # Numerically ascending... @sorted = sort {+ $^elem} @unsorted; @sorted = sort {+ $_} @unsorted; # Namewise stringifically descending case-insensitive... @sorted = sort {~ $^elem.name} is descending is insensitive @unsorted; @sorted = sort {lc $^elem.name} is descending @unsorted; @sorted = sort {lc .name} is descending @unsorted; # Modtimewise numerically ascending... @sorted = sort {-M} @unsorted; # Key-ifically... sub get_key($elem) {...} @sorted = sort get_key, @unsorted; or with a single extractor/comparator pair (to sort according to the extracted key, using the specified comparator): # Modtimewise stringifically descending... @sorted = sort {-M}={$^b cmp $^a} @unsorted; # Namewise fuzz-ifically... @sorted = sort {.name}=fuzzy_cmp @unsorted; or with an array of comparators and/or key extractors and/or extractor-comparator pairs (to sort according to a cascading list of criteria): # Numerically ascending # or else namewise stringifically descending case-insensitive # or else modtimewise numerically ascending # or else namewise fuzz-ifically # or else fuzz-ifically... @sorted = sort [ {+ $^elem}, {$^b.name cmp $^a.name} is insensitive, {-M}, {.name}=fuzzy_cmp, fuzzy_cmp, ], @unsorted; If a key-extractor block returns number, then C = is used to compare those keys. Otherwise Ccmp is used. In either case, the keys extracted by the block are cached within the call to Csort, to optimize subsequent comparisons against the same element. That is, a key-extractor block is only ever called once for each element being sorted. If a key-extractor/comparator pair is specified, the key-extractor is the key of the pair and the comparator the value. The extractor is used to retreive keys, which are then passed to the comparator. The Cis descending and Cis insensitive traits on a key extractor or a comparator are detected within the call to Csort (or possibly by the compiler) and used to modify the case-sensitivity and direction of any comparison operators used for the corresponding key or in the corresponding comparator. Note that ambiguous cases like: @sorted = sort {-M}, {-M}, {-M}; @sorted = sort {$^a = $^b}, {$^a = $^b}, {$^a = $^b}; @sorted = sort [...], [...], [...]; # etc. will be dispatched according to the normal multiple dispatch semantics (which will mean that they will mean): @sorted = sort {-M} == {-M}, {-M}; @sorted = sort {$^a = $^b} == {$^a = $^b}, {$^a = $^b}; @sorted = sort [...] == [...], [...]; # etc. and so one would need to write: @sorted = sort == {-M}, {-M}, {-M}; @sorted = sort == {$^a = $^b}, {$^a = $^b}, {$^a = $^b}; @sorted = sort == [...], [...], [...]; # etc. to get Ccmp comparison on all the arguments. -cut-cut-cut-cut-cut-cut Thanks to everyone who contributed to this discussion (especially Uri). As you see, the result is sort facility that is simultaneously much more powerful, much easier-to-use in the simple cases, has the potential
Re: The Sort Problem: a definitive ruling
Damian Conway [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] type KeyExtractor ::= Code(Any) returns Any; # Modtimewise numerically ascending... @sorted = sort {-M} @unsorted; One thing I've been trying to figure out reading this: what is the signature of prefix:-M ? i.e. how does it tell the outer block that it (the outer-block) needs a parameter? There seems to be some transitive magic going on here. Could similar magic be used to have infix:= require two higher-order variables (e.g. could sort { = } @unsorted be made to work?) Dave.
Re: The Sort Problem: a definitive ruling
Dave Whipp writes: Damian Conway [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] type KeyExtractor ::= Code(Any) returns Any; # Modtimewise numerically ascending... @sorted = sort {-M} @unsorted; One thing I've been trying to figure out reading this: what is the signature of prefix:-M ? Presumably something like: sub prefix:-M (?$file = $CALLER::_) {...} i.e. how does it tell the outer block that it (the outer-block) needs a parameter? Because it operates on $_. It tells it the same way: map { .name } @objects Does. Of course, this is going to be tough on the compiler, who will have to take the C= $CALLER::_ part into account. There seems to be some transitive magic going on here. Could similar magic be used to have infix:= require two higher-order variables (e.g. could sort { = } @unsorted be made to work?) No. Although you could do such a thing with: sort infix:=, @unsorted; Luke
Re: The Sort Problem: a definitive ruling
DC == Damian Conway [EMAIL PROTECTED] writes: DC Once again the Iron Designer rises to the supreme challenge of DC the Mailinglist Stadium and expresses the true spirit of Perl DC 6!!! and the challenge for next week is slicing squid with noodles! (or cutting down the mightiest tree in the forest with a herring) good job all. uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
Re: The Sort Problem: a definitive ruling
Dave Whipp wondered: @sorted = sort {-M} @unsorted; One thing I've been trying to figure out reading this: what is the signature of prefix:-M ? i.e. how does it tell the outer block that it (the outer-block) needs a parameter? It doesn't. As A6 explained: http://dev.perl.org/perl6/apocalypse/A06.html#Bare_subs any block that doesn't have placeholder-specified parameters but which refers (even implicitly) to $_ will automatically have the signature of ($_). That's why: @odd = grep { $_ % 2 } @nums; will still work in Perl 6. Since a bare C-M implicitly refers to $_, the surrounding block automagically gets a one-parameter signature and hence is (correctly!) interpreted as a key extractor. Don't you just love it when a plan^H^H^H^Hdesign comes together? ;-) There seems to be some transitive magic going on here. There is. Kinda. Just not the type of magic you thought. Could similar magic be used to have infix:= require two higher-order variables (e.g. could sort { = } @unsorted be made to work?) No. But this will work: sort infix:= @unsorted Damian
Re: The Sort Problem: a definitive ruling
DC == Damian Conway [EMAIL PROTECTED] writes: DC # Stringifically ascending... DC @sorted = sort @unsorted; DC or with a single two-argument block/closure (to sort by whatever the DC specified comparator is): DC # Numerically ascending... DC @sorted = sort {$^a = $^b} @unsorted; so because that has 2 placeholders, it is will match this signature: type Comparator ::= Code(Any, Any) returns Int; i have to remember that placeholders are really implied args to a code block and not just in the expression DC # Namewise stringifically descending case-insensitive... DC @sorted = sort {lc $^b.name cmp lc $^a.name} DC @unsorted; DC # or... DC @sorted = sort {$^b.name cmp $^a.name} is insensitive DC @unsorted; DC # or... DC @sorted = sort {$^a.name cmp $^b.name} is descending is insensitive DC @unsorted; TIMTOWTDI lives on! DC # Modtimewise numerically ascending... DC @sorted = sort {-M $^a = -M $^b} @unsorted; DC # Fuzz-ifically... DC sub fuzzy_cmp($x, $y) returns Int; DC @sorted = sort fuzzy_cmp, @unsorted; ok, so that is recognizes as a compare sub due to the 2 arg sig. so does the sub must be defined/declared before the sort code is compiled? DC or with a single one-argument block/closure (to sort according DC whatever the specified key extractor returns): DC # Numerically ascending... DC @sorted = sort {+ $^elem} @unsorted; DC @sorted = sort {+ $_} @unsorted; is $^elem special? or just a regular place holder? i see $_ will be set to each record as we discussed. DC # Namewise stringifically descending case-insensitive... DC @sorted = sort {~ $^elem.name} is descending is insensitive @unsorted; DC @sorted = sort {lc $^elem.name} is descending @unsorted; DC @sorted = sort {lc .name} is descending @unsorted; just getting my p6 chops back. .name is really $_.name so that makes sense. and $^elem is just a named placeholder for $_ as before? DC # Key-ifically... DC sub get_key($elem) {...} DC @sorted = sort get_key, @unsorted; and that is parsed as an extracter code call due to the single arg sig. again, it appears that it has to be seen before the sort code for that to work. DC or with a single extractor/comparator pair (to sort according to the DC extracted key, using the specified comparator): DC # Modtimewise stringifically descending... DC @sorted = sort {-M}={$^b cmp $^a} @unsorted; so that is a single pair of extractor/comparator. but there is no comma before @unsorted. is that correct? see below for why i ask that. DC # Namewise fuzz-ifically... DC @sorted = sort {.name}=fuzzy_cmp @unsorted; i first parsed that as being wrong and the {} should wrap the whole thing. so that is a pair again of extractor/comparator. DC or with an array of comparators and/or key extractors and/or DC extractor-comparator pairs (to sort according to a cascading list of DC criteria): DC # Numerically ascending DC # or else namewise stringifically descending case-insensitive DC # or else modtimewise numerically ascending DC # or else namewise fuzz-ifically DC # or else fuzz-ifically... DC @sorted = sort [ {+ $^elem}, DC {$^b.name cmp $^a.name} is insensitive, DC {-M}, DC {.name}=fuzzy_cmp, DC fuzzy_cmp, i see the need for commas in here as it is a list of criteria. DC ], but what about that comma? no other example seems to have one before the @unsorted stuff. DC @unsorted; DC If a key-extractor block returns number, then C = is used to DC compare those keys. Otherwise Ccmp is used. In either case, the keys DC extracted by the block are cached within the call to Csort, to DC optimize subsequent comparisons against the same element. That is, a DC key-extractor block is only ever called once for each element being DC sorted. where does the int optimizer come in? just as you had it before in the extractor code? that will need to be accessible to the optimizer if the GRT is to work correctly. i like that the key caching is defined here. we can implement it in several different ways depending on optimization hints and such. we could support the ST, GRT and orchish and select the best one for each sort. or we could have one basic sort and load the others as pragmas or modules. DC The Cis descending and Cis insensitive traits on a key extractor DC or a comparator are detected within the call to Csort (or possibly DC by the compiler) and used to modify the case-sensitivity and DC direction of any comparison operators used for the corresponding key DC or in the corresponding comparator. or by reversing the order of the
Re: The Sort Problem: a definitive ruling
Uri Guttman writes: DC == Damian Conway [EMAIL PROTECTED] writes: DC # Modtimewise numerically ascending... DC @sorted = sort {-M $^a = -M $^b} @unsorted; DC # Fuzz-ifically... DC sub fuzzy_cmp($x, $y) returns Int; DC @sorted = sort fuzzy_cmp, @unsorted; ok, so that is recognizes as a compare sub due to the 2 arg sig. so does the sub must be defined/declared before the sort code is compiled? Nope. Csort is declared as a multimethod. This works, too: $code = sub ($a, $b) { -M $a = -M $b }; @sorted = sort $code, @unsorted; DC or with a single one-argument block/closure (to sort according DC whatever the specified key extractor returns): DC # Numerically ascending... DC @sorted = sort {+ $^elem} @unsorted; DC @sorted = sort {+ $_} @unsorted; is $^elem special? or just a regular place holder? i see $_ will be set to each record as we discussed. Those two statements are exactly the same in every way. Well, except how they're writted. $^elem is indeed a regular placeholder. $_ becomes an implicit parameter when it is referred to, in the absence of placeholders or another type of signature. DC # Key-ifically... DC sub get_key($elem) {...} DC @sorted = sort get_key, @unsorted; and that is parsed as an extracter code call due to the single arg sig. again, it appears that it has to be seen before the sort code for that to work. Nope. Runtime dispatch as before. DC or with a single extractor/comparator pair (to sort according to the DC extracted key, using the specified comparator): DC # Modtimewise stringifically descending... DC @sorted = sort {-M}={$^b cmp $^a} @unsorted; so that is a single pair of extractor/comparator. but there is no comma before @unsorted. is that correct? see below for why i ask that. Yes. Commas may be ommitted on either side of a block when used as an argument. I would argue that they only be omitted on the right side, so that this is unambiguous: if some_function { ... } { ... } Which might be parsed as either: if (some_function { ... }) { ... } Or: if (some_function()) {...} {...} # Bare block DC or with an array of comparators and/or key extractors and/or DC extractor-comparator pairs (to sort according to a cascading list of DC criteria): DC # Numerically ascending DC # or else namewise stringifically descending case-insensitive DC # or else modtimewise numerically ascending DC # or else namewise fuzz-ifically DC # or else fuzz-ifically... DC @sorted = sort [ {+ $^elem}, DC {$^b.name cmp $^a.name} is insensitive, DC {-M}, DC {.name}=fuzzy_cmp, DC fuzzy_cmp, i see the need for commas in here as it is a list of criteria. DC ], but what about that comma? no other example seems to have one before the @unsorted stuff. It's not a closure, so you need a comma. DC @unsorted; DC If a key-extractor block returns number, then C = is used to DC compare those keys. Otherwise Ccmp is used. In either case, the keys DC extracted by the block are cached within the call to Csort, to DC optimize subsequent comparisons against the same element. That is, a DC key-extractor block is only ever called once for each element being DC sorted. where does the int optimizer come in? just as you had it before in the extractor code? that will need to be accessible to the optimizer if the GRT is to work correctly. If the block provably returns an int, Csort might be able to optimize for ints. Several ways to provably return an int: my $extractor = an int sub($arg) { $arg.num } @sorted = sort $extractor, @unsorted; Or with a smarter compiler: @sorted = sort { int .num } @unsorted; Or Csort might even check whether all the return values are ints and then optimize that way. No guarantees: it's not a language-level issue. i like that the key caching is defined here. Yeah. This is a language-level issue, as the blocks might have side-effects. DC Note that ambiguous cases like: DC @sorted = sort {-M}, {-M}, {-M}; DC will be dispatched according to the normal multiple dispatch semantics DC (which will mean that they will mean): DC @sorted = sort {-M} == {-M}, {-M}; DC and so one would need to write: DC @sorted = sort == {-M}, {-M}, {-M}; that clears up that one for me. this is very good overall (notwithstanding my few nits and questions). it will satisfy all sorts of sort users, even those who are out of sorts. Agreed. I'm very fond of it.. Luke
Re: The Sort Problem: a definitive ruling
Luke Palmer writes: Yes. Commas may be ommitted on either side of a block when used as an argument. I would argue that they only be omitted on the right side, so that this is unambiguous: if some_function { ... } { ... } Which might be parsed as either: if (some_function { ... }) { ... } Or: if (some_function()) {...} {...} # Bare block Silly me. That doesn't solve anything. I don't know why I thought it did. I still think that this looks weird: foo $bar { ... } $baz; But that's just preference. Luke
Re: The Sort Problem: a definitive ruling
- Original Message - From: Damian Conway [EMAIL PROTECTED] To: Perl 6 Language [EMAIL PROTECTED] Sent: Thursday, February 19, 2004 8:29 PM Subject: [perl] The Sort Problem: a definitive ruling Csort in Perl6 is a global multisub: multi sub *sort(Criterion @by: [EMAIL PROTECTED]) {...} multi sub *sort(Criterion $by: [EMAIL PROTECTED]) {...} multi sub *sort( : [EMAIL PROTECTED]) {...} where: type KeyExtractor ::= Code(Any) returns Any; type Comparator ::= Code(Any, Any) returns Int; type Criterion::= KeyExtractor | Comparator | Pair(KeyExtractor, Comparator) ; snip If a key-extractor block returns number, then C = is used to compare those keys. Otherwise Ccmp is used. In either case, the keys extracted by the block are cached within the call to Csort, to optimize subsequent comparisons against the same element. That is, a key-extractor block is only ever called once for each element being sorted. How do you decide whether a key-extractor block returns number? Do you look at the signature, or do you simply evaluate the result of the key-extractor for each element in the unsorted list? For example, what is the result of the following code? sort {$_.key} (1= 'a', 10 = 'b', 2 ='c'); There is nothing in the signature of the key-extractor to suggest that all the keys are numbers, but as it turns out they all are. Will the sort end up being numerical or alphabetic? Joe Gottman
Re: The Sort Problem: a definitive ruling
Uri checked: DC @sorted = sort {$^a = $^b} @unsorted; so because that has 2 placeholders, it is will match this signature: type Comparator ::= Code(Any, Any) returns Int; Correct. i have to remember that placeholders are really implied args to a code block and not just in the expression Indeed. DC sub fuzzy_cmp($x, $y) returns Int; DC @sorted = sort fuzzy_cmp, @unsorted; ok, so that is recognizes as a compare sub due to the 2 arg sig. so does the sub must be defined/declared before the sort code is compiled? Yes. And, yes, declaration is sufficient. DC @sorted = sort {+ $^elem} @unsorted; DC @sorted = sort {+ $_} @unsorted; is $^elem special? or just a regular place holder? Regular. just getting my p6 chops back. .name is really $_.name so that makes sense. and $^elem is just a named placeholder for $_ as before? Yes and yes. DC @sorted = sort get_key, @unsorted; and that is parsed as an extracter code call due to the single arg sig. Yes. again, it appears that it has to be seen before the sort code for that to work. Correct. DC or with a single extractor/comparator pair (to sort according to the DC extracted key, using the specified comparator): DC # Modtimewise stringifically descending... DC @sorted = sort {-M}={$^b cmp $^a} @unsorted; so that is a single pair of extractor/comparator. but there is no comma before @unsorted. Typo. I'm pretty sure it would actually need a comma there. DC @sorted = sort {.name}=fuzzy_cmp @unsorted; Comma required there too. :-( DC ], but what about that comma? It's required. no other example seems to have one before the @unsorted stuff. The comma exception only applies to simple blocks. I messed up those two examples. :-( where does the int optimizer come in? When the key extractor is known to return an Int. Which would occur either when it's explicitly declared to do that, or when the compiler can intuit a block's return type from the type of value returned by the block (i.e. if the block always returns the result of a call to Cint). just as you had it before in the extractor code? Yup. so are those traits are only allowed/meaningful on comparison blocks? or will an extraction block take them Both. That's why I wrote: The Cis descending and Cis insensitive traits on a key extractor or a comparator... ^ you have examples which show the traits on either the extractor or comparator code blocks. that implies that the guts can get those flags from either and use them as needed. Yep. Inside the body of Csort you'd access them as: $by.trait{descending} $by.trait{insensitive} (unless Larry's changed the trait accessor syntax since last I looked). Damian
Re: The Sort Problem: a definitive ruling
DC == Damian Conway [EMAIL PROTECTED] writes: DC No. But this will work: DC sort infix:= @unsorted my brane hertz!! so that declares (creates?) an infix op as a code block? and since = is known to take 2 args it is parsed (or multidispatched) as a comparator block for sort? amazing how you and luke both came up with the exact same answer. p6 syntax like that is killing me slowly! uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
Re: The Sort Problem: a definitive ruling
JG == Joe Gottman [EMAIL PROTECTED] writes: JGHow do you decide whether a key-extractor block returns number? Do you JG look at the signature, or do you simply evaluate the result of the JG key-extractor for each element in the unsorted list? For example, what is JG the result of the following code? JG sort {$_.key} (1= 'a', 10 = 'b', 2 ='c'); JGThere is nothing in the signature of the key-extractor to suggest that JG all the keys are numbers, but as it turns out they all are. Will the sort JG end up being numerical or alphabetic? my take is that either = or cmp (my pref) would be the default comparator. if you want to force one, you need to use prefix ~ or + or int. oh, another reason to make cmp the default is BACKWARDS COMPATIBILITY! we have support now for the old simple @sorted = sort @unsorted syntax so the cmp should still be the default. hey, i am remembering p6 syntax now! but give me a week and i will forget it again :) uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
Re: The Sort Problem: a definitive ruling
Uri bemoaned: DC sort infix:= @unsorted my brane hertz!! so that declares (creates?) an infix op as a code block? No. C infix:= is the name of the binary C = operator. amazing how you and luke both came up with the exact same answer. Great minds... etc. ;-) p6 syntax like that is killing me slowly! No, it's gradually making your *stronger*. ;-) Damian
Re: The Sort Problem: a definitive ruling
On Fri, Feb 20, 2004 at 02:47:55PM +1100, Damian Conway wrote: : Yep. Inside the body of Csort you'd access them as: : : $by.trait{descending} : $by.trait{insensitive} : : (unless Larry's changed the trait accessor syntax since last I looked). Well, if traits are just compile-time properties, and properties are just mixed-in roles (usually enums), then it's more likely that something like: $by.Direction == descending $by.Case == insensitive would be the incantation. Or maybe if enums auto-booleanize, then you could say $by.Direction::descending $by.Case::insensitive And then $by.descending $by.insensitive might be allowed as abbreviations when unambiguous. Or maybe we require matching: $by ~~ descending $by ~~ insensitive But there is no such thing as a true property or false property. There's a Boolean role that can have the value true or false. Traits are mixed in to declared objects at compile time, and can do weird things to such objects at mixin time. Likewise there's no such thing as a descending property. There's a Direction property which defaults to ascending. And a Case property that defaults to sensitive. To do otherwise is to set ourselves up for objects that can be both true and false simultaneously. Only junctions should be allowed to do that... Larry
Re: The Sort Problem: a definitive ruling
DC == Damian Conway [EMAIL PROTECTED] writes: DC Uri bemoaned: cause you agonize me head! DC sort infix:= @unsorted my brane hertz!! so that declares (creates?) an infix op as a code block? DC No. C infix:= is the name of the binary C = operator. so how is that allowed there without a block? is it because it is the name is in a style of a sub? that makes sense to me but i want to make sure i get it. and now back to my advil addiction. :) uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
Re: The Sort Problem
Damian Conway writes: type KeyExtractor ::= Code(Any) returns Any; type Comparator ::= Code(Any, Any) returns Int; type Criterion::= KeyExtractor | Comparator Pair(KeyExtractor, Comparator) ; type Criteria ::= Criterion | Array of Criterion ; multi sub *sort(Criteria $by = {$^a cmp $^b}: [EMAIL PROTECTED]) {...} This is neat stuff. But in order for this to work, we're relying on a Ivery sophisticated type system. Reminiscent of C++ templates, actually. And while I love C++ templates for academic challenges, they don't seem quite what Perl is looking for. This is pattern matching more than it is type comparison. And Perl's all about pattern matching. I'm just wondering whether it needs Itwo pattern archetectures. If things like this are going to be happening (and thanks, Damian, for demonstrating exactly how powerful this can be), I'd like types to at least look more like regexes. Implementation difficulty has not been a concern of Perl The Language's since the beginning. Since I'm usually not good at these philosophical posts, I'll show some examples. multi sub infix:equals ( (Any) $x, ($1) $y ) { $x eq $y } multi sub infix:equals ( Any $x, Any $y ) { undef } Might be a way of nicely comparing apples to apples and rejecting everything else as unequal. I don't really like the delcaration order dependency, though. Perhaps: multi sub infix:equals ({ $x := (Any), $y := ($1) { $x eq $y } | $x := (Any), $y := (Any) { undef } }); Which is Ireally starting to look like rules (which is what I wanted). My summarizing thesis is: if the type system is going to be this rich a sublanguage, why not make it a formal one? [1] Luke [1] Keep in mind that I'm fine with it Inot becoming a sublanguage, and relying on user-written code to do this kind of fancy matching. I'm just saying if we want to make it a rich and powerful pattern matching system, it ought to look like patterns.
Re: The Sort Problem
Uri Guttman writes: LP == Luke Palmer [EMAIL PROTECTED] writes: LP Uri Guttman writes: because that would be the default comparison and the extracted key value would be stringified unless some other marker is used. most sorts are on strings so this would be a useful huffman and removal of a redundancy. LP While I like where most of this is going, I beg to differ on this point. LP I end up sorting numbers more often then strings, except when I'm LP sorting the keys of a hash. I'm always annoyed in my one liners when I LP have to write out: LP sort { $a = $b } ... LP And I'd be thrilled if it were as easy as: LP sort { +$_ } ... oh, i don't want to see = anywhere in sort code. i don't mind + or int as markers for that. you seem to actually be agreeing with me that strings are the default key type and + is needed for a numeric sort. but do you do integer or float sorts? and as i said, i don't like the asymmetry of int vs +. Usually integer. But specifying integer is usually an optimization guideline, whereas there is a much greater semantic difference between sorting strings and floats. Cint deserves to be longer as just an optimization guideline. And yes, I agree that string should be default. Luke
Re: The Sort Problem
Damian Conway wrote: Uri persisted: but int is needed otherwise? int is more commonly a sort key than float. it just feels asymetrical with one having a symbol and the other a named operator. Sorry, but that's the way the language works. The more general and usual conversion (to number, not to integer) has the shorter name. For the record, I do a lot of statistical work. On the sorts where I care about speed, I'm using floats far more often than ints. Uri's usage obviously varies from mine. Let's not hack up the language to make sort more efficient for some arguable benefit. I would entertain, however, Cfloat and Cstring comparators (and functions in core) that pretty much (if not precisely) alias C+ and C~, forcing said type. DC If you don't explicitly cast either way, Csort just DWIMs by DC looking at the actual type of the keys returned by the extractor. DC If any of them isn't a number, it defaults to Ccmp. that doesn't work for me. the GRT can't scan all keys before it decides on what comparison operator is needed. the GRT needs to merge keys as they are extracted and it needs to do the correct conversion to a byte string then and not later. you can't DWIM this away if you want the optimization. EXACTLY!!! So, if you want the GRT to kick in, you have to make sure the return type of your block is appropriate. If you don't give the compiler that hint, you don't get the optimized sorting. When fed a number that it doesn't trust to be an int or a float, couldn't the GRT store the number with Cpack 'Nd', $_, $_? (ok, there's some prepping for sign and endian, but you get the point) Seems like a not-terrible way to handle the issue. Here's another example (all praise be to Rod!): @teams = sort [ # 1 parameter so it's a key extractor... {+ $^team-{WinPct} }, # 2 parameters so it's a comparator... { ($^a-{Conference} eq $^b-{Conference} ? ($^b-{ConfWinPct} = $^a-{ConfWinPct} || $^b-{ConfWon}= $^a-{ConfWon}|| $^a-{ConfLost} = $^b-{ConfLost}) : ($^a-{Division} eq $^b-{Division} ? ($^b-{DivWinPct} = $^a-{DivWinPct} || $^b-{DivWon}= $^a-{DivWon}|| $^a-{DivLost} = $^b-{DivLost}) : 0 ) }, # 1 parameter each so all three are key extractors... {+ $^team-{Won} } is descending, {+ $^team-{Lost} }, {+ $^team-{GamesPlayed} } is descending, ] @teams; Now that's just spiffy. Except in moving from my P5 version to your P6 version, you have to s/?/??/ and s/:/::/, IIRC. DC But you *can't* apply Cis descending to a Code reference. then how did you apply 'is insensitive'? I applied it to the *block* (i.e. a closure definition). That's not the same thing as a Code reference. what i am saying is i think that you need to go back to the drawing board to find a clean syntax to mark those flags. No, I think we have it...just put the appropriate trait on the extractor *block* when you define it. I really don't see the problem with @out = sort {lc $_}, @in; It's short, simple, unambiguous, IMO very clean. But if people want to take a tool that we're creating anyways elsewhere in the language (traits) to provide another way to do it, that's fine too. i just don't like the reverse args concept at all. it is not semantically useful to sorting. sorts care about ascending and descending, not a vs b in args. The problem is that sometimes comparator logic is suffiently complex that a single comparator needs to have a bit each way, as in Rod's football team comparator above. What my football example was meant to show is that no matter how much we abuse poor Csort in the name of progress, we need to have a way to drop it all and go back to how we did it in P5, for those truly pathological cases where nothing else works. If people don't trust this, I'll come up with something worse. As a side note, the reason I couldn't find the original sort in question is that I later scraped it in favor of a more robust 100 line sub to figure out who ranked where. But in more common case, how much it break things to have type Comparator ::= Code(Any, Any) returns Int; become type Comparator ::= Code(Any, Any) returns Int | '!' Code(Any, Any) returns Int; where the '!' sets the reverse/descending trait? So we could get things like: @out = sort {!~ $_} @in; @out = sort {!+ -M} @in; @out = sort {! complex} @in; Overall, Damian's proposal looks very good. Yeah Damian! -- Rod Adams PS -- Only pure and utter insanity would lead someone to not make string the default sort comparator, and I don't think I've heard anyone here mention
Re: The Sort Problem
Here's a proposed syntax and semantics for Csort that tries to preserve the (excellent) features of Uri's on the right track proposal whilst integrating it into the Perl 6 design without multiplying entities (especially colons!) unnecessarily. Suppose Csort's signature is: type KeyExtractor ::= Code(Any) returns Any; type Comparator ::= Code(Any, Any) returns Int; type Criterion::= KeyExtractor | Comparator | Pair(KeyExtractor, Comparator) ; type Criteria ::= Criterion | Array of Criterion ; multi sub *sort(Criteria ?$by = {$^a cmp $^b}, [EMAIL PROTECTED]) {...} In other words, Csort takes as its (optional) first argument either a single sort criterion, or an array of criteria. Each of those sort criteria may be a block/closure, or a pair of block/closures. Each block/closure may take either one argument (in which case it's a key extractor) or two arguments (in which case it's a comparator). If a key-extractor block returns number, then C = is used to compare those keys. Otherwise Ccmp is used. In either case, the returned keys are cached to optimize subsequent comparisons against the same element. If a key-extractor/comparator pair is specified, the key-extractor is the key of the pair and the comparator the value. The extractor is used to retreive (and cache) keys, which are then passed to the comparator. Which means that (a slightly extended version of) Uri's proposed: @out = sort key( %lookup{ .{remotekey} } ), #1 key:insensitive:descending:locale( 'somewhere' )( .{priority} ), #2 key:float ( substr( 0, 10 ) ), #3 key:integer ( /foo(\d+)bar/ ), #4 key:compare( { ^$a = ^$b } )( /(\d+)$/ ), #5 key:compare( \my_compare_sub ) ( /(\d+)$/ ),#6 key:compare( { just_guess $^b, $^a } ), #7 @in; would become: @out = sort [ { ~ %lookup{ .{remotekey} } }, #1 { use locale 'somewhere'; lc .{priority} } is descending, #2 { + substr( 0, 10 ) }, #3 { int /foo(\d+)bar/ }, #4 { + m/(\d+)$/.[1] }, #5 { /(\d+)$/ } = my_compare_sub, #6 { just_guess $^b, $^a }, #7 ], @in; So to specify a key extractor we provide a block with one argument, as in examples #1 through #5. Then to specify that the key is compared with Ccmp we return a string (using unary C~ to be explicit about it) as in #1. To specify that the key is compared with C = we return a number (using unary C+ to be explicit about it) as in #3 and #5. If we think that Csort might be able to optimize integer comparisons, we can explicitly return an integer, as in #4. If we want locale-sensitive sorting, we specify that with the appropriate Cuse locale statement, as in #2. To specify a comparator, we provide a block with two arguments, as in #7. That block is always expected to return an integer. To specify a key-extractor *and* an associated comparator, we bind them together in a Pair, as in #6. The only tricky bits are how to specify a key extractor for keys that are to be sorted in descending order and/or case-insensitively. The case insensitivity could be handled by simple Clc'ing, as in #2. Descending sort order could be handled by a trait on the block/closure, as in #2. Alternatively, case-insensitivity could be specified by trait as well. Note that simple sorts would be unchanged: @sorted = sort @unsorted; @sorted = sort {$^b = $^a} @unsorted; But now one could also very neatly code cases that formerly required an Orcish or Transformed sort. For example, these: @sorted = sort {(%M{$^a}//-M $^a) = (%M{$^b}//-M $^b)} @unsorted; @sorted = map $_[1], sort {$^a[0] = $^b[0]}, map [-M,$_], @unsorted; would both become: @sorted = sort {-M} @unsorted; Damian
Re: The Sort Problem
- Original Message - From: Damian Conway [EMAIL PROTECTED] To: Larry Wall [EMAIL PROTECTED] Cc: Perl 6 Language [EMAIL PROTECTED] Sent: Sunday, February 15, 2004 5:59 PM Subject: [perl] Re: The Sort Problem Here's a proposed syntax and semantics for Csort that tries to preserve the (excellent) features of Uri's on the right track proposal whilst integrating it into the Perl 6 design without multiplying entities (especially colons!) unnecessarily. Suppose Csort's signature is: type KeyExtractor ::= Code(Any) returns Any; type Comparator ::= Code(Any, Any) returns Int; type Criterion::= KeyExtractor | Comparator | Pair(KeyExtractor, Comparator) ; type Criteria ::= Criterion | Array of Criterion ; multi sub *sort(Criteria ?$by = {$^a cmp $^b}, [EMAIL PROTECTED]) {...} If we use this signature, won't the code sort ('foo', 'bar', 'glarch'); attempt to use the first parameter as a Criteria? Since sort has to be a multi sub anyhow, why don't we do multi sub *sort(Criteria $by : [EMAIL PROTECTED]) {...} multi sub *sort( : [EMAIL PROTECTED]) { ...} # returns sort {$^a cmp $^b} @data Joe Gottman
Re: The Sort Problem
Joe Gottman asked: If we use this signature, won't the code sort ('foo', 'bar', 'glarch'); attempt to use the first parameter as a Criteria? No. Because a string like 'foo' wouldn't match the first parameter's type. Since sort has to be a multi sub anyhow, why don't we do multi sub *sort(Criteria $by : [EMAIL PROTECTED]) {...} multi sub *sort( : [EMAIL PROTECTED]) { ...} # returns sort {$^a cmp $^b} @data We probably should (from an implementation efficiency standpoint if nothing else). Damian
Re: The Sort Problem
DC == Damian Conway [EMAIL PROTECTED] writes: DC Suppose Csort's signature is: DC type KeyExtractor ::= Code(Any) returns Any; DC type Comparator ::= Code(Any, Any) returns Int; DC type Criterion::= KeyExtractor DC | Comparator DC | Pair(KeyExtractor, Comparator) DC ; DC type Criteria ::= Criterion DC | Array of Criterion DC ; DC multi sub *sort(Criteria ?$by = {$^a cmp $^b}, [EMAIL PROTECTED]) {...} DC Each of those sort criteria may be a block/closure, or a pair of DC block/closures. Each block/closure may take either one argument (in DC which case it's a key extractor) or two arguments (in which case it's DC a comparator). DC If a key-extractor block returns number, then C = is used to DC compare those keys. Otherwise Ccmp is used. In either case, the DC returned keys are cached to optimize subsequent comparisons against DC the same element. i would make cmp the default as it is now. you sort strings much more often than you sort numbers. DC If a key-extractor/comparator pair is specified, the key-extractor is DC the key of the pair and the comparator the value. The extractor is DC used to retreive (and cache) keys, which are then passed to the DC comparator. DC Which means that (a slightly extended version of) Uri's proposed: DC @out = sort DCkey( %lookup{ .{remotekey} } ), #1 DCkey:insensitive:descending:locale( 'somewhere' )( .{priority} ), #2 DCkey:float ( substr( 0, 10 ) ), #3 DCkey:integer ( /foo(\d+)bar/ ), #4 DCkey:compare( { ^$a = ^$b } )( /(\d+)$/ ), #5 DCkey:compare( \my_compare_sub ) ( /(\d+)$/ ),#6 DCkey:compare( { just_guess $^b, $^a } ), #7 DC @in; DC would become: DC @out = sort DC[ { ~ %lookup{ .{remotekey} } }, #1 if string cmp is the default, wouldn't that ~ be redundant? and i keep forgetting that ~ is now the stringify operator :) DC { use locale 'somewhere'; lc .{priority} } is descending, #2 hmmm, this needs more work IMO. requiring the coder to lc the key is moving the case insensitivity feature back into code. wouldn't 'is insensitive' be ok? how does the internal guts get the descending/insensitive flags/traits passed to it? i know it is an internals problem but i am just pondering it. DC { + substr( 0, 10 ) }, #3 DC { int /foo(\d+)bar/ }, #4 i would also expect int to be a default over float as it will be used more often. + is needed there since the regex returns a string. in the #3 case that would be an int as well. so we need a 'float' cast thingy. BTW, the only way to get a number as a key is from a structure where the field was assigned as a number/int. that may not happen a lot so the int/float cast will probably be needed here to sort correctly DC { + m/(\d+)$/.[1] }, #5 DC { /(\d+)$/ } = my_compare_sub, #6 missing a close } it seems. DC { just_guess $^b, $^a }, #7 is that a reverse order sort? why not skip the args and do this: { just_guess is descending }, #7 DC ], so the first arg to sort is either a single compare block or a anon list of them? i figure we need the [] to separate the criteria from the input data. but what about this odd case, sort [...], [...], [...] now that is stupid code but it could be trying to sort the refs by their address in string mode. or it could be a sort criteria list followed by 2 refs to input records. DC @in; DC So to specify a key extractor we provide a block with one argument, as DC in examples #1 through #5. Then to specify that the key is compared DC with Ccmp we return a string (using unary C~ to be explicit about DC it) as in #1. To specify that the key is compared with C = we DC return a number (using unary C+ to be explicit about it) as in #3 DC and #5. If we think that Csort might be able to optimize integer DC comparisons, we can explicitly return an integer, as in #4. If we want DC locale-sensitive sorting, we specify that with the appropriate Cuse locale statement, as in #2. DC To specify a comparator, we provide a block with two arguments, as in DC #7. That block is always expected to return an integer. so #7 is a call to just_guess which is passed the 2 args to compare. it must return an
Re: The Sort Problem
Uri wrote: DC If a key-extractor block returns number, then C = is used to DC compare those keys. Otherwise Ccmp is used. In either case, the DC returned keys are cached to optimize subsequent comparisons against DC the same element. i would make cmp the default as it is now. Err. That's kinda what Otherwise Ccmp is used means. ;-) DC @out = sort DC[ { ~ %lookup{ .{remotekey} } }, #1 if string cmp is the default, wouldn't that ~ be redundant? How do you know that the values of %lookup are strings? How would the optimizer know? DC { + substr( 0, 10 ) }, #3 DC { int /foo(\d+)bar/ }, #4 i would also expect int to be a default over float as it will be used more often. + is needed there since the regex returns a string. in the #3 case that would be an int as well. so we need a 'float' cast thingy. Unary C+ *is* the float cast thingy! BTW, the only way to get a number as a key is from a structure where the field was assigned as a number/int. that may not happen a lot so the int/float cast will probably be needed here to sort correctly That's the point. If you want to force numeric comparison of keys you explicitly cast each key to number using unary C+ or Cint. If you want to force stringific comparison you explicitly cast the key to string using unary C~. If you don't explicitly cast either way, Csort just DWIMs by looking at the actual type of the keys returned by the extractor. If any of them isn't a number, it defaults to Ccmp. DC { + m/(\d+)$/.[1] }, #5 DC { /(\d+)$/ } = my_compare_sub, #6 missing a close } it seems. Yup. Thanks. DC { just_guess $^b, $^a }, #7 is that a reverse order sort? why not skip the args and do this: { just_guess is descending }, #7 Because I wanted to show a plain old two-parameter block being used as a *comparator* (not a one-parameter block being used as a key extractor). so the first arg to sort is either a single compare block or a anon list of them? i figure we need the [] to separate the criteria from the input data. Yep. but what about this odd case, sort [...], [...], [...] now that is stupid code but it could be trying to sort the refs by their address in string mode. In which case we probably should have written it: sort == [...], [...], [...] or it could be a sort criteria list followed by 2 refs to input records. Only if the first array ref contains nothing but Criterion objects. DC To specify a comparator, we provide a block with two arguments, as in DC #7. That block is always expected to return an integer. so #7 is a call to just_guess which is passed the 2 args to compare. it must return an int like cmp/=. Yep. as i pointed out above, i don't see why you even need to show the ^$a and ^$b args? So the block knows it has two parameters. they will be passed into just_guess that way. let is descending handle the sort ordering. But you *can't* apply Cis descending to a Code reference. Nor are we sure that the order *is* descending. Maybe the Cjust_guess predicate is invariant to argument order and there were other reasons to pass the args in that order. Or maybe we reversed the order because we know that in Cjust_guess never returns zero, but defaults to its second argument being smaller, in which case we needed to reverse the args so that the Csort remained stable. The point is that I wanted to show a vanilla two-parameter compare block. (And, boy, am I ever sorry I whimsically reversed the args to indicate generality ;-) DC @sorted = sort {(%M{$^a}//-M $^a) = (%M{$^b}//-M $^b)} @unsorted; wow, that is UGLY! but i get it after a few hours of study. :) just the orcish maneuver but with //. i think you also mean //= there. Yup. Should indeed be //= DC @sorted = sort {-M} @unsorted; that still wants to be cached somehow as -M is expensive. It *is* cached. It's a one-parameter block. So its a key extractor. So it automagically caches the keys it extracts. so -M there is a simple key extraction on the files in @unsorted. Yup. assuming no internal caching Key extractors will always cache. @sorted = sort {%M{$_} //= -M} @unsorted; i assume //= will be optimized and -M won't be called if it is cached. also where does %M get declared and/or cleared before this? Exactly the problem. That's why key extractors aways cache. can it be done in the block: @sorted = sort {my %M ; %M{$_} //= -M} @unsorted; If you'd gone insane and particularly wanted to do it that way, you'd need something like: @sorted = sort {state %M ; %M{$_} //= -M} @unsorted; to ensure the cache persisted between calls to the key extractor. another -M problem is that it
Re: The Sort Problem
DC == Damian Conway [EMAIL PROTECTED] writes: DC Uri wrote: DC @out = sort DC [ { ~ %lookup{ .{remotekey} } }, #1 if string cmp is the default, wouldn't that ~ be redundant? DC How do you know that the values of %lookup are strings? DC How would the optimizer know? because that would be the default comparison and the extracted key value would be stringified unless some other marker is used. most sorts are on strings so this would be a useful huffman and removal of a redundancy. DC { + substr( 0, 10 ) }, #3 DC { int /foo(\d+)bar/ }, #4 i would also expect int to be a default over float as it will be used more often. + is needed there since the regex returns a string. in the #3 case that would be an int as well. so we need a 'float' cast thingy. DC Unary C+ *is* the float cast thingy! hmm, so + is float but int is needed otherwise? int is more commonly a sort key than float. it just feels asymetrical with one having a symbol and the other a named operator. DC If you want to force numeric comparison of keys you explicitly cast DC each key to number using unary C+ or Cint. If you want to force DC stringific comparison you explicitly cast the key to string using DC unary C~. or ~ is the default if nothing else is specified. it matches up with cmp being the default comparison as you agreed above. DC If you don't explicitly cast either way, Csort just DWIMs by looking DC at the actual type of the keys returned by the extractor. If any of DC them isn't a number, it defaults to Ccmp. that doesn't work for me. the GRT can't scan all keys before it decides on what comparison operator is needed. the GRT needs to merge keys as they are extracted and it needs to do the correct conversion to a byte string then and not later. you can't DWIM this away if you want the optimization. the ST can get away with it since you are still using a compare block even if it is generated internally by the sort function. DC { just_guess $^b, $^a }, #7 is that a reverse order sort? why not skip the args and do this: { just_guess is descending }, #7 DC Because I wanted to show a plain old two-parameter block being used as DC a *comparator* (not a one-parameter block being used as a key DC extractor). that seems like extra redundant code just to mark a callback vs a extract block. there should be a cleaner way to do that. maybe a null extract key in the pair? { '' = just_guess } { undef = just_guess }# will = autoquote undef there? then the records are full keys with no special extraction but there is a callback comparator. no need to declare any arguments to the callback sub here since you know it is that and not key extract code. but what about this odd case, sort [...], [...], [...] now that is stupid code but it could be trying to sort the refs by their address in string mode. DC In which case we probably should have written it: DC sort == [...], [...], [...] i did ask in another post whether == or == would fit in here. so that line forces it all to be data and my silly example has the first anon list of criteria and the rest as data. works for me so far. or it could be a sort criteria list followed by 2 refs to input records. DC Only if the first array ref contains nothing but Criterion objects. but what defines them as criteria objects? they look just like records which could be sorted as well. i don't see any syntactical markers that {} means criteria block vs a hash ref. DWIM guessing isn't good enough for me here. sort in p5 already had some issues with that IIRC. as i pointed out above, i don't see why you even need to show the ^$a and ^$b args? DC So the block knows it has two parameters. but the callback sub knows it has two params since it has to be written that way. sort always calls the code block with 2 params. they will be passed into just_guess that way. let is descending handle the sort ordering. DC But you *can't* apply Cis descending to a Code reference. then how did you apply 'is insensitive'? aside from how it is done (traits and such), we need a syntax that conveys the semantics of insensitive and descending into the sort func. it will use that info to reverse the key order to comparison subs or modify the key merging of the GRT to effect those flags. what i am saying is i think that you need to go back to the drawing board to find a clean syntax to mark those flags. note that neither the code block nor the callback needs to see them, only the sort guts ever needs to see them. so we are communicating info to the sort about this key. the code block/callback only ever sees two arguments to compare and nothing else. maybe this will clarify for you the intentions of those flags and why they have nothing to do with
Re: The Sort Problem
Uri Guttman writes: because that would be the default comparison and the extracted key value would be stringified unless some other marker is used. most sorts are on strings so this would be a useful huffman and removal of a redundancy. While I like where most of this is going, I beg to differ on this point. I end up sorting numbers more often then strings, except when I'm sorting the keys of a hash. I'm always annoyed in my one liners when I have to write out: sort { $a = $b } ... And I'd be thrilled if it were as easy as: sort { +$_ } ... Luke
Re: The Sort Problem
LP == Luke Palmer [EMAIL PROTECTED] writes: LP Uri Guttman writes: because that would be the default comparison and the extracted key value would be stringified unless some other marker is used. most sorts are on strings so this would be a useful huffman and removal of a redundancy. LP While I like where most of this is going, I beg to differ on this point. LP I end up sorting numbers more often then strings, except when I'm LP sorting the keys of a hash. I'm always annoyed in my one liners when I LP have to write out: LP sort { $a = $b } ... LP And I'd be thrilled if it were as easy as: LP sort { +$_ } ... oh, i don't want to see = anywhere in sort code. i don't mind + or int as markers for that. you seem to actually be agreeing with me that strings are the default key type and + is needed for a numeric sort. but do you do integer or float sorts? and as i said, i don't like the asymmetry of int vs +. uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
Re: The Sort Problem
Uri persisted: DC How do you know that the values of %lookup are strings? DC How would the optimizer know? because that would be the default comparison and the extracted key value would be stringified unless some other marker is used. most sorts are on strings so this would be a useful huffman and removal of a redundancy. Leaving aside your assertion about the commonest type of sort (which I don't see how you can have reliable evidence about), we're not talking about markers; these are *operators* that ensure that the keys extracted are of a particular static type. DC Unary C+ *is* the float cast thingy! hmm, so + is float No. C+ is number. but int is needed otherwise? int is more commonly a sort key than float. it just feels asymetrical with one having a symbol and the other a named operator. Sorry, but that's the way the language works. The more general and usual conversion (to number, not to integer) has the shorter name. DC If you want to force numeric comparison of keys you explicitly DC cast each key to number using unary C+ or Cint. If you want to DC force stringific comparison you explicitly cast the key to string DC using unary C~. or ~ is the default if nothing else is specified. it matches up with cmp being the default comparison as you agreed above. Okay, if you want to think of it that way, fine. DC If you don't explicitly cast either way, Csort just DWIMs by DC looking at the actual type of the keys returned by the extractor. DC If any of them isn't a number, it defaults to Ccmp. that doesn't work for me. the GRT can't scan all keys before it decides on what comparison operator is needed. the GRT needs to merge keys as they are extracted and it needs to do the correct conversion to a byte string then and not later. you can't DWIM this away if you want the optimization. EXACTLY!!! So, if you want the GRT to kick in, you have to make sure the return type of your block is appropriate. If you don't give the compiler that hint, you don't get the optimized sorting. DC Because I wanted to show a plain old two-parameter block being used as DC a *comparator* (not a one-parameter block being used as a key DC extractor). that seems like extra redundant code just to mark a callback vs a extract block. It's just an *example*, for heavens sake. there should be a cleaner way to do that. There is. If it's just a callback that is already declared to take two parameters, and you want to pass $^a and $^b in that order, then you can just write: sort just_guess, @unsorted; or, if it's one of many criteria, you write: sort [ { -M }, { .{owner} }, just_guess, ] @unsorted; maybe a null extract key in the pair? Unnecessary. See above. or it could be a sort criteria list followed by 2 refs to input records. DC Only if the first array ref contains nothing but Criterion objects. but what defines them as criteria objects? Their *type*. Remember this: type KeyExtractor ::= Code(Any) returns Any; type Comparator ::= Code(Any, Any) returns Int; type Criterion::= KeyExtractor | Comparator Pair(KeyExtractor, Comparator) ; type Criteria ::= Criterion | Array of Criterion ; multi sub *sort(Criteria $by = {$^a cmp $^b}: [EMAIL PROTECTED]) {...} ??? The criteria have to be of a very specific type. they look just like records which could be sorted as well. The point is that an array is only a list Csort criteria if every one of its elements is a one- or two-parameter block, or a Pair of blocks. And the blocks have to have very specific signatures too. The *only* time there's going to be any ambiguity is if you wanted to sort a list of arrays of Csort criteria...not a very common occurrence, I suspect. ;-) i don't see any syntactical markers that {} means criteria block vs a hash ref. Because there aren't any. Blocks are blocks and hashes are hashes, and in Perl 6 we have a very clear set of syntactic rules that determines which is which. DWIM guessing isn't good enough for me here. This *isn't* DWIM guessing. This is block parsing and static typing. sort in p5 already had some issues with that IIRC. Only because Perl 5 has issues with hashes vs blocks. Perl 6 fixes those issues. but the callback sub knows it has two params since it has to be written that way. But it *isn't* a callback sub in my example!!! It's a call to subroutine inside a block. And the *block* is the comparator Csort sees. The parameters of the block are $^a and $^b and it's a two parameter block (and therefore a valid comparator) precisely *because* it has those two parameters. Look, FORGET about that example. It was obviously a bad example. It's caused enough pain (to me if no-one else). Here's another example (all praise be to Rod!): @teams = sort [ # 1 parameter so
Re: The Sort Problem
--- Uri Guttman [EMAIL PROTECTED] wrote: @out = sort key( %lookup{ .{remotekey} } ), key:insensitive:descending:locale( 'somewhere' )( .{priority} ), key:float ( substr( 0, 10 ) ), key:integer ( /foo(\d+)bar/ ), key:compare( { ^$a = ^$b } )( /(\d+)$/ ), key:compare( \my_compare_sub ) ( /(\d+)$/ ), @in ; ... note that the custome compare callbacks can be a block or a sub name/ref. the callback sub would be passed 2 args as usual. Off the top of my head, I can't think of a case where the compare sub would be needed unless the key was not well-ordered. Does anyone have an example of a case where the key-extraction sub approach doesn't reduce the problem to a Scalar comparison? =Austin
Re: The Sort Problem
AH == Austin Hastings [EMAIL PROTECTED] writes: AH --- Uri Guttman [EMAIL PROTECTED] wrote: @out = sort key:compare( \my_compare_sub ) ( /(\d+)$/ ), @in ; note that the custome compare callbacks can be a block or a sub name/ref. the callback sub would be passed 2 args as usual. AH Off the top of my head, I can't think of a case where the compare sub AH would be needed unless the key was not well-ordered. Does anyone have AH an example of a case where the key-extraction sub approach doesn't AH reduce the problem to a Scalar comparison? there are rare times when you need a computed comparison, say with a specialized collation sequence. if this sequence isn't supported by locale or whatever, you need to do a callback comparison. i can't think of a real world example but maybe with some odd symbolic character set or a custom one. in any case, supporting the callback is easy. it has one drawback which is that it disallows the GRT but the guts could fallback to the ST then. in both cases, key extraction is the same. uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
Re: The Sort Problem
--- Uri Guttman [EMAIL PROTECTED] wrote: AH == Austin Hastings [EMAIL PROTECTED] writes: AH --- Uri Guttman [EMAIL PROTECTED] wrote: @out = sort key:compare( \my_compare_sub ) ( /(\d+)$/ ), @in ; note that the custome compare callbacks can be a block or a sub name/ref. the callback sub would be passed 2 args as usual. AH Off the top of my head, I can't think of a case where the compare sub AH would be needed unless the key was not well-ordered. Does anyone have AH an example of a case where the key-extraction sub approach doesn't AH reduce the problem to a Scalar comparison? there are rare times when you need a computed comparison, say with a specialized collation sequence. if this sequence isn't supported by locale or whatever, you need to do a callback comparison. i can't think of a real world example but maybe with some odd symbolic character set or a custom one. in any case, supporting the callback is easy. it has one drawback which is that it disallows the GRT but the guts could fallback to the ST then. in both cases, key extraction is the same. Hmm. If this is all, then I think it's worth putting the onus on the developer to write a better key extractor. What you're saying, basically, is that the special comparator would be doing dynamic key generation. IMO, let the developer write a xxx_2_UTF32 key extractor. In all likelihood, once it's written he may find a way to use that format throughout the code. =Austin
Re: The Sort Problem
AH == Austin Hastings [EMAIL PROTECTED] writes: AH --- Uri Guttman [EMAIL PROTECTED] wrote: there are rare times when you need a computed comparison, say with a specialized collation sequence. if this sequence isn't supported by locale or whatever, you need to do a callback comparison. i can't think of a real world example but maybe with some odd symbolic character set or a custom one. in any case, supporting the callback is easy. it has one drawback which is that it disallows the GRT but the guts could fallback to the ST then. in both cases, key extraction is the same. AH Hmm. If this is all, then I think it's worth putting the onus on the AH developer to write a better key extractor. possibly. AH What you're saying, basically, is that the special comparator would AH be doing dynamic key generation. not exactly. key extraction is getting the key out of the record. there could be some munging applied at that phase as well. comparison is just a replacement for = or cmp. it could be some strange collate sequence based on specialized data. a contrived example is sorting by length but only for strings beginning with i-k, otherwise go with alphabetic sort. you would have to extract a dummy length value for the keys which don't start with i-k and the name would be the second key in all cases. but there are more complex cases where it is easier to put the logic in the comparison code instead of the key extraction part. the issue is that the comparison could have logic beyond simple comparisons. so we do need to support the compare code block or sub callback. as i said, when you use that you could (if we do it this way) lose major speedups but that is a standard penalty to pay when you need custom code in the comparison. uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
Re: The Sort Problem
Austin Hastings wrote: Off the top of my head, I can't think of a case where the compare sub would be needed unless the key was not well-ordered. Does anyone have an example of a case where the key-extraction sub approach doesn't reduce the problem to a Scalar comparison? I can't find the P5 code I used for it right off, but I remember a case when I was playing around with various football (US) stats. I was ranking teams, and had something akin to: @teams = sort {$b-{WinPct} = $a-{WinPct} || ($a-{Conference} eq $b-{Conference} ?($b-{ConfWinPct} = $a-{ConfWinPct} || $b-{ConfWon}= $a-{ConfWon} || $a-{ConfLost} = $b-{ConfLost}) :($a-{Division} eq $b-{Division} ?($b-{DivWinPct} = $a-{DivWinPct} || $b-{DivWon}= $a-{DivWon} || $a-{DivLost} = $b-{DivLost}) :0 ) ) || $b-{Won} = $a-{Won} || $a-{Lost}= $b-{Lost} || $b-{GamesPlayed} = $a-{GamesPlayed} } @teams; Creating a keycode for this situation is not a trivial task at all. So sorts do occur in the real world for which there is no straightforward way to generate a sortkey. Albeit considerably rare. Also, I think there is utility in have a compare sub supported so that: 1) porting P5 code is easier. (a minor design rationale, but it exists) 2) people used to thinking in terms of compare subs (from C, P5, and points of the programming universe) can still think that way. 3) most importantly to me, so that There's More Than One Way to Do It. -- Rod Adams
Re: The Sort Problem (was: well, The Sort Problem)
Here's my stab at a sort syntax, pulling syntax over from REs: @out == sort key:ri($_-[2]), key:s($_-[4]) == @in; Basicly, you have a list of RE syntax like Ckey values, whilch take various modifiers to say how to play with that key, and then an expr on how to generate the key given element $_. Possible modifiers: (verbose versions welcome) :rreverse/descending :n force numeric comparisons :s force string comparisons (default) :u unicode (implies :s) :i case insensitive (implies :s) :l localized collation order :x call custom compare sub (a la P5) This allows: @out = sort keys %hash; # everything gets defaulted @out = sort key:x{myfunc($a) cmp myfunc($b)}(), @in; # handy for P5 migration, but not much else @out = sort key(myfunc($_)), @in; # same as above, but much better. @out = sort key(%lookup{$_-{remotekey}}), key:ir($_-{priority}), @in; # complex in P5, easy here. Advantages: - Uses syntax idioms used elsewhere in P6. - Common cases are easy - Decent huffman coding. Disadvantages: - Do we really want things that look like REs that are not REs? - If we do this, are we setting ourselves up to create other RE-like creatures for grep, for, etc, to the point where people will start wanting to roll their own in modules? Thoughts? -- Rod
Re: The Sort Problem
Am Freitag, 13. Februar 2004 01:40 schrieb Larry Wall: On Thu, Feb 12, 2004 at 04:29:58PM -0500, Uri Guttman wrote: : again, confusing. why should the order of a binary operator mean so : much? the order of a sort key is either ascending or descending. that is : what coders want to specify. translating that to the correct operator : (cmp or =) and the correct binary order is not the same as specifying : the key sort order and key type (int, string, float). Uri is dead on with this one, guys. As I listen to this mails, I get the feeling that something like this is wanted: Key generation: @unsorted_temp = map { $k1=$_.func1('a');# ASC $k2=$_.func2('we'); # DESC [ $_, $k1, $k2 ]; } @unsorted; Now we've got an array with keys and the objects. Sorting: @sorted = sort { $a-[1] cmp $b-[1] || $b-[2] = $a-[2] || } @unsorted_temp; These things would have to be said in P6. So approx.: @sorted = @unsorted.sort( keys = [ { $_.func1('a'); }, { $_.func2('we'); } ], cmp = [ cmp, = ], order = [ asc, desc], key_generation = lazy, ); That would explain what I want. Maybe we could turn the parts around: @sorted = @unsorted.sort( 1 = [ { $_.func1('a'); }, cmp, asc], 2 = [ { $_.func2('we'); }, =, desc], ); or maybe use a hash instead of an array: @sorted = @unsorted.sort( 1 = [ key = { $_.func1('a'); }, op = cmp, order = asc], 2 = [ key = { $_.func2('we'); }, op = =, order = desc], ); If that's too verbose? I don't think so; I've stumbled often enough on $a = $b vs. $b = $a and similar, and the above just tells what should be done. Regards, Phil
Re: The Sort Problem (was: well, The Sort Problem)
At 11:52 PM -0700 2/12/04, Luke Palmer wrote: But it needs some major syntax work so it can feel more like it's a part of the language instead of a library function. Not, mind, that I think Perl's syntax needs to be changed at all to accommodate. Since everyone's well past mad here and deep into drug-induced brain damage territory... If you're *really* looking to get fancy, why not just allow the sort specification to be done with SQL? Comfortable, well-understood, already has a decade or so of stupid things welded into it (so everyone can stop trying to think up new stupid things to weld in), and there are several grammars for it so it'd be no work to speak of for me. Heck, you could even unify map, grep, and sort and, if you fed in a list of pair lists or hashes, return parts of each record^Wlist element rather than the whole thing. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: The Sort Problem (was: well, The Sort Problem)
Friday 13 February 2004 15:02, Dan Sugalski wrote: If you're *really* looking to get fancy, why not just allow the sort specification to be done with SQL? Comfortable, well-understood, already has a decade or so of stupid things welded into it [...] Heck, you could even unify map, grep, and sort [...] That would be postmodern indeed. -angel
Re: The Sort Problem
On Thu, 2004-02-12 at 18:50, Uri Guttman wrote: there are only a short list of key comparisons possible, int, string, float and maybe unicode. i separate int from float since they do have different internals in the GRT. it is one area where you do expose stuff. otherwise you could just use number. And comparing in a particular language (as Dan pointed out in his Parrot talk); comparing sets (e.g. numeric comparison, but odds and evens are separate); ordering by string length; ordering by string-to-numeric value with esoteric exceptions (software versions); etc. Comparison is essentially always numeric, but only if you reduce your inputs to numbers, and that's often not terribly efficent (e.g. you can reduce any ascii string to: chr(substr($s,0,1))+chr(substr($s,1,1))*128+... and then comparing. The only limitation is that you'll have to use Math::BigInt to do the comparison. simple. pick almost any language char set other than US ascii. many have special collating sequences. i am not anything close to a unicode expert but i have seen this issue. in fact setting the LANG (or some other) environment variable will affect many programs by changing the collating order. Ok, yes. This was Dan's example, and is EXACTLY why I think you want to use map once for key extraction and then sort's expression for comparison. that is scary. do you realize that the sort block will be called for each comparison? Yes why would it not be? How do you compare, say, IPv6 addresses if not by performing a logically-ored comparison between each octet? I certainly don't want to instantiate a Math::BigInt just to compare two 128-bit values. AS @new1 = sortpairs {$_[0] = $_[1]} map {[$_,$_]} @old1; AS @new2 = sortpairs {$_[0] = $_[1] || $_[4] cmp $_[3]} map AS {[getkeys($_),$_]} @old2; where is getkeys defined? It was an example. It's whatever you like. In some cases, that might just be: sub getkeys { my $object = shift; return @{$object-keys}; } or perhaps: sub getkeys { my $string = shift; return(length($string),$string); # Order by length and then ascii order } It was just an example. sortpairs was the interesting part. how do you know what indexes to use for each comparison? what happened to $_[2]? your call to $comp is passed 2 arguments but the second example accesses 4 arguments. You missed a layer of extraction there. $comp is passed each parameter PAIR. So if your input is [1,2,3,4,moo], then you are comparing the keys 1, 2, 3 and 4 and your value is moo. AS The second example really illustrates the point that you can swap the AS direction of key order and mechanism to compare them at your whim. AS Now, you just need to call sortpairs with any array of arrays of keys AS (with trailing value). add a third key to that. quickly! ok, easy: @new2 = sortpairs {$_[0] = $_[1] || $_[4] cmp $_[3] || ACME::Compare::Thingulas-compare($_[5],$_[6])} map {[getkeys($_),$_]} @old2; Please notice that the ONLY change is in the way I call it, not in sortpairs. sortpairs still works just fine. a more straightforward api (which is close to what i will do in Sort::GRT) is (p5 code) my $sort = Sort::GRT-new( keys= [ { descending = 1, type = float, extract = '$_-{amount}, }, { extract = 'substr( $_-{date}, 0, 4)' }, ] ) ; The value of extract should be a subref, not a string (thus having closure status, and therefore access to context). I think you and I are in violent agreement. I just look at this as a problem that doesn't need solving because Schwartzian Transforms do all the work for me. You look at it as something needing encapsulation. That's cool, but sematically, we're doing the same work. -- Aaron Sherman [EMAIL PROTECTED] Senior Systems Engineer and Toolsmith It's the sound of a satellite saying, 'get me down!' -Shriekback
Re: [perl] The Sort Problem
Joe Gottman wrote in perl.perl6.language : This is unrelated to the problem you mentioned, but there is another annoying problem with sort as it is currently defined. If you have an @array and you want to replace it with the sorted version, you have to type @array = sort @array; Besides being long-winded, this causes Perl to make an unnecessary copy of the array. It would be nice calling if sort (or reverse, or other similar functions) in void context resulted in in-place modification of the array that was input. I'd rather not change the behaviour of sort (or other built-ins that may or may not operate in place) depending on the calling context. What to do, for example, when Csort is the last statement of a sub? Besides this, a construction like @x = sort @x could be detected by a suitable optimizer and turned to (internally) an in-place sort. (It appears that Dave Mitchell is working on such an optimization for perl 5.)
Re: [perl] The Sort Problem
On Thu, 12 Feb 2004 15.40, Joe Gottman wrote: This is unrelated to the problem you mentioned, but there is another annoying problem with sort as it is currently defined. If you have an @array and you want to replace it with the sorted version, you have to type @array = sort @array; Besides being long-winded, this causes Perl to make an unnecessary copy of the array. It would be nice calling if sort (or reverse, or other similar functions) in void context resulted in in-place modification of the array that was input. I just know that this has come up before . . . I don't remember whether it was consensus[1] or just my wishful thinking that this would make code like: sub foo { my $listref = shift; # other stuff... # return sorted list. sort @$listref; } morally ambiguous[2], because that sort is being called in whatever context the function call was in, which could be miles away. I'd suggest that an in-place sort have a marginally different name, but the rubyometer is already redlined as it is. (Even more OT: I dislike the scoping rules for named sorting subroutines in Perl 5. It means you can't do questionable things like this: # A common function used all over the place. sub onewayorother { $direction * ($a = $b) } # much later ... my $direction = -1; my @arr = sort onewayorother (5,7,9); The more I look at that, the more I think it's a stupid example, but I have a vivid memory of being upset about it several years ago, when I wasn't a very good programmer. This is looking less like a whinge and more like a confession. Move along, nothing more to see here.) [1] If, by consensus, you mean Larry Said No. [2] Not semantically ambiguous, because it isn't. -- Debbie Pickett http://www.csse.monash.edu.au/~debbiep [EMAIL PROTECTED]
Re: The Sort Problem
Luke Palmer wrote: Any other ideas? How about something like this, modulo any errors in my Perl 6 syntax? sub sort(?cmp = infix:cmp, +$key, +$desc, [EMAIL PROTECTED]) { ... } I think that allows all of these: # P5: @sorted = sort @unsorted; @sorted = sort @unsorted; The simplest case is the same as the Perl 5, which seems a pleasant feature. # P5: @sorted = sort { $a = $b } @unsorted; @sorted = sort { $^a = $^b } @unsorted; # or: @sorted = sort infix:= == @unsorted; This also seems reasonable. # P5: @sorted = sort { $a-foo('bar')-compute = $b-foo('bar')-compute } # @unsorted # or: @sorted = map { $_-[1] } # sort { $a-[0] =? $b-[0] } # map { [ $_-foo('bar')-compute, $_ ] } # @unsorted @sorted = sort infix:=, key = { $_.foo('bar').compute } == @unsorted; I think my suggestion wins big here. We've only had to specify how to extract the key, and sort itself takes care of everything else. And it looks to me like this sort function has enough information about the programmer's intent for it to optimise in all sorts of exciting ways -- it should be able to do the equivalent of the GRT internally, for example. Just for kicks, this one demonstrates all the features. It's the same as before, but in descending order: @unsorted == sort infix:=, desc = 1, key = { $_.foo('bar').compute } == @sorted; What problems can anyone spot with this suggestion? -- Aaron Crane * GBdirect Ltd. http://training.gbdirect.co.uk/courses/perl/
Re: The Sort Problem
... so here is a (very rough and probably broken) syntax idea building on that: sort :key { :descend :string .foo('bar').substr( 10, 3) } :key { :int .foo('baz') } :key { :float .foo('amount') } @unsorted ; I see a kind of problem here: If the parts of the key are not fixed length but can vary you can put them in strings *only* after processing all and verifying the needed length. Example: sort :key { :descend :string .foo('bar') } :key { :int .foo('baz') } :key { :float .foo('amount') } @unsorted ; Now .foo('bar') isn't bounded with any length - so you don't know how much space to reserve. And I believe that - generating keys on every list element - storing them into a array (array of array) and - after having processed all checking the length, and - now generate the to-be-sorted-strings - sort isn't the optimal way. BTW: this requires that *all* keys are generated. In cases like - by name, - by age, - by height, - by number of toes left, - and finally sort by the social security number most of the extractions (and possibly database-queries of calculations or whatever) will not be done - at least in the current form of sort { $a-{name} cmp $b-{name} || $a-{age} = $b-{age} || ... That is to say, I very much like the syntax you propose, but I'm not sure if pre-generating *every* key-part is necessarily a speed-up. If there are expensive calculations you can always cut them short be pre-calculating them into a hash by object, and just query this in sort. Also I fear that the amount of memory necessary to sort an array of length N is not N*2 (unsorted, sorted), but more like N*3 (unsorted, keys, sorted), which could cause troubles on bigger arrays Regards, Phil
Re: The Sort Problem
On Thu, 2004-02-12 at 01:59, Uri Guttman wrote: sorry about the confusion. read the paper at: http://www.sysarch.com/perl/sort_paper.html All of which stops being a concern if you have a sort variant that works on pairs in the first place. Perl _5_ code follows: sub sortpairs(@) { my $comp = shift; my %pairs = @_; return map {$pairs{$_}} sort {$comp-($a)} keys %pairs; } @new1 = sortpairs {$a = $b} map {($_-computekey,$_)} @old1; @new2 = sortpairs {$a cmp $b} map {(lc($_),uc($_))} @old2; as you can see, you can perform any key extraction you like in the map portion (which just returns key/value pairs) and sortpairs' job is just to compare the keys and return the resulting sorted values. No Perl 6 here, move along ;-) Perl 6 could contribute here by making it cheaper to construct/pass the keys, but that's about it. -- Aaron Sherman [EMAIL PROTECTED] Senior Systems Engineer and Toolsmith It's the sound of a satellite saying, 'get me down!' -Shriekback signature.asc Description: This is a digitally signed message part
Re: The Sort Problem
On Thu, 2004-02-12 at 08:43, Aaron Sherman wrote: sub sortpairs(@) { my $comp = shift; my %pairs = @_; return map {$pairs{$_}} sort {$comp-($a)} keys %pairs; } Doh... it's early for me. That's Csort {$comp-()} with no parameter. The fact that $a and $b are dynamically scoped in Perl 5 helps out here. -- Aaron Sherman [EMAIL PROTECTED] Senior Systems Engineer and Toolsmith It's the sound of a satellite saying, 'get me down!' -Shriekback signature.asc Description: This is a digitally signed message part
Re: [perl] The Sort Problem
At 11:40 PM -0500 2/11/04, Joe Gottman wrote: This is unrelated to the problem you mentioned, but there is another annoying problem with sort as it is currently defined. If you have an @array and you want to replace it with the sorted version, you have to type @array = sort @array; That would seem to be a place for more explicit, and specific, behaviour. Since we're going everything is an object then it's just a matter of: @array.sort_in_place; and making sure that the array role provides the appropriate method. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: The Sort Problem
On Thu, Feb 12, 2004 at 12:07:37PM +0100, Ph. Marek wrote: : ... : so here is a (very rough and probably broken) syntax idea building on : that: : : sort :key { :descend :string .foo('bar').substr( 10, 3) } : : :key { :int .foo('baz') } : :key { :float .foo('amount') } @unsorted ; : I see a kind of problem here: If the parts of the key are not fixed length but : can vary you can put them in strings *only* after processing all and : verifying the needed length. I think this is easily solved by sorting on a list of strings rather than a single string. : - generating keys on every list element : - storing them into a array (array of array) and : - after having processed all checking the length, and : - now generate the to-be-sorted-strings : - sort : : isn't the optimal way. : BTW: this requires that *all* keys are generated. : In cases like : - by name, : - by age, : - by height, : - by number of toes left, : - and finally sort by the social security number Not if you lazily add strings to the list of strings mentioned above. : That is to say, I very much like the syntax you propose, but I'm not sure if : pre-generating *every* key-part is necessarily a speed-up. : : If there are expensive calculations you can always cut them short be : pre-calculating them into a hash by object, and just query this in sort. Another approach is to declare a function that is cached, I suppose. : Also I fear that the amount of memory necessary to sort an array of length N : is not N*2 (unsorted, sorted), but more like N*3 (unsorted, keys, sorted), : which could cause troubles on bigger arrays We'll just use virtual memory. :-) Larry
Re: The Sort Problem
Aaron Crane writes: Luke Palmer wrote: Any other ideas? How about something like this, modulo any errors in my Perl 6 syntax? sub sort(?cmp = infix:cmp, +$key, +$desc, [EMAIL PROTECTED]) { ... } I think that allows all of these: # P5: @sorted = sort @unsorted; @sorted = sort @unsorted; The simplest case is the same as the Perl 5, which seems a pleasant feature. This is a feature that, IMO, we can not do without. I still do things like: print $_: %hash{$_} for sort keys %hash; All the time, and I don't want to give that up for some clearer syntax, as there is no such thing. # P5: @sorted = sort { $a = $b } @unsorted; @sorted = sort { $^a = $^b } @unsorted; # or: @sorted = sort infix:= == @unsorted; This also seems reasonable. # P5: @sorted = sort { $a-foo('bar')-compute = $b-foo('bar')-compute } # @unsorted # or: @sorted = map { $_-[1] } # sort { $a-[0] =? $b-[0] } # map { [ $_-foo('bar')-compute, $_ ] } # @unsorted @sorted = sort infix:=, key = { $_.foo('bar').compute } == @unsorted; Ok, I have to say, that's pretty good. Er, really good. I like it a lot. I think my suggestion wins big here. We've only had to specify how to extract the key, and sort itself takes care of everything else. And it looks to me like this sort function has enough information about the programmer's intent for it to optimise in all sorts of exciting ways -- it should be able to do the equivalent of the GRT internally, for example. Just for kicks, this one demonstrates all the features. It's the same as before, but in descending order: @unsorted == sort infix:=, desc = 1, key = { $_.foo('bar').compute } == @sorted; What problems can anyone spot with this suggestion? I don't like the Cdesc flag. But I can't, at the moment, think of any way around it short of: @unsorted == sort { $^b = $^a }, key = { .foo('bar').compute } == @sorted Which people have made pretty clear that they don't like. Oh, by the way, your Perl 6 syntax is immaculate. Bravo :-) Luke
Re: The Sort Problem
LP == Luke Palmer [EMAIL PROTECTED] writes: # P5: @sorted = sort { $a-foo('bar')-compute = $b-foo('bar')-compute } # @unsorted # or: @sorted = map { $_-[1] } # sort { $a-[0] =? $b-[0] } # map { [ $_-foo('bar')-compute, $_ ] } # @unsorted @sorted = sort infix:=, key = { $_.foo('bar').compute } == @unsorted; LP Ok, I have to say, that's pretty good. Er, really good. I like it a LP lot. how do you select descending order? and how do you selecte that per key? you can't provide a binary operator without also providing the order. and what about different key types? the = and cmp operators are not enough information needed to do complex sorts. collating sequences are another issue. you need to have that info on a perl key basis. I think my suggestion wins big here. We've only had to specify how to extract the key, and sort itself takes care of everything else. And it looks to me like this sort function has enough information about the programmer's intent for it to optimise in all sorts of exciting ways -- it should be able to do the equivalent of the GRT internally, for example. sort can't figure it out without you telling it things. you need more than just key extraction. Just for kicks, this one demonstrates all the features. It's the same as before, but in descending order: @unsorted == sort infix:=, desc = 1, key = { $_.foo('bar').compute } == @sorted; What problems can anyone spot with this suggestion? multiple keys with each having different comparisons and different sort orders. LP @unsorted LP == sort { $^b = $^a }, key = { .foo('bar').compute } LP == @sorted LP Which people have made pretty clear that they don't like. as i have said before. needing to have $a and $b in the correct order is bug prone and confusing to many. uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
Re: The Sort Problem
PM == Ph Marek [EMAIL PROTECTED] writes: ... so here is a (very rough and probably broken) syntax idea building on that: sort :key { :descend :string .foo('bar').substr( 10, 3) } :key { :int .foo('baz') } :key { :float .foo('amount') } @unsorted ; PM I see a kind of problem here: If the parts of the key are not PM fixed length but can vary you can put them in strings *only* after PM processing all and verifying the needed length. varying keys has no effect on this. you still build up a merged key of whatever length is needed for each record. perl5 does that with pack or .= or sprintf (all used in variants of the GRT). PM Example: PM sort :key { :descend :string .foo('bar') } PM :key { :int .foo('baz') } PM :key { :float .foo('amount') } @unsorted ; PM Now .foo('bar') isn't bounded with any length - so you don't know how much PM space to reserve. who needs to reserve space? you just let the guts do a realloc like p5 does now. PM And I believe that PM - generating keys on every list element PM - storing them into a array (array of array) and PM - after having processed all checking the length, and PM - now generate the to-be-sorted-strings PM - sort PM isn't the optimal way. PM BTW: this requires that *all* keys are generated. PM In cases like PM - by name, PM - by age, PM - by height, PM - by number of toes left, PM - and finally sort by the social security number PM most of the extractions (and possibly database-queries of calculations or PM whatever) will not be done - at least in the current form of PM sort { $a-{name} cmp $b-{name} || PM$a-{age} = $b-{age} || PM ... PM That is to say, I very much like the syntax you propose, but I'm PM not sure if pre-generating *every* key-part is necessarily a PM speed-up. see the paper for more on that. the work you do in the callback, even if you only do the first compare is O(N log N). in large sorts that will outweigh the O(N) work on extracting the keys one time. and as with all algorithms, real world cases will differ. short sorts can be faster with complex comparisons because of the initial overhead. maybe the internal code can decide to do that instead of a GRT based on the sort array size. the key extract code is the same in both cases so that would be an interesting optimization. PM If there are expensive calculations you can always cut them short PM be pre-calculating them into a hash by object, and just query this PM in sort. again, the goal is to eliminate work in the comparison. but as i said, this is an implementation problem. we should focus on the language level needs of the sort func and not how it is done internally. i shouldn't have brought up the internal design here. PM Also I fear that the amount of memory necessary to sort an array PM of length N is not N*2 (unsorted, sorted), but more like N*3 PM (unsorted, keys, sorted), which could cause troubles on bigger PM arrays as larry said, virtual memory. and there are ways to reduce the copies internally with pointers (something p5 couldn't do at p5 code level). other than copying the extracted keys, you could leave the original array alone and sort it in place by indexing it based on the sorted pointers. and since the records will all be the same PMC type (usually), you could just swap the array's pointers and not move any data around. so no extra copies would be made other than key extraction. uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
Re: The Sort Problem
Luke Palmer wrote: Aaron Crane writes: @unsorted == sort infix:=, desc = 1, key = { $_.foo('bar').compute } == @sorted; I don't like the Cdesc flag. But I can't, at the moment, think of any way around it short of: @unsorted == sort { $^b = $^a }, key = { .foo('bar').compute } == @sorted Which people have made pretty clear that they don't like. One option might be an 'rsort' function, but I think that's somewhat lacking in the taste department. Another might be as simple as @unsorted == sort == reverse == @sorted; But I can see an argument that C == reverse is quite a large code burden for something so conceptually simple. I have one other idea, but I can't decide if I like it: @unsorted == sort rinfix:cmp == @sorted; That is, rinfix: (or some other name) is like infix:, but gives you a function that reverses its arguments before actually running the operator. Perhaps it could even be implemented as a macro. -- Aaron Crane * GBdirect Ltd. http://training.gbdirect.co.uk/courses/perl/
Re: The Sort Problem
AC == Aaron Crane [EMAIL PROTECTED] writes: AC One option might be an 'rsort' function, but I think that's AC somewhat lacking in the taste department. AC Another might be as simple as AC @unsorted == sort == reverse == @sorted; again, reverse order needs to be on a per key basis. the problem we are wrangling is how to support multiple keys in a clean syntactical way. AC @unsorted == sort rinfix:cmp == @sorted; AC That is, rinfix: (or some other name) is like infix:, but gives you a AC function that reverses its arguments before actually running the operator. AC Perhaps it could even be implemented as a macro. again, confusing. why should the order of a binary operator mean so much? the order of a sort key is either ascending or descending. that is what coders want to specify. translating that to the correct operator (cmp or =) and the correct binary order is not the same as specifying the key sort order and key type (int, string, float). uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
Re: The Sort Problem
Aaron Crane writes: I have one other idea, but I can't decide if I like it: @unsorted == sort rinfix:cmp == @sorted; That is, rinfix: (or some other name) is like infix:, but gives you a function that reverses its arguments before actually running the operator. Perhaps it could even be implemented as a macro. Like this one? macro rinfix($op) is parsed(/ \: (.*?) before: [\s(] /) { return { - $a, $b { (infix:$op).($b, $a) } } } Luke
Re: The Sort Problem
[EMAIL PROTECTED] (Aaron Crane) writes: One option might be an 'rsort' function, but I think that's somewhat lacking in the taste department. Agreed. Another might be as simple as @unsorted == sort == reverse == @sorted; @sorted == sort == @unsorted, no? ;) @unsorted == sort rinfix:cmp == @sorted; That is, rinfix: (or some other name) is like infix:, but gives you a function that reverses its arguments before actually running the operator. Perhaps it could even be implemented as a macro. I like this; it reminds me of the games you have to play with determining what do to with binary operators whose operands are overloaded in different classes. -- Do not go gentle into that good night/Rage, rage against the dying of the light
Re: The Sort Problem
Perhaps something like: @sorted = sort { infix:= map { scalar $_.foo('bar').compute } @^_ } } @data I'm not entirely sure it's readability is better than yours, though. Dave. Luke Palmer [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] I've been thinking about this problem which comes up in my code a lot: @sorted = sort { $^a.foo('bar').compute = $^b.foo('bar').compute } @unsorted; Often the expressions on each side are even longer than that. But one thing remains: both sides are exactly the same, substitute a $^b for a $^a. I can see a couple less-than-desirable ways around this redundancy: @sorted = sort { infix:=( *($^a, $^b)».foo('bar').compute ) } @unsorted; Which doesn't work if .compute returns a list... not to mention its horrible ugliness. Another is to define a variant of sort (haven't had much practice with A6 material recently; here we go!): multi sub sort (block($) = { $_ } : [EMAIL PROTECTED]) { sort { block($^a) cmp block($^b) } @data; } @sorted = sort { .foo('bar').compute } @unsorted; Which has the disadvantage of forcing you to use Ccmp and forcing an ascending sort. Any other ideas? Is a more general solution necessary? Luke
Re: The Sort Problem
On Thu, 2004-02-12 at 15:46, Uri Guttman wrote: LP == Luke Palmer [EMAIL PROTECTED] writes: LP Ok, I have to say, that's pretty good. Er, really good. I like it a LP lot. how do you select descending order? and how do you selecte that per key? you can't provide a binary operator without also providing the order. and what about different key types? the = and cmp operators are not enough information needed to do complex sorts. collating sequences are another issue. you need to have that info on a perl key basis. You know, I have trouble typing per when I'm thinking perl too ;-) I think my example addressed all of your concerns, and did it in Perl 5. Once again, that was: sub sortpairs(@) { my $comp = shift; my %pairs = @_; return map {$pairs{$_}} sort {$comp-()} keys %pairs; } @new1 = sortpairs {$a = $b} map {($_-computekey,$_)} @old1; @new2 = sortpairs {$a cmp $b} map {(lc($_),uc($_))} @old2; So, in order, your questions were: * how do you select descending order? You reverse the $a and $b in the first parameter to sortpairs * and how do you selecte that per key? Ok, I was only addressing one key. But, it's not hard to generalize the idea, and certainly Perl 6 gives you the tools to do so easily. * you can't provide a binary operator without also providing the order. Correct * and what about different key types? the = and cmp operators are not enough information needed to do complex sorts. That's fine, you can plug in a call to Alien::Headed::Babies inside the closure for all I care. * collating sequences are another issue. you need to have that info on a perl key basis. Give me an example? * multiple keys with each having different comparisons and different sort orders. Ok let's delve into it then: sub sortpairs(@){ my $comp = shift; my @elements = @_; return map {my $e=$elements[$_];$e-[$#{$e}]} sort { my @akeys = @{$elements[$a]}[0..$#{$elements[$a]}-1]; my @bkeys = @{$elements[$b]}[0..$#{$elements[$b]}-1]; for(my $i=0;$i@akeys;$i++) { my $dir = $comp-($akey[$i],$bkey[$i]); return $dir if $dir; } return 0; } 1..scalar(@elements); } @new1 = sortpairs {$_[0] = $_[1]} map {[$_,$_]} @old1; @new2 = sortpairs {$_[0] = $_[1] || $_[4] cmp $_[3]} map {[getkeys($_),$_]} @old2; The second example really illustrates the point that you can swap the direction of key order and mechanism to compare them at your whim. Now, you just need to call sortpairs with any array of arrays of keys (with trailing value). -- Aaron Sherman [EMAIL PROTECTED] Senior Systems Engineer and Toolsmith It's the sound of a satellite saying, 'get me down!' -Shriekback
Re: The Sort Problem
AS == Aaron Sherman [EMAIL PROTECTED] writes: AS On Thu, 2004-02-12 at 15:46, Uri Guttman wrote: how do you select descending order? and how do you selecte that per key? you can't provide a binary operator without also providing the order. and what about different key types? the = and cmp operators are not enough information needed to do complex sorts. collating sequences are another issue. you need to have that info on a perl key basis. AS I think my example addressed all of your concerns, and did it in Perl 5. AS Once again, that was: AS sub sortpairs(@) { AS my $comp = shift; AS my %pairs = @_; AS return map {$pairs{$_}} sort {$comp-()} keys %pairs; AS } AS @new1 = sortpairs {$a = $b} map {($_-computekey,$_)} @old1; AS @new2 = sortpairs {$a cmp $b} map {(lc($_),uc($_))} @old2; AS So, in order, your questions were: AS * how do you select descending order? AS You reverse the $a and $b in the first parameter to sortpairs i find that a poor solution. the $a cmp $b isn't needed in the ascending case at all. a descending marker per key is better as it reflects the results desired and not how it gets done. the reversing of $a and $b requires the coder to understand sort comparison callbacks. it is implementation exposed. AS * and how do you selecte that per key? AS Ok, I was only addressing one key. But, it's not hard to AS generalize the idea, and certainly Perl 6 gives you the tools to AS do so easily. i haven't seen any proposals other than mine (which is bad syntax but good semantics) for this. and if you have multiple keys with each having a $a = $b in there it will be very noisy. AS * you can't provide a binary operator without also providing the AS order. AS Correct a single descending marker is simpler and better semantics. AS * and what about different key types? the = and cmp operators AS are not enough information needed to do complex sorts. AS That's fine, you can plug in a call to Alien::Headed::Babies AS inside the closure for all I care. there are only a short list of key comparisons possible, int, string, float and maybe unicode. i separate int from float since they do have different internals in the GRT. it is one area where you do expose stuff. otherwise you could just use number. AS * collating sequences are another issue. you need to have that AS info on a perl key basis. AS Give me an example? simple. pick almost any language char set other than US ascii. many have special collating sequences. i am not anything close to a unicode expert but i have seen this issue. in fact setting the LANG (or some other) environment variable will affect many programs by changing the collating order. AS * multiple keys with each having different comparisons and AS different sort orders. AS Ok let's delve into it then: AS sub sortpairs(@){ AS my $comp = shift; AS my @elements = @_; AS return map {my $e=$elements[$_];$e-[$#{$e}]} sort { AS my @akeys = @{$elements[$a]}[0..$#{$elements[$a]}-1]; why the -1? that looks like: my @akeys = @{$elements[$a]} ; pop @akeys ; pop @akeys ; AS my @bkeys = @{$elements[$b]}[0..$#{$elements[$b]}-1]; AS for(my $i=0;$i@akeys;$i++) { ASmy $dir = $comp-($akey[$i],$bkey[$i]); ASreturn $dir if $dir; AS } AS return 0; AS } 1..scalar(@elements); AS } that is scary. do you realize that the sort block will be called for each comparison? AS @new1 = sortpairs {$_[0] = $_[1]} map {[$_,$_]} @old1; AS @new2 = sortpairs {$_[0] = $_[1] || $_[4] cmp $_[3]} map AS {[getkeys($_),$_]} @old2; where is getkeys defined? how do you know what indexes to use for each comparison? what happened to $_[2]? your call to $comp is passed 2 arguments but the second example accesses 4 arguments. AS The second example really illustrates the point that you can swap the AS direction of key order and mechanism to compare them at your whim. AS Now, you just need to call sortpairs with any array of arrays of keys AS (with trailing value). add a third key to that. quickly! a more straightforward api (which is close to what i will do in Sort::GRT) is (p5 code) my $sort = Sort::GRT-new( keys= [ { descending = 1, type = float, extract = '$_-{amount}, }, { extract = 'substr( $_-{date}, 0, 4)' }, ] ) ; my @sorted = $sort-do_it( @unsorted ) ; $sort-do_it_inplace( [EMAIL PROTECTED] ) ;
Re: The Sort Problem
On Thu, Feb 12, 2004 at 09:37:37PM +, Simon Cozens wrote: : [EMAIL PROTECTED] (Aaron Crane) writes: : That is, rinfix: (or some other name) is like infix:, but gives you a : function that reverses its arguments before actually running the operator. : Perhaps it could even be implemented as a macro. : : I like this; it reminds me of the games you have to play with determining : what do to with binary operators whose operands are overloaded in different : classes. Which we're trying very hard to get away from in Perl 6...which reminds me, I have to put something about that into A12... Larry
Re: The Sort Problem
On Thu, Feb 12, 2004 at 04:29:58PM -0500, Uri Guttman wrote: : again, confusing. why should the order of a binary operator mean so : much? the order of a sort key is either ascending or descending. that is : what coders want to specify. translating that to the correct operator : (cmp or =) and the correct binary order is not the same as specifying : the key sort order and key type (int, string, float). Uri is dead on with this one, guys. Larry
dynamic arguments (was: The Sort Problem)
Jonathan Lang wrote: Luke Palmer wrote: I've been thinking about this problem which comes up in my code a lot: @sorted = sort { $^a.foo('bar').compute = $^b.foo('bar').compute} @unsorted; Often the expressions on each side are even longer than that. But one thing remains: both sides are exactly the same, substitute a $^b for a $^a. The problem, then, isn't specific to sort; it happens any time that you find yourself needing to do the same things to multiple arguments. As such, the solution ought to be more general than just sort. You're looking for something to the effect of: @sorted = sort { parameters($^a = $^b).foo('bar').compute } That is, you need a way to factor C.foo('bar').compute out from each of the arguments that it applies to. For list arguments, this is straightforward: pipe the list arguments through a map before you pipe them into the routine. A similar approach could be used with named arguments. The problem comes when you deal with positional arguments. How about including something similar to ==, but which binds the elements of the list to the various positional parameters? For instance: @sorted = sort {infix:= args map {$_.foo('bar').compute}, $^a, $^b } @unsorted; Where @x = $a, $b, $c; routine args @x; is equivelent to routine $a, $b, $c; = Jonathan Dataweaver Lang __ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html
Re: dynamic arguments (was: The Sort Problem)
Jonathan Lang writes: How about including something similar to ==, but which binds the elements of the list to the various positional parameters? For instance: @sorted = sort {infix:= args map {$_.foo('bar').compute}, $^a, $^b } @unsorted; Where @x = $a, $b, $c; routine args @x; is equivelent to routine $a, $b, $c; We already have that. It's spelled: routine [EMAIL PROTECTED]; Or routine * == @x; Luke
Re: dynamic arguments (was: The Sort Problem)
Luke Palmer wrote: Jonathan Lang writes: How about including something similar to ==, but which binds the elements of the list to the various positional parameters? For instance: @sorted = sort {infix:= args map {$_.foo('bar').compute}, $^a, $^b } @unsorted; Where @x = $a, $b, $c; routine args @x; is equivelent to routine $a, $b, $c; We already have that. It's spelled: routine [EMAIL PROTECTED]; Or routine * == @x; Then you've got your solution: @sorted = sort {infix:= * map {$_.foo('bar').compute}, $^a, $^b } @unsorted; or @sorted = sort {($^a, $^b) == map {$_.foo('bar').compute} == infix:= * } @unsorted; = Jonathan Dataweaver Lang __ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html
Re: The Sort Problem (was: well, The Sort Problem)
Jonathan Lang writes: We already have that. It's spelled: routine [EMAIL PROTECTED]; Or routine * == @x; Then you've got your solution: @sorted = sort {infix:= * map {$_.foo('bar').compute}, $^a, $^b } @unsorted; or @sorted = sort {($^a, $^b) == map {$_.foo('bar').compute} == infix:= * } @unsorted; No, I don't think we do. Uri's given some good points in this thread about specifying what you want out of the sort, instead of some arbitrary callback interface. But it needs some major syntax work so it can feel more like it's a part of the language instead of a library function. Not, mind, that I think Perl's syntax needs to be changed at all to accommodate. I'm thinking that Csort might take a list of closures or pairs (with the value being a closure). I'm still not sure how that separates from the data to be sorted, but it's probably related to the problem of how Cfor works in declaration. The return value of each closure is dispatched into some sort of generic =, hopefully one that blows up if its arguments are of different types[1]. That way if you have a method that always returns a number, you can just leave it alone. Otherwise you're expected to prefix with the approprate contextifier. For example: sort { .foo('bar').compute }, desc = { ~.baz }, @data; # ascending by default string compares Pairs don't exactly seem fit here. Maybe we're supposed to think of these these closures as higher-level 'extractors', and therefore attach properties to them: sort { .foo('bar').compute }, { ~.baz } but descending, @data; Or even something on the value that tells = to negate whatever it returns: sort { .foo('bar').compute }, { ~.baz but swap }, @data; That's starting to expose implementation again, though. Moving a different direction now. Instead of being honest and saying that sort does each of these comparisons until one of them is nonzero, let's pretend that it sorts with respect to the first comparison and feeds the lists of equal elements into the second comparison, etc. Sort might then look like: sub sort (?extract, ?next, [EMAIL PROTECTED]) And our example turns into: sort { .foo('bar').compute } { reverse sort { ~.baz } [EMAIL PROTECTED] } [EMAIL PROTECTED]; That hurts, especially when there are more than two keys. I'd really like to do that with the piping operators instead of nested closures, but that screws with the simple case. There might be even more ways of looking at this that would result in more novel syntaxes. Brainstorming required. Luke [1] There's something to be said for a generic comparison, but not one that blows up. Mostly because certain containers want their elements to be well ordered. But we could just as easily say that elements that aren't intercompatible with the short-tempered = aren't allowed in said containers.
The Sort Problem
I've been thinking about this problem which comes up in my code a lot: @sorted = sort { $^a.foo('bar').compute = $^b.foo('bar').compute } @unsorted; Often the expressions on each side are even longer than that. But one thing remains: both sides are exactly the same, substitute a $^b for a $^a. I can see a couple less-than-desirable ways around this redundancy: @sorted = sort { infix:=( *($^a, $^b).foo('bar').compute ) } @unsorted; Which doesn't work if .compute returns a list... not to mention its horrible ugliness. Another is to define a variant of sort (haven't had much practice with A6 material recently; here we go!): multi sub sort (block($) = { $_ } : [EMAIL PROTECTED]) { sort { block($^a) cmp block($^b) } @data; } @sorted = sort { .foo('bar').compute } @unsorted; Which has the disadvantage of forcing you to use Ccmp and forcing an ascending sort. Any other ideas? Is a more general solution necessary? Luke
Re: The Sort Problem
Luke -- Hmmm... I haven't been practicing my Perl 6, and its been a while since the last Apocalyptic refresher, but here goes (I'll don a paper bag preemptively)... Thinking of that as the equivalent to: sort { my ($ta, $tb) = map { $_.foo('bar').compute } ($^a, $^b); $ta = $tb } @unsorted; so if you had a unary = that took a two element array (old style syntax here, sorry): sub cmp_num { my ($a, $b) = @_; $a = $b; } you could write that as: sort { cmp_num map { $_.foo('bar').compute } ($^a, $^b); } @unsorted; I'd be a bit afraid of allowing = itself to be prefix, although I admit it would be handy not to need the separate sub definition: sort { = map { $_.foo('bar').compute } ($^a, $^b); } @unsorted; The Schwartzian is the usual way, of course: map { $_-[1] } sort { $^a[0] = $^b[0] } map { [ $_.foo('bar').compute, $_ ] } @unsorted; But we really aren't being concise here. (Aside) I wonder if there will be a new Perl 6 idiom to replace this using properties, something like this: map { $_.really } sort : numerically map { $_.foo('bar').compute but really($_) } @unsorted; or: @unsorted == map { $_.foo('bar').compute but really($_) } == sort : numerically == map { $_.really }; But we *still* aren't concise. Maybe we should have a two-block sort? The second block gives the transform to apply to the elements before applying the comparison block: sort { $^a = $^b } { $_.foo('bar').compute } @unsorted; Or maybe even something like this: sort :numerically :as { $^a.foo('bar').compute } @unsorted; Regards, -- Gregor On Wed, 2004-02-11 at 19:11, Luke Palmer wrote: I've been thinking about this problem which comes up in my code a lot: @sorted = sort { $^a.foo('bar').compute = $^b.foo('bar').compute } @unsorted; Often the expressions on each side are even longer than that. But one thing remains: both sides are exactly the same, substitute a $^b for a $^a. I can see a couple less-than-desirable ways around this redundancy: @sorted = sort { infix:=( *($^a, $^b)».foo('bar').compute ) } @unsorted; Which doesn't work if .compute returns a list... not to mention its horrible ugliness. Another is to define a variant of sort (haven't had much practice with A6 material recently; here we go!): multi sub sort (block($) = { $_ } : [EMAIL PROTECTED]) { sort { block($^a) cmp block($^b) } @data; } @sorted = sort { .foo('bar').compute } @unsorted; Which has the disadvantage of forcing you to use Ccmp and forcing an ascending sort. Any other ideas? Is a more general solution necessary? Luke -- Gregor Purdy[EMAIL PROTECTED] Focus Research, Inc. http://www.focusresearch.com/
Re: [perl] The Sort Problem
- Original Message - From: Luke Palmer [EMAIL PROTECTED] To: Language List [EMAIL PROTECTED] Sent: Wednesday, February 11, 2004 10:11 PM Subject: [perl] The Sort Problem I've been thinking about this problem which comes up in my code a lot: @sorted = sort { $^a.foo('bar').compute = $^b.foo('bar').compute } @unsorted; Often the expressions on each side are even longer than that. But one thing remains: both sides are exactly the same, substitute a $^b for a $^a. I can see a couple less-than-desirable ways around this redundancy: @sorted = sort { infix:=( *($^a, $^b).foo('bar').compute ) } @unsorted; Which doesn't work if .compute returns a list... not to mention its horrible ugliness. Another is to define a variant of sort (haven't had much practice with A6 material recently; here we go!): multi sub sort (block($) = { $_ } : [EMAIL PROTECTED]) { sort { block($^a) cmp block($^b) } @data; } @sorted = sort { .foo('bar').compute } @unsorted; Which has the disadvantage of forcing you to use Ccmp and forcing an ascending sort. Any other ideas? Is a more general solution necessary? This is unrelated to the problem you mentioned, but there is another annoying problem with sort as it is currently defined. If you have an @array and you want to replace it with the sorted version, you have to type @array = sort @array; Besides being long-winded, this causes Perl to make an unnecessary copy of the array. It would be nice calling if sort (or reverse, or other similar functions) in void context resulted in in-place modification of the array that was input. Joe Gottman
Re: The Sort Problem
GNP == Gregor N Purdy [EMAIL PROTECTED] writes: GNP The Schwartzian is the usual way, of course: GNP map { $_-[1] } GNP sort { $^a[0] = $^b[0] } GNP map { [ $_.foo('bar').compute, $_ ] } GNP @unsorted; and the GRT is another. let stick my nose in here. sorting is really composed of 3 parts, extracting keys, comparing them and then getting back the records in sorted order. the p5 sort block idioms require duplicating the extract code and usually a binary operator. the GRT eliminates the binary operator as it uses the default string compare in sort but it still requires key extraction and then record extraction (similar to the first map (last executed) in the ST. GNP But we really aren't being concise here. GNP (Aside) I wonder if there will be a new Perl 6 idiom to replace this GNP using properties, something like this: GNP map { $_.really } GNP sort : numerically GNP map { $_.foo('bar').compute but really($_) } GNP @unsorted; GNP or: GNP @unsorted GNP == map { $_.foo('bar').compute but really($_) } GNP == sort : numerically GNP == map { $_.really }; GNP But we *still* aren't concise. that is the problem. the merging of the extracted key (in the ST or GRT) with the record and the extraction of the record need to be hidden as that is common code. GNP Maybe we should have a two-block sort? The second block gives the GNP transform to apply to the elements before applying the comparison GNP block: GNP sort { $^a = $^b } { $_.foo('bar').compute } @unsorted; duplicate args again. GNP Or maybe even something like this: GNP sort :numerically :as { $^a.foo('bar').compute } @unsorted; that looks best but how do you replace the :numerically comparison with say a unicode aware string compare? i have been working on a Sort::GRT in my head (finally!) and have some ideas. first, key extraction should work on $_ which is set for each element of the unsorted list so there is no need for $^a. also how can you extract multiple keys? and each key needs a ascending/descending option. and each key needs a type so it can be compared correctly. so here is a (very rough and probably broken) syntax idea building on that: sort :key { :descend :string .foo('bar').substr( 10, 3) } :key { :int .foo('baz') } :key { :float .foo('amount') } @unsorted ; sort takes a list of :key's which each has a block (or whatever) that specifies the key type, ordering and extraction code (that works on the record in $_). the extraction code is in a block in itself so you can declare lexicals and use multiple statements. the last value (in list context) is the key. this is so you can do (p5 style) /foo(\d+)bar/ and extract that digit string as the key. side note: i was pondering the problem of where two different keys have to be extracted from nearby places deep in a nested record structure. this means duplicate code will be used to get into the deep part. in my module, i was planning to allow a 'pre-extraction' piece of code to be used and the value (e.g. a ref to the deep part) stored in a lexical which can be used later in both key extractions. i don't know any way to do that with the above syntax. there has to be a larger block around all the :key's to allow something like that here. the p5 module will be wrapping all the key extract code in a block so this would work there. internally, each record has its keys extracted and they are merged using the GRT or ST or similar. in either case the record must be attached (via a pointer or appended or some other way) to the merged key. the keys are sorted and then the attached records are grabbed back and returned. sorting in place can be done too. what is good about the above syntax is that it removes all the common code, removes the binary operator (order and type is all that matters) and allows for a simple way to express multiple key extractions. it can be implemented internally in c to be far faster than any perl or parrot level code could do. the biggest hurdle i see is proper support for unicode stuff. maybe another key attribute would be the collating name? how this affects internal sorting is a question. can you generate a merged key with numbers, ascii and unicode strings that will sort with simple string compares? maybe the unicode extraction has to do a char by char conversions to (32 bit?) integers using the collate sequence and those numbers can easily be merged and sorted. note that only the extracted key text needs to be converted so that shouldn't be too expensive. so it may look like this: sort :key { :descend :string :collate('iso_some_country') .foo('bar').substr( 10, 3) } :key { :int .foo('baz') } @unsorted ; what is nice about that is that each key could have its own collate type. the :string attribute could be optional since the :collate implies a string. just trying to sort things out, uri -- Uri
Re: The Sort Problem
Uri Guttman writes: GNP == Gregor N Purdy [EMAIL PROTECTED] writes: GNP (But, the GRT doesn't apply to sorting general objects, and the GNP example has @unsorted containing objects on which we run GNP .foo('bar').compute.) there are no restrictions on what the GRT (or ST) can sort. the key is key extraction and that can be from an object or anything. if the compute method has to be called each time during the comparison, you are in deep sort voodoo. so you can just call it during the extract phase (the first executed map in the GRT or ST). if you have the object in $_, you can call a method or do whatever you need to get the key. Might you explain what the GRT is and what it is intended to do? I've been lost throughout most of your proposal. Thanks, Luke
Re: The Sort Problem
LP == Luke Palmer [EMAIL PROTECTED] writes: LP Uri Guttman writes: GNP == Gregor N Purdy [EMAIL PROTECTED] writes: GNP (But, the GRT doesn't apply to sorting general objects, and the GNP example has @unsorted containing objects on which we run GNP .foo('bar').compute.) there are no restrictions on what the GRT (or ST) can sort. the key is key extraction and that can be from an object or anything. if the compute method has to be called each time during the comparison, you are in deep sort voodoo. so you can just call it during the extract phase (the first executed map in the GRT or ST). if you have the object in $_, you can call a method or do whatever you need to get the key. LP Might you explain what the GRT is and what it is intended to do? LP I've been lost throughout most of your proposal. and now you know how i feel about some threads you delve into! my brane hertz! :) sorry about the confusion. read the paper at: http://www.sysarch.com/perl/sort_paper.html the idea is simple and very old. you extract keys from the records and merge them (with some munging) into a byte string. then you can run a straight sort with comparisons on those merged key strings and use fast string compares. the win is not doing the key extraction on each set of comparisons which is O(N log N). instead you do the extractions once per key which is O(N). it is a real world optimization as pure O() theory wouldn't care as the dominating O(N log N) is still the same in both cases. :) all we (larry rosler and myself) did was translate the idea into perl and document how to do it. one of these days i will actually create a perl5 module (started one years ago but it got shelved) to implement this and make it easy to use. i recently came up with a design breakthrough (just not being as stupid as i was a few years ago) to make it easier to do so i may get back onto it. but much of my p6 proposal was to have the sort function reflect the essence of both the GRT and ST which is to factor out key extraction in a map before the sort. my syntax examples were meant to show what is needed to pass into a full sort system and what isn't needed. perl5's sort exposed everything and required redundant code in the comparison block, confusing expressions, etc. the ST and GRT removed some of the redundant code but replaced that with the map/sort/map which has different redundant code (redundant between different instances of them. why see the map parts?). so my proposed syntax isolates the key extraction code and makes it work on $_. it also removes the comparison operator stuff and allows the coder to just say what they want in regard to sort order, key type and collate sequence. the guts will do all the common code (the maps) and generate/compile the proper key extraction code. in fact, there is no need for the GRT or ST to even be implemented at all. the key extraction code and order could be used to generate comparison stuff just like perl5 does. the ST and GRT are just implementation optimizations and don't belong in a language level discussion. hope that helps, uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org