>>>>> "DC" == Damian Conway <[EMAIL PROTECTED]> writes:
DC> Suppose C<sort>'s signature is: DC> type KeyExtractor ::= Code(Any) returns Any; DC> type Comparator ::= Code(Any, Any) returns Int; DC> type Criterion ::= KeyExtractor DC> | Comparator DC> | Pair(KeyExtractor, Comparator) DC> ; DC> type Criteria ::= Criterion DC> | Array of Criterion DC> ; DC> multi sub *sort(Criteria ?$by = {$^a cmp $^b}, [EMAIL PROTECTED]) {...} DC> Each of those sort criteria may be a block/closure, or a pair of DC> block/closures. Each block/closure may take either one argument (in DC> which case it's a key extractor) or two arguments (in which case it's DC> a comparator). DC> If a key-extractor block returns number, then C<< <=> >> is used to DC> compare those keys. Otherwise C<cmp> is used. In either case, the DC> returned keys are cached to optimize subsequent comparisons against DC> the same element. i would make cmp the default as it is now. you sort strings much more often than you sort numbers. DC> If a key-extractor/comparator pair is specified, the key-extractor is DC> the key of the pair and the comparator the value. The extractor is DC> used to retreive (and cache) keys, which are then passed to the DC> comparator. DC> Which means that (a slightly extended version of) Uri's proposed: DC> @out = sort DC> key( %lookup{ .{remotekey} } ), #1 DC> key:insensitive:descending:locale( 'somewhere' )( .{priority} ), #2 DC> key:float ( substr( 0, 10 ) ), #3 DC> key:integer ( /foo(\d+)bar/ ), #4 DC> key:compare( { ^$a <=> ^$b } )( /(\d+)$/ ), #5 DC> key:compare( \&my_compare_sub ) ( /(\d+)$/ ), #6 DC> key:compare( { just_guess $^b, $^a } ), #7 DC> @in; DC> would become: DC> @out = sort DC> [ { ~ %lookup{ .{remotekey} } }, #1 if string cmp is the default, wouldn't that ~ be redundant? and i keep forgetting that ~ is now the stringify operator :) DC> { use locale 'somewhere'; lc .{priority} } is descending, #2 hmmm, this needs more work IMO. requiring the coder to lc the key is moving the case insensitivity feature back into code. wouldn't 'is insensitive' be ok? how does the internal guts get the descending/insensitive flags/traits passed to it? i know it is an internals problem but i am just pondering it. DC> { + substr( 0, 10 ) }, #3 DC> { int /foo(\d+)bar/ }, #4 i would also expect int to be a default over float as it will be used more often. + is needed there since the regex returns a string. in the #3 case that would be an int as well. so we need a 'float' cast thingy. BTW, the only way to get a number as a key is from a structure where the field was assigned as a number/int. that may not happen a lot so the int/float cast will probably be needed here to sort correctly DC> { + m/(\d+)$/.[1] }, #5 DC> { /(\d+)$/ } => &my_compare_sub, #6 missing a close } it seems. DC> { just_guess $^b, $^a }, #7 is that a reverse order sort? why not skip the args and do this: { &just_guess is descending }, #7 DC> ], so the first arg to sort is either a single compare block or a anon list of them? i figure we need the [] to separate the criteria from the input data. but what about this odd case, sort [...], [...], [...] now that is stupid code but it could be trying to sort the refs by their address in string mode. or it could be a sort criteria list followed by 2 refs to input records. DC> @in; DC> So to specify a key extractor we provide a block with one argument, as DC> in examples #1 through #5. Then to specify that the key is compared DC> with C<cmp> we return a string (using unary C<~> to be explicit about DC> it) as in #1. To specify that the key is compared with C<< <=> >> we DC> return a number (using unary C<+> to be explicit about it) as in #3 DC> and #5. If we think that C<sort> might be able to optimize integer DC> comparisons, we can explicitly return an integer, as in #4. If we want DC> locale-sensitive sorting, we specify that with the appropriate C<use locale> statement, as in #2. DC> To specify a comparator, we provide a block with two arguments, as in DC> #7. That block is always expected to return an integer. so #7 is a call to just_guess which is passed the 2 args to compare. it must return an int like cmp/<=>. as i pointed out above, i don't see why you even need to show the ^$a and ^$b args? they will be passed into just_guess that way. let is descending handle the sort ordering. DC> To specify a key-extractor *and* an associated comparator, we bind DC> them together in a Pair, as in #6. that works for me. DC> The only tricky bits are how to specify a key extractor for keys that DC> are to be sorted in descending order and/or case-insensitively. The DC> case insensitivity could be handled by simple C<lc>'ing, as in DC> #2. Descending sort order could be handled by a trait on the DC> block/closure, as in #2. Alternatively, case-insensitivity could be DC> specified by trait as well. ok, i like a trait better than coding in lc. better semantics and clearer to read. the is descending trait could just cause the sort callbacks to be called with reversed arguments and so the callback code never has to worry about that. DC> Note that simple sorts would be unchanged: DC> @sorted = sort @unsorted; DC> @sorted = sort {$^b <=> $^a} @unsorted; DC> But now one could also very neatly code cases that formerly required DC> an Orcish or Transformed sort. For example, these: DC> @sorted = sort {(%M{$^a}//-M $^a) <=> (%M{$^b}//-M $^b)} @unsorted; wow, that is UGLY! but i get it after a few hours of study. :) just the orcish maneuver but with //. i think you also mean //= there. DC> @sorted = map $_[1], sort {$^a[0] <=> $^b[0]}, map [-M,$_], DC> @unsorted; the familiar ST. DC> would both become: DC> @sorted = sort {-M} @unsorted; that still wants to be cached somehow as -M is expensive. but that is an internal thing and the ST/GRT both can handle it. so -M there is a simple key extraction on the files in @unsorted. assuming no internal caching would this be correct? @sorted = sort {%M{$_} //= -M} @unsorted; i assume //= will be optimized and -M won't be called if it is cached. also where does %M get declared and/or cleared before this? can it be done in the block: @sorted = sort {my %M ; %M{$_} //= -M} @unsorted; another -M problem is that it currently returns a float so that must be marked/cast as a float. @sorted = sort {float -M} @unsorted; that cast causes a <=> compare (since i would rather see cmp be the default) and also can let the internal GRT properly merge a float key instead of an int. it should work without it (not sure how) but the float marker is important to optimization and a proper comparison. maybe the fact that the compiler knows -M returns a float can be used to mark it internally and the explicit float isn't needed here. but data from a user record will need to be marked as float as the compiler can't tell. anyhow, i am glad i invoked your name and summoned you into this thread. :) uri -- Uri Guttman ------ [EMAIL PROTECTED] -------- http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org