Re: The Sort Problem

Damian Conway Sun, 15 Feb 2004 15:00:19 -0800

Here's a proposed syntax and semantics for C<sort> that tries to preserve the (excellent) features of Uri's "on the right track" proposal whilst integrating it into the Perl 6 design without multiplying entities (especially colons!) unnecessarily.

Suppose C<sort>'s signature is:

type KeyExtractor ::= Code(Any) returns Any;

type Comparator ::= Code(Any, Any) returns Int;

    type Criterion    ::= KeyExtractor
                        | Comparator
                        | Pair(KeyExtractor, Comparator)
                        ;

    type Criteria     ::= Criterion
                        | Array of Criterion
                        ;

multi sub *sort(Criteria ?$by = {$^a cmp $^b}, [EMAIL PROTECTED]) {...}

In other words, C<sort> takes as its (optional) first argument either a single sort criterion, or an array of criteria.

Each of those sort criteria may be a block/closure, or a pair of block/closures. Each block/closure may take either one argument (in which case it's a key extractor) or two arguments (in which case it's a comparator).

If a key-extractor block returns number, then C<< <=> >> is used to compare those keys. Otherwise C<cmp> is used. In either case, the returned keys are cached to optimize subsequent comparisons against the same element.

If a key-extractor/comparator pair is specified, the key-extractor is the key of the pair and the comparator the value. The extractor is used to retreive (and cache) keys, which are then passed to the comparator.

Which means that (a slightly extended version of) Uri's proposed:

   @out = sort
           key( %lookup{ .{remotekey} } ),                                  #1
           key:insensitive:descending:locale( 'somewhere' )( .{priority} ), #2
           key:float ( substr( 0, 10 ) ),                                   #3
           key:integer ( /foo(\d+)bar/ ),                                   #4
           key:compare( { ^$a <=> ^$b } )( /(\d+)$/ ),                      #5
           key:compare( \&my_compare_sub ) ( /(\d+)$/ ),                    #6
           key:compare( { just_guess $^b, $^a } ),                          #7
                @in;

would become:
                
   @out = sort
           [ { ~ %lookup{ .{remotekey} } },                                 #1
             { use locale 'somewhere'; lc .{priority} } is descending,      #2
             { + substr( 0, 10 ) },                                         #3
             { int /foo(\d+)bar/ },                                         #4
             { + m/(\d+)$/.[1] },                                           #5
             { /(\d+)$/ } => &my_compare_sub,                               #6
             { just_guess $^b, $^a },                                       #7
           ],
                @in;

So to specify a key extractor we provide a block with one argument, as in examples #1 through #5. Then to specify that the key is compared with C<cmp> we return a string (using unary C<~> to be explicit about it) as in #1. To specify that the key is compared with C<< <=> >> we return a number (using unary C<+> to be explicit about it) as in #3 and #5. If we think that C<sort> might be able to optimize integer comparisons, we can explicitly return an integer, as in #4. If we want locale-sensitive sorting, we specify that with the appropriate C<use locale> statement, as in #2.

To specify a comparator, we provide a block with two arguments, as in #7. That block is always expected to return an integer.

To specify a key-extractor *and* an associated comparator, we bind them together in a Pair, as in #6.

The only tricky bits are how to specify a key extractor for keys that are to be sorted in descending order and/or case-insensitively. The case insensitivity could be handled by simple C<lc>'ing, as in #2. Descending sort order could be handled by a trait on the block/closure, as in #2. Alternatively, case-insensitivity could be specified by trait as well.

Note that simple sorts would be unchanged:

    @sorted = sort @unsorted;
    @sorted = sort {$^b <=> $^a} @unsorted;

But now one could also very neatly code cases that formerly required an Orcish or Transformed sort. For example, these:

@sorted = sort {(%M{$^a}//-M $^a) <=> (%M{$^b}//-M $^b)} @unsorted;

@sorted = map $_[1], sort {$^a[0] <=> $^b[0]}, map [-M,$_], @unsorted;

would both become:

@sorted = sort {-M} @unsorted;

Damian

Re: The Sort Problem

Reply via email to