subject:"The Sort Problem"

Re: The Sort Problem: a definitive ruling

2004-02-21 Thread Gordon Henriksen

On Friday, February 20, 2004, at 05:48 , Damian Conway wrote:

Joe Gottman asked:

   How do you decide whether a key-extractor block returns number?  Do 
you look at the signature,  or do you simply evaluate the result of 
the key-extractor for each element in the unsorted list?  For example, 
what is the result of the following code?
  sort {$_.key} (1= 'a', 10 = 'b', 2 ='c');
   There is nothing in the signature of the key-extractor to suggest 
that all the keys are numbers, but as it turns out they all are.  Will 
the sort end up being numerical or alphabetic?
Whilst I'd very much like it to analyse the keys, detect that they're 
all numbers, and use C = 
Eek! Please don't even TRY to do that. It'd be creepy if the same call 
to sort could swap at runtime between numeric and string comparisons 
based upon its input. I would hope that the determination be made at 
compile time.

Consider the poor schmuck sorting new objects in preparation for a merge 
sort, only to find that his new array isn't sorted the same as his old 
array was, even though they came back from the exact same call to 
sort... Blech.

But if sort's arguments were specifically typed, i.e.:

my @array of Int;
@array = sort @array;
Does this meet the key extractor returns number qualification?



Gordon Henriksen
[EMAIL PROTECTED]

Re: The Sort Problem: a definitive ruling

2004-02-20 Thread Smylers

Luke Palmer writes:

 Uri Guttman writes:
 
   DC == Damian Conway [EMAIL PROTECTED] writes:
 
DC  @sorted = sort {-M}={$^b cmp $^a} @unsorted;
  
  but there is no comma before @unsorted. is that correct?
 
 Yes.  Commas may be ommitted on either side of a block when used as an
 argument.

That's what I thought too.  But Damian gave exactly the opposite answer
to Uri's question, claiming he'd made a typo and a comma would be
required.

So which is it -- is Luke right in saying that Damian was right in the
first place?  Or is Damian right in saying that his example was wrong?

Smylers

Re: The Sort Problem: a definitive ruling

2004-02-20 Thread Luke Palmer

Smylers writes:
 Luke Palmer writes:
 
  Uri Guttman writes:
  
DC == Damian Conway [EMAIL PROTECTED] writes:
  
 DC  @sorted = sort {-M}={$^b cmp $^a} @unsorted;
   
   but there is no comma before @unsorted. is that correct?
  
  Yes.  Commas may be ommitted on either side of a block when used as an
  argument.
 
 That's what I thought too.  But Damian gave exactly the opposite answer
 to Uri's question, claiming he'd made a typo and a comma would be
 required.
 
 So which is it -- is Luke right in saying that Damian was right in the
 first place?  Or is Damian right in saying that his example was wrong?

I was wrong in saying that he was right.  Those aren't simple blocks, as
Damian said, so you need the comma.

Luke

 Smylers

Re: The Sort Problem: a definitive ruling

2004-02-20 Thread Smylers

Joe Gottman writes:

   sort {$_.key} (1= 'a', 10 = 'b', 2 ='c');
 
 There is nothing in the signature of the key-extractor to suggest that
 all the keys are numbers, but as it turns out they all are.

Are they?  I'd been presuming that pair keys would always be strings (as
for hashes in Perl 5), and that the C =  operator would
automatically quote a preceding word, stringifying it (as in Perl 5).
So the keys above are strings, albeit ones composed only of digits.

Of course that doesn't actually help with your question, since there are
other data structures of which the same could be asked.

Smylers

Re: The Sort Problem: a definitive ruling

2004-02-20 Thread Luke Palmer

Smylers writes:
 Joe Gottman writes:
 
sort {$_.key} (1= 'a', 10 = 'b', 2 ='c');
  
  There is nothing in the signature of the key-extractor to suggest that
  all the keys are numbers, but as it turns out they all are.
 
 Are they?  I'd been presuming that pair keys would always be strings (as
 for hashes in Perl 5), and that the C =  operator would
 automatically quote a preceding word, stringifying it (as in Perl 5).
 So the keys above are strings, albeit ones composed only of digits.
 
 Of course that doesn't actually help with your question, since there are
 other data structures of which the same could be asked.

I think you're forgetting what language you're talking about.  Those are
numbers.  After this statement:

$x = '345';

C$x is a number.  I should hope it would be treated as one during
multimethod dispatch.

However, I'm not saying this with authority.  I'm just extrapolating.
If it's not correct, I'd appreciate that someone who knows correct me.

Luke

Re: The Sort Problem: a definitive ruling

2004-02-20 Thread Smylers

Luke Palmer writes:

 After this statement:
 
 $x = '345';
 
 C$x is a number.

Oh.  I'd been assuming that quote marks indicated strings, and that,
while a string containing only digits could obviously be treated as a
number (as in Perl 5), it wouldn't be one without being provoked.

 I should hope it would be treated as one during multimethod dispatch.

What about:

  $x = '0345';

Is that a number?  And if so which of these is the same as?

  $x = 345;
  $x = 0345;

What about if the variable contains a line read from user input?  As a
programmer I'd expect that to be a string -- and if a user happens to
type only digits then it'd be surprising to find the variable is
considered to be of a different type.

User input comes with a trailing line-break character, which would make
it not a number.  But would it suddenly become a number after
Cchomping it?  Or if the input stream was auto-chomping?

Smylers

Re: The Sort Problem: a definitive ruling

2004-02-20 Thread Dan Sugalski

At 2:49 PM -0700 2/20/04, Luke Palmer wrote:
After this statement:

$x = '345';

C$x is a number.
No, it isn't. It's a string. Or, rather, it's a PerlScalar.

I should hope it would be treated as one during
multimethod dispatch.
I should certainly hope *not*. If so, it's a bug. We ought to go add 
some tests to the test suite once we expose this bit of the engine.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: The Sort Problem: a definitive ruling

2004-02-20 Thread Luke Palmer

Smylers writes:
 Luke Palmer writes:
 
  After this statement:
  
  $x = '345';
  
  C$x is a number.
 
 Oh.  I'd been assuming that quote marks indicated strings, and that,
 while a string containing only digits could obviously be treated as a
 number (as in Perl 5), it wouldn't be one without being provoked.
 
  I should hope it would be treated as one during multimethod dispatch.
 
 What about:
 
   $x = '0345';
 
 Is that a number?  And if so which of these is the same as?
 
   $x = 345;
   $x = 0345;

Well, since those are the same number, I imagine the, um, first?

Don't forget that octal numbers look like 0o345.

 What about if the variable contains a line read from user input?  As a
 programmer I'd expect that to be a string -- and if a user happens to
 type only digits then it'd be surprising to find the variable is
 considered to be of a different type.

Yeah, that's a tough question.  I'd want it to be a number if it were
only digits, unless I wanted it to be a string.  Since numbers and
strings are polymorphic with one another, maybe it's wrong to think that
we can differentiate.

But Csort has to know when to use C = .

Maybe you're right.  In the presence of multimethod dispatch, it might
be simpler just to tag something with either a num or str marker (I'm
completely neglecting Parrot's implementation for this discussion), and
treat it as its tag.  That wouldn't change the behavior of adding two
strings together, or concatenating two numbers, of course.

Luke

Re: The Sort Problem: a definitive ruling

2004-02-20 Thread Damian Conway

Uri wondered:

  DC No. C infix:=  is the name of the binary C =  operator.

so how is that allowed there without a block?
A Code object in a scalar context yields a Code reference.

Damian

Re: The Sort Problem: a definitive ruling

2004-02-20 Thread Damian Conway

Smylers wrote:

 sort {$_.key} (1= 'a', 10 = 'b', 2 ='c');

There is nothing in the signature of the key-extractor to suggest that
all the keys are numbers, but as it turns out they all are.
Are they?  I'd been presuming that pair keys would always be strings 
Nope.

 and that the C =  operator would
automatically quote a preceding word, stringifying it (as in Perl 5).
Yes. But numbers aren't words. C =  will continue to autostringify 
*identifiers*, as in Perl 6, and those keys aren't identifiers.

Of course, if you used the pairs to populate a hash, the *hash* will convert 
the non-stringific pair keys to stringific hash keys (unless the hash is 
defined to take non-string keys).

Damian

Re: The Sort Problem: a definitive ruling

2004-02-20 Thread Damian Conway

Luke wrote:

I think you're forgetting what language you're talking about.  Those are
numbers.  After this statement:
$x = '345';

C$x is a number.  
I don't think so. C$x is, of course, a variable. And what it contains after 
that statement will depend on whether the variable is explicitly typed or not.
If C$x is explicitly typed, the rvalue will have been converted to that type 
(if possible). If C$x is not explicitly typed (i.e. implicitly typed to 
CAny), then it will contain a string, since that's what the rvalue 
inherently is.

I should hope it would be treated as one during
multimethod dispatch.
I would expect that the multiple dispatch mechanism would allow the CStr in 
C$x to be coerced to a CNum, if that's the parameter type it was seeking 
to match. And I would expect that the distance of that coercion would be 1.


However, I'm not saying this with authority.  I'm just extrapolating.
If it's not correct, I'd appreciate that someone who knows correct me.
Ditto. ;-)

Damian

Re: The Sort Problem: a definitive ruling

2004-02-20 Thread Damian Conway

Smylers wrote:

Oh.  I'd been assuming that quote marks indicated strings, and that,
while a string containing only digits could obviously be treated as a
number (as in Perl 5), it wouldn't be one without being provoked.
Correct.


What about:

  $x = '0345';

Is that a number?  
Nope. A string (unless C$X is otherwised typed).


What about if the variable contains a line read from user input?  As a
programmer I'd expect that to be a string
You'd be right (unless, of course, the variable's storage type forced a 
coercion during the assignment).

Damian

Re: The Sort Problem

2004-02-19 Thread Larry Wall

On Mon, Feb 16, 2004 at 01:04:01AM -0700, Luke Palmer wrote:
: This is pattern matching more than it is type comparison.  And Perl's
: all about pattern matching. I'm just wondering whether it needs Itwo
: pattern archetectures.

I suspect it does, at least from the viewpoint of mere mortals.  The
regex engine is built around the notion of ordered rules, whereas the
multi dispatch engine is designed to run rules in order of distance.
I think we'll have done good enough if the regex engine can be used at
runtime against a list of actual argument types.  The regex engine is
allowed to have deep, powerful nooks, after all.  People are trained
not to expect to understand patterns on first glance.  But I a strong
gut feeling that the dispatch engine needs to remain transparent to
casual readers--or there will never be any casual readers of Perl 6.

Of course, someone will doubtless attempt this unification, since a
single use can turn Perl into another language.  It's just not a
language I would have bothered to learn 25 years ago...

Larry

The Sort Problem: a definitive ruling

2004-02-19 Thread Damian Conway

The design team discussed The Sort Problem during yesterday's 
teleconference. Here is Larry's decision: final, definitive, and unalterable 
(well...for this week at least ;-)

-cut-cut-cut-cut-cut-cut

Csort in Perl6 is a global multisub:

multi sub *sort(Criterion @by: [EMAIL PROTECTED]) {...}
multi sub *sort(Criterion $by: [EMAIL PROTECTED]) {...}
multi sub *sort( : [EMAIL PROTECTED]) {...}
where:

type KeyExtractor ::= Code(Any) returns Any;

type Comparator   ::= Code(Any, Any) returns Int;

type Criterion::= KeyExtractor
| Comparator
| Pair(KeyExtractor, Comparator)
;
That means that we can call Csort without a block (to sort stringifically 
ascending with Ccmp):

# Stringifically ascending...
@sorted = sort @unsorted;
or with a single two-argument block/closure (to sort by whatever the specified 
comparator is):

# Numerically ascending...
@sorted = sort {$^a = $^b} @unsorted;
# Namewise stringifically descending case-insensitive...
@sorted = sort {lc $^b.name cmp lc $^a.name}
   @unsorted;
# or...
@sorted = sort {$^b.name cmp $^a.name} is insensitive
   @unsorted;
# or...
@sorted = sort {$^a.name cmp $^b.name} is descending is insensitive
   @unsorted;
# Modtimewise numerically ascending...
@sorted = sort {-M $^a = -M $^b} @unsorted;
# Fuzz-ifically...
sub fuzzy_cmp($x, $y) returns Int;
@sorted = sort fuzzy_cmp, @unsorted;
or with a single one-argument block/closure (to sort according whatever the 
specified key extractor returns):

# Numerically ascending...
@sorted = sort {+ $^elem} @unsorted;
@sorted = sort {+ $_} @unsorted;
# Namewise stringifically descending case-insensitive...
@sorted = sort {~ $^elem.name} is descending is insensitive @unsorted;
@sorted = sort {lc $^elem.name} is descending @unsorted;
@sorted = sort {lc .name} is descending @unsorted;
# Modtimewise numerically ascending...
@sorted = sort {-M} @unsorted;
# Key-ifically...
sub get_key($elem) {...}
@sorted = sort get_key, @unsorted;
or with a single extractor/comparator pair (to sort according to the extracted 
key, using the specified comparator):

# Modtimewise stringifically descending...
@sorted = sort {-M}={$^b cmp $^a} @unsorted;
# Namewise fuzz-ifically...
@sorted = sort {.name}=fuzzy_cmp @unsorted;
or with an array of comparators and/or key extractors and/or 
extractor-comparator pairs (to sort according to a cascading list of criteria):

# Numerically ascending
# or else namewise stringifically descending case-insensitive
# or else modtimewise numerically ascending
# or else namewise fuzz-ifically
# or else fuzz-ifically...
@sorted = sort [ {+ $^elem},
 {$^b.name cmp $^a.name} is insensitive,
 {-M},
 {.name}=fuzzy_cmp,
 fuzzy_cmp,
   ],
   @unsorted;


If a key-extractor block returns number, then C =  is used to compare 
those keys. Otherwise Ccmp is used. In either case, the keys extracted by 
the block are cached within the call to Csort, to optimize subsequent 
comparisons against the same element. That is, a key-extractor block is only 
ever called once for each element being sorted.

If a key-extractor/comparator pair is specified, the key-extractor is the key 
of the pair and the comparator the value. The extractor is used to retreive 
keys, which are then passed to the comparator.

The Cis descending and Cis insensitive traits on a key extractor or a 
comparator are detected within the call to Csort (or possibly by the 
compiler) and used to modify the case-sensitivity and direction of any 
comparison operators used for the corresponding key or in the corresponding 
comparator.

Note that ambiguous cases like:

@sorted = sort {-M}, {-M}, {-M};
@sorted = sort {$^a = $^b}, {$^a = $^b}, {$^a = $^b};
@sorted = sort [...], [...], [...];
# etc.
will be dispatched according to the normal multiple dispatch semantics
(which will mean that they will mean):
@sorted = sort {-M}  == {-M}, {-M};
@sorted = sort {$^a = $^b} == {$^a = $^b}, {$^a = $^b};
@sorted = sort [...] == [...], [...];
# etc.
and so one would need to write:

@sorted = sort == {-M}, {-M}, {-M};
@sorted = sort == {$^a = $^b}, {$^a = $^b}, {$^a = $^b};
@sorted = sort == [...], [...], [...];
# etc.
to get Ccmp comparison on all the arguments.

-cut-cut-cut-cut-cut-cut

Thanks to everyone who contributed to this discussion (especially Uri). As you 
see, the result is sort facility that is simultaneously much more powerful, 
much easier-to-use in the simple cases, has the potential

Re: The Sort Problem: a definitive ruling

2004-02-19 Thread Dave Whipp

Damian Conway [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
  type KeyExtractor ::= Code(Any) returns Any;

  # Modtimewise numerically ascending...
  @sorted = sort {-M} @unsorted;


One thing I've been trying to figure out reading this: what is the signature
of prefix:-M ? i.e. how does it tell the outer block that it (the
outer-block) needs a parameter? There seems to be some transitive magic
going on here. Could similar magic be used to have infix:= require two
higher-order variables (e.g. could sort { = } @unsorted be made to
work?)


Dave.

Re: The Sort Problem: a definitive ruling

2004-02-19 Thread Luke Palmer

Dave Whipp writes:
 Damian Conway [EMAIL PROTECTED] wrote in message
 news:[EMAIL PROTECTED]
   type KeyExtractor ::= Code(Any) returns Any;
 
   # Modtimewise numerically ascending...
   @sorted = sort {-M} @unsorted;
 
 
 One thing I've been trying to figure out reading this: what is the signature
 of prefix:-M ? 

Presumably something like:

sub prefix:-M (?$file = $CALLER::_) {...}

 i.e. how does it tell the outer block that it (the outer-block) needs
 a parameter? 

Because it operates on $_.  It tells it the same way:

map { .name } @objects

Does.  Of course, this is going to be tough on the compiler, who will
have to take the C= $CALLER::_ part into account.

 There seems to be some transitive magic going on here.  Could similar
 magic be used to have infix:= require two higher-order variables
 (e.g. could sort { = } @unsorted be made to work?)

No.  Although you could do such a thing with:

sort infix:=, @unsorted;

Luke

Re: The Sort Problem: a definitive ruling

2004-02-19 Thread Uri Guttman

 DC == Damian Conway [EMAIL PROTECTED] writes:

  DC Once again the Iron Designer rises to the supreme challenge of
  DC the Mailinglist Stadium and expresses the true spirit of Perl
  DC 6!!!

and the challenge for next week is slicing squid with noodles!
(or cutting down the mightiest tree in the forest with a herring)

good job all.

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org

Re: The Sort Problem: a definitive ruling

2004-02-19 Thread Damian Conway

Dave Whipp wondered:

@sorted = sort {-M} @unsorted;
One thing I've been trying to figure out reading this: what is the signature
of prefix:-M ? i.e. how does it tell the outer block that it (the
outer-block) needs a parameter?
It doesn't. As A6 explained:

	http://dev.perl.org/perl6/apocalypse/A06.html#Bare_subs

any block that doesn't have placeholder-specified parameters but which refers 
(even implicitly) to $_ will automatically have the signature of ($_).

That's why:

	@odd = grep { $_ % 2 } @nums;

will still work in Perl 6.

Since a bare C-M implicitly refers to $_, the surrounding block 
automagically gets a one-parameter signature and hence is (correctly!) 
interpreted as a key extractor.

Don't you just love it when a plan^H^H^H^Hdesign comes together? ;-)


There seems to be some transitive magic going on here. 
There is. Kinda. Just not the type of magic you thought.


Could similar magic be used to have infix:= require two
higher-order variables (e.g. could sort { = } @unsorted be made to
work?)
No. But this will work:

	sort infix:= @unsorted

Damian

Re: The Sort Problem: a definitive ruling

2004-02-19 Thread Uri Guttman

 DC == Damian Conway [EMAIL PROTECTED] writes:

  DC  # Stringifically ascending...
  DC  @sorted = sort @unsorted;

  DC or with a single two-argument block/closure (to sort by whatever the
  DC specified comparator is):

  DC  # Numerically ascending...
  DC  @sorted = sort {$^a = $^b} @unsorted;

so because that has 2 placeholders, it is will match this signature:

 type Comparator   ::= Code(Any, Any) returns Int;

i have to remember that placeholders are really implied args to a code
block and not just in the expression

  DC  # Namewise stringifically descending case-insensitive...
  DC  @sorted = sort {lc $^b.name cmp lc $^a.name}
  DC @unsorted;
  DC  # or...
  DC  @sorted = sort {$^b.name cmp $^a.name} is insensitive
  DC @unsorted;
  DC  # or...
  DC  @sorted = sort {$^a.name cmp $^b.name} is descending is insensitive
  DC @unsorted;

TIMTOWTDI lives on!

  DC  # Modtimewise numerically ascending...
  DC  @sorted = sort {-M $^a = -M $^b} @unsorted;

  DC  # Fuzz-ifically...
  DC  sub fuzzy_cmp($x, $y) returns Int;
  DC  @sorted = sort fuzzy_cmp, @unsorted;

ok, so that is recognizes as a compare sub due to the 2 arg sig. so does
the sub must be defined/declared before the sort code is compiled?

  DC or with a single one-argument block/closure (to sort according
  DC whatever the specified key extractor returns):

  DC  # Numerically ascending...
  DC  @sorted = sort {+ $^elem} @unsorted;
  DC  @sorted = sort {+ $_} @unsorted;

is $^elem special? or just a regular place holder? i see $_ will be set
to each record as we discussed.

  DC  # Namewise stringifically descending case-insensitive...
  DC  @sorted = sort {~ $^elem.name} is descending is insensitive @unsorted;
  DC  @sorted = sort {lc $^elem.name} is descending @unsorted;
  DC  @sorted = sort {lc .name} is descending @unsorted;

just getting my p6 chops back. .name is really $_.name so that makes
sense. and $^elem is just a named placeholder for $_ as before?

  DC  # Key-ifically...
  DC  sub get_key($elem) {...}
  DC  @sorted = sort get_key, @unsorted;

and that is parsed as an extracter code call due to the single arg
sig. again, it appears that it has to be seen before the sort code for
that to work.

  DC or with a single extractor/comparator pair (to sort according to the
  DC extracted key, using the specified comparator):

  DC  # Modtimewise stringifically descending...
  DC  @sorted = sort {-M}={$^b cmp $^a} @unsorted;

so that is a single pair of extractor/comparator. but there is no comma
before @unsorted. is that correct? see below for why i ask that.

  DC  # Namewise fuzz-ifically...
  DC  @sorted = sort {.name}=fuzzy_cmp @unsorted;

i first parsed that as being wrong and the {} should wrap the whole
thing. so that is a pair again of extractor/comparator.

  DC or with an array of comparators and/or key extractors and/or
  DC extractor-comparator pairs (to sort according to a cascading list of
  DC criteria):

  DC  # Numerically ascending
  DC  # or else namewise stringifically descending case-insensitive
  DC  # or else modtimewise numerically ascending
  DC  # or else namewise fuzz-ifically
  DC  # or else fuzz-ifically...
  DC  @sorted = sort [ {+ $^elem},
  DC   {$^b.name cmp $^a.name} is insensitive,
  DC   {-M},
  DC   {.name}=fuzzy_cmp,
  DC   fuzzy_cmp,

i see the need for commas in here as it is a list of criteria.

  DC ],

but what about that comma? no other example seems to have one before the
@unsorted stuff.

  DC @unsorted;

  DC If a key-extractor block returns number, then C =  is used to
  DC compare those keys. Otherwise Ccmp is used. In either case, the keys
  DC extracted by the block are cached within the call to Csort, to
  DC optimize subsequent comparisons against the same element. That is, a
  DC key-extractor block is only ever called once for each element being
  DC sorted.

where does the int optimizer come in? just as you had it before in the
extractor code? that will need to be accessible to the optimizer if the
GRT is to work correctly.

i like that the key caching is defined here. we can implement it in
several different ways depending on optimization hints and such. we
could support the ST, GRT and orchish and select the best one for each
sort. or we could have one basic sort and load the others as pragmas or
modules.

  DC The Cis descending and Cis insensitive traits on a key extractor
  DC or a comparator are detected within the call to Csort (or possibly
  DC by the compiler) and used to modify the case-sensitivity and
  DC direction of any comparison operators used for the corresponding key
  DC or in the corresponding comparator.

or by reversing the order of the

Re: The Sort Problem: a definitive ruling

2004-02-19 Thread Luke Palmer

Uri Guttman writes:
  DC == Damian Conway [EMAIL PROTECTED] writes:
   DC  # Modtimewise numerically ascending...
   DC  @sorted = sort {-M $^a = -M $^b} @unsorted;
 
   DC  # Fuzz-ifically...
   DC  sub fuzzy_cmp($x, $y) returns Int;
   DC  @sorted = sort fuzzy_cmp, @unsorted;
 
 ok, so that is recognizes as a compare sub due to the 2 arg sig. so does
 the sub must be defined/declared before the sort code is compiled?

Nope.  Csort is declared as a multimethod.  This works, too:

$code = sub ($a, $b) { -M $a = -M $b };
@sorted = sort $code, @unsorted;

   DC or with a single one-argument block/closure (to sort according
   DC whatever the specified key extractor returns):
 
   DC  # Numerically ascending...
   DC  @sorted = sort {+ $^elem} @unsorted;
   DC  @sorted = sort {+ $_} @unsorted;
 
 is $^elem special? or just a regular place holder? i see $_ will be set
 to each record as we discussed.

Those two statements are exactly the same in every way.  Well, except
how they're writted.  $^elem is indeed a regular placeholder.  $_
becomes an implicit parameter when it is referred to, in the absence of
placeholders or another type of signature.

   DC  # Key-ifically...
   DC  sub get_key($elem) {...}
   DC  @sorted = sort get_key, @unsorted;
 
 and that is parsed as an extracter code call due to the single arg
 sig. again, it appears that it has to be seen before the sort code for
 that to work.

Nope.  Runtime dispatch as before.

   DC or with a single extractor/comparator pair (to sort according to the
   DC extracted key, using the specified comparator):
 
   DC  # Modtimewise stringifically descending...
   DC  @sorted = sort {-M}={$^b cmp $^a} @unsorted;
 
 so that is a single pair of extractor/comparator. but there is no comma
 before @unsorted. is that correct? see below for why i ask that.

Yes.  Commas may be ommitted on either side of a block when used as an
argument.  I would argue that they only be omitted on the right side, so
that this is unambiguous:

if some_function { ... }  
{ ... }

Which might be parsed as either:

if (some_function { ... }) { ... }

Or:

if (some_function()) {...}
{...}  # Bare block

   DC or with an array of comparators and/or key extractors and/or
   DC extractor-comparator pairs (to sort according to a cascading list of
   DC criteria):
 
   DC  # Numerically ascending
   DC  # or else namewise stringifically descending case-insensitive
   DC  # or else modtimewise numerically ascending
   DC  # or else namewise fuzz-ifically
   DC  # or else fuzz-ifically...
   DC  @sorted = sort [ {+ $^elem},
   DC   {$^b.name cmp $^a.name} is insensitive,
   DC   {-M},
   DC   {.name}=fuzzy_cmp,
   DC   fuzzy_cmp,
 
 i see the need for commas in here as it is a list of criteria.
 
   DC ],
 
 but what about that comma? no other example seems to have one before the
 @unsorted stuff.

It's not a closure, so you need a comma.

   DC @unsorted;
 
   DC If a key-extractor block returns number, then C =  is used to
   DC compare those keys. Otherwise Ccmp is used. In either case, the keys
   DC extracted by the block are cached within the call to Csort, to
   DC optimize subsequent comparisons against the same element. That is, a
   DC key-extractor block is only ever called once for each element being
   DC sorted.
 
 where does the int optimizer come in? just as you had it before in the
 extractor code? that will need to be accessible to the optimizer if the
 GRT is to work correctly.

If the block provably returns an int, Csort might be able to optimize
for ints.  Several ways to provably return an int:

my $extractor = an int sub($arg) { $arg.num }
@sorted = sort $extractor, @unsorted;

Or with a smarter compiler:

@sorted = sort { int .num } @unsorted;

Or Csort might even check whether all the return values are ints and
then optimize that way.  No guarantees: it's not a language-level issue.

 i like that the key caching is defined here. 

Yeah.  This is a language-level issue, as the blocks might have
side-effects.

   DC Note that ambiguous cases like:
 
   DC  @sorted = sort {-M}, {-M}, {-M};
 
   DC will be dispatched according to the normal multiple dispatch semantics
   DC (which will mean that they will mean):
 
   DC  @sorted = sort {-M}  == {-M}, {-M};
 
   DC and so one would need to write:
 
   DC  @sorted = sort == {-M}, {-M}, {-M};
 
 that clears up that one for me.
 
 this is very good overall (notwithstanding my few nits and
 questions). it will satisfy all sorts of sort users, even those who are
 out of sorts.

Agreed.  I'm very fond of it..

Luke

Re: The Sort Problem: a definitive ruling

2004-02-19 Thread Luke Palmer

Luke Palmer writes:
 Yes.  Commas may be ommitted on either side of a block when used as an
 argument.  I would argue that they only be omitted on the right side, so
 that this is unambiguous:
 
 if some_function { ... }  
 { ... }
 
 Which might be parsed as either:
 
 if (some_function { ... }) { ... }
 
 Or:
 
 if (some_function()) {...}
 {...}  # Bare block

Silly me.  That doesn't solve anything.  I don't know why I thought it
did.  I still think that this looks weird:

foo $bar { ... } $baz;

But that's just preference.

Luke

Re: The Sort Problem: a definitive ruling

2004-02-19 Thread Joe Gottman

- Original Message - 
From: Damian Conway [EMAIL PROTECTED]
To: Perl 6 Language [EMAIL PROTECTED]
Sent: Thursday, February 19, 2004 8:29 PM
Subject: [perl] The Sort Problem: a definitive ruling

 Csort in Perl6 is a global multisub:

  multi sub *sort(Criterion @by: [EMAIL PROTECTED]) {...}
  multi sub *sort(Criterion $by: [EMAIL PROTECTED]) {...}
  multi sub *sort( : [EMAIL PROTECTED]) {...}

 where:

  type KeyExtractor ::= Code(Any) returns Any;

  type Comparator   ::= Code(Any, Any) returns Int;

  type Criterion::= KeyExtractor
  | Comparator
  | Pair(KeyExtractor, Comparator)
  ;
   snip

 If a key-extractor block returns number, then C =  is used to
compare
 those keys. Otherwise Ccmp is used. In either case, the keys extracted
by
 the block are cached within the call to Csort, to optimize subsequent
 comparisons against the same element. That is, a key-extractor block is
only
 ever called once for each element being sorted.

   How do you decide whether a key-extractor block returns number?  Do you
look at the signature,  or do you simply evaluate the result of the
key-extractor for each element in the unsorted list?  For example, what is
the result of the following code?

  sort {$_.key} (1= 'a', 10 = 'b', 2 ='c');

   There is nothing in the signature of the key-extractor to suggest that
all the keys are numbers, but as it turns out they all are.  Will the sort
end up being numerical or alphabetic?

Joe Gottman

Re: The Sort Problem: a definitive ruling

2004-02-19 Thread Damian Conway

Uri checked:


  DC  @sorted = sort {$^a = $^b} @unsorted;

so because that has 2 placeholders, it is will match this signature:

 type Comparator   ::= Code(Any, Any) returns Int;
Correct.


i have to remember that placeholders are really implied args to a code
block and not just in the expression
Indeed.


  DC  sub fuzzy_cmp($x, $y) returns Int;
  DC  @sorted = sort fuzzy_cmp, @unsorted;
ok, so that is recognizes as a compare sub due to the 2 arg sig. so does
the sub must be defined/declared before the sort code is compiled?
Yes. And, yes, declaration is sufficient.


  DC  @sorted = sort {+ $^elem} @unsorted;
  DC  @sorted = sort {+ $_} @unsorted;
is $^elem special? or just a regular place holder?
Regular.


just getting my p6 chops back. .name is really $_.name so that makes
sense. and $^elem is just a named placeholder for $_ as before?
Yes and yes.


  DC  @sorted = sort get_key, @unsorted;

and that is parsed as an extracter code call due to the single arg
sig. 
Yes.


again, it appears that it has to be seen before the sort code for
that to work.
Correct.


  DC or with a single extractor/comparator pair (to sort according to the
  DC extracted key, using the specified comparator):
  DC  # Modtimewise stringifically descending...
  DC  @sorted = sort {-M}={$^b cmp $^a} @unsorted;
so that is a single pair of extractor/comparator. but there is no comma
before @unsorted. 
Typo. I'm pretty sure it would actually need a comma there.


  DC  @sorted = sort {.name}=fuzzy_cmp @unsorted;
Comma required there too. :-(



  DC ],

but what about that comma? 
It's required.


no other example seems to have one before the @unsorted stuff.
The comma exception only applies to simple blocks. I messed up those two 
examples. :-(


where does the int optimizer come in? 
When the key extractor is known to return an Int. Which would occur either 
when it's explicitly declared to do that, or when the compiler can intuit a 
block's return type from the type of value returned by the block (i.e. if the 
block always returns the result of a call to Cint).


just as you had it before in the
extractor code? 
Yup.



so are those traits are only allowed/meaningful on comparison blocks?
or will an extraction block take them 
Both. That's why I wrote:

The Cis descending and Cis insensitive traits
on a key extractor or a comparator...
 ^
you have examples which show the
traits on either the extractor or comparator code blocks. that implies
that the guts can get those flags from either and use them as needed.
Yep. Inside the body of Csort you'd access them as:

$by.trait{descending}
$by.trait{insensitive}
(unless Larry's changed the trait accessor syntax since last I looked).

Damian

Re: The Sort Problem: a definitive ruling

2004-02-19 Thread Uri Guttman

 DC == Damian Conway [EMAIL PROTECTED] writes:

  DC No. But this will work:

  DC   sort infix:= @unsorted

my brane hertz!!

so that declares (creates?) an infix op as a code block? and since =
is known to take 2 args it is parsed (or multidispatched) as a
comparator block for sort?

amazing how you and luke both came up with the exact same answer. p6
syntax like that is killing me slowly!

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org

Re: The Sort Problem: a definitive ruling

2004-02-19 Thread Uri Guttman

 JG == Joe Gottman [EMAIL PROTECTED] writes:


  JGHow do you decide whether a key-extractor block returns number?  Do you
  JG look at the signature,  or do you simply evaluate the result of the
  JG key-extractor for each element in the unsorted list?  For example, what is
  JG the result of the following code?

  JG   sort {$_.key} (1= 'a', 10 = 'b', 2 ='c');

  JGThere is nothing in the signature of the key-extractor to suggest that
  JG all the keys are numbers, but as it turns out they all are.  Will the sort
  JG end up being numerical or alphabetic?

my take is that either = or cmp (my pref) would be the default
comparator. if you want to force one, you need to use prefix ~ or + or
int.

oh, another reason to make cmp the default is BACKWARDS COMPATIBILITY!
we have support now for the old simple @sorted = sort @unsorted syntax
so the cmp should still be the default.

hey, i am remembering p6 syntax now! but give me a week and i will
forget it again :)

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org

Re: The Sort Problem: a definitive ruling

2004-02-19 Thread Damian Conway

Uri bemoaned:


  DC 	sort infix:= @unsorted

my brane hertz!!

so that declares (creates?) an infix op as a code block? 
No. C infix:=  is the name of the binary C =  operator.


amazing how you and luke both came up with the exact same answer.
Great minds... etc. ;-)

 p6 syntax like that is killing me slowly!

No, it's gradually making your *stronger*. ;-)

Damian

Re: The Sort Problem: a definitive ruling

2004-02-19 Thread Larry Wall

On Fri, Feb 20, 2004 at 02:47:55PM +1100, Damian Conway wrote:
: Yep. Inside the body of Csort you'd access them as:
: 
:   $by.trait{descending}
:   $by.trait{insensitive}
: 
: (unless Larry's changed the trait accessor syntax since last I looked).

Well, if traits are just compile-time properties, and properties
are just mixed-in roles (usually enums), then it's more likely that
something like:

$by.Direction == descending
$by.Case == insensitive

would be the incantation.  Or maybe if enums auto-booleanize, then
you could say

$by.Direction::descending
$by.Case::insensitive

And then

$by.descending
$by.insensitive

might be allowed as abbreviations when unambiguous.  Or maybe we
require matching:

$by ~~ descending
$by ~~ insensitive

But there is no such thing as a true property or false property.
There's a Boolean role that can have the value true or false.  Traits
are mixed in to declared objects at compile time, and can do weird
things to such objects at mixin time.

Likewise there's no such thing as a descending property.  There's
a Direction property which defaults to ascending.  And a Case property
that defaults to sensitive.

To do otherwise is to set ourselves up for objects that can be both
true and false simultaneously.  Only junctions should be allowed
to do that...

Larry

Re: The Sort Problem: a definitive ruling

2004-02-19 Thread Uri Guttman

 DC == Damian Conway [EMAIL PROTECTED] writes:

  DC Uri bemoaned:

cause you agonize me head!

  DC sort infix:= @unsorted
   my brane hertz!!
   so that declares (creates?) an infix op as a code block?

  DC No. C infix:=  is the name of the binary C =  operator.

so how is that allowed there without a block? is it because it is the
name is in a style of a sub? that makes sense to me but i want to make
sure i get it.

and now back to my advil addiction. :)

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org

Re: The Sort Problem

2004-02-16 Thread Luke Palmer

Damian Conway writes:
 type KeyExtractor ::= Code(Any) returns Any;
 
 type Comparator   ::= Code(Any, Any) returns Int;
 
 type Criterion::= KeyExtractor
 | Comparator Pair(KeyExtractor, Comparator)
 ;
 
 type Criteria ::= Criterion
 | Array of Criterion
 ;
 
 multi sub *sort(Criteria $by = {$^a cmp $^b}: [EMAIL PROTECTED]) {...}

This is neat stuff.  But in order for this to work, we're relying on a
Ivery sophisticated type system.  Reminiscent of C++ templates,
actually.  And while I love C++ templates for academic challenges, they
don't seem quite what Perl is looking for.

This is pattern matching more than it is type comparison.  And Perl's
all about pattern matching. I'm just wondering whether it needs Itwo
pattern archetectures.

If things like this are going to be happening (and thanks, Damian, for
demonstrating exactly how powerful this can be), I'd like types to at
least look more like regexes.   Implementation difficulty has not been a
concern of Perl The Language's since the beginning.

Since I'm usually not good at these philosophical posts, I'll show some
examples. 

multi sub infix:equals ( (Any) $x, ($1) $y ) { $x eq $y }

multi sub infix:equals ( Any $x, Any $y ) { undef }

Might be a way of nicely comparing apples to apples and rejecting
everything else as unequal.  I don't really like the delcaration order
dependency, though.  Perhaps:

multi sub infix:equals ({
$x := (Any), $y := ($1)   { $x eq $y }
  | $x := (Any), $y := (Any)  { undef }
});

Which is Ireally starting to look like rules (which is what I wanted).

My summarizing thesis is: if the type system is going to be this rich a
sublanguage, why not make it a formal one? [1]

Luke

[1] Keep in mind that I'm fine with it Inot becoming a sublanguage,
and relying on user-written code to do this kind of fancy matching.  I'm
just saying if we want to make it a rich and powerful pattern matching
system, it ought to look like patterns.

Re: The Sort Problem

2004-02-16 Thread Luke Palmer

Uri Guttman writes:
  LP == Luke Palmer [EMAIL PROTECTED] writes:
 
   LP Uri Guttman writes:
because that would be the default comparison and the extracted key value
would be stringified unless some other marker is used. most sorts are on
strings so this would be a useful huffman and removal of a redundancy.
 
   LP While I like where most of this is going, I beg to differ on this point.
   LP I end up sorting numbers more often then strings, except when I'm
   LP sorting the keys of a hash.  I'm always annoyed in my one liners when I
   LP have to write out:
 
   LP sort { $a = $b } ...
 
   LP And I'd be thrilled if it were as easy as:
 
   LP sort { +$_ } ...
 
 oh, i don't want to see = anywhere in sort code. i don't mind + or int
 as markers for that.  you seem to actually be agreeing with me that
 strings are the default key type and + is needed for a numeric sort. but
 do you do integer or float sorts? and as i said, i don't like the
 asymmetry of int vs +.

Usually integer.  But specifying integer is usually an optimization
guideline, whereas there is a much greater semantic difference between
sorting strings and floats.  Cint deserves to be longer as just an
optimization guideline.

And yes, I agree that string should be default. 

Luke

Re: The Sort Problem

2004-02-16 Thread Rod Adams

Damian Conway wrote:

Uri persisted:
 but int is needed otherwise? int is more commonly a sort key than 
float.
 it just feels asymetrical with one having a symbol and the other a 
named
 operator.

Sorry, but that's the way the language works. The more general and usual
conversion (to number, not to integer) has the shorter name.


For the record, I do a lot of statistical work. On the sorts where I 
care about speed, I'm using floats far more often than ints. Uri's usage 
obviously varies from mine. Let's not hack up the language to make sort 
more efficient for some arguable benefit.

I would entertain, however, Cfloat and Cstring comparators (and 
functions in core) that pretty much (if not precisely) alias C+ and 
C~, forcing said type.


   DC If you don't explicitly cast either way, Csort just DWIMs by
   DC looking at the actual type of the keys returned by the extractor.
   DC If any of them isn't a number, it defaults to Ccmp.

 that doesn't work for me. the GRT can't scan all keys before it decides
 on what comparison operator is needed. the GRT needs to merge keys as
 they are extracted and it needs to do the correct conversion to a byte
 string then and not later. you can't DWIM this away if you want the
 optimization.
EXACTLY!!! So, if you want the GRT to kick in, you have to make sure the
return type of your block is appropriate. If you don't give the 
compiler that
hint, you don't get the optimized sorting.


When fed a number that it doesn't trust to be an int or a float, 
couldn't the GRT store the number with Cpack 'Nd', $_, $_? (ok, 
there's some prepping for sign and endian, but you get the point) Seems 
like a not-terrible way to handle the issue.



Here's another example (all praise be to Rod!):

@teams = sort [
  # 1 parameter so it's a key extractor...
  {+ $^team-{WinPct} },
  # 2 parameters so it's a comparator...
  { ($^a-{Conference} eq $^b-{Conference}
? ($^b-{ConfWinPct} = $^a-{ConfWinPct} ||
   $^b-{ConfWon}= $^a-{ConfWon}||
   $^a-{ConfLost}   = $^b-{ConfLost})
: ($^a-{Division} eq $^b-{Division}
? ($^b-{DivWinPct} = $^a-{DivWinPct} ||
   $^b-{DivWon}= $^a-{DivWon}||
   $^a-{DivLost}   = $^b-{DivLost})
: 0
)
  },
  # 1 parameter each so all three are key extractors...
  {+ $^team-{Won}  } is descending,
  {+ $^team-{Lost} },
  {+ $^team-{GamesPlayed} } is descending,
  ] @teams;
Now that's just spiffy. Except in moving from my P5 version to your P6 
version, you have to s/?/??/ and s/:/::/, IIRC.

   DC But you *can't* apply Cis descending to a Code reference.
 then how did you apply 'is insensitive'?
I applied it to the *block* (i.e. a closure definition).
That's not the same thing as a Code reference.
 what i am saying is i think that you need to go back to the drawing
 board to find a clean syntax to mark those flags.
No, I think we have it...just put the appropriate trait on the extractor
*block* when you define it.


I really don't see the problem with

@out = sort {lc $_}, @in;

It's short, simple, unambiguous, IMO very clean. But if people want to 
take a tool that we're creating anyways elsewhere in the language 
(traits) to provide another way to do it, that's fine too.

 i just don't like the reverse args concept at all. it is not
 semantically useful to sorting. sorts care about ascending and
 descending, not a vs b in args.
The problem is that sometimes comparator logic is suffiently complex that
a single comparator needs to have a bit each way, as in Rod's
football team comparator above.


What my football example was meant to show is that no matter how much we 
abuse poor Csort in the name of progress, we need to have a way to 
drop it all and go back to how we did it in P5, for those truly 
pathological cases where nothing else works. If people don't trust this, 
I'll come up with something worse.

As a side note, the reason I couldn't find the original sort in question 
is that I later scraped it in favor of a more robust 100 line sub to 
figure out who ranked where.

But in more common case, how much it break things to have

  type Comparator   ::= Code(Any, Any) returns Int;

become

  type Comparator   ::= Code(Any, Any) returns Int
| '!' Code(Any, Any) returns Int;
where the '!' sets the reverse/descending trait?

So we could get things like:

@out = sort {!~ $_} @in;
@out = sort {!+ -M} @in;
@out = sort {! complex} @in;
Overall, Damian's proposal looks very good.
Yeah Damian!
-- Rod Adams

PS -- Only pure and utter insanity would lead someone to not make string 
the default sort comparator, and I don't think I've heard anyone here 
mention

Re: The Sort Problem

2004-02-15 Thread Damian Conway

Here's a proposed syntax and semantics for Csort that tries to preserve the 
(excellent) features of Uri's on the right track proposal whilst integrating 
it into the Perl 6 design without multiplying entities (especially colons!) 
unnecessarily.

Suppose Csort's signature is:

type KeyExtractor ::= Code(Any) returns Any;

type Comparator   ::= Code(Any, Any) returns Int;

type Criterion::= KeyExtractor
| Comparator
| Pair(KeyExtractor, Comparator)
;
type Criteria ::= Criterion
| Array of Criterion
;
multi sub *sort(Criteria ?$by = {$^a cmp $^b}, [EMAIL PROTECTED]) {...}

In other words, Csort takes as its (optional) first argument either a single 
sort criterion, or an array of criteria.

Each of those sort criteria may be a block/closure, or a pair of 
block/closures. Each block/closure may take either one argument (in which case 
it's a key extractor) or two arguments (in which case it's a comparator).

If a key-extractor block returns number, then C =  is used to compare 
those keys. Otherwise Ccmp is used. In either case, the returned keys are 
cached to optimize subsequent comparisons against the same element.

If a key-extractor/comparator pair is specified, the key-extractor is the key 
of the pair and the comparator the value. The extractor is used to retreive 
(and cache) keys, which are then passed to the comparator.

Which means that (a slightly extended version of) Uri's proposed:

   @out = sort
   key( %lookup{ .{remotekey} } ),  #1
   key:insensitive:descending:locale( 'somewhere' )( .{priority} ), #2
   key:float ( substr( 0, 10 ) ),   #3
   key:integer ( /foo(\d+)bar/ ),   #4
   key:compare( { ^$a = ^$b } )( /(\d+)$/ ),  #5
   key:compare( \my_compare_sub ) ( /(\d+)$/ ),#6
   key:compare( { just_guess $^b, $^a } ),  #7
@in;
would become:

   @out = sort
   [ { ~ %lookup{ .{remotekey} } }, #1
 { use locale 'somewhere'; lc .{priority} } is descending,  #2
 { + substr( 0, 10 ) }, #3
 { int /foo(\d+)bar/ }, #4
 { + m/(\d+)$/.[1] },   #5
 { /(\d+)$/ } = my_compare_sub,   #6
 { just_guess $^b, $^a },   #7
   ],
@in;
So to specify a key extractor we provide a block with one argument, as in 
examples #1 through #5. Then to specify that the key is compared with Ccmp 
we return a string (using unary C~ to be explicit about it) as in #1. To 
specify that the key is compared with C =  we return a number (using 
unary C+ to be explicit about it) as in #3 and #5. If we think that Csort 
might be able to optimize integer comparisons, we can explicitly return an 
integer, as in #4. If we want locale-sensitive sorting, we specify that with 
the appropriate Cuse locale statement, as in #2.

To specify a comparator, we provide a block with two arguments, as in #7. That 
block is always expected to return an integer.

To specify a key-extractor *and* an associated comparator, we bind them 
together in a Pair, as in #6.

The only tricky bits are how to specify a key extractor for keys that are to 
be sorted in descending order and/or case-insensitively. The case 
insensitivity could be handled by simple Clc'ing, as in #2. Descending sort 
order could be handled by a trait on the block/closure, as in #2. 
Alternatively, case-insensitivity could be specified by trait as well.

Note that simple sorts would be unchanged:

@sorted = sort @unsorted;
@sorted = sort {$^b = $^a} @unsorted;
But now one could also very neatly code cases that formerly required an Orcish 
or Transformed sort. For example, these:

@sorted = sort {(%M{$^a}//-M $^a) = (%M{$^b}//-M $^b)} @unsorted;

@sorted = map $_[1], sort {$^a[0] = $^b[0]}, map [-M,$_], @unsorted;

would both become:

@sorted = sort {-M} @unsorted;

Damian

Re: The Sort Problem

2004-02-15 Thread Joe Gottman

- Original Message - 
From: Damian Conway [EMAIL PROTECTED]
To: Larry Wall [EMAIL PROTECTED]
Cc: Perl 6 Language [EMAIL PROTECTED]
Sent: Sunday, February 15, 2004 5:59 PM
Subject: [perl] Re: The Sort Problem

 Here's a proposed syntax and semantics for Csort that tries to preserve
the
 (excellent) features of Uri's on the right track proposal whilst
integrating
 it into the Perl 6 design without multiplying entities (especially
colons!)
 unnecessarily.

 Suppose Csort's signature is:

  type KeyExtractor ::= Code(Any) returns Any;

  type Comparator   ::= Code(Any, Any) returns Int;

  type Criterion::= KeyExtractor
  | Comparator
  | Pair(KeyExtractor, Comparator)
  ;

  type Criteria ::= Criterion
  | Array of Criterion
  ;

  multi sub *sort(Criteria ?$by = {$^a cmp $^b}, [EMAIL PROTECTED]) {...}

   If we use this signature, won't the code
sort ('foo', 'bar', 'glarch');

attempt to use the first parameter as a Criteria?  Since sort has to be a
multi sub anyhow, why don't we do

multi sub *sort(Criteria $by : [EMAIL PROTECTED]) {...}
multi sub *sort( : [EMAIL PROTECTED]) { ...} # returns sort {$^a cmp $^b} @data

Joe Gottman

Re: The Sort Problem

2004-02-15 Thread Damian Conway

Joe Gottman asked:

   If we use this signature, won't the code
sort ('foo', 'bar', 'glarch');
attempt to use the first parameter as a Criteria?
No. Because a string like 'foo' wouldn't match the first parameter's type.


Since sort has to be a multi sub anyhow, why don't we do

multi sub *sort(Criteria $by : [EMAIL PROTECTED]) {...}
multi sub *sort( : [EMAIL PROTECTED]) { ...} # returns sort {$^a cmp $^b} @data
We probably should (from an implementation efficiency standpoint if nothing else).

Damian

Re: The Sort Problem

2004-02-15 Thread Uri Guttman

 DC == Damian Conway [EMAIL PROTECTED] writes:

  DC Suppose Csort's signature is:

  DC  type KeyExtractor ::= Code(Any) returns Any;

  DC  type Comparator   ::= Code(Any, Any) returns Int;

  DC  type Criterion::= KeyExtractor
  DC  | Comparator
  DC  | Pair(KeyExtractor, Comparator)
  DC  ;

  DC  type Criteria ::= Criterion
  DC  | Array of Criterion
  DC  ;

  DC  multi sub *sort(Criteria ?$by = {$^a cmp $^b}, [EMAIL PROTECTED]) {...}

  DC Each of those sort criteria may be a block/closure, or a pair of
  DC block/closures. Each block/closure may take either one argument (in
  DC which case it's a key extractor) or two arguments (in which case it's
  DC a comparator).

  DC If a key-extractor block returns number, then C =  is used to
  DC compare those keys. Otherwise Ccmp is used. In either case, the
  DC returned keys are cached to optimize subsequent comparisons against
  DC the same element.

i would make cmp the default as it is now. you sort strings much more
often than you sort numbers.

  DC If a key-extractor/comparator pair is specified, the key-extractor is
  DC the key of the pair and the comparator the value. The extractor is
  DC used to retreive (and cache) keys, which are then passed to the
  DC comparator.

  DC Which means that (a slightly extended version of) Uri's proposed:

  DC @out = sort
  DCkey( %lookup{ .{remotekey} } ),  #1
  DCkey:insensitive:descending:locale( 'somewhere' )( .{priority} ), #2
  DCkey:float ( substr( 0, 10 ) ),   #3
  DCkey:integer ( /foo(\d+)bar/ ),   #4
  DCkey:compare( { ^$a = ^$b } )( /(\d+)$/ ),  #5
  DCkey:compare( \my_compare_sub ) ( /(\d+)$/ ),#6
  DCkey:compare( { just_guess $^b, $^a } ),  #7
  DC  @in;

  DC would become:

  DC @out = sort
  DC[ { ~ %lookup{ .{remotekey} } }, #1

if string cmp is the default, wouldn't that ~ be redundant?
and i keep forgetting that ~ is now the stringify operator :)

  DC  { use locale 'somewhere'; lc .{priority} } is descending,  #2

hmmm, this needs more work IMO. requiring the coder to lc the key is
moving the case insensitivity feature back into code. wouldn't 'is
insensitive' be ok?

how does the internal guts get the descending/insensitive flags/traits
passed to it? i know it is an internals problem but i am just pondering
it.

  DC  { + substr( 0, 10 ) }, #3
  DC  { int /foo(\d+)bar/ }, #4

i would also expect int to be a default over float as it will be used
more often. + is needed there since the regex returns a string. in the
#3 case that would be an int as well. so we need a 'float' cast
thingy. BTW, the only way to get a number as a key is from a structure
where the field was assigned as a number/int. that may not happen a lot
so the int/float cast will probably be needed here to sort correctly


  DC  { + m/(\d+)$/.[1] },   #5
  DC  { /(\d+)$/ } = my_compare_sub,   #6

missing a close } it seems.

  DC  { just_guess $^b, $^a },   #7

is that a reverse order sort? why not skip the args and do this:

   { just_guess is descending }, #7

  DC  ],

so the first arg to sort is either a single compare block or a anon list
of them? i figure we need the [] to separate the criteria from the input
data. but what about this odd case,

sort [...], [...], [...]

now that is stupid code but it could be trying to sort the refs by their
address in string mode. or it could be a sort criteria list followed by
2 refs to input records.

  DC  @in;

  DC So to specify a key extractor we provide a block with one argument, as
  DC in examples #1 through #5. Then to specify that the key is compared
  DC with Ccmp we return a string (using unary C~ to be explicit about
  DC it) as in #1. To specify that the key is compared with C =  we
  DC return a number (using unary C+ to be explicit about it) as in #3
  DC and #5. If we think that Csort might be able to optimize integer
  DC comparisons, we can explicitly return an integer, as in #4. If we want
  DC locale-sensitive sorting, we specify that with the appropriate Cuse
  locale statement, as in #2.

  DC To specify a comparator, we provide a block with two arguments, as in
  DC #7. That block is always expected to return an integer.

so #7 is a call to just_guess which is passed the 2 args to compare. it
must return an

Re: The Sort Problem

2004-02-15 Thread Damian Conway

Uri wrote:


  DC If a key-extractor block returns number, then C =  is used to
  DC compare those keys. Otherwise Ccmp is used. In either case, the
  DC returned keys are cached to optimize subsequent comparisons against
  DC the same element.
i would make cmp the default as it is now. 
Err. That's kinda what Otherwise Ccmp is used means. ;-)



  DC @out = sort
  DC[ { ~ %lookup{ .{remotekey} } }, #1
if string cmp is the default, wouldn't that ~ be redundant?
How do you know that the values of %lookup are strings?
How would the optimizer know?


  DC  { + substr( 0, 10 ) }, #3
  DC  { int /foo(\d+)bar/ }, #4
i would also expect int to be a default over float as it will be used
more often. + is needed there since the regex returns a string. in the
#3 case that would be an int as well. so we need a 'float' cast
thingy.
Unary C+ *is* the float cast thingy!


BTW, the only way to get a number as a key is from a structure
where the field was assigned as a number/int. that may not happen a lot
so the int/float cast will probably be needed here to sort correctly
That's the point.

If you want to force numeric comparison of keys you explicitly cast each key 
to number using unary C+ or Cint. If you want to force stringific 
comparison you explicitly cast the key to string using unary C~.

If you don't explicitly cast either way, Csort just DWIMs by looking at the 
actual type of the keys returned by the extractor. If any of them isn't a 
number, it defaults to Ccmp.



  DC  { + m/(\d+)$/.[1] },   #5
  DC  { /(\d+)$/ } = my_compare_sub,   #6
missing a close } it seems.
Yup. Thanks.



  DC  { just_guess $^b, $^a },   #7

is that a reverse order sort? why not skip the args and do this:

   { just_guess is descending }, #7
Because I wanted to show a plain old two-parameter block being used as a 
*comparator* (not a one-parameter block being used as a key extractor).



so the first arg to sort is either a single compare block or a anon list
of them? i figure we need the [] to separate the criteria from the input
data. 
Yep.


but what about this odd case,

	sort [...], [...], [...]

now that is stupid code but it could be trying to sort the refs by their
address in string mode.
In which case we probably should have written it:

	sort == [...], [...], [...]


or it could be a sort criteria list followed by
2 refs to input records.
Only if the first array ref contains nothing but Criterion objects.


  DC To specify a comparator, we provide a block with two arguments, as in
  DC #7. That block is always expected to return an integer.
so #7 is a call to just_guess which is passed the 2 args to compare. it
must return an int like cmp/=. 
Yep.


as i pointed out above, i don't see why
you even need to show the ^$a and ^$b args? 
So the block knows it has two parameters.


they will be passed into just_guess that way. let is descending handle the sort ordering.
But you *can't* apply Cis descending to a Code reference.

Nor are we sure that the order *is* descending. Maybe the Cjust_guess 
predicate is invariant to argument order and there were other reasons to pass 
the args in that order. Or maybe we reversed the order because we know that in 
Cjust_guess never returns zero, but defaults to its second argument being 
smaller, in which case we needed to reverse the args so that the Csort 
remained stable.

The point is that I wanted to show a vanilla two-parameter compare block.
(And, boy, am I ever sorry I whimsically reversed the args to indicate 
generality ;-)



  DC  @sorted = sort {(%M{$^a}//-M $^a) = (%M{$^b}//-M $^b)} @unsorted;

wow, that is UGLY! but i get it after a few hours of study. :) just the
orcish maneuver but with //. i think you also mean //= there. 
Yup. Should indeed be //=



  DC  @sorted = sort {-M} @unsorted;

that still wants to be cached somehow as -M is expensive.
It *is* cached. It's a one-parameter block. So its a key extractor. So it 
automagically caches the keys it extracts.


so -M there is a
simple key extraction on the files in @unsorted. 
Yup.


assuming no internal caching 
Key extractors will always cache.


	@sorted = sort {%M{$_} //= -M} @unsorted;

i assume //= will be optimized and -M won't be called if it is cached.

also where does %M get declared and/or cleared before this? 
Exactly the problem. That's why key extractors aways cache.


can it be
done in the block:
	@sorted = sort {my %M ; %M{$_} //= -M} @unsorted;
If you'd gone insane and particularly wanted to do it that way, you'd need 
something like:

  	@sorted = sort {state %M ; %M{$_} //= -M} @unsorted;

to ensure the cache persisted between calls to the key extractor.


another -M problem is that it

Re: The Sort Problem

2004-02-15 Thread Uri Guttman

 DC == Damian Conway [EMAIL PROTECTED] writes:

  DC Uri wrote:

  DC @out = sort
  DC [ { ~ %lookup{ .{remotekey} } }, #1
   if string cmp is the default, wouldn't that ~ be redundant?

  DC How do you know that the values of %lookup are strings?
  DC How would the optimizer know?

because that would be the default comparison and the extracted key value
would be stringified unless some other marker is used. most sorts are on
strings so this would be a useful huffman and removal of a redundancy.

  DC { + substr( 0, 10 ) }, #3
  DC { int /foo(\d+)bar/ }, #4
   i would also expect int to be a default over float as it will be used
   more often. + is needed there since the regex returns a string. in the
   #3 case that would be an int as well. so we need a 'float' cast
   thingy.

  DC Unary C+ *is* the float cast thingy!

hmm, so + is float but int is needed otherwise? int is more commonly a
sort key than float. it just feels asymetrical with one having a symbol
and the other a named operator.

  DC If you want to force numeric comparison of keys you explicitly cast
  DC each key to number using unary C+ or Cint. If you want to force
  DC stringific comparison you explicitly cast the key to string using
  DC unary C~.

or ~ is the default if nothing else is specified. it matches up with cmp
being the default comparison as you agreed above.

  DC If you don't explicitly cast either way, Csort just DWIMs by looking
  DC at the actual type of the keys returned by the extractor. If any of
  DC them isn't a number, it defaults to Ccmp.

that doesn't work for me. the GRT can't scan all keys before it
decides on what comparison operator is needed. the GRT needs to merge
keys as they are extracted and it needs to do the correct conversion to
a byte string then and not later. you can't DWIM this away if you want
the optimization. the ST can get away with it since you are still using
a compare block even if it is generated internally by the sort function.


  DC { just_guess $^b, $^a },   #7
   is that a reverse order sort? why not skip the args and do this:
   { just_guess is descending },
   #7

  DC Because I wanted to show a plain old two-parameter block being used as
  DC a *comparator* (not a one-parameter block being used as a key
  DC extractor).

that seems like extra redundant code just to mark a callback vs a
extract block. there should be a cleaner way to do that. maybe a null
extract key in the pair?

{ '' = just_guess }
{ undef = just_guess }# will = autoquote undef there?

then the records are full keys with no special extraction but there is a
callback comparator. no need to declare any arguments to the callback
sub here since you know it is that and not key extract code.

   but what about this odd case,
   sort [...], [...], [...]
   now that is stupid code but it could be trying to sort the refs by
   their
   address in string mode.

  DC In which case we probably should have written it:

  DC   sort == [...], [...], [...]

i did ask in another post whether == or == would fit in here. so that
line forces it all to be data and my silly example has the first anon
list of criteria and the rest as data. works for me so far.

   or it could be a sort criteria list followed by
   2 refs to input records.

  DC Only if the first array ref contains nothing but Criterion objects.

but what defines them as criteria objects? they look just like records
which could be sorted as well. i don't see any syntactical markers that
{} means criteria block vs a hash ref. DWIM guessing isn't good enough
for me here. sort in p5 already had some issues with that IIRC.

   as i pointed out above, i don't see why
   you even need to show the ^$a and ^$b args?

  DC So the block knows it has two parameters.

but the callback sub knows it has two params since it has to be written
that way. sort always calls the code block with 2 params.

   they will be passed into just_guess that way. let is descending
   handle the sort ordering.

  DC But you *can't* apply Cis descending to a Code reference.

then how did you apply 'is insensitive'? aside from how it is done
(traits and such), we need a syntax that conveys the semantics of
insensitive and descending into the sort func. it will use that info to
reverse the key order to comparison subs or modify the key merging of
the GRT to effect those flags. what i am saying is i think that you need
to go back to the drawing board to find a clean syntax to mark those
flags. note that neither the code block nor the callback needs to see
them, only the sort guts ever needs to see them. so we are communicating
info to the sort about this key. the code block/callback only ever sees
two arguments to compare and nothing else. maybe this will clarify for
you the intentions of those flags and why they have nothing to do with

Re: The Sort Problem

2004-02-15 Thread Luke Palmer

Uri Guttman writes:
 because that would be the default comparison and the extracted key value
 would be stringified unless some other marker is used. most sorts are on
 strings so this would be a useful huffman and removal of a redundancy.

While I like where most of this is going, I beg to differ on this point.
I end up sorting numbers more often then strings, except when I'm
sorting the keys of a hash.  I'm always annoyed in my one liners when I
have to write out:

sort { $a = $b } ...

And I'd be thrilled if it were as easy as:

sort { +$_ } ...

Luke

Re: The Sort Problem

2004-02-15 Thread Uri Guttman

 LP == Luke Palmer [EMAIL PROTECTED] writes:

  LP Uri Guttman writes:
   because that would be the default comparison and the extracted key value
   would be stringified unless some other marker is used. most sorts are on
   strings so this would be a useful huffman and removal of a redundancy.

  LP While I like where most of this is going, I beg to differ on this point.
  LP I end up sorting numbers more often then strings, except when I'm
  LP sorting the keys of a hash.  I'm always annoyed in my one liners when I
  LP have to write out:

  LP sort { $a = $b } ...

  LP And I'd be thrilled if it were as easy as:

  LP sort { +$_ } ...

oh, i don't want to see = anywhere in sort code. i don't mind + or int
as markers for that.  you seem to actually be agreeing with me that
strings are the default key type and + is needed for a numeric sort. but
do you do integer or float sorts? and as i said, i don't like the
asymmetry of int vs +.

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org

Re: The Sort Problem

2004-02-15 Thread Damian Conway

Uri persisted:

   DC How do you know that the values of %lookup are strings? DC How
   would the optimizer know?

 because that would be the default comparison and the extracted key value
 would be stringified unless some other marker is used. most sorts are on
 strings so this would be a useful huffman and removal of a redundancy.
Leaving aside your assertion about the commonest type of sort (which I
don't see how you can have reliable evidence about), we're not talking
about markers; these are *operators* that ensure that the keys extracted
are of a particular static type.
   DC Unary C+ *is* the float cast thingy!

 hmm, so + is float
No. C+ is number.

 but int is needed otherwise? int is more commonly a sort key than float.
 it just feels asymetrical with one having a symbol and the other a named
 operator.
Sorry, but that's the way the language works. The more general and usual
conversion (to number, not to integer) has the shorter name.
   DC If you want to force numeric comparison of keys you explicitly
   DC cast each key to number using unary C+ or Cint. If you want to
   DC force stringific comparison you explicitly cast the key to string
   DC using unary C~.

 or ~ is the default if nothing else is specified. it matches up with cmp
 being the default comparison as you agreed above.
Okay, if you want to think of it that way, fine.

   DC If you don't explicitly cast either way, Csort just DWIMs by
   DC looking at the actual type of the keys returned by the extractor.
   DC If any of them isn't a number, it defaults to Ccmp.

 that doesn't work for me. the GRT can't scan all keys before it decides
 on what comparison operator is needed. the GRT needs to merge keys as
 they are extracted and it needs to do the correct conversion to a byte
 string then and not later. you can't DWIM this away if you want the
 optimization.
EXACTLY!!! So, if you want the GRT to kick in, you have to make sure the
return type of your block is appropriate. If you don't give the compiler that
hint, you don't get the optimized sorting.
   DC Because I wanted to show a plain old two-parameter block being
   used as DC a *comparator* (not a one-parameter block being used as a
   key DC extractor).

 that seems like extra redundant code just to mark a callback vs a
 extract block.
It's just an *example*, for heavens sake.

 there should be a cleaner way to do that.

There is. If it's just a callback that is already declared to take two 
parameters, and you want to pass $^a and $^b in that order, then you can just 
write:

sort just_guess, @unsorted;

or, if it's one of many criteria, you write:

sort [ { -M }, { .{owner} }, just_guess,
 ] @unsorted;
 maybe a null extract key in the pair?

Unnecessary. See above.



or it could be a sort criteria list followed by 2 refs to input
records.

   DC Only if the first array ref contains nothing but Criterion
   objects.

 but what defines them as criteria objects?
Their *type*. Remember this:

type KeyExtractor ::= Code(Any) returns Any;

type Comparator   ::= Code(Any, Any) returns Int;

type Criterion::= KeyExtractor
| Comparator Pair(KeyExtractor, Comparator)
;
type Criteria ::= Criterion
| Array of Criterion
;
multi sub *sort(Criteria $by = {$^a cmp $^b}: [EMAIL PROTECTED]) {...}

???

The criteria have to be of a very specific type.

 they look just like records which could be sorted as well.

The point is that an array is only a list Csort criteria if every one of its 
elements is a one- or two-parameter block, or a Pair of blocks. And the blocks 
have to have very specific signatures too.

The *only* time there's going to be any ambiguity is if you wanted to sort a 
list of arrays of Csort criteria...not a very common occurrence, I suspect. ;-)

 i don't see any syntactical markers that {} means criteria block vs a
 hash ref.
Because there aren't any. Blocks are blocks and hashes are hashes, and in Perl 
6 we have a very clear set of syntactic rules that determines which is which.

 DWIM guessing isn't good enough for me here.

This *isn't* DWIM guessing. This is block parsing and static typing.

 sort in p5 already had some issues with that IIRC.

Only because Perl 5 has issues with hashes vs blocks.
Perl 6 fixes those issues.
 but the callback sub knows it has two params since it has to be written
 that way.
But it *isn't* a callback sub in my example!!! It's a call to subroutine
inside a block. And the *block* is the comparator Csort sees. The parameters 
of the block are $^a and $^b and it's a two parameter block (and therefore a
valid comparator) precisely *because* it has those two parameters.

Look, FORGET about that example. It was obviously a bad example. It's
caused enough pain (to me if no-one else).
Here's another example (all praise be to Rod!):

@teams = sort [
  # 1 parameter so

Re: The Sort Problem

2004-02-14 Thread Austin Hastings


--- Uri Guttman [EMAIL PROTECTED] wrote:
   @out = sort
   key( %lookup{ .{remotekey} } ),
   key:insensitive:descending:locale( 'somewhere' )( .{priority} ),
   key:float ( substr( 0, 10 ) ),
   key:integer ( /foo(\d+)bar/ ),
   key:compare( { ^$a = ^$b } )( /(\d+)$/ ),
   key:compare( \my_compare_sub ) ( /(\d+)$/ ),
   @in ;
 

...

 note that the custome compare callbacks can be a block or a sub
 name/ref. the callback sub would be passed 2 args as usual.

Off the top of my head, I can't think of a case where the compare sub
would be needed unless the key was not well-ordered. Does anyone have
an example of a case where the key-extraction sub approach doesn't
reduce the problem to a Scalar comparison? 

=Austin

Re: The Sort Problem

2004-02-14 Thread Uri Guttman

 AH == Austin Hastings [EMAIL PROTECTED] writes:

  AH --- Uri Guttman [EMAIL PROTECTED] wrote:
   @out = sort
   key:compare( \my_compare_sub ) ( /(\d+)$/ ),
   @in ;

   note that the custome compare callbacks can be a block or a sub
   name/ref. the callback sub would be passed 2 args as usual.

  AH Off the top of my head, I can't think of a case where the compare sub
  AH would be needed unless the key was not well-ordered. Does anyone have
  AH an example of a case where the key-extraction sub approach doesn't
  AH reduce the problem to a Scalar comparison? 

there are rare times when you need a computed comparison, say with a
specialized collation sequence. if this sequence isn't supported by
locale or whatever, you need to do a callback comparison. i can't think
of a real world example but maybe with some odd symbolic character set
or a custom one. in any case, supporting the callback is easy. it has
one drawback which is that it disallows the GRT but the guts could
fallback to the ST then. in both cases, key extraction is the same.

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org

Re: The Sort Problem

2004-02-14 Thread Austin Hastings


--- Uri Guttman [EMAIL PROTECTED] wrote:
  AH == Austin Hastings [EMAIL PROTECTED] writes:
 
   AH --- Uri Guttman [EMAIL PROTECTED] wrote:
@out = sort
key:compare( \my_compare_sub ) ( /(\d+)$/ ),
@in ;
 
note that the custome compare callbacks can be a block or a sub
name/ref. the callback sub would be passed 2 args as usual.
 
   AH Off the top of my head, I can't think of a case where the
 compare sub
   AH would be needed unless the key was not well-ordered. Does
 anyone have
   AH an example of a case where the key-extraction sub approach
 doesn't
   AH reduce the problem to a Scalar comparison? 
 
 there are rare times when you need a computed comparison, say with a
 specialized collation sequence. if this sequence isn't supported by
 locale or whatever, you need to do a callback comparison. i can't
 think
 of a real world example but maybe with some odd symbolic character
 set
 or a custom one. in any case, supporting the callback is easy. it has
 one drawback which is that it disallows the GRT but the guts could
 fallback to the ST then. in both cases, key extraction is the same.

Hmm. If this is all, then I think it's worth putting the onus on the
developer to write a better key extractor.

What you're saying, basically, is that the special comparator would
be doing dynamic key generation.

IMO, let the developer write a xxx_2_UTF32 key extractor. In all
likelihood, once it's written he may find a way to use that format
throughout the code.

=Austin

Re: The Sort Problem

2004-02-14 Thread Uri Guttman

 AH == Austin Hastings [EMAIL PROTECTED] writes:

  AH --- Uri Guttman [EMAIL PROTECTED] wrote:

   there are rare times when you need a computed comparison, say with
   a specialized collation sequence. if this sequence isn't supported
   by locale or whatever, you need to do a callback comparison. i
   can't think of a real world example but maybe with some odd
   symbolic character set or a custom one. in any case, supporting the
   callback is easy. it has one drawback which is that it disallows
   the GRT but the guts could fallback to the ST then. in both cases,
   key extraction is the same.

  AH Hmm. If this is all, then I think it's worth putting the onus on the
  AH developer to write a better key extractor.

possibly.

  AH What you're saying, basically, is that the special comparator would
  AH be doing dynamic key generation.

not exactly. key extraction is getting the key out of the record. there
could be some munging applied at that phase as well. comparison is just
a replacement for = or cmp. it could be some strange collate sequence
based on specialized data. a contrived example is sorting by length but
only for strings beginning with i-k, otherwise go with alphabetic sort.
you would have to extract a dummy length value for the keys which don't
start with i-k and the name would be the second key in all cases. but
there are more complex cases where it is easier to put the logic in the
comparison code instead of the key extraction part.

the issue is that the comparison could have logic beyond simple
comparisons. so we do need to support the compare code block or sub
callback. as i said, when you use that you could (if we do it this way)
lose major speedups but that is a standard penalty to pay when you need
custom code in the comparison.

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org

Re: The Sort Problem

2004-02-14 Thread Rod Adams

Austin Hastings wrote:

Off the top of my head, I can't think of a case where the compare sub
would be needed unless the key was not well-ordered. Does anyone have
an example of a case where the key-extraction sub approach doesn't
reduce the problem to a Scalar comparison? 
 

I can't find the P5 code I used for it right off, but I remember a case 
when I was playing around with various football (US) stats. I was 
ranking teams, and had something akin to:

@teams = sort {$b-{WinPct} = $a-{WinPct}  ||
  ($a-{Conference} eq $b-{Conference}
   ?($b-{ConfWinPct} = $a-{ConfWinPct} ||
 $b-{ConfWon}= $a-{ConfWon}   ||
 $a-{ConfLost}   = $b-{ConfLost})
   :($a-{Division} eq $b-{Division}
 ?($b-{DivWinPct} = $a-{DivWinPct} ||
   $b-{DivWon}= $a-{DivWon}   ||
   $a-{DivLost}   = $b-{DivLost})
 :0
 )
   ) ||
  $b-{Won} = $a-{Won}  ||
  $a-{Lost}= $b-{Lost} ||
  $b-{GamesPlayed} = $a-{GamesPlayed}
 } @teams;
   

Creating a keycode for this situation is not a trivial task at all. So 
sorts do occur in the real world for which there is no straightforward 
way to generate a sortkey. Albeit considerably rare.

Also, I think there is utility in have a compare sub supported so that:
1) porting P5 code is easier. (a minor design rationale, but it exists)
2) people used to thinking in terms of compare subs (from C, P5, and 
points of the programming universe) can still think that way.
3) most importantly to me, so that There's More Than One Way to Do It.

-- Rod Adams

Re: The Sort Problem (was: well, The Sort Problem)

2004-02-13 Thread Rod Adams

Here's my stab at a sort syntax, pulling syntax over from REs:

@out
 == sort key:ri($_-[2]), key:s($_-[4])
 == @in;
Basicly, you have a list of RE syntax like Ckey values, whilch take 
various modifiers to say how to play with that key, and then an expr on 
how to generate the key given element $_.

Possible modifiers: (verbose versions welcome)
:rreverse/descending
:n   force numeric comparisons
:s  force string comparisons (default)
:u  unicode (implies :s)
:i   case insensitive (implies :s)
:l   localized collation order
:x  call custom compare sub (a la P5)
This allows:

@out = sort keys %hash; # everything gets defaulted
@out = sort key:x{myfunc($a) cmp myfunc($b)}(), @in; # handy for P5 
migration, but not much else
@out = sort key(myfunc($_)), @in; # same as above, but much better.
@out = sort key(%lookup{$_-{remotekey}}), key:ir($_-{priority}), @in; 
# complex in P5, easy here.

Advantages:
- Uses syntax idioms used elsewhere in P6.
- Common cases are easy
- Decent huffman coding.
Disadvantages:
- Do we really want things that look like REs that are not REs?
- If we do this, are we setting ourselves up to create other RE-like 
creatures for grep, for, etc,  to the point
 where people will start wanting to roll their own in modules?

Thoughts?
-- Rod

Re: The Sort Problem

2004-02-13 Thread Ph. Marek

Am Freitag, 13. Februar 2004 01:40 schrieb Larry Wall:
 On Thu, Feb 12, 2004 at 04:29:58PM -0500, Uri Guttman wrote:
 : again, confusing. why should the order of a binary operator mean so
 : much? the order of a sort key is either ascending or descending. that is
 : what coders want to specify. translating that to the correct operator
 : (cmp or =) and the correct binary order is not the same as specifying
 : the key sort order and key type (int, string, float).

 Uri is dead on with this one, guys.
As I listen to this mails, I get the feeling that something like this is 
wanted:

Key generation:
@unsorted_temp = map {
   $k1=$_.func1('a');# ASC
   $k2=$_.func2('we');  # DESC
   [ $_, $k1, $k2 ];
 } @unsorted;
Now we've got an array with keys and the objects.
Sorting:
@sorted = sort {
  $a-[1] cmp $b-[1] ||
  $b-[2] = $a-[2] ||
} @unsorted_temp;


These things would have to be said in P6.
So approx.:
@sorted = @unsorted.sort(
  keys = [ { $_.func1('a'); },
{ $_.func2('we'); } ],
  cmp = [ cmp, = ],
  order = [ asc, desc],
  key_generation = lazy,
);

That would explain what I want.
Maybe we could turn the parts around:

@sorted = @unsorted.sort(
  1 = [ { $_.func1('a'); }, cmp, asc],
  2 = [ { $_.func2('we'); }, =, desc],
);

or maybe use a hash instead of an array:

@sorted = @unsorted.sort(
  1 = [ key = { $_.func1('a'); }, op = cmp, order = asc],
  2 = [ key = { $_.func2('we'); }, op = =, order = desc],
);


If that's too verbose? I don't think so; I've stumbled often enough on $a = 
$b vs. $b = $a and similar, and the above just tells what should be done.


Regards,

Phil

Re: The Sort Problem (was: well, The Sort Problem)

2004-02-13 Thread Dan Sugalski

At 11:52 PM -0700 2/12/04, Luke Palmer wrote:
But it needs some major syntax work so it can feel more like it's a part
of the language instead of a library function.  Not, mind, that I think
Perl's syntax needs to be changed at all to accommodate.
Since everyone's well past mad here and deep into drug-induced brain 
damage territory...

If you're *really* looking to get fancy, why not just allow the sort 
specification to be done with SQL? Comfortable, well-understood, 
already has a decade or so of stupid things welded into it (so 
everyone can stop trying to think up new stupid things to weld in), 
and there are several grammars for it so it'd be no work to speak of 
for me.

Heck, you could even unify map, grep, and sort and, if you fed in a 
list of pair lists or hashes, return parts of each record^Wlist 
element rather than the whole thing.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: The Sort Problem (was: well, The Sort Problem)

2004-02-13 Thread Angel Faus


Friday 13 February 2004 15:02, Dan Sugalski wrote:

 If you're *really* looking to get fancy, why not just allow the
 sort specification to be done with SQL? Comfortable,
 well-understood, already has a decade or so of stupid things welded
 into it [...]

 Heck, you could even unify map, grep, and sort [...]


That would be postmodern indeed.

-angel

Re: The Sort Problem

2004-02-13 Thread Aaron Sherman

On Thu, 2004-02-12 at 18:50, Uri Guttman wrote:

 there are only a short list of key comparisons possible, int, string,
 float and maybe unicode. i separate int from float since they do have
 different internals in the GRT. it is one area where you do expose
 stuff. otherwise you could just use number. 

And comparing in a particular language (as Dan pointed out in his Parrot
talk); comparing sets (e.g. numeric comparison, but odds and evens are
separate); ordering by string length; ordering by string-to-numeric
value with esoteric exceptions (software versions); etc.

Comparison is essentially always numeric, but only if you reduce your
inputs to numbers, and that's often not terribly efficent (e.g. you can
reduce any ascii string to:

chr(substr($s,0,1))+chr(substr($s,1,1))*128+...

and then comparing. The only limitation is that you'll have to use
Math::BigInt to do the comparison.

 simple. pick almost any language char set other than US ascii. many have
 special collating sequences. i am not anything close to a unicode expert
 but i have seen this issue. in fact setting the LANG (or some other)
 environment variable will affect many programs by changing the collating
 order.

Ok, yes. This was Dan's example, and is EXACTLY why I think you want to
use map once for key extraction and then sort's expression for
comparison.

 that is scary. do you realize that the sort block will be called for
 each comparison?

Yes why would it not be? How do you compare, say, IPv6 addresses if
not by performing a logically-ored comparison between each octet? I
certainly don't want to instantiate a Math::BigInt just to compare two
128-bit values.

   AS @new1 = sortpairs {$_[0] = $_[1]} map {[$_,$_]} @old1;
 
   AS @new2 = sortpairs {$_[0] = $_[1] || $_[4] cmp $_[3]} map
   AS {[getkeys($_),$_]} @old2;
 
 where is getkeys defined?

It was an example. It's whatever you like. In some cases, that might
just be:

  sub getkeys {
my $object = shift;
return @{$object-keys};
  }

or perhaps:

  sub getkeys {
my $string = shift;
return(length($string),$string); # Order by length and then ascii order
  }

It was just an example. sortpairs was the interesting part.

 how do you know what indexes to use for each
 comparison? what happened to $_[2]? your call to $comp is passed 2
 arguments but the second example accesses 4 arguments.

You missed a layer of extraction there. $comp is passed each parameter
PAIR.

So if your input is [1,2,3,4,moo], then you are comparing the keys 1,
2, 3 and 4 and your value is moo.

   AS The second example really illustrates the point that you can swap the
   AS direction of key order and mechanism to compare them at your whim.
 
   AS Now, you just need to call sortpairs with any array of arrays of keys
   AS (with trailing value).
 
 add a third key to that. quickly!

ok, easy:

  @new2 = sortpairs {$_[0] = $_[1] || $_[4] cmp $_[3] || 
ACME::Compare::Thingulas-compare($_[5],$_[6])} map {[getkeys($_),$_]} @old2;

Please notice that the ONLY change is in the way I call it, not in
sortpairs. sortpairs still works just fine.

 a more straightforward api (which is close to what i will do in
 Sort::GRT) is (p5 code)
 
   my $sort = Sort::GRT-new(
 
   keys= [
   { descending = 1,
 type = float,
 extract = '$_-{amount},
   },
   { extract = 'substr( $_-{date}, 0, 4)' },
   ]
   ) ;

The value of extract should be a subref, not a string (thus having
closure status, and therefore access to context).

I think you and I are in violent agreement. I just look at this as a
problem that doesn't need solving because Schwartzian Transforms do all
the work for me. You look at it as something needing encapsulation.
That's cool, but sematically, we're doing the same work.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback

Re: [perl] The Sort Problem

2004-02-12 Thread Rafael Garcia-Suarez

Joe Gottman wrote in perl.perl6.language :
This is unrelated to the problem you mentioned, but there is another
 annoying problem with sort as it is currently defined.  If you have an
 @array and you want to replace it with the sorted version, you have to type
 @array = sort @array;
 
 Besides being long-winded, this causes Perl to make an unnecessary copy of
 the array.  It would be nice calling if sort (or reverse, or other similar
 functions) in void context resulted in in-place modification of the array
 that was input.

I'd rather not change the behaviour of sort (or other built-ins that
may or may not operate in place) depending on the calling context.
What to do, for example, when Csort is the last statement of a sub?

Besides this, a construction like @x = sort @x could be detected by
a suitable optimizer and turned to (internally) an in-place sort.
(It appears that Dave Mitchell is working on such an optimization for
perl 5.)

Re: [perl] The Sort Problem

2004-02-12 Thread Deborah Pickett

On Thu, 12 Feb 2004 15.40, Joe Gottman wrote:
This is unrelated to the problem you mentioned, but there is another
 annoying problem with sort as it is currently defined.  If you have an
 @array and you want to replace it with the sorted version, you have to type
 @array = sort @array;

 Besides being long-winded, this causes Perl to make an unnecessary copy of
 the array.  It would be nice calling if sort (or reverse, or other similar
 functions) in void context resulted in in-place modification of the array
 that was input.

I just know that this has come up before . . . I don't remember whether it was 
consensus[1] or just my wishful thinking that this would make code like:

sub foo
{
  my $listref = shift;
  # other stuff...
  # return sorted list.
  sort @$listref;
}

morally ambiguous[2], because that sort is being called in whatever context 
the function call was in, which could be miles away.

I'd suggest that an in-place sort have a marginally different name, but the 
rubyometer is already redlined as it is.

(Even more OT: I dislike the scoping rules for named sorting subroutines in 
Perl 5.  It means you can't do questionable things like this:

# A common function used all over the place.
sub onewayorother { $direction * ($a = $b) }
# much later ...
my $direction = -1;
my @arr = sort onewayorother (5,7,9);

The more I look at that, the more I think it's a stupid example, but I have a 
vivid memory of being upset about it several years ago, when I wasn't a very 
good programmer.  This is looking less like a whinge and more like a 
confession.  Move along, nothing more to see here.)

[1] If, by consensus, you mean Larry Said No.
[2] Not semantically ambiguous, because it isn't.

-- 
Debbie Pickett
http://www.csse.monash.edu.au/~debbiep
[EMAIL PROTECTED]

Re: The Sort Problem

2004-02-12 Thread Aaron Crane

Luke Palmer wrote:
 Any other ideas?

How about something like this, modulo any errors in my Perl 6 syntax?

  sub sort(?cmp = infix:cmp, +$key, +$desc, [EMAIL PROTECTED]) { ... }

I think that allows all of these:

  # P5: @sorted = sort @unsorted;
  @sorted = sort @unsorted;

The simplest case is the same as the Perl 5, which seems a pleasant feature.

  # P5: @sorted = sort { $a = $b } @unsorted;
  @sorted = sort { $^a = $^b } @unsorted;  # or:
  @sorted = sort infix:= == @unsorted;

This also seems reasonable.

  # P5: @sorted = sort { $a-foo('bar')-compute = $b-foo('bar')-compute }
  #   @unsorted
  # or: @sorted = map { $_-[1] }
  #   sort { $a-[0] =? $b-[0] }
  #   map { [ $_-foo('bar')-compute, $_ ] }
  #   @unsorted
  @sorted = sort infix:=, key = { $_.foo('bar').compute } == @unsorted;

I think my suggestion wins big here.  We've only had to specify how to
extract the key, and sort itself takes care of everything else.  And it looks
to me like this sort function has enough information about the programmer's
intent for it to optimise in all sorts of exciting ways -- it should be able
to do the equivalent of the GRT internally, for example.

Just for kicks, this one demonstrates all the features.  It's the same as
before, but in descending order:

  @unsorted
== sort infix:=, desc = 1, key = { $_.foo('bar').compute }
== @sorted;

What problems can anyone spot with this suggestion?

-- 
Aaron Crane * GBdirect Ltd.
http://training.gbdirect.co.uk/courses/perl/

Re: The Sort Problem

2004-02-12 Thread Ph. Marek

 ...
 so here is a (very rough and probably broken) syntax idea building on
 that:

 sort :key { :descend :string .foo('bar').substr( 10, 3) }

  :key {  :int .foo('baz') }
  :key {  :float .foo('amount') } @unsorted ;
I see a kind of problem here: If the parts of the key are not fixed length but 
can vary you can put them in strings *only* after processing all and 
verifying the needed length.

Example:
sort :key { :descend :string .foo('bar') }
  :key {  :int .foo('baz') }
  :key {  :float .foo('amount') } @unsorted ;

Now .foo('bar') isn't bounded with any length - so you don't know how much 
space to reserve.


And I believe that 
- generating keys on every list element
- storing them into a array (array of array) and
- after having processed all checking the length, and
- now generate the to-be-sorted-strings 
- sort

isn't the optimal way.
BTW: this requires that *all* keys are generated.
In cases like
- by name,
- by age,
- by height,
- by number of toes left,
- and finally sort by the social security number

most of the extractions (and possibly database-queries of calculations or 
whatever) will not be done - at least in the current form of
sort { $a-{name} cmp $b-{name} ||
 $a-{age} = $b-{age} || 
...

That is to say, I very much like the syntax you propose, but I'm not sure if 
pre-generating *every* key-part is necessarily a speed-up.

If there are expensive calculations you can always cut them short be 
pre-calculating them into a hash by object, and just query this in sort.


Also I fear that the amount of memory necessary to sort an array of length N 
is not N*2 (unsorted, sorted), but more like N*3 (unsorted, keys, sorted), 
which could cause troubles on bigger arrays 


Regards,

Phil

Re: The Sort Problem

2004-02-12 Thread Aaron Sherman

On Thu, 2004-02-12 at 01:59, Uri Guttman wrote:

 sorry about the confusion. read the paper at:
 
   http://www.sysarch.com/perl/sort_paper.html

All of which stops being a concern if you have a sort variant that works
on pairs in the first place. Perl _5_ code follows:

sub sortpairs(@) {
  my $comp = shift;
  my %pairs = @_;
  return map {$pairs{$_}} sort {$comp-($a)} keys %pairs;
}

@new1 = sortpairs {$a = $b} map {($_-computekey,$_)} @old1;
@new2 = sortpairs {$a cmp $b} map {(lc($_),uc($_))} @old2;

as you can see, you can perform any key extraction you like in the map
portion (which just returns key/value pairs) and sortpairs' job is just
to compare the keys and return the resulting sorted values.

No Perl 6 here, move along ;-)

Perl 6 could contribute here by making it cheaper to construct/pass the
keys, but that's about it.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback



signature.asc
Description: This is a digitally signed message part

Re: The Sort Problem

2004-02-12 Thread Aaron Sherman

On Thu, 2004-02-12 at 08:43, Aaron Sherman wrote:

 sub sortpairs(@) {
   my $comp = shift;
   my %pairs = @_;
   return map {$pairs{$_}} sort {$comp-($a)} keys %pairs;
 }

Doh... it's early for me. That's Csort {$comp-()} with no parameter.
The fact that $a and $b are dynamically scoped in Perl 5 helps out here.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback



signature.asc
Description: This is a digitally signed message part

Re: [perl] The Sort Problem

2004-02-12 Thread Dan Sugalski

At 11:40 PM -0500 2/11/04, Joe Gottman wrote:
   This is unrelated to the problem you mentioned, but there is another
annoying problem with sort as it is currently defined.  If you have an
@array and you want to replace it with the sorted version, you have to type
@array = sort @array;
That would seem to be a place for more explicit, and specific, 
behaviour. Since we're going everything is an object then it's just 
a matter of:

   @array.sort_in_place;

and making sure that the array role provides the appropriate method.
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: The Sort Problem

2004-02-12 Thread Larry Wall

On Thu, Feb 12, 2004 at 12:07:37PM +0100, Ph. Marek wrote:
:  ...
:  so here is a (very rough and probably broken) syntax idea building on
:  that:
: 
:  sort :key { :descend :string .foo('bar').substr( 10, 3) }
: 
:   :key {  :int .foo('baz') }
:   :key {  :float .foo('amount') } @unsorted ;
: I see a kind of problem here: If the parts of the key are not fixed length but 
: can vary you can put them in strings *only* after processing all and 
: verifying the needed length.

I think this is easily solved by sorting on a list of strings rather than a
single string.

: - generating keys on every list element
: - storing them into a array (array of array) and
: - after having processed all checking the length, and
: - now generate the to-be-sorted-strings 
: - sort
: 
: isn't the optimal way.
: BTW: this requires that *all* keys are generated.
: In cases like
: - by name,
: - by age,
: - by height,
: - by number of toes left,
: - and finally sort by the social security number

Not if you lazily add strings to the list of strings mentioned above.

: That is to say, I very much like the syntax you propose, but I'm not sure if 
: pre-generating *every* key-part is necessarily a speed-up.
: 
: If there are expensive calculations you can always cut them short be 
: pre-calculating them into a hash by object, and just query this in sort.

Another approach is to declare a function that is cached, I suppose.

: Also I fear that the amount of memory necessary to sort an array of length N 
: is not N*2 (unsorted, sorted), but more like N*3 (unsorted, keys, sorted), 
: which could cause troubles on bigger arrays 

We'll just use virtual memory.  :-)

Larry

Re: The Sort Problem

2004-02-12 Thread Luke Palmer

Aaron Crane writes:
 Luke Palmer wrote:
  Any other ideas?
 
 How about something like this, modulo any errors in my Perl 6 syntax?
 
   sub sort(?cmp = infix:cmp, +$key, +$desc, [EMAIL PROTECTED]) { ... }
 
 I think that allows all of these:
 
   # P5: @sorted = sort @unsorted;
   @sorted = sort @unsorted;
 
 The simplest case is the same as the Perl 5, which seems a pleasant feature.

This is a feature that, IMO, we can not do without.  I still do things
like:

print $_: %hash{$_} for sort keys %hash;

All the time, and I don't want to give that up for some clearer
syntax, as there is no such thing.

 
   # P5: @sorted = sort { $a = $b } @unsorted;
   @sorted = sort { $^a = $^b } @unsorted;  # or:
   @sorted = sort infix:= == @unsorted;
 
 This also seems reasonable.
 
   # P5: @sorted = sort { $a-foo('bar')-compute = $b-foo('bar')-compute }
   #   @unsorted
   # or: @sorted = map { $_-[1] }
   #   sort { $a-[0] =? $b-[0] }
   #   map { [ $_-foo('bar')-compute, $_ ] }
   #   @unsorted
   @sorted = sort infix:=, key = { $_.foo('bar').compute } == @unsorted;

Ok, I have to say, that's pretty good.  Er, really good.  I like it a
lot.

 I think my suggestion wins big here.  We've only had to specify how to
 extract the key, and sort itself takes care of everything else.  And it looks
 to me like this sort function has enough information about the programmer's
 intent for it to optimise in all sorts of exciting ways -- it should be able
 to do the equivalent of the GRT internally, for example.
 
 Just for kicks, this one demonstrates all the features.  It's the same as
 before, but in descending order:
 
   @unsorted
 == sort infix:=, desc = 1, key = { $_.foo('bar').compute }
 == @sorted;
 
 What problems can anyone spot with this suggestion?

I don't like the Cdesc flag.  But I can't, at the moment, think of any
way around it short of:

@unsorted
== sort { $^b = $^a }, key = { .foo('bar').compute }
== @sorted

Which people have made pretty clear that they don't like.

Oh, by the way, your Perl 6 syntax is immaculate.  Bravo :-)

Luke

Re: The Sort Problem

2004-02-12 Thread Uri Guttman

 LP == Luke Palmer [EMAIL PROTECTED] writes:

   # P5: @sorted = sort { $a-foo('bar')-compute = $b-foo('bar')-compute }
   #   @unsorted
   # or: @sorted = map { $_-[1] }
   #   sort { $a-[0] =? $b-[0] }
   #   map { [ $_-foo('bar')-compute, $_ ] }
   #   @unsorted
   @sorted = sort infix:=, key = { $_.foo('bar').compute } == @unsorted;

  LP Ok, I have to say, that's pretty good.  Er, really good.  I like it a
  LP lot.

how do you select descending order? and how do you selecte that per key?
you can't provide a binary operator without also providing the
order. and what about different key types? the = and cmp operators are
not enough information needed to do complex sorts. collating sequences
are another issue. you need to have that info on a perl key basis. 

   I think my suggestion wins big here.  We've only had to specify how
   to extract the key, and sort itself takes care of everything else.
   And it looks to me like this sort function has enough information
   about the programmer's intent for it to optimise in all sorts of
   exciting ways -- it should be able to do the equivalent of the GRT
   internally, for example.

sort can't figure it out without you telling it things. you need more
than just key extraction.

   Just for kicks, this one demonstrates all the features.  It's the same as
   before, but in descending order:
   
   @unsorted
   == sort infix:=, desc = 1, key = { $_.foo('bar').compute }
   == @sorted;
   
   What problems can anyone spot with this suggestion?

multiple keys with each having different comparisons and different sort
orders.

  LP @unsorted
  LP == sort { $^b = $^a }, key = { .foo('bar').compute }
  LP == @sorted

  LP Which people have made pretty clear that they don't like.

as i have said before. needing to have $a and $b in the correct order is
bug prone and confusing to many.

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org

Re: The Sort Problem

2004-02-12 Thread Uri Guttman

 PM == Ph Marek [EMAIL PROTECTED] writes:

   ...
   so here is a (very rough and probably broken) syntax idea building on
   that:
   
   sort :key { :descend :string .foo('bar').substr( 10, 3) }
   
   :key {  :int .foo('baz') }
   :key {  :float .foo('amount') } @unsorted ;

  PM I see a kind of problem here: If the parts of the key are not
  PM fixed length but can vary you can put them in strings *only* after
  PM processing all and verifying the needed length.

varying keys has no effect on this. you still build up a merged key of
whatever length is needed for each record. perl5 does that with pack or
.= or sprintf (all used in variants of the GRT).

  PM Example:
  PM sort :key { :descend :string .foo('bar') }
  PM   :key {  :int .foo('baz') }
  PM   :key {  :float .foo('amount') } @unsorted ;

  PM Now .foo('bar') isn't bounded with any length - so you don't know how much 
  PM space to reserve.

who needs to reserve space? you just let the guts do a realloc like p5
does now.

  PM And I believe that 
  PM - generating keys on every list element
  PM - storing them into a array (array of array) and
  PM - after having processed all checking the length, and
  PM - now generate the to-be-sorted-strings 
  PM - sort

  PM isn't the optimal way.
  PM BTW: this requires that *all* keys are generated.
  PM In cases like
  PM - by name,
  PM - by age,
  PM - by height,
  PM - by number of toes left,
  PM - and finally sort by the social security number

  PM most of the extractions (and possibly database-queries of calculations or 
  PM whatever) will not be done - at least in the current form of
  PM   sort { $a-{name} cmp $b-{name} ||
  PM$a-{age} = $b-{age} || 
  PM   ...

  PM That is to say, I very much like the syntax you propose, but I'm
  PM not sure if pre-generating *every* key-part is necessarily a
  PM speed-up.

see the paper for more on that. the work you do in the callback, even if
you only do the first compare is O(N log N). in large sorts that will
outweigh the O(N) work on extracting the keys one time.

and as with all algorithms, real world cases will differ. short sorts
can be faster with complex comparisons because of the initial
overhead. maybe the internal code can decide to do that instead of a GRT
based on the sort array size. the key extract code is the same in both
cases so that would be an interesting optimization.

  PM If there are expensive calculations you can always cut them short
  PM be pre-calculating them into a hash by object, and just query this
  PM in sort.

again, the goal is to eliminate work in the comparison. but as i said,
this is an implementation problem. we should focus on the language level
needs of the sort func and not how it is done internally. i shouldn't
have brought up the internal design here.

  PM Also I fear that the amount of memory necessary to sort an array
  PM of length N is not N*2 (unsorted, sorted), but more like N*3
  PM (unsorted, keys, sorted), which could cause troubles on bigger
  PM arrays 

as larry said, virtual memory. and there are ways to reduce the copies
internally with pointers (something p5 couldn't do at p5 code level).

other than copying the extracted keys, you could leave the original
array alone and sort it in place by indexing it based on the sorted
pointers. and since the records will all be the same PMC type (usually),
you could just swap the array's pointers and not move any data
around. so no extra copies would be made other than key extraction.

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org

Re: The Sort Problem

2004-02-12 Thread Aaron Crane

Luke Palmer wrote:
 Aaron Crane writes:
@unsorted
  == sort infix:=, desc = 1, key = { $_.foo('bar').compute }
  == @sorted;
 
 I don't like the Cdesc flag.  But I can't, at the moment, think of any
 way around it short of:
 
 @unsorted
 == sort { $^b = $^a }, key = { .foo('bar').compute }
 == @sorted
 
 Which people have made pretty clear that they don't like.

One option might be an 'rsort' function, but I think that's somewhat lacking
in the taste department.

Another might be as simple as

  @unsorted == sort == reverse == @sorted;

But I can see an argument that C == reverse  is quite a large code
burden for something so conceptually simple.

I have one other idea, but I can't decide if I like it:

  @unsorted == sort rinfix:cmp == @sorted;

That is, rinfix: (or some other name) is like infix:, but gives you a
function that reverses its arguments before actually running the operator.
Perhaps it could even be implemented as a macro.

-- 
Aaron Crane * GBdirect Ltd.
http://training.gbdirect.co.uk/courses/perl/

Re: The Sort Problem

2004-02-12 Thread Uri Guttman

 AC == Aaron Crane [EMAIL PROTECTED] writes:

  AC One option might be an 'rsort' function, but I think that's
  AC somewhat lacking in the taste department.

  AC Another might be as simple as

  AC   @unsorted == sort == reverse == @sorted;

again, reverse order needs to be on a per key basis. the problem we are
wrangling is how to support multiple keys in a clean syntactical way.

  AC   @unsorted == sort rinfix:cmp == @sorted;

  AC That is, rinfix: (or some other name) is like infix:, but gives you a
  AC function that reverses its arguments before actually running the operator.
  AC Perhaps it could even be implemented as a macro.

again, confusing. why should the order of a binary operator mean so
much? the order of a sort key is either ascending or descending. that is
what coders want to specify. translating that to the correct operator
(cmp or =) and the correct binary order is not the same as specifying
the key sort order and key type (int, string, float).

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org

Re: The Sort Problem

2004-02-12 Thread Luke Palmer

Aaron Crane writes:
 I have one other idea, but I can't decide if I like it:
 
   @unsorted == sort rinfix:cmp == @sorted;
 
 That is, rinfix: (or some other name) is like infix:, but gives you a
 function that reverses its arguments before actually running the operator.
 Perhaps it could even be implemented as a macro.

Like this one?

macro rinfix($op)
is parsed(/ \: (.*?) before: [\s(]  /)
{
return {
- $a, $b {
(infix:$op).($b, $a)
}
}
}

Luke

Re: The Sort Problem

2004-02-12 Thread Simon Cozens

[EMAIL PROTECTED] (Aaron Crane) writes:
 One option might be an 'rsort' function, but I think that's somewhat lacking
 in the taste department.

Agreed.
 
 Another might be as simple as
 
   @unsorted == sort == reverse == @sorted;

@sorted == sort == @unsorted, no? ;)

   @unsorted == sort rinfix:cmp == @sorted;
 
 That is, rinfix: (or some other name) is like infix:, but gives you a
 function that reverses its arguments before actually running the operator.
 Perhaps it could even be implemented as a macro.

I like this; it reminds me of the games you have to play with determining
what do to with binary operators whose operands are overloaded in different
classes.

-- 
Do not go gentle into that good night/Rage, rage against the dying of the light

Re: The Sort Problem

2004-02-12 Thread Dave Whipp

Perhaps something like:

@sorted = sort { infix:= map { scalar $_.foo('bar').compute } @^_ } }
@data

I'm not entirely sure it's readability is better than yours, though.

Dave.


Luke Palmer [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 I've been thinking about this problem which comes up in my code a lot:

 @sorted = sort { $^a.foo('bar').compute = $^b.foo('bar').compute }
 @unsorted;

 Often the expressions on each side are even longer than that.  But one
 thing remains:  both sides are exactly the same, substitute a $^b for a
 $^a.

 I can see a couple less-than-desirable ways around this redundancy:

 @sorted = sort { infix:=( *($^a, $^b)».foo('bar').compute ) }
 @unsorted;

 Which doesn't work if .compute returns a list... not to mention its
 horrible ugliness.  Another is to define a variant of sort (haven't had
 much practice with A6 material recently; here we go!):

 multi sub sort (block($) = { $_ } : [EMAIL PROTECTED]) {
 sort { block($^a) cmp block($^b) } @data;
 }

 @sorted = sort { .foo('bar').compute } @unsorted;

 Which has the disadvantage of forcing you to use Ccmp and forcing an
 ascending sort.

 Any other ideas?  Is a more general solution necessary?

 Luke

Re: The Sort Problem

2004-02-12 Thread Aaron Sherman

On Thu, 2004-02-12 at 15:46, Uri Guttman wrote:
  LP == Luke Palmer [EMAIL PROTECTED] writes:

   LP Ok, I have to say, that's pretty good.  Er, really good.  I like it a
   LP lot.
 
 how do you select descending order? and how do you selecte that per key?
 you can't provide a binary operator without also providing the
 order. and what about different key types? the = and cmp operators are
 not enough information needed to do complex sorts. collating sequences
 are another issue. you need to have that info on a perl key basis. 

You know, I have trouble typing per when I'm thinking perl too ;-)

I think my example addressed all of your concerns, and did it in Perl 5.
Once again, that was:

sub sortpairs(@) {
  my $comp = shift;
  my %pairs = @_;
  return map {$pairs{$_}} sort {$comp-()} keys %pairs;
}

@new1 = sortpairs {$a = $b} map {($_-computekey,$_)} @old1;
@new2 = sortpairs {$a cmp $b} map {(lc($_),uc($_))} @old2;

So, in order, your questions were:

  * how do you select descending order?
You reverse the $a and $b in the first parameter to sortpairs
  *  and how do you selecte that per key?
Ok, I was only addressing one key. But, it's not hard to
generalize the idea, and certainly Perl 6 gives you the tools to
do so easily.
  * you can't provide a binary operator without also providing the
order.
Correct
  * and what about different key types? the = and cmp operators
are not enough information needed to do complex sorts.
That's fine, you can plug in a call to Alien::Headed::Babies
inside the closure for all I care.
  * collating sequences are another issue. you need to have that
info on a perl key basis.
Give me an example?
  * multiple keys with each having different comparisons and
different sort orders.

Ok let's delve into it then:

sub sortpairs(@){
  my $comp = shift;
  my @elements = @_;
  return map {my $e=$elements[$_];$e-[$#{$e}]} sort {
 my @akeys = @{$elements[$a]}[0..$#{$elements[$a]}-1];
 my @bkeys = @{$elements[$b]}[0..$#{$elements[$b]}-1];
 for(my $i=0;$i@akeys;$i++) {
   my $dir = $comp-($akey[$i],$bkey[$i]);
   return $dir if $dir;
 }
 return 0;
  } 1..scalar(@elements);
}
   
 
@new1 = sortpairs {$_[0] = $_[1]} map {[$_,$_]} @old1;
@new2 = sortpairs {$_[0] = $_[1] || $_[4] cmp $_[3]}  map {[getkeys($_),$_]} 
@old2;

The second example really illustrates the point that you can swap the
direction of key order and mechanism to compare them at your whim.

Now, you just need to call sortpairs with any array of arrays of keys
(with trailing value).

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback

Re: The Sort Problem

2004-02-12 Thread Uri Guttman

 AS == Aaron Sherman [EMAIL PROTECTED] writes:

  AS On Thu, 2004-02-12 at 15:46, Uri Guttman wrote:
   

   how do you select descending order? and how do you selecte that per
   key?

   you can't provide a binary operator without also providing the
   order. and what about different key types? the = and cmp
   operators are not enough information needed to do complex
   sorts. collating sequences are another issue. you need to have that
   info on a perl key basis.

  AS I think my example addressed all of your concerns, and did it in Perl 5.
  AS Once again, that was:

  AS sub sortpairs(@) {
  AS   my $comp = shift;
  AS   my %pairs = @_;
  AS   return map {$pairs{$_}} sort {$comp-()} keys %pairs;
  AS }

  AS @new1 = sortpairs {$a = $b} map {($_-computekey,$_)} @old1;
  AS @new2 = sortpairs {$a cmp $b} map {(lc($_),uc($_))} @old2;

  AS So, in order, your questions were:

  AS   * how do you select descending order?
  AS You reverse the $a and $b in the first parameter to sortpairs

i find that a poor solution. the $a cmp $b isn't needed in the ascending
case at all. a descending marker per key is better as it reflects the
results desired and not how it gets done. the reversing of $a and $b
requires the coder to understand sort comparison callbacks. it is
implementation exposed.

  AS   *  and how do you selecte that per key?
  AS Ok, I was only addressing one key. But, it's not hard to
  AS generalize the idea, and certainly Perl 6 gives you the tools to
  AS do so easily.

i haven't seen any proposals other than mine (which is bad syntax but
good semantics) for this. and if you have multiple keys with each having
a $a = $b in there it will be very noisy.

  AS   * you can't provide a binary operator without also providing the
  AS order.
  AS   Correct

a single descending marker is simpler and better semantics.

  AS   * and what about different key types? the = and cmp operators
  AS are not enough information needed to do complex sorts.
  AS That's fine, you can plug in a call to Alien::Headed::Babies
  AS inside the closure for all I care.

there are only a short list of key comparisons possible, int, string,
float and maybe unicode. i separate int from float since they do have
different internals in the GRT. it is one area where you do expose
stuff. otherwise you could just use number. 

  AS   * collating sequences are another issue. you need to have that
  AS info on a perl key basis.
  AS Give me an example?

simple. pick almost any language char set other than US ascii. many have
special collating sequences. i am not anything close to a unicode expert
but i have seen this issue. in fact setting the LANG (or some other)
environment variable will affect many programs by changing the collating
order.


  AS   * multiple keys with each having different comparisons and
  AS different sort orders.

  AS Ok let's delve into it then:

  AS sub sortpairs(@){
  AS   my $comp = shift;
  AS   my @elements = @_;
  AS   return map {my $e=$elements[$_];$e-[$#{$e}]} sort {
  AS  my @akeys = @{$elements[$a]}[0..$#{$elements[$a]}-1];

why the -1? that looks like:

my @akeys = @{$elements[$a]} ;
pop @akeys ;
pop @akeys ;

  AS  my @bkeys = @{$elements[$b]}[0..$#{$elements[$b]}-1];
  AS  for(my $i=0;$i@akeys;$i++) {
  ASmy $dir = $comp-($akey[$i],$bkey[$i]);
  ASreturn $dir if $dir;
  AS  }
  AS  return 0;
  AS   } 1..scalar(@elements);
  AS }

that is scary. do you realize that the sort block will be called for
each comparison?

  AS @new1 = sortpairs {$_[0] = $_[1]} map {[$_,$_]} @old1;

  AS @new2 = sortpairs {$_[0] = $_[1] || $_[4] cmp $_[3]} map
  AS {[getkeys($_),$_]} @old2;

where is getkeys defined? how do you know what indexes to use for each
comparison? what happened to $_[2]? your call to $comp is passed 2
arguments but the second example accesses 4 arguments.

  AS The second example really illustrates the point that you can swap the
  AS direction of key order and mechanism to compare them at your whim.

  AS Now, you just need to call sortpairs with any array of arrays of keys
  AS (with trailing value).

add a third key to that. quickly!

a more straightforward api (which is close to what i will do in
Sort::GRT) is (p5 code)

my $sort = Sort::GRT-new(

keys= [
{ descending = 1,
  type = float,
  extract = '$_-{amount},
},
{ extract = 'substr( $_-{date}, 0, 4)' },
]
) ;

my @sorted = $sort-do_it( @unsorted ) ;

$sort-do_it_inplace( [EMAIL PROTECTED] ) ;

Re: The Sort Problem

2004-02-12 Thread Larry Wall

On Thu, Feb 12, 2004 at 09:37:37PM +, Simon Cozens wrote:
: [EMAIL PROTECTED] (Aaron Crane) writes:
:  That is, rinfix: (or some other name) is like infix:, but gives you a
:  function that reverses its arguments before actually running the operator.
:  Perhaps it could even be implemented as a macro.
: 
: I like this; it reminds me of the games you have to play with determining
: what do to with binary operators whose operands are overloaded in different
: classes.

Which we're trying very hard to get away from in Perl 6...which reminds me,
I have to put something about that into A12...

Larry

Re: The Sort Problem

2004-02-12 Thread Larry Wall

On Thu, Feb 12, 2004 at 04:29:58PM -0500, Uri Guttman wrote:
: again, confusing. why should the order of a binary operator mean so
: much? the order of a sort key is either ascending or descending. that is
: what coders want to specify. translating that to the correct operator
: (cmp or =) and the correct binary order is not the same as specifying
: the key sort order and key type (int, string, float).

Uri is dead on with this one, guys.

Larry

dynamic arguments (was: The Sort Problem)

2004-02-12 Thread Jonathan Lang

Jonathan Lang wrote:
 Luke Palmer wrote:
  I've been thinking about this problem which comes up in my code a lot:
  
 @sorted = sort { $^a.foo('bar').compute = $^b.foo('bar').compute}
  @unsorted;
  
  Often the expressions on each side are even longer than that.  But one
  thing remains:  both sides are exactly the same, substitute a $^b for
  a $^a.
 
 The problem, then, isn't specific to sort; it happens any time that you
 find yourself needing to do the same things to multiple arguments.  As
 such, the solution ought to be more general than just sort.  
 
 You're looking for something to the effect of:
 
   @sorted = sort { parameters($^a = $^b).foo('bar').compute }
 
 That is, you need a way to factor C.foo('bar').compute out from each
 of the arguments that it applies to.  For list arguments, this is
 straightforward: pipe the list arguments through a map before you pipe
 them into the routine.  A similar approach could be used with named
 arguments.  The problem comes when you deal with positional arguments.  

How about including something similar to ==, but which binds the elements
of the list to the various positional parameters?  For instance: 

  @sorted = sort {infix:= args map {$_.foo('bar').compute}, $^a, $^b }
@unsorted; 

Where

  @x = $a, $b, $c; 
  routine args @x; 

is equivelent to

  routine $a, $b, $c;

=
Jonathan Dataweaver Lang

__
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

Re: dynamic arguments (was: The Sort Problem)

2004-02-12 Thread Luke Palmer

Jonathan Lang writes:
 How about including something similar to ==, but which binds the elements
 of the list to the various positional parameters?  For instance: 
 
   @sorted = sort {infix:= args map {$_.foo('bar').compute}, $^a, $^b }
 @unsorted; 
 
 Where
 
   @x = $a, $b, $c; 
   routine args @x; 
 
 is equivelent to
 
   routine $a, $b, $c;

We already have that.  It's spelled:

routine [EMAIL PROTECTED];

Or

routine * == @x;

Luke

Re: dynamic arguments (was: The Sort Problem)

2004-02-12 Thread Jonathan Lang

Luke Palmer wrote:
 Jonathan Lang writes:
  How about including something similar to ==, but which binds the
  elements of the list to the various positional parameters?  For 
  instance: 
  
@sorted = sort {infix:= args map {$_.foo('bar').compute}, 
$^a, $^b } @unsorted; 
  
  Where
  
@x = $a, $b, $c; 
routine args @x; 
  
  is equivelent to
  
routine $a, $b, $c;
 
 We already have that.  It's spelled:
 
 routine [EMAIL PROTECTED];
 
 Or
 
 routine * == @x;

Then you've got your solution:

  @sorted = sort {infix:= * map {$_.foo('bar').compute}, $^a, $^b }
@unsorted; 

or

  @sorted = sort {($^a, $^b) ==
  map {$_.foo('bar').compute} ==
  infix:= * } @unsorted; 

=
Jonathan Dataweaver Lang

__
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

Re: The Sort Problem (was: well, The Sort Problem)

2004-02-12 Thread Luke Palmer

Jonathan Lang writes:
  We already have that.  It's spelled:
  
  routine [EMAIL PROTECTED];
  
  Or
  
  routine * == @x;
 
 Then you've got your solution:
 
   @sorted = sort {infix:= * map {$_.foo('bar').compute}, $^a, $^b }
 @unsorted; 
 
 or
 
   @sorted = sort {($^a, $^b) ==
   map {$_.foo('bar').compute} ==
   infix:= * } @unsorted; 

No, I don't think we do.  Uri's given some good points in this thread
about specifying what you want out of the sort, instead of some
arbitrary callback interface.

But it needs some major syntax work so it can feel more like it's a part
of the language instead of a library function.  Not, mind, that I think
Perl's syntax needs to be changed at all to accommodate.

I'm thinking that Csort might take a list of closures or pairs (with
the value being a closure).  I'm still not sure how that separates from
the data to be sorted, but it's probably related to the problem of how
Cfor works in declaration.

The return value of each closure is dispatched into some sort of generic
=, hopefully one that blows up if its arguments are of different
types[1]. That way if you have a method that always returns a number,
you can just leave it alone.  Otherwise you're expected to prefix with
the approprate contextifier.

For example:

sort { .foo('bar').compute }, desc = { ~.baz }, @data;
# ascending by default  string compares

Pairs don't exactly seem fit here.  Maybe we're supposed to think of
these these closures as higher-level 'extractors', and therefore attach
properties to them:

sort { .foo('bar').compute }, { ~.baz } but descending, @data;

Or even something on the value that tells = to negate whatever it
returns:

sort { .foo('bar').compute }, { ~.baz but swap }, @data;

That's starting to expose implementation again, though.

Moving a different direction now.  Instead of being honest and saying
that sort does each of these comparisons until one of them is nonzero,
let's pretend that it sorts with respect to the first comparison and
feeds the lists of equal elements into the second comparison, etc.

Sort might then look like:

sub sort (?extract, ?next, [EMAIL PROTECTED])

And our example turns into:

sort { .foo('bar').compute } 
 { reverse sort { ~.baz } [EMAIL PROTECTED] } 
 [EMAIL PROTECTED];

That hurts, especially when there are more than two keys.  I'd really
like to do that with the piping operators instead of nested closures,
but that screws with the simple case.

There might be even more ways of looking at this that would result in
more novel syntaxes.  Brainstorming required.

Luke

[1] There's something to be said for a generic comparison, but not one
that blows up.  Mostly because certain containers want their elements to
be well ordered.  But we could just as easily say that elements that
aren't intercompatible with the short-tempered = aren't allowed in
said containers.

The Sort Problem

2004-02-11 Thread Luke Palmer

I've been thinking about this problem which comes up in my code a lot:

@sorted = sort { $^a.foo('bar').compute = $^b.foo('bar').compute }
@unsorted;

Often the expressions on each side are even longer than that.  But one
thing remains:  both sides are exactly the same, substitute a $^b for a
$^a.

I can see a couple less-than-desirable ways around this redundancy:

@sorted = sort { infix:=( *($^a, $^b).foo('bar').compute ) }
@unsorted;

Which doesn't work if .compute returns a list... not to mention its
horrible ugliness.  Another is to define a variant of sort (haven't had
much practice with A6 material recently; here we go!):

multi sub sort (block($) = { $_ } : [EMAIL PROTECTED]) {
sort { block($^a) cmp block($^b) } @data;
}

@sorted = sort { .foo('bar').compute } @unsorted;

Which has the disadvantage of forcing you to use Ccmp and forcing an
ascending sort.

Any other ideas?  Is a more general solution necessary?

Luke

Re: The Sort Problem

2004-02-11 Thread Gregor N. Purdy

Luke --

Hmmm... I haven't been practicing my Perl 6, and its been a while
since the last Apocalyptic refresher, but here goes (I'll don a paper
bag preemptively)...

Thinking of that as the equivalent to:

  sort {
my ($ta, $tb) = map { $_.foo('bar').compute } ($^a, $^b);
$ta = $tb
  } @unsorted;

so if you had a unary = that took a two element array (old style
syntax here, sorry):

  sub cmp_num { my ($a, $b) = @_; $a = $b; }

you could write that as:

  sort {
cmp_num map { $_.foo('bar').compute } ($^a, $^b);
  } @unsorted;

I'd be a bit afraid of allowing = itself to be prefix, although I
admit it would be handy not to need the separate sub definition:

  sort {
= map { $_.foo('bar').compute } ($^a, $^b);
  } @unsorted;

The Schwartzian is the usual way, of course:

  map { $_-[1] }
  sort { $^a[0] = $^b[0] }
  map { [ $_.foo('bar').compute, $_ ] }
  @unsorted;

But we really aren't being concise here.

(Aside) I wonder if there will be a new Perl 6 idiom to replace this
using properties, something like this:

  map { $_.really }
  sort : numerically
  map { $_.foo('bar').compute but really($_) }
  @unsorted;

or:

  @unsorted
== map { $_.foo('bar').compute but really($_) }
== sort : numerically
== map { $_.really };

But we *still* aren't concise.

Maybe we should have a two-block sort? The second block gives the
transform to apply to the elements before applying the comparison
block:

  sort { $^a = $^b } { $_.foo('bar').compute } @unsorted;

Or maybe even something like this:

  sort :numerically :as { $^a.foo('bar').compute } @unsorted;


Regards,

-- Gregor

On Wed, 2004-02-11 at 19:11, Luke Palmer wrote:
 I've been thinking about this problem which comes up in my code a lot:
 
 @sorted = sort { $^a.foo('bar').compute = $^b.foo('bar').compute }
 @unsorted;
 
 Often the expressions on each side are even longer than that.  But one
 thing remains:  both sides are exactly the same, substitute a $^b for a
 $^a.
 
 I can see a couple less-than-desirable ways around this redundancy:
 
 @sorted = sort { infix:=( *($^a, $^b)».foo('bar').compute ) }
 @unsorted;
 
 Which doesn't work if .compute returns a list... not to mention its
 horrible ugliness.  Another is to define a variant of sort (haven't had
 much practice with A6 material recently; here we go!):
 
 multi sub sort (block($) = { $_ } : [EMAIL PROTECTED]) {
 sort { block($^a) cmp block($^b) } @data;
 }
 
 @sorted = sort { .foo('bar').compute } @unsorted;
 
 Which has the disadvantage of forcing you to use Ccmp and forcing an
 ascending sort.
 
 Any other ideas?  Is a more general solution necessary?
 
 Luke
-- 
Gregor Purdy[EMAIL PROTECTED]
Focus Research, Inc.   http://www.focusresearch.com/

Re: [perl] The Sort Problem

2004-02-11 Thread Joe Gottman

- Original Message - 
From: Luke Palmer [EMAIL PROTECTED]
To: Language List [EMAIL PROTECTED]
Sent: Wednesday, February 11, 2004 10:11 PM
Subject: [perl] The Sort Problem

 I've been thinking about this problem which comes up in my code a lot:

 @sorted = sort { $^a.foo('bar').compute = $^b.foo('bar').compute }
 @unsorted;

 Often the expressions on each side are even longer than that.  But one
 thing remains:  both sides are exactly the same, substitute a $^b for a
 $^a.

 I can see a couple less-than-desirable ways around this redundancy:

 @sorted = sort { infix:=( *($^a, $^b).foo('bar').compute ) }
 @unsorted;

 Which doesn't work if .compute returns a list... not to mention its
 horrible ugliness.  Another is to define a variant of sort (haven't had
 much practice with A6 material recently; here we go!):

 multi sub sort (block($) = { $_ } : [EMAIL PROTECTED]) {
 sort { block($^a) cmp block($^b) } @data;
 }

 @sorted = sort { .foo('bar').compute } @unsorted;

 Which has the disadvantage of forcing you to use Ccmp and forcing an
 ascending sort.

 Any other ideas?  Is a more general solution necessary?

   This is unrelated to the problem you mentioned, but there is another
annoying problem with sort as it is currently defined.  If you have an
@array and you want to replace it with the sorted version, you have to type
@array = sort @array;

Besides being long-winded, this causes Perl to make an unnecessary copy of
the array.  It would be nice calling if sort (or reverse, or other similar
functions) in void context resulted in in-place modification of the array
that was input.

Joe Gottman

Re: The Sort Problem

2004-02-11 Thread Uri Guttman

 GNP == Gregor N Purdy [EMAIL PROTECTED] writes:

  GNP The Schwartzian is the usual way, of course:

  GNP   map { $_-[1] }
  GNP   sort { $^a[0] = $^b[0] }
  GNP   map { [ $_.foo('bar').compute, $_ ] }
  GNP   @unsorted;

and the GRT is another. 

let stick my nose in here. sorting is really composed of 3 parts,
extracting keys, comparing them and then getting back the records in
sorted order. the p5 sort block idioms require duplicating the extract
code and usually a binary operator. the GRT eliminates the binary
operator as it uses the default string compare in sort but it still
requires key extraction and then record extraction (similar to the first
map (last executed) in the ST.

  GNP But we really aren't being concise here.

  GNP (Aside) I wonder if there will be a new Perl 6 idiom to replace this
  GNP using properties, something like this:

  GNP   map { $_.really }
  GNP   sort : numerically
  GNP   map { $_.foo('bar').compute but really($_) }
  GNP   @unsorted;

  GNP or:

  GNP   @unsorted
  GNP == map { $_.foo('bar').compute but really($_) }
  GNP == sort : numerically
  GNP == map { $_.really };

  GNP But we *still* aren't concise.

that is the problem. the merging of the extracted key (in the ST or GRT)
with the record and the extraction of the record need to be hidden as
that is common code.

  GNP Maybe we should have a two-block sort? The second block gives the
  GNP transform to apply to the elements before applying the comparison
  GNP block:

  GNP   sort { $^a = $^b } { $_.foo('bar').compute } @unsorted;

duplicate args again.

  GNP Or maybe even something like this:

  GNP   sort :numerically :as { $^a.foo('bar').compute } @unsorted;

that looks best but how do you replace the :numerically comparison with
say a unicode aware string compare?

i have been working on a Sort::GRT in my head (finally!) and have some
ideas.

first, key extraction should work on $_ which is set for each element of
the unsorted list so there is no need for $^a. also how can you extract
multiple keys? and each key needs a ascending/descending option. and
each key needs a type so it can be compared correctly.

so here is a (very rough and probably broken) syntax idea building on
that:

sort :key { :descend :string .foo('bar').substr( 10, 3) }
 :key {  :int .foo('baz') }
 :key {  :float .foo('amount') } @unsorted ;

sort takes a list of :key's which each has a block (or whatever) that
specifies the key type, ordering and extraction code (that works on the
record in $_). the extraction code is in a block in itself so you can
declare lexicals and use multiple statements. the last value (in list
context) is the key. this is so you can do (p5 style) /foo(\d+)bar/ and
extract that digit string as the key.

side note: i was pondering the problem of where two different keys have
to be extracted from nearby places deep in a nested record
structure. this means duplicate code will be used to get into the deep
part. in my module, i was planning to allow a 'pre-extraction' piece of
code to be used and the value (e.g. a ref to the deep part) stored in a
lexical which can be used later in both key extractions. i don't know
any way to do that with the above syntax. there has to be a larger block
around all the :key's to allow something like that here. the p5 module
will be wrapping all the key extract code in a block so this would work
there.

internally, each record has its keys extracted and they are merged using
the GRT or ST or similar. in either case the record must be attached
(via a pointer or appended or some other way) to the merged key. the
keys are sorted and then the attached records are grabbed back and
returned. sorting in place can be done too.

what is good about the above syntax is that it removes all the common
code, removes the binary operator (order and type is all that matters)
and allows for a simple way to express multiple key extractions.

it can be implemented internally in c to be far faster than any perl or
parrot level code could do. the biggest hurdle i see is proper support
for unicode stuff. maybe another key attribute would be the collating
name? how this affects internal sorting is a question. can you generate
a merged key with numbers, ascii and unicode strings that will sort with
simple string compares? maybe the unicode extraction has to do a char by
char conversions to (32 bit?) integers using the collate sequence and
those numbers can easily be merged and sorted. note that only the
extracted key text needs to be converted so that shouldn't be too
expensive. so it may look like this:

sort :key { :descend :string :collate('iso_some_country') 
  .foo('bar').substr( 10, 3) }
 :key {  :int .foo('baz') } @unsorted ;

what is nice about that is that each key could have its own collate
type. the :string attribute could be optional since the :collate implies
a string.

just trying to sort things out,

uri

-- 
Uri

Re: The Sort Problem

2004-02-11 Thread Luke Palmer

Uri Guttman writes:
  GNP == Gregor N Purdy [EMAIL PROTECTED] writes:
 
   GNP (But, the GRT doesn't apply to sorting general objects, and the
   GNP example has @unsorted containing objects on which we run
   GNP .foo('bar').compute.)
 
 there are no restrictions on what the GRT (or ST) can sort. the key is
 key extraction and that can be from an object or anything. if the
 compute method has to be called each time during the comparison, you are
 in deep sort voodoo. so you can just call it during the extract phase
 (the first executed map in the GRT or ST). if you have the object in $_,
 you can call a method or do whatever you need to get the key.

Might you explain what the GRT is and what it is intended to do?  I've
been lost throughout most of your proposal. 

Thanks,
Luke

Re: The Sort Problem

2004-02-11 Thread Uri Guttman

 LP == Luke Palmer [EMAIL PROTECTED] writes:

  LP Uri Guttman writes:
GNP == Gregor N Purdy [EMAIL PROTECTED] writes:
   
  GNP (But, the GRT doesn't apply to sorting general objects, and the
  GNP example has @unsorted containing objects on which we run
  GNP .foo('bar').compute.)
   
   there are no restrictions on what the GRT (or ST) can sort. the key is
   key extraction and that can be from an object or anything. if the
   compute method has to be called each time during the comparison, you are
   in deep sort voodoo. so you can just call it during the extract phase
   (the first executed map in the GRT or ST). if you have the object in $_,
   you can call a method or do whatever you need to get the key.

  LP Might you explain what the GRT is and what it is intended to do?
  LP I've been lost throughout most of your proposal.

and now you know how i feel about some threads you delve into! my brane
hertz! :)

sorry about the confusion. read the paper at:

http://www.sysarch.com/perl/sort_paper.html

the idea is simple and very old. you extract keys from the records and
merge them (with some munging) into a byte string. then you can run a
straight sort with comparisons on those merged key strings and use fast
string compares. the win is not doing the key extraction on each set of
comparisons which is O(N log N). instead you do the extractions once per
key which is O(N). it is a real world optimization as pure O() theory
wouldn't care as the dominating O(N log N) is still the same in both
cases. :)

all we (larry rosler and myself) did was translate the idea into perl
and document how to do it. one of these days i will actually create a
perl5 module (started one years ago but it got shelved) to implement
this and make it easy to use. i recently came up with a design
breakthrough (just not being as stupid as i was a few years ago) to make
it easier to do so i may get back onto it.

but much of my p6 proposal was to have the sort function reflect the
essence of both the GRT and ST which is to factor out key extraction in
a map before the sort. my syntax examples were meant to show what is
needed to pass into a full sort system and what isn't needed. perl5's
sort exposed everything and required redundant code in the comparison
block, confusing expressions, etc. the ST and GRT removed some of the
redundant code but replaced that with the map/sort/map which has
different redundant code (redundant between different instances of
them. why see the map parts?).

so my proposed syntax isolates the key extraction code and makes it work
on $_. it also removes the comparison operator stuff and allows the
coder to just say what they want in regard to sort order, key type and
collate sequence. the guts will do all the common code (the maps) and
generate/compile the proper key extraction code. in fact, there is no
need for the GRT or ST to even be implemented at all. the key extraction
code and order could be used to generate comparison stuff just like
perl5 does. the ST and GRT are just implementation optimizations and
don't belong in a language level discussion.

hope that helps,

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org

80 matches

Mail list logo