Re: [Fwd: Re: [FWP] sorting text in human-order]

2000-12-29 Thread Piers Cawley

"David L. Nicol" [EMAIL PROTECTED] writes:
 Piers Cawley [EMAIL PROTECTED] writes:
 [EMAIL PROTECTED] (Yitzchak Scott-Thoennes) writes:
 
$srt =~ tr/0-9a-z\xe9/a-jA-ZE/;  # uc  sort nums after letters
 
 `10' is going to sort before `2' with that rule. Having done the whole
 bitter experience thing with this, may I suggest:
 
 $srt =~ s/(\d+)/unpack("B32", pack("N",$1))/eg
 
 Which will give you nice 32 bit binary representations of your
 numbers, which have leading zeros and will sort properly via cmp.
 
 If you want a sample of the pain I had working that out, you
 should've been at my 12 step perl session at YAPC::Europe.

 Is there a perl6 sort committee yet?  AFter reading Cawley's
 method here, I wonder if using it we could make radix-sorts the
 default sort method.

Er... the point behind changing numbers to binary strings was
emphatically not so that they could be sorted by a Radix method, but
to ensure that numbers within text would sort correctly: qw(A1 A2 A3
A10) instead of qw(A1 A10 A2 A3)...




Re: [Fwd: Re: [FWP] sorting text in human-order]

2000-12-29 Thread David L. Nicol

Piers Cawley wrote:
 
 "David L. Nicol" [EMAIL PROTECTED] writes:
  After reading Cawley's
  method, I wondered if using it we could make radix-sorts the
  default sort method.
 
 Er... the point behind changing numbers to binary strings was
 emphatically not so that they could be sorted by a Radix method, but
 to ensure that numbers within text would sort correctly: qw(A1 A2 A3
 A10) instead of qw(A1 A10 A2 A3)...

The rsort documentation informs that radix-sorts will sort ascii
text.

My thought was that the perl6 default sort could do an implied ST on
the data, using Cawley's Substitution, and then a radix-sort, instead
of analyzing each pair of data to see if they are numeric or not using
whatever the current heuristic is.

I do not know exactly what the perl5 default sort heuristic is, aside that
it tries to DWIM both numeric and string data.

Without the ST, the sort function would be

 sub PCsort {
   my $mya = $a;
   my $myb = $b;
 $mya =~ s/(\d+)/unpack("B32", pack("N",$1))/eg;
 $myb =~ s/(\d+)/unpack("B32", pack("N",$1))/eg;
  return $mya cmp $myb;
 }

With ST (and duplicate loss correction!)

   sub PCsort(@){
my $this;
my $trans;
my %duplicates;
my %doppleganger;
while ($trans = $this = shift){
$trans =~ s/(\d+)/unpack("B32", pack("N",$1))/eg;
exists $doppleganger{$trans} and $duplicates{$trans}++;
$doppleganger{$trans} = $this;
};
my @Sorted = sort {$a cmp $b} keys %doppleganger;
my @result; # from here down could be a map{} but it would be
# hard to understand
foreach $trans (@Sorted){
do{
push @result, $doppleganger{$trans};
}while($duplicates{$trans}--);
};
@result
   };




On another note, anyone for suppressing the use-of-unititalized warning
on the unary incrementors?


-- 
   David Nicol 816.235.1187 [EMAIL PROTECTED]
Today in art class, draw your sword




Re: [Fwd: Re: [FWP] sorting text in human-order]

2000-12-29 Thread Jarkko Hietaniemi

On Sat, Dec 30, 2000 at 05:31:29AM +, David L. Nicol wrote:
 Piers Cawley wrote:
  
  "David L. Nicol" [EMAIL PROTECTED] writes:
   After reading Cawley's
   method, I wondered if using it we could make radix-sorts the
   default sort method.
  
  Er... the point behind changing numbers to binary strings was
  emphatically not so that they could be sorted by a Radix method, but
  to ensure that numbers within text would sort correctly: qw(A1 A2 A3
  A10) instead of qw(A1 A10 A2 A3)...
 
 The rsort documentation informs that radix-sorts will sort ascii
 text.
 
 My thought was that the perl6 default sort could do an implied ST on
 the data, using Cawley's Substitution, and then a radix-sort, instead
 of analyzing each pair of data to see if they are numeric or not using
 whatever the current heuristic is.
 
 I do not know exactly what the perl5 default sort heuristic is, aside that
 it tries to DWIM both numeric and string data.

"sort heuristic"?  "DWIM both numeric and string data"?  There is
no "heuristic".  There is no "DWIM".  Perl's sort() does by default
string sort based on the byte values of the strings of its argument
list.  That's it.  Period.  Full stop.

If you want something else, like a numeric comparison, or, say, a
case-ignorant string comparison, or whatever, then you supply the
comparison function yourself. 

The sorting algorithm? Before 5.005 (I think...my memory is going)
vendors' quicksort, after that Tom Horsley's excellent ultratuned
quicksort (since vendors' quicksorts were (a) buggy (c) slow),
in 5.7 mergesort by John Lindermann was introduced.

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen