Re: [Fwd: Re: [FWP] sorting text in human-order]
"David L. Nicol" [EMAIL PROTECTED] writes: Piers Cawley [EMAIL PROTECTED] writes: [EMAIL PROTECTED] (Yitzchak Scott-Thoennes) writes: $srt =~ tr/0-9a-z\xe9/a-jA-ZE/; # uc sort nums after letters `10' is going to sort before `2' with that rule. Having done the whole bitter experience thing with this, may I suggest: $srt =~ s/(\d+)/unpack("B32", pack("N",$1))/eg Which will give you nice 32 bit binary representations of your numbers, which have leading zeros and will sort properly via cmp. If you want a sample of the pain I had working that out, you should've been at my 12 step perl session at YAPC::Europe. Is there a perl6 sort committee yet? AFter reading Cawley's method here, I wonder if using it we could make radix-sorts the default sort method. Er... the point behind changing numbers to binary strings was emphatically not so that they could be sorted by a Radix method, but to ensure that numbers within text would sort correctly: qw(A1 A2 A3 A10) instead of qw(A1 A10 A2 A3)...
Re: [Fwd: Re: [FWP] sorting text in human-order]
Piers Cawley wrote: "David L. Nicol" [EMAIL PROTECTED] writes: After reading Cawley's method, I wondered if using it we could make radix-sorts the default sort method. Er... the point behind changing numbers to binary strings was emphatically not so that they could be sorted by a Radix method, but to ensure that numbers within text would sort correctly: qw(A1 A2 A3 A10) instead of qw(A1 A10 A2 A3)... The rsort documentation informs that radix-sorts will sort ascii text. My thought was that the perl6 default sort could do an implied ST on the data, using Cawley's Substitution, and then a radix-sort, instead of analyzing each pair of data to see if they are numeric or not using whatever the current heuristic is. I do not know exactly what the perl5 default sort heuristic is, aside that it tries to DWIM both numeric and string data. Without the ST, the sort function would be sub PCsort { my $mya = $a; my $myb = $b; $mya =~ s/(\d+)/unpack("B32", pack("N",$1))/eg; $myb =~ s/(\d+)/unpack("B32", pack("N",$1))/eg; return $mya cmp $myb; } With ST (and duplicate loss correction!) sub PCsort(@){ my $this; my $trans; my %duplicates; my %doppleganger; while ($trans = $this = shift){ $trans =~ s/(\d+)/unpack("B32", pack("N",$1))/eg; exists $doppleganger{$trans} and $duplicates{$trans}++; $doppleganger{$trans} = $this; }; my @Sorted = sort {$a cmp $b} keys %doppleganger; my @result; # from here down could be a map{} but it would be # hard to understand foreach $trans (@Sorted){ do{ push @result, $doppleganger{$trans}; }while($duplicates{$trans}--); }; @result }; On another note, anyone for suppressing the use-of-unititalized warning on the unary incrementors? -- David Nicol 816.235.1187 [EMAIL PROTECTED] Today in art class, draw your sword
Re: [Fwd: Re: [FWP] sorting text in human-order]
On Sat, Dec 30, 2000 at 05:31:29AM +, David L. Nicol wrote: Piers Cawley wrote: "David L. Nicol" [EMAIL PROTECTED] writes: After reading Cawley's method, I wondered if using it we could make radix-sorts the default sort method. Er... the point behind changing numbers to binary strings was emphatically not so that they could be sorted by a Radix method, but to ensure that numbers within text would sort correctly: qw(A1 A2 A3 A10) instead of qw(A1 A10 A2 A3)... The rsort documentation informs that radix-sorts will sort ascii text. My thought was that the perl6 default sort could do an implied ST on the data, using Cawley's Substitution, and then a radix-sort, instead of analyzing each pair of data to see if they are numeric or not using whatever the current heuristic is. I do not know exactly what the perl5 default sort heuristic is, aside that it tries to DWIM both numeric and string data. "sort heuristic"? "DWIM both numeric and string data"? There is no "heuristic". There is no "DWIM". Perl's sort() does by default string sort based on the byte values of the strings of its argument list. That's it. Period. Full stop. If you want something else, like a numeric comparison, or, say, a case-ignorant string comparison, or whatever, then you supply the comparison function yourself. The sorting algorithm? Before 5.005 (I think...my memory is going) vendors' quicksort, after that Tom Horsley's excellent ultratuned quicksort (since vendors' quicksorts were (a) buggy (c) slow), in 5.7 mergesort by John Lindermann was introduced. -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen