Re: [Fwd: Re: [FWP] sorting text in human-order]

2001-01-02 Thread Uri Guttman

 "JSD" == Jonathan Scott Duff [EMAIL PROTECTED] writes:

  JSD On Mon, Jan 01, 2001 at 04:31:42PM -0600, Jarkko Hietaniemi wrote:
   (1) Quicksort has a weak point where it goes deep into the Quadratic Land:
   (nearly) already ordered data.  No, that is not so far-fetched a case.
   Mergesort has no similar weakpoints: its performance is in fact
   consistently N log N.
   
   (2) Mergesort is stable.  Quicksort is not.

  JSD I figured it was something like this, but wanted to know if there
  JSD were any other reasons.

  JSD I recently tracked down a bug in some software that we've been using for
  JSD years that turned out to be an implicit assumption that perl's sort
  JSD would be stable when, of course, it's not.  Had perl 5.7 come along
  JSD sooner I might never have caught the bug or been bitten by it  :-)

there was a thread the other day in c.l.p.misc about this very
issue. regardless of the stableness of perl's internal sort, you should
never assume stability unless it is specified. perl5 never specified
this and relying on 5.7 for it is foolish IMO. adding sort stability is
not hard even with an unstable sort algorithm.

on this note, i then propose that we do specify perl6's sort to be
stable. we can at this point do that and by keeping our sort algorthm
choices to stable ones, we can deliver it. i would never sepc perl5's
sort as stable due to all the older versions out there with unstable
quicksorts.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting
The Perl Books Page  ---  http://www.sysarch.com/cgi-bin/perl_books
The Best Search Engine on the Net  --  http://www.northernlight.com



Re: [Fwd: Re: [FWP] sorting text in human-order]

2001-01-01 Thread Jonathan Scott Duff

On Fri, Dec 29, 2000 at 11:47:59PM -0600, Jarkko Hietaniemi wrote:
 The sorting algorithm? Before 5.005 (I think...my memory is going)
 vendors' quicksort, after that Tom Horsley's excellent ultratuned
 quicksort (since vendors' quicksorts were (a) buggy (c) slow),
 in 5.7 mergesort by John Lindermann was introduced.

Could someone give me a pointer to the whys and wherefors of the
change from quicksort to mergesort?

thanks,

-Scott
-- 
Jonathan Scott Duff
[EMAIL PROTECTED]



Re: [Fwd: Re: [FWP] sorting text in human-order]

2001-01-01 Thread Jarkko Hietaniemi

On Mon, Jan 01, 2001 at 02:04:25PM -0600, Jonathan Scott Duff wrote:
 On Fri, Dec 29, 2000 at 11:47:59PM -0600, Jarkko Hietaniemi wrote:
  The sorting algorithm? Before 5.005 (I think...my memory is going)
  vendors' quicksort, after that Tom Horsley's excellent ultratuned
  quicksort (since vendors' quicksorts were (a) buggy (c) slow),
  in 5.7 mergesort by John Lindermann was introduced.
 
 Could someone give me a pointer to the whys and wherefors of the
 change from quicksort to mergesort?

Executive summary:

(1) Quicksort has a weak point where it goes deep into the Quadratic Land:
(nearly) already ordered data.  No, that is not so far-fetched a case.
Mergesort has no similar weakpoints: its performance is in fact
consistently N log N.

(2) Mergesort is stable.  Quicksort is not.

Some of the discussion:

http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2000-09/msg00484.html
(and the following thread)

and a real-life case:

http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2000-12/msg00241.html
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2000-12/msg00244.html

 thanks,
 
 -Scott
 -- 
 Jonathan Scott Duff
 [EMAIL PROTECTED]

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



Re: [Fwd: Re: [FWP] sorting text in human-order]

2000-12-29 Thread Piers Cawley

"David L. Nicol" [EMAIL PROTECTED] writes:
 Piers Cawley [EMAIL PROTECTED] writes:
 [EMAIL PROTECTED] (Yitzchak Scott-Thoennes) writes:
 
$srt =~ tr/0-9a-z\xe9/a-jA-ZE/;  # uc  sort nums after letters
 
 `10' is going to sort before `2' with that rule. Having done the whole
 bitter experience thing with this, may I suggest:
 
 $srt =~ s/(\d+)/unpack("B32", pack("N",$1))/eg
 
 Which will give you nice 32 bit binary representations of your
 numbers, which have leading zeros and will sort properly via cmp.
 
 If you want a sample of the pain I had working that out, you
 should've been at my 12 step perl session at YAPC::Europe.

 Is there a perl6 sort committee yet?  AFter reading Cawley's
 method here, I wonder if using it we could make radix-sorts the
 default sort method.

Er... the point behind changing numbers to binary strings was
emphatically not so that they could be sorted by a Radix method, but
to ensure that numbers within text would sort correctly: qw(A1 A2 A3
A10) instead of qw(A1 A10 A2 A3)...




Re: [Fwd: Re: [FWP] sorting text in human-order]

2000-12-29 Thread David L. Nicol

Piers Cawley wrote:
 
 "David L. Nicol" [EMAIL PROTECTED] writes:
  After reading Cawley's
  method, I wondered if using it we could make radix-sorts the
  default sort method.
 
 Er... the point behind changing numbers to binary strings was
 emphatically not so that they could be sorted by a Radix method, but
 to ensure that numbers within text would sort correctly: qw(A1 A2 A3
 A10) instead of qw(A1 A10 A2 A3)...

The rsort documentation informs that radix-sorts will sort ascii
text.

My thought was that the perl6 default sort could do an implied ST on
the data, using Cawley's Substitution, and then a radix-sort, instead
of analyzing each pair of data to see if they are numeric or not using
whatever the current heuristic is.

I do not know exactly what the perl5 default sort heuristic is, aside that
it tries to DWIM both numeric and string data.

Without the ST, the sort function would be

 sub PCsort {
   my $mya = $a;
   my $myb = $b;
 $mya =~ s/(\d+)/unpack("B32", pack("N",$1))/eg;
 $myb =~ s/(\d+)/unpack("B32", pack("N",$1))/eg;
  return $mya cmp $myb;
 }

With ST (and duplicate loss correction!)

   sub PCsort(@){
my $this;
my $trans;
my %duplicates;
my %doppleganger;
while ($trans = $this = shift){
$trans =~ s/(\d+)/unpack("B32", pack("N",$1))/eg;
exists $doppleganger{$trans} and $duplicates{$trans}++;
$doppleganger{$trans} = $this;
};
my @Sorted = sort {$a cmp $b} keys %doppleganger;
my @result; # from here down could be a map{} but it would be
# hard to understand
foreach $trans (@Sorted){
do{
push @result, $doppleganger{$trans};
}while($duplicates{$trans}--);
};
@result
   };




On another note, anyone for suppressing the use-of-unititalized warning
on the unary incrementors?


-- 
   David Nicol 816.235.1187 [EMAIL PROTECTED]
Today in art class, draw your sword




Re: [Fwd: Re: [FWP] sorting text in human-order]

2000-12-29 Thread Jarkko Hietaniemi

On Sat, Dec 30, 2000 at 05:31:29AM +, David L. Nicol wrote:
 Piers Cawley wrote:
  
  "David L. Nicol" [EMAIL PROTECTED] writes:
   After reading Cawley's
   method, I wondered if using it we could make radix-sorts the
   default sort method.
  
  Er... the point behind changing numbers to binary strings was
  emphatically not so that they could be sorted by a Radix method, but
  to ensure that numbers within text would sort correctly: qw(A1 A2 A3
  A10) instead of qw(A1 A10 A2 A3)...
 
 The rsort documentation informs that radix-sorts will sort ascii
 text.
 
 My thought was that the perl6 default sort could do an implied ST on
 the data, using Cawley's Substitution, and then a radix-sort, instead
 of analyzing each pair of data to see if they are numeric or not using
 whatever the current heuristic is.
 
 I do not know exactly what the perl5 default sort heuristic is, aside that
 it tries to DWIM both numeric and string data.

"sort heuristic"?  "DWIM both numeric and string data"?  There is
no "heuristic".  There is no "DWIM".  Perl's sort() does by default
string sort based on the byte values of the strings of its argument
list.  That's it.  Period.  Full stop.

If you want something else, like a numeric comparison, or, say, a
case-ignorant string comparison, or whatever, then you supply the
comparison function yourself. 

The sorting algorithm? Before 5.005 (I think...my memory is going)
vendors' quicksort, after that Tom Horsley's excellent ultratuned
quicksort (since vendors' quicksorts were (a) buggy (c) slow),
in 5.7 mergesort by John Lindermann was introduced.

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



Re: [Fwd: Re: [FWP] sorting text in human-order]

2000-12-28 Thread John Porter

Nathan Torkington wrote:
 
 By "pluggable" you mean that sort() should be overridable?

use D::Oh s s\?s.s;


-- 
John Porter

What would Gabrielle do?




Re: [Fwd: Re: [FWP] sorting text in human-order]

2000-12-28 Thread Jarkko Hietaniemi

On Wed, Dec 27, 2000 at 06:36:56PM -0600, David L. Nicol wrote:
 
 Is there a perl6 sort committee yet?  AFter reading Cawley's
 method here, I wonder if using it we could make radix-sorts the
 default sort method.

Radix sorts are great if the data cooperates, radix sorts can really
fly in such conditions -- but this is not always the case: radix sorts
require that you have an unambiguous rule for mapping all the keys to
a flat fixed-length 'namespace'.  Take, for example your usual strings
of practically unlimited length.  You can't define a (non-lossy and
non-colliding) rule to map them to, say, fixed-length integers.

Therefore and thusly: we aim for pluggable sort algorithms.

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



Re: [Fwd: Re: [FWP] sorting text in human-order]

2000-12-28 Thread Dan Sugalski

At 06:36 PM 12/27/00 -0600, David L. Nicol wrote:

Is there a perl6 sort committee yet?  AFter reading Cawley's
method here, I wonder if using it we could make radix-sorts the
default sort method.

I don't see any reason to not allow this--perhaps a lexically scoped 
assignment to CORE::GLOBAL::sort_func or something of the sort. Or we could 
just provide a selection of sort functions that could be imported with a 
"use sort;" and stuck in the sort function slot:

   use sort qw(radix_sort);
   sort \radix_sort @data;

more or less...

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: [Fwd: Re: [FWP] sorting text in human-order]

2000-12-28 Thread John Porter

Dan Sugalski wrote:
 
use sort qw(radix_sort);
sort \radix_sort @data;

Isn't that the slot where the comparison function goes?
Maybe something more like this:

use sort::radix_sort;
sort @data; # magically uses radix_sort instead of default.


-- 
John Porter

What would Gabrielle do?




Re: [Fwd: Re: [FWP] sorting text in human-order]

2000-12-28 Thread Jarkko Hietaniemi

On Thu, Dec 28, 2000 at 03:43:21PM -0500, John Porter wrote:
 Dan Sugalski wrote:
  
 use sort qw(radix_sort);
 sort \radix_sort @data;
 
 Isn't that the slot where the comparison function goes?
 Maybe something more like this:
 
 use sort::radix_sort;
 sort @data; # magically uses radix_sort instead of default.
 

If someone wants to play with such ideas there's Perl 5.7 which has a
new mergesort as the incore sorting algorithm, while Perl 5.6 and before
used quicksort.

 -- 
 John Porter
 
 What would Gabrielle do?

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



Re: [Fwd: Re: [FWP] sorting text in human-order]

2000-12-28 Thread Dan Sugalski

At 03:43 PM 12/28/00 -0500, John Porter wrote:
Dan Sugalski wrote:
 
 use sort qw(radix_sort);
 sort \radix_sort @data;

Isn't that the slot where the comparison function goes?
Maybe something more like this:

use sort::radix_sort;
sort @data; # magically uses radix_sort instead of default.

D'oh. Yeah, you're right. A lexically scoped pragma or a setting of some 
CORE::GLOBAL sub. (Probably one and the same, though we'd need a method to 
hoist things out a few levels of lexical scope for that to work...)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: [Fwd: Re: [FWP] sorting text in human-order]

2000-12-28 Thread John Porter

Jarkko Hietaniemi wrote:
 
 If someone wants to play with such ideas there's Perl 5.7 which has a
 new mergesort as the incore sorting algorithm, while Perl 5.6 and before
 used quicksort.

I'm triggering on the word "incore" there...
I seem to recall someone suggested on perl6-language a while back*
(or was it perl6-internals?) that perl ought also to support efficient
sorting of large volumes of data by using disk, the way unix sort does.
Pluggable algorithms would make this possible too.
Although, I suppose you'd actually like to make it possible for perl
intrinsics to use disk buffers whenever necessary, globally.

(*If the mail archives were searchable, I'd give an actual reference.)

-- 
John Porter

What would Gabrielle do?




Re: [Fwd: Re: [FWP] sorting text in human-order]

2000-12-28 Thread Peter Scott

At 04:34 PM 12/28/00 -0500, John Porter wrote:
I seem to recall someone suggested on perl6-language a while back*
(or was it perl6-internals?) that perl ought also to support efficient
sorting of large volumes of data by using disk, the way unix sort does.
Pluggable algorithms would make this possible too.
Although, I suppose you'd actually like to make it possible for perl
intrinsics to use disk buffers whenever necessary, globally.

(*If the mail archives were searchable, I'd give an actual reference.)

http://www.mail-archive.com/perl6-language@perl.org/index.html

http://www.mail-archive.com/perl6-internals@perl.org/index.html

Although I cannot find an article referring to what you mention.  It does 
sound familiar though.
--
Peter Scott
Pacific Systems Design Technologies




[Fwd: Re: [FWP] sorting text in human-order]

2000-12-27 Thread David L. Nicol


Is there a perl6 sort committee yet?  AFter reading Cawley's
method here, I wonder if using it we could make radix-sorts the
default sort method.




 Original Message 
Return-Path: [EMAIL PROTECTED]
Delivered-To: [EMAIL PROTECTED]
Received: (qmail 10490 invoked from network); 29 Nov 2000 15:17:02 -
Received: from cpe-024-221-169-054.ca.sprintbbd.net (HELO cfcl.com) (24.221.169.54)by 
hilbert.umkc.edu with SMTP; 29 Nov
2000 15:17:02 -
Received: (from majordom@localhost)by cfcl.com (8.9.3/8.9.3) id HAA12020;Wed, 29 Nov 
2000 07:05:10 -0800
(PST)(envelope-from [EMAIL PROTECTED])
X-Authentication-Warning: cfcl.com: majordom set sender to [EMAIL PROTECTED] 
using -f
Received: from rt158 (firewall.realtime.co.uk [194.205.218.14])by cfcl.com 
(8.9.3/8.9.3) with ESMTP id HAA12015for
[EMAIL PROTECTED]; Wed, 29 Nov 2000 07:05:04 -0800 (PST)(envelope-from 
[EMAIL PROTECTED])
Received: from rt158.private.realtime.co.uk (IDENT:[EMAIL PROTECTED] 
[127.0.0.1])by rt158 (8.11.0/8.8.7)
with ESMTP id eATExu511390;Wed, 29 Nov 2000 14:59:56 GMT
From: Piers Cawley [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Re: [FWP] sorting text in human-order
References: [EMAIL PROTECTED]
Date: 29 Nov 2000 14:59:56 +
In-Reply-To: [EMAIL PROTECTED]'s message of "Tue, 28 Nov 2000 23:20:28 -0800"
Message-ID: [EMAIL PROTECTED]
Lines: 17
User-Agent: Gnus/5.0807 (Gnus v5.8.7) XEmacs/21.1 (GTK)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: [EMAIL PROTECTED]
Precedence: bulk
Reply-To: Piers Cawley [EMAIL PROTECTED]
X-Mozilla-Status: 9011
X-Mozilla-Status2: 
X-UIDL: 975511023.10493.hilbert.umkc.edu

[EMAIL PROTECTED] (Yitzchak Scott-Thoennes) writes:

   $srt =~ tr/0-9a-z\xe9/a-jA-ZE/;  # uc  sort nums after letters

`10' is going to sort before `2' with that rule. Having done the whole
bitter experience thing with this, may I suggest:

$srt =~ s/(\d+)/unpack("B32", pack("N",$1))/eg

Which will give you nice 32 bit binary representations of your
numbers, which have leading zeros and will sort properly via cmp. 

If you want a sample of the pain I had working that out, you should've
been at my 12 step perl session at YAPC::Europe.

-- 
Piers


 Want to unsubscribe from Fun With Perl?  Well, if you insist...
 Send email to [EMAIL PROTECTED] with message _body_
   unsubscribe




Re: [Fwd: Re: [FWP] sorting text in human-order]

2000-12-27 Thread John Porter

David L. Nicol wrote:
 
 Is there a perl6 sort committee yet?  AFter reading Cawley's
 method here, I wonder if using it we could make radix-sorts the
 default sort method.

Perl6 ought to support pluggable sort algorithms, just as Perl
now supports pluggable comparison functions.

-- 
John Porter

What would Gabrielle do?




Re: [Fwd: Re: [FWP] sorting text in human-order]

2000-12-27 Thread Nathan Torkington

John Porter writes:
 Perl6 ought to support pluggable sort algorithms, just as Perl
 now supports pluggable comparison functions.

By "pluggable" you mean that sort() should be overridable?

Nat