Re: Sort comparison

2012-05-01 Thread Ian Kelly
On Mon, Apr 30, 2012 at 11:25 PM, Dan Stromberg drsali...@gmail.com wrote:

 A while back I did a sort algorithm runtime comparison for a variety of
 sorting algorithms, and then mostly sat on it.

 Recently, I got into a discussion with someone on stackoverflow about the
 running time of radix sort.

 I realize it's commonly said that radixsort is n*k rather than n*log(n).
 I've been making that case that in real life, frequently k==log(n).

 Anyway, here's the comparison, with code and graph:
 http://stromberg.dnsalias.org/~strombrg/sort-comparison/

It can be difficult to distinguish O(n) from O(n log n) on a log-log
plot, so I don't think that graph makes your point very well.

 Interesting, BTW, that an unstable quicksort in Cython was approaching
 timsort as a C extension module in performance, even though timsort in
 Cython wasn't that performant.  It makes one wonder if a highly optimized
 quicksort as a C extension module wouldn't outperform timsort in the
 standard library.

 Yes, I know, we've committed to keeping list_.sort() stable now, and
 quicksort normally isn't stable.

 Also, before timsort, did CPython use quicksort?  I'd be a little surprised
 if it hadn't, since people used to say quicksort was the best thing around
 in practice.

Based on a little googling, it was quicksort prior to 1.5.2, then it
was changed to a highly optimized samplesort (a quicksort variant) /
insertion sort hybrid until the current timsort was implemented in
2.3.

Cheers,
Ian
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sort comparison

2012-05-01 Thread Terry Reedy

On 5/1/2012 1:25 AM, Dan Stromberg wrote:


Anyway, here's the comparison, with code and graph:
http://stromberg.dnsalias.org/~strombrg/sort-comparison/

(It's done in Pure python and Cython, with a little m4).

Interesting, BTW, that an unstable quicksort in Cython was approaching
timsort as a C extension module in performance, even though timsort in
Cython wasn't that performant.  It makes one wonder if a highly
optimized quicksort as a C extension module wouldn't outperform timsort
in the standard library.


Timsort is not only stable, but, by design, it is faster than quicksort 
for various structured data patterns, many of which are realistic in 
practice. For instance, you append k unordered items to n already sorted 
items, where k  n, so that k*log(k) is on the order of n. Timsort 
discovers the initial run of n sorted items, sorts the k items, and then 
merges. The practical time is O(n). Ditto for sorting an already sorted 
file in the reverse direction (timsort does an O(n) list reverse).


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: Sort comparison

2012-05-01 Thread Dan Stromberg
On Tue, May 1, 2012 at 12:21 AM, Ian Kelly ian.g.ke...@gmail.com wrote:

 On Mon, Apr 30, 2012 at 11:25 PM, Dan Stromberg drsali...@gmail.com
 wrote:
 
  A while back I did a sort algorithm runtime comparison for a variety of
  sorting algorithms, and then mostly sat on it.
 
  Recently, I got into a discussion with someone on stackoverflow about the
  running time of radix sort.
 
  I realize it's commonly said that radixsort is n*k rather than n*log(n).
  I've been making that case that in real life, frequently k==log(n).
 
  Anyway, here's the comparison, with code and graph:
  http://stromberg.dnsalias.org/~strombrg/sort-comparison/

 It can be difficult to distinguish O(n) from O(n log n) on a log-log
 plot, so I don't think that graph makes your point very well.

Thank you for your comment.

Do you have a suggestion for a better type of graph - or other
linearity/nonlinearity test?

I thought that on a log-log graph, a straight line was still a straight
line, but it's compressed at one end.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sort comparison

2012-05-01 Thread Dan Stromberg
On Tue, May 1, 2012 at 10:51 AM, Terry Reedy tjre...@udel.edu wrote:

 On 5/1/2012 1:25 AM, Dan Stromberg wrote:

  Anyway, here's the comparison, with code and graph:
 http://stromberg.dnsalias.org/**~strombrg/sort-comparison/http://stromberg.dnsalias.org/%7Estrombrg/sort-comparison/

 (It's done in Pure python and Cython, with a little m4).

 Interesting, BTW, that an unstable quicksort in Cython was approaching
 timsort as a C extension module in performance, even though timsort in
 Cython wasn't that performant.  It makes one wonder if a highly
 optimized quicksort as a C extension module wouldn't outperform timsort
 in the standard library.


 Timsort is not only stable, but, by design, it is faster than quicksort
 for various structured data patterns, many of which are realistic in
 practice. For instance, you append k unordered items to n already sorted
 items, where k  n, so that k*log(k) is on the order of n. Timsort
 discovers the initial run of n sorted items, sorts the k items, and then
 merges. The practical time is O(n). Ditto for sorting an already sorted
 file in the reverse direction (timsort does an O(n) list reverse).


Yeah, I know, but thanks for making sure I did.

Timsort is a quite impressive algorithm.

I suspect that the sorting a mostly-sorted list in near-linear time
attribute of Timsort tends to encourage sorting in a loop though.  Usually
sorting inside a loop gives a slow algorithm, even with something as nice
as Timsort.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sort comparison

2012-05-01 Thread Ian Kelly
On Tue, May 1, 2012 at 12:00 PM, Dan Stromberg drsali...@gmail.com wrote:

 On Tue, May 1, 2012 at 12:21 AM, Ian Kelly ian.g.ke...@gmail.com wrote:

 On Mon, Apr 30, 2012 at 11:25 PM, Dan Stromberg drsali...@gmail.com
 wrote:
 
  A while back I did a sort algorithm runtime comparison for a variety of
  sorting algorithms, and then mostly sat on it.
 
  Recently, I got into a discussion with someone on stackoverflow about
  the
  running time of radix sort.
 
  I realize it's commonly said that radixsort is n*k rather than n*log(n).
  I've been making that case that in real life, frequently k==log(n).
 
  Anyway, here's the comparison, with code and graph:
  http://stromberg.dnsalias.org/~strombrg/sort-comparison/

 It can be difficult to distinguish O(n) from O(n log n) on a log-log
 plot, so I don't think that graph makes your point very well.

 Thank you for your comment.

 Do you have a suggestion for a better type of graph - or other
 linearity/nonlinearity test?

An ordinary linear-scale graph.  It will only show the highest order
of magnitude in any detail, but for determining asymptotic behavior
that's the most interesting part anyway.  O(n) should look like a
straight line, and O(n log n) should curve up with a gradually
increasing slope.  Alternatively, calculate regressions and compare
the goodness of fit.

 I thought that on a log-log graph, a straight line was still a straight
 line, but it's compressed at one end.

The key characteristic of a log-log plot is that any curve y = ax ** b
will appear as a straight line with a slope of b.  So a linear curve
will be a straight line with a slope of 1.  A O(n log n) curve will
converge to a straight line with a slope of 1 as n approaches
infinity.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sort comparison

2012-05-01 Thread Dan Stromberg
On Tue, May 1, 2012 at 11:52 AM, Ian Kelly ian.g.ke...@gmail.com wrote:

 On Tue, May 1, 2012 at 12:00 PM, Dan Stromberg drsali...@gmail.com
 wrote:
 
  On Tue, May 1, 2012 at 12:21 AM, Ian Kelly ian.g.ke...@gmail.com
 wrote:
 
  On Mon, Apr 30, 2012 at 11:25 PM, Dan Stromberg drsali...@gmail.com
  wrote:
  
   A while back I did a sort algorithm runtime comparison for a variety
 of
   sorting algorithms, and then mostly sat on it.
  
   Recently, I got into a discussion with someone on stackoverflow about
   the
   running time of radix sort.
  
   I realize it's commonly said that radixsort is n*k rather than
 n*log(n).
   I've been making that case that in real life, frequently k==log(n).
  
   Anyway, here's the comparison, with code and graph:
   http://stromberg.dnsalias.org/~strombrg/sort-comparison/
 
  It can be difficult to distinguish O(n) from O(n log n) on a log-log
  plot, so I don't think that graph makes your point very well.
 
  Thank you for your comment.
 
  Do you have a suggestion for a better type of graph - or other
  linearity/nonlinearity test?

 An ordinary linear-scale graph.  It will only show the highest order
 of magnitude in any detail, but for determining asymptotic behavior
 that's the most interesting part anyway.  O(n) should look like a
 straight line, and O(n log n) should curve up with a gradually
 increasing slope.  Alternatively, calculate regressions and compare
 the goodness of fit.

  I thought that on a log-log graph, a straight line was still a straight
  line, but it's compressed at one end.

 The key characteristic of a log-log plot is that any curve y = ax ** b
 will appear as a straight line with a slope of b.  So a linear curve
 will be a straight line with a slope of 1.  A O(n log n) curve will
 converge to a straight line with a slope of 1 as n approaches
 infinity.


Thanks for the info.

I actually feel like both the log-log graph and the linear-linear graph,
tell an interesting but different part of the story, so I've kept the
log-log graph and added a linear-linear graph.

I also added a linear-line to the linear-linear graph, for the sake of
comparison.

You can see that radix sort is falling between mergesort and quicksort
(both nlogn on average), and radix sort is coming up almost parallel to
mergesort.
-- 
http://mail.python.org/mailman/listinfo/python-list