Re: Sort comparison
On Mon, Apr 30, 2012 at 11:25 PM, Dan Stromberg drsali...@gmail.com wrote: A while back I did a sort algorithm runtime comparison for a variety of sorting algorithms, and then mostly sat on it. Recently, I got into a discussion with someone on stackoverflow about the running time of radix sort. I realize it's commonly said that radixsort is n*k rather than n*log(n). I've been making that case that in real life, frequently k==log(n). Anyway, here's the comparison, with code and graph: http://stromberg.dnsalias.org/~strombrg/sort-comparison/ It can be difficult to distinguish O(n) from O(n log n) on a log-log plot, so I don't think that graph makes your point very well. Interesting, BTW, that an unstable quicksort in Cython was approaching timsort as a C extension module in performance, even though timsort in Cython wasn't that performant. It makes one wonder if a highly optimized quicksort as a C extension module wouldn't outperform timsort in the standard library. Yes, I know, we've committed to keeping list_.sort() stable now, and quicksort normally isn't stable. Also, before timsort, did CPython use quicksort? I'd be a little surprised if it hadn't, since people used to say quicksort was the best thing around in practice. Based on a little googling, it was quicksort prior to 1.5.2, then it was changed to a highly optimized samplesort (a quicksort variant) / insertion sort hybrid until the current timsort was implemented in 2.3. Cheers, Ian -- http://mail.python.org/mailman/listinfo/python-list
Re: Sort comparison
On 5/1/2012 1:25 AM, Dan Stromberg wrote: Anyway, here's the comparison, with code and graph: http://stromberg.dnsalias.org/~strombrg/sort-comparison/ (It's done in Pure python and Cython, with a little m4). Interesting, BTW, that an unstable quicksort in Cython was approaching timsort as a C extension module in performance, even though timsort in Cython wasn't that performant. It makes one wonder if a highly optimized quicksort as a C extension module wouldn't outperform timsort in the standard library. Timsort is not only stable, but, by design, it is faster than quicksort for various structured data patterns, many of which are realistic in practice. For instance, you append k unordered items to n already sorted items, where k n, so that k*log(k) is on the order of n. Timsort discovers the initial run of n sorted items, sorts the k items, and then merges. The practical time is O(n). Ditto for sorting an already sorted file in the reverse direction (timsort does an O(n) list reverse). -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: Sort comparison
On Tue, May 1, 2012 at 12:21 AM, Ian Kelly ian.g.ke...@gmail.com wrote: On Mon, Apr 30, 2012 at 11:25 PM, Dan Stromberg drsali...@gmail.com wrote: A while back I did a sort algorithm runtime comparison for a variety of sorting algorithms, and then mostly sat on it. Recently, I got into a discussion with someone on stackoverflow about the running time of radix sort. I realize it's commonly said that radixsort is n*k rather than n*log(n). I've been making that case that in real life, frequently k==log(n). Anyway, here's the comparison, with code and graph: http://stromberg.dnsalias.org/~strombrg/sort-comparison/ It can be difficult to distinguish O(n) from O(n log n) on a log-log plot, so I don't think that graph makes your point very well. Thank you for your comment. Do you have a suggestion for a better type of graph - or other linearity/nonlinearity test? I thought that on a log-log graph, a straight line was still a straight line, but it's compressed at one end. -- http://mail.python.org/mailman/listinfo/python-list
Re: Sort comparison
On Tue, May 1, 2012 at 10:51 AM, Terry Reedy tjre...@udel.edu wrote: On 5/1/2012 1:25 AM, Dan Stromberg wrote: Anyway, here's the comparison, with code and graph: http://stromberg.dnsalias.org/**~strombrg/sort-comparison/http://stromberg.dnsalias.org/%7Estrombrg/sort-comparison/ (It's done in Pure python and Cython, with a little m4). Interesting, BTW, that an unstable quicksort in Cython was approaching timsort as a C extension module in performance, even though timsort in Cython wasn't that performant. It makes one wonder if a highly optimized quicksort as a C extension module wouldn't outperform timsort in the standard library. Timsort is not only stable, but, by design, it is faster than quicksort for various structured data patterns, many of which are realistic in practice. For instance, you append k unordered items to n already sorted items, where k n, so that k*log(k) is on the order of n. Timsort discovers the initial run of n sorted items, sorts the k items, and then merges. The practical time is O(n). Ditto for sorting an already sorted file in the reverse direction (timsort does an O(n) list reverse). Yeah, I know, but thanks for making sure I did. Timsort is a quite impressive algorithm. I suspect that the sorting a mostly-sorted list in near-linear time attribute of Timsort tends to encourage sorting in a loop though. Usually sorting inside a loop gives a slow algorithm, even with something as nice as Timsort. -- http://mail.python.org/mailman/listinfo/python-list
Re: Sort comparison
On Tue, May 1, 2012 at 12:00 PM, Dan Stromberg drsali...@gmail.com wrote: On Tue, May 1, 2012 at 12:21 AM, Ian Kelly ian.g.ke...@gmail.com wrote: On Mon, Apr 30, 2012 at 11:25 PM, Dan Stromberg drsali...@gmail.com wrote: A while back I did a sort algorithm runtime comparison for a variety of sorting algorithms, and then mostly sat on it. Recently, I got into a discussion with someone on stackoverflow about the running time of radix sort. I realize it's commonly said that radixsort is n*k rather than n*log(n). I've been making that case that in real life, frequently k==log(n). Anyway, here's the comparison, with code and graph: http://stromberg.dnsalias.org/~strombrg/sort-comparison/ It can be difficult to distinguish O(n) from O(n log n) on a log-log plot, so I don't think that graph makes your point very well. Thank you for your comment. Do you have a suggestion for a better type of graph - or other linearity/nonlinearity test? An ordinary linear-scale graph. It will only show the highest order of magnitude in any detail, but for determining asymptotic behavior that's the most interesting part anyway. O(n) should look like a straight line, and O(n log n) should curve up with a gradually increasing slope. Alternatively, calculate regressions and compare the goodness of fit. I thought that on a log-log graph, a straight line was still a straight line, but it's compressed at one end. The key characteristic of a log-log plot is that any curve y = ax ** b will appear as a straight line with a slope of b. So a linear curve will be a straight line with a slope of 1. A O(n log n) curve will converge to a straight line with a slope of 1 as n approaches infinity. -- http://mail.python.org/mailman/listinfo/python-list
Re: Sort comparison
On Tue, May 1, 2012 at 11:52 AM, Ian Kelly ian.g.ke...@gmail.com wrote: On Tue, May 1, 2012 at 12:00 PM, Dan Stromberg drsali...@gmail.com wrote: On Tue, May 1, 2012 at 12:21 AM, Ian Kelly ian.g.ke...@gmail.com wrote: On Mon, Apr 30, 2012 at 11:25 PM, Dan Stromberg drsali...@gmail.com wrote: A while back I did a sort algorithm runtime comparison for a variety of sorting algorithms, and then mostly sat on it. Recently, I got into a discussion with someone on stackoverflow about the running time of radix sort. I realize it's commonly said that radixsort is n*k rather than n*log(n). I've been making that case that in real life, frequently k==log(n). Anyway, here's the comparison, with code and graph: http://stromberg.dnsalias.org/~strombrg/sort-comparison/ It can be difficult to distinguish O(n) from O(n log n) on a log-log plot, so I don't think that graph makes your point very well. Thank you for your comment. Do you have a suggestion for a better type of graph - or other linearity/nonlinearity test? An ordinary linear-scale graph. It will only show the highest order of magnitude in any detail, but for determining asymptotic behavior that's the most interesting part anyway. O(n) should look like a straight line, and O(n log n) should curve up with a gradually increasing slope. Alternatively, calculate regressions and compare the goodness of fit. I thought that on a log-log graph, a straight line was still a straight line, but it's compressed at one end. The key characteristic of a log-log plot is that any curve y = ax ** b will appear as a straight line with a slope of b. So a linear curve will be a straight line with a slope of 1. A O(n log n) curve will converge to a straight line with a slope of 1 as n approaches infinity. Thanks for the info. I actually feel like both the log-log graph and the linear-linear graph, tell an interesting but different part of the story, so I've kept the log-log graph and added a linear-linear graph. I also added a linear-line to the linear-linear graph, for the sake of comparison. You can see that radix sort is falling between mergesort and quicksort (both nlogn on average), and radix sort is coming up almost parallel to mergesort. -- http://mail.python.org/mailman/listinfo/python-list