Re: heap sort or the wonder of abstraction
Lennart wrote Well, I'm a sucker for a benchmark so I ran all of these with hbc. I also added the smooth merge sort that comes with hbc. ... As you can see there is no clear winner, but I see no real reason to change the sort that comes with hbc to something else at this moment. You are right, I should clarify that the recommendations are ghc specific (in the next version ;-). sortHbc :: (Ord a) = [a] - [a] sortHbc [] = [] sortHbc (x:xs) = msort (upSeq xs [x]) upSeq [] xs = [reverse xs] upSeq (y:ys) xxs@(x:xs) = if x = y then upSeq ys (y:xxs) else reverse xxs : upSeq ys [y] msort [xs] = xs msort xss = msort (mergePairs xss) mergePairs (xs:ys:xss) = merge xs ys : mergePairs xss mergePairs xss = xss merge xxs@(x:xs) yys@(y:ys) = if x = y then x:merge xs yys else y:merge xxs ys merge [] yys = yys merge xxs[] = xxs That's a first-order version of the smooth bottom-up mergesort (which I did not include in the timings because the difference to the top-down variant was not significant). `sortHbc' is probably slightly faster than `smoothMergeSort' because its first-order? NB Bottom-up mergsort was my previous favourite ;-). Your version has the slight drawback that it uses only increasing sequences. BTW, I don't think the test program does the right thing. It prints the last element of the sorted list, but there is nothing that says that computing this forces the list to be completely sorted. When I test sort routines I always do something like printing the sum of the sorted list. Hmm. I think this does not apply to the examples I gave but you are right, it could happen. cheatSort x = [ nth i x | i - [1 .. length x] ] where `nth i x' computes the ith smallest element of x. Furthermore (while I'm in a whining mode :-), taking the median of several runs is not the accepted wisdom. You should take the minimum of several runs. Fixed. Ralf
Re: heap sort or the wonder of abstraction
--167E2781446B Content-Type: text/plain; charset="us-ascii" Ralf Hinze wrote: Practitioners are probably surprised to learn that `pairingSort' is the algorithm of choice for sorting. Any objections to this recommendation? I was surprised to see that it performs so well: sorting 50.000 Int's in roughly three seconds and 100.000 Int's in roughly nine seconds is quite acceptable. I ran some similar experiments in Standard ML a few years ago. In those experiments pairingSort also performed extremely well. The only algorithm that performed better, and even then only by a small amount, was splaySort, based on splay trees[1]. However, my experiment only considered algorithms that were good choices as heaps -- I did not consider any of the mergesort variations. Ralf, could I ask you to run my code below through your experiments (I don't have easy access to anything but hugs at the moment)? According to Ralf's criteria, splaySort is A. asymptotically optimal B. stable C. smooth (In fact, it has been conjectured that splaySort is optimal with respect to any reasonable notion of "presortedness".[2]) However, I believe--although I'm positive--that splaySort is D. not lazy Ralf considered the situation where the creation phase takes O(n) time and the selection phase takes O(n log n) time, but for splaySort these are reversed. Chris -- [1] Sleator and Tarjan "Self-adjusting binary search trees" Journal of the ACM 32(3):652-686 (July '85) [2] Moffat, Eddy, and Petersson "Splaysort: Fast, Versatile, Practical" Software PE 26(7):781-797 (July '96) - --167E2781446B Content-Disposition: inline; filename="Splay.lhs" Content-Type: text/plain; charset="us-ascii"; name="Splay.lhs" data Splay a = SEmpty | SNode (Splay a) a (Splay a) instance PriorityQueue Splay where empty = SEmpty single x = SNode SEmpty x SEmpty fromList xs = foldr insert empty xs toOrderedList t = tol t [] where tol SEmpty rest = rest tol (SNode a x b) rest = tol a (x : tol b rest) insert k t = SNode a k b where (a, b) = partition t -- elements of a = k, elements of b k partition SEmpty = (SEmpty,SEmpty) partition t@(SNode tl x tr) | x k = case tr of SEmpty - (t,SEmpty) SNode trl y trr | y k - let tl' = SNode tl x trl (lt,ge) = partition trr in (SNode tl' y lt,ge) | otherwise - let (lt,ge) = partition trl in (SNode tl x lt,SNode ge y trr) | otherwise = case tl of SEmpty - (SEmpty,t) SNode tll y tlr | y k - let (lt,ge) = partition tlr in (SNode tll y lt,SNode ge x tr) | otherwise - let tr' = SNode tlr x tr (lt,ge) = partition tll in (lt,SNode ge y tr') splaySort :: (Ord a) = [a] - [a] splaySort = toOrderedList . (fromList :: (Ord a) = [a] - Splay a) --167E2781446B--
Re: heap sort or the wonder of abstraction
Sorting is a hobby-horse of mine, so I cannot resist the temptation to elaborate on the subject. I was motivated to write this rather long reply by Carsten Kehler Holst saying `As far as I can see the difference between merge sort and heap sort as described by Jon is almost non existing'. Carsten is not quite right but he is not totally wrong either. Both sorting algorithms are based on priority queues, so it may be worthwhile to take a `data-structural look at sorting'. That's the theme of this email. There are still some open points, so any remarks, corrections, ideas etc are *welcome*. Ralf PS: Those who are interested in performance only should skip to Section 10. import List ( group ) import System ( getArgs ) 1. Introductory remarks ~~~ What makes up a good sorting algorithm? Here are some criteria: A. it should be asymptotically optimal (ie O(n log n) worst case behaviour ruling out quick sort ;-)), B. it should be stable (ie it may not change the order of equal elements), C. it should be smooth (a smooth sort has a linear execution time if the input is nearly sorted). All algorithms we are going to present are asymptotically optimal (with the notable exception of `jonsSort') and all of them are stable. Only `smoothMergeSort' has shown to be smooth (to the best of my knowledge). However, practical experiments suggest that `pairingSort' and `jonsSort' adopt quite well to the input data. Additional criteria one *may* consider: D. it should be lazy (ie `head . sort' has linear execution time), E. it should run faster if the input contains many equal elements. All algorithms are lazy. No algorithm explicitly addresses Criterion E. Again, experiments suggest that `pairingSort' and `jonsSort' adopt quite well to the input data. It is advisable to gather some test data to check the various implementations. A stable sorting algorithm should perform well on the following data. strictlyIncreasing n = [1 .. n] increasing n = interleave x x where x = strictlyIncreasing n strictlyDecreasing n = [n, n-1 .. 1] decreasing n = interleave x x where x = strictlyDecreasing n constant n= replicate n 0 The following generators produce lists containing many equal elements (provided `k n'). repIncreasing k n = take n (copy [0 .. k]) repDecreasing k n = take n (copy [k, k-1 .. 0]) oscillating k n = take n (copy ([0 .. k] ++ [k, k-1 .. 0])) Finally we have random data. random n = take n (random2Ints (2*n) (3*n)) A complete list of all generators. generators:: [Int - [Int]] generators= [ strictlyIncreasing,-- 0 increasing, strictlyDecreasing, decreasing, constant, repIncreasing 2, -- 5 repIncreasing 100, repDecreasing 2, repDecreasing 100, oscillating 2, random ] -- 10 NB We only consider lists of Int's. It may be worthwhile to repeat the benchmarks (Section 10) with data designed to make comparisons dominate, see Jon's second email. 2. Priority queues ~~ Here is the abstract data type of priority queues formulated as a Haskell class definition. data SeqView t a = Null | Cons a (t a) class PriorityQueue q where empty :: (Ord a) = q a single:: (Ord a) = a - q a insert:: (Ord a) = a - q a - q a meld :: (Ord a) = q a - q a - q a splitMin :: (Ord a) = q a - SeqView q a fromList :: (Ord a) = [a] - q a toOrderedList :: (Ord a) = q a - [a] single a = insert a empty insert a q= single a `meld` q fromList = foldm meld empty . map single toOrderedList q = case splitMin q of Null - [] Cons a q - a : toOrderedList q The function `splitMin' replaces `isEmpty', `findMin' and `deleteMin' which usually belong to the standard repertoire. The call `splitMin q' returns `Null' if `q' is empty and `Cons a q1' otherwise (`a' is a minimal element of `q' and `q1' is the remaining queue). The prototypical sorting algorithm based on priority queues looks as follows (`PQ' refers to the concrete implementation) pqSort:: (Ord a) = [a] - [a] pqSort