### Re: heap sort or the wonder of abstraction

```
Lennart wrote

Well, I'm a sucker for a benchmark so I ran all of these with hbc.
I also added the smooth merge sort that comes with hbc.
...
As you can see there is no clear winner, but I see no real reason
to change the sort that comes with hbc to something else at this
moment.

You are right, I should clarify that the recommendations are ghc
specific (in the next version ;-).

sortHbc :: (Ord a) = [a] - [a]
sortHbc [] = []
sortHbc (x:xs) = msort (upSeq xs [x])

upSeq [] xs = [reverse xs]
upSeq (y:ys) xxs@(x:xs) =
if x = y then
upSeq ys (y:xxs)
else
reverse xxs : upSeq ys [y]

msort [xs] = xs
msort xss = msort (mergePairs xss)

mergePairs (xs:ys:xss) = merge xs ys : mergePairs xss
mergePairs xss = xss

merge xxs@(x:xs) yys@(y:ys) =
if x = y then
x:merge xs yys
else
y:merge xxs ys
merge [] yys = yys
merge xxs[]  = xxs

That's a first-order version of the smooth bottom-up mergesort (which I
did not include in the timings because the difference to the top-down
variant was not significant). `sortHbc' is probably slightly faster
than `smoothMergeSort' because its first-order? NB Bottom-up mergsort
was my previous favourite ;-). Your version has the slight drawback
that it uses only increasing sequences.

BTW, I don't think the test program does the right thing.  It prints
the last element of the sorted list, but there is nothing that
says that computing this forces the list to be completely sorted.
When I test sort routines I always do something like printing the
sum of the sorted list.

Hmm. I think this does not apply to the examples I gave but you are
right, it could happen.

cheatSort x   =  [ nth i x | i - [1 .. length x] ]

where `nth i x' computes the ith smallest element of x.

Furthermore (while I'm in a whining mode :-), taking the median
of several runs is not the accepted wisdom.  You should take the
minimum of several runs.

Fixed.

Ralf

```

### Re: heap sort or the wonder of abstraction

```
--167E2781446B
Content-Type: text/plain; charset="us-ascii"

Ralf Hinze wrote:
Practitioners are probably surprised to learn that `pairingSort' is the
algorithm of choice for sorting. Any objections to this recommendation?
I was surprised to see that it performs so well: sorting 50.000 Int's
in roughly three seconds and 100.000 Int's in roughly nine seconds is
quite acceptable.

I ran some similar experiments in Standard ML a few years ago.  In those
experiments pairingSort also performed extremely well.  The only
algorithm that performed better, and even then only by a small amount,
was splaySort, based on splay trees[1].  However, my experiment
only considered algorithms that were good choices as heaps -- I
did not consider any of the mergesort variations.  Ralf, could I
ask you to run my code below through your experiments (I don't have

According to Ralf's criteria, splaySort is
A. asymptotically optimal
B. stable
C. smooth  (In fact, it has been conjectured that splaySort is
optimal with respect to any reasonable notion of
"presortedness".[2])
However, I believe--although I'm positive--that splaySort is
D. not lazy
Ralf considered the situation where the creation phase takes O(n) time
and the selection phase takes O(n log n) time, but for splaySort these
are reversed.

Chris

--

[1] Sleator and Tarjan
Journal of the ACM 32(3):652-686 (July '85)

"Splaysort: Fast, Versatile, Practical"
Software PE 26(7):781-797 (July '96)

-

--167E2781446B
Content-Disposition: inline; filename="Splay.lhs"
Content-Type: text/plain; charset="us-ascii"; name="Splay.lhs"

data Splay a = SEmpty | SNode (Splay a) a (Splay a)

instance PriorityQueue Splay where
empty = SEmpty
single x = SNode SEmpty x SEmpty

fromList xs = foldr insert empty xs

toOrderedList t = tol t []
where tol SEmpty rest = rest
tol (SNode a x b) rest = tol a (x : tol b rest)

insert k t = SNode a k b
where
(a, b) = partition t  -- elements of a = k, elements of b  k

partition SEmpty = (SEmpty,SEmpty)
partition t@(SNode tl x tr)
| x  k =
case tr of
SEmpty - (t,SEmpty)
SNode trl y trr
| y  k -
let tl' = SNode tl x trl
(lt,ge) = partition trr
in (SNode tl' y lt,ge)
| otherwise -
let (lt,ge) = partition trl
in (SNode tl x lt,SNode ge y trr)
| otherwise =
case tl of
SEmpty - (SEmpty,t)
SNode tll y tlr
| y  k -
let (lt,ge) = partition tlr
in (SNode tll y lt,SNode ge x tr)
| otherwise -
let tr' = SNode tlr x tr
(lt,ge) = partition tll
in (lt,SNode ge y tr')

splaySort :: (Ord a) = [a] - [a]
splaySort =  toOrderedList
.  (fromList :: (Ord a) = [a] - Splay a)

--167E2781446B--

```

### Re: heap sort or the wonder of abstraction

```
Sorting is a hobby-horse of mine, so I cannot resist the temptation to
elaborate on the subject. I was motivated to write this rather long
reply by Carsten Kehler Holst saying `As far as I can see the
difference between merge sort and heap sort as described by Jon is
almost non existing'. Carsten is not quite right but he is not totally
wrong either. Both sorting algorithms are based on priority queues, so
it may be worthwhile to take a `data-structural look at sorting'.
That's the theme of this email.

There are still some open points, so any remarks, corrections, ideas
etc are *welcome*.

Ralf

PS: Those who are interested in performance only should skip to Section 10.

import List   (  group  )
import System (  getArgs  )

1. Introductory remarks
~~~

What makes up a good sorting algorithm? Here are some criteria:

A. it should be asymptotically optimal (ie O(n log n) worst case behaviour
ruling out quick sort ;-)),

B. it should be stable (ie it may not change the order of equal elements),

C. it should be smooth (a smooth sort has a linear execution time if the
input is nearly sorted).

All algorithms we are going to present are asymptotically optimal (with
the notable exception of `jonsSort') and all of them are stable. Only
`smoothMergeSort' has shown to be smooth (to the best of my
knowledge).  However, practical experiments suggest that `pairingSort'
and `jonsSort' adopt quite well to the input data.

D. it should be lazy (ie `head . sort' has linear execution time),

E. it should run faster if the input contains many equal elements.

All algorithms are lazy. No algorithm explicitly addresses Criterion E.
Again, experiments suggest that `pairingSort' and `jonsSort' adopt
quite well to the input data.

It is advisable to gather some test data to check the various
implementations. A stable sorting algorithm should perform well on the
following data.

strictlyIncreasing n  =  [1 .. n]
increasing n  =  interleave x x
where x   =  strictlyIncreasing n
strictlyDecreasing n  =  [n, n-1 .. 1]
decreasing n  =  interleave x x
where x   =  strictlyDecreasing n
constant n=  replicate n 0

The following generators produce lists containing many equal elements
(provided `k  n').

repIncreasing k n =  take n (copy [0 .. k])
repDecreasing k n =  take n (copy [k, k-1 .. 0])
oscillating k n   =  take n (copy ([0 .. k] ++ [k, k-1 .. 0]))

Finally we have random data.

random n  =  take n (random2Ints (2*n) (3*n))

A complete list of all generators.

generators:: [Int - [Int]]
generators=  [ strictlyIncreasing,-- 0
increasing,
strictlyDecreasing,
decreasing,
constant,
repIncreasing 2,   -- 5
repIncreasing 100,
repDecreasing 2,
repDecreasing 100,
oscillating 2,
random ]   -- 10

NB We only consider lists of Int's. It may be worthwhile to repeat the
benchmarks (Section 10) with data designed to make comparisons
dominate, see Jon's second email.

2. Priority queues
~~

Here is the abstract data type of priority queues formulated as a

data SeqView t a  =  Null
|  Cons a (t a)

class PriorityQueue q where
empty :: (Ord a) = q a
single:: (Ord a) = a - q a
insert:: (Ord a) = a - q a - q a
meld  :: (Ord a) = q a - q a - q a
splitMin  :: (Ord a) = q a - SeqView q a
fromList  :: (Ord a) = [a] - q a
toOrderedList :: (Ord a) = q a - [a]

single a  =  insert a empty
insert a q=  single a `meld` q
fromList  =  foldm meld empty . map single
toOrderedList q   =  case splitMin q of
Null  - []
Cons a q  - a : toOrderedList q

The function `splitMin' replaces `isEmpty', `findMin' and `deleteMin'
which usually belong to the standard repertoire. The call `splitMin q'
returns `Null' if `q' is empty and `Cons a q1' otherwise (`a' is a
minimal element of `q' and `q1' is the remaining queue).

The prototypical sorting algorithm based on priority queues looks as
follows (`PQ' refers to the concrete implementation)

pqSort:: (Ord a) = [a] - [a]
pqSort```