Re: heap sort or the wonder of abstraction

1997-10-09 Thread Ralf Hinze

Lennart wrote

 Well, I'm a sucker for a benchmark so I ran all of these with hbc.
 I also added the smooth merge sort that comes with hbc.
 As you can see there is no clear winner, but I see no real reason
 to change the sort that comes with hbc to something else at this

You are right, I should clarify that the recommendations are ghc
specific (in the next version ;-).

  sortHbc :: (Ord a) = [a] - [a]
  sortHbc [] = []
  sortHbc (x:xs) = msort (upSeq xs [x])
  upSeq [] xs = [reverse xs]
  upSeq (y:ys) xxs@(x:xs) =
  if x = y then
  upSeq ys (y:xxs)
  reverse xxs : upSeq ys [y]
  msort [xs] = xs
  msort xss = msort (mergePairs xss)
  mergePairs (xs:ys:xss) = merge xs ys : mergePairs xss
  mergePairs xss = xss
  merge xxs@(x:xs) yys@(y:ys) =
  if x = y then
  x:merge xs yys
  y:merge xxs ys
  merge [] yys = yys
  merge xxs[]  = xxs

That's a first-order version of the smooth bottom-up mergesort (which I
did not include in the timings because the difference to the top-down
variant was not significant). `sortHbc' is probably slightly faster
than `smoothMergeSort' because its first-order? NB Bottom-up mergsort
was my previous favourite ;-). Your version has the slight drawback
that it uses only increasing sequences.

 BTW, I don't think the test program does the right thing.  It prints 
 the last element of the sorted list, but there is nothing that
 says that computing this forces the list to be completely sorted.
 When I test sort routines I always do something like printing the
 sum of the sorted list.

Hmm. I think this does not apply to the examples I gave but you are
right, it could happen.

  cheatSort x   =  [ nth i x | i - [1 .. length x] ]

where `nth i x' computes the ith smallest element of x.

 Furthermore (while I'm in a whining mode :-), taking the median
 of several runs is not the accepted wisdom.  You should take the
 minimum of several runs.



Re: heap sort or the wonder of abstraction

1997-10-08 Thread Chris Okasaki

Content-Type: text/plain; charset="us-ascii"

Ralf Hinze wrote:
 Practitioners are probably surprised to learn that `pairingSort' is the
 algorithm of choice for sorting. Any objections to this recommendation?
 I was surprised to see that it performs so well: sorting 50.000 Int's
 in roughly three seconds and 100.000 Int's in roughly nine seconds is
 quite acceptable.

I ran some similar experiments in Standard ML a few years ago.  In those
experiments pairingSort also performed extremely well.  The only 
algorithm that performed better, and even then only by a small amount,
was splaySort, based on splay trees[1].  However, my experiment
only considered algorithms that were good choices as heaps -- I
did not consider any of the mergesort variations.  Ralf, could I
ask you to run my code below through your experiments (I don't have
easy access to anything but hugs at the moment)?

According to Ralf's criteria, splaySort is
  A. asymptotically optimal
  B. stable
  C. smooth  (In fact, it has been conjectured that splaySort is
  optimal with respect to any reasonable notion of
However, I believe--although I'm positive--that splaySort is
  D. not lazy
Ralf considered the situation where the creation phase takes O(n) time
and the selection phase takes O(n log n) time, but for splaySort these
are reversed.



[1] Sleator and Tarjan
"Self-adjusting binary search trees"
Journal of the ACM 32(3):652-686 (July '85)

[2] Moffat, Eddy, and Petersson
"Splaysort: Fast, Versatile, Practical"
Software PE 26(7):781-797 (July '96)


Content-Disposition: inline; filename="Splay.lhs"
Content-Type: text/plain; charset="us-ascii"; name="Splay.lhs"

 data Splay a = SEmpty | SNode (Splay a) a (Splay a)

 instance PriorityQueue Splay where
  empty = SEmpty
  single x = SNode SEmpty x SEmpty

  fromList xs = foldr insert empty xs
  toOrderedList t = tol t []
where tol SEmpty rest = rest
  tol (SNode a x b) rest = tol a (x : tol b rest)

  insert k t = SNode a k b
  (a, b) = partition t  -- elements of a = k, elements of b  k

  partition SEmpty = (SEmpty,SEmpty)
  partition t@(SNode tl x tr)
| x  k =
case tr of
  SEmpty - (t,SEmpty)
  SNode trl y trr
| y  k -
let tl' = SNode tl x trl
(lt,ge) = partition trr
in (SNode tl' y lt,ge)
| otherwise -
let (lt,ge) = partition trl
in (SNode tl x lt,SNode ge y trr)
| otherwise =
case tl of
  SEmpty - (SEmpty,t)
  SNode tll y tlr
| y  k -
let (lt,ge) = partition tlr
in (SNode tll y lt,SNode ge x tr)
| otherwise -
let tr' = SNode tlr x tr
(lt,ge) = partition tll
in (lt,SNode ge y tr')

 splaySort :: (Ord a) = [a] - [a]
 splaySort =  toOrderedList
   .  (fromList :: (Ord a) = [a] - Splay a)


Re: heap sort or the wonder of abstraction

1997-10-08 Thread Ralf Hinze

Sorting is a hobby-horse of mine, so I cannot resist the temptation to
elaborate on the subject. I was motivated to write this rather long
reply by Carsten Kehler Holst saying `As far as I can see the
difference between merge sort and heap sort as described by Jon is
almost non existing'. Carsten is not quite right but he is not totally
wrong either. Both sorting algorithms are based on priority queues, so
it may be worthwhile to take a `data-structural look at sorting'.
That's the theme of this email.

There are still some open points, so any remarks, corrections, ideas
etc are *welcome*.


PS: Those who are interested in performance only should skip to Section 10.

 import List   (  group  )
 import System (  getArgs  )

1. Introductory remarks

What makes up a good sorting algorithm? Here are some criteria:

A. it should be asymptotically optimal (ie O(n log n) worst case behaviour
   ruling out quick sort ;-)),

B. it should be stable (ie it may not change the order of equal elements),

C. it should be smooth (a smooth sort has a linear execution time if the
   input is nearly sorted).

All algorithms we are going to present are asymptotically optimal (with
the notable exception of `jonsSort') and all of them are stable. Only
`smoothMergeSort' has shown to be smooth (to the best of my
knowledge).  However, practical experiments suggest that `pairingSort'
and `jonsSort' adopt quite well to the input data.

Additional criteria one *may* consider:

D. it should be lazy (ie `head . sort' has linear execution time),

E. it should run faster if the input contains many equal elements.

All algorithms are lazy. No algorithm explicitly addresses Criterion E.
Again, experiments suggest that `pairingSort' and `jonsSort' adopt
quite well to the input data.

It is advisable to gather some test data to check the various
implementations. A stable sorting algorithm should perform well on the
following data.

 strictlyIncreasing n  =  [1 .. n]
 increasing n  =  interleave x x
 where x   =  strictlyIncreasing n
 strictlyDecreasing n  =  [n, n-1 .. 1]
 decreasing n  =  interleave x x
 where x   =  strictlyDecreasing n
 constant n=  replicate n 0

The following generators produce lists containing many equal elements
(provided `k  n').

 repIncreasing k n =  take n (copy [0 .. k])
 repDecreasing k n =  take n (copy [k, k-1 .. 0])
 oscillating k n   =  take n (copy ([0 .. k] ++ [k, k-1 .. 0]))

Finally we have random data.

 random n  =  take n (random2Ints (2*n) (3*n))

A complete list of all generators.

 generators:: [Int - [Int]]
 generators=  [ strictlyIncreasing,-- 0
repIncreasing 2,   -- 5
repIncreasing 100,
repDecreasing 2,
repDecreasing 100,
oscillating 2,
random ]   -- 10

NB We only consider lists of Int's. It may be worthwhile to repeat the
benchmarks (Section 10) with data designed to make comparisons
dominate, see Jon's second email.

2. Priority queues

Here is the abstract data type of priority queues formulated as a
Haskell class definition.

 data SeqView t a  =  Null
   |  Cons a (t a)

 class PriorityQueue q where
 empty :: (Ord a) = q a
 single:: (Ord a) = a - q a
 insert:: (Ord a) = a - q a - q a
 meld  :: (Ord a) = q a - q a - q a
 splitMin  :: (Ord a) = q a - SeqView q a
 fromList  :: (Ord a) = [a] - q a
 toOrderedList :: (Ord a) = q a - [a]

 single a  =  insert a empty
 insert a q=  single a `meld` q
 fromList  =  foldm meld empty . map single
 toOrderedList q   =  case splitMin q of
 Null  - []
 Cons a q  - a : toOrderedList q

The function `splitMin' replaces `isEmpty', `findMin' and `deleteMin'
which usually belong to the standard repertoire. The call `splitMin q'
returns `Null' if `q' is empty and `Cons a q1' otherwise (`a' is a
minimal element of `q' and `q1' is the remaining queue).

The prototypical sorting algorithm based on priority queues looks as
follows (`PQ' refers to the concrete implementation)

  pqSort:: (Ord a) = [a] - [a]