Re: [Haskell-cafe] excercise - a completely lazy sorting algorithm

2009-07-17 Thread Petr Pudlak
Hi all,

I apologize that I didn't react to your posts, I was on a vacation. (BTW, if
you ever come to Slovakia, I strongly recommend visiting Mala (Lesser) Fatra
mountains. IMHO it's more beautiful than more-known Tatra mountains.)

Thanks for your interest and many intriguing ideas. Especially, I like
cata-/ana-/hylo-morphisms, it looks to me as a very useful concept to learn.

I hope I'll manage to create my own version of the sorting algorithm based on
your advices. Maybe I'll also try to do some real benchmarks, if I have time.

-Petr

On Tue, Jul 07, 2009 at 02:49:08AM +0200, Matthias Görgens wrote:
> The "sorted array of bags of unsorted input" is a nice idea.  However,
> you have to use the data structure in a single-threaded [1] fashion to
> obtain the claimed bounds.
> 
> Here's a pure solution that uses amortization and laziness.
> 
> > import qualified Data.Sequence as S
> > import Data.Sequence ((><))
> > import Data.Foldable
> > import Data.Monoid
> 
> Suppose we have a function to find the the median of a list, and
> partition it into three sublists: Smaller than the median, equal to
> the media, larger than the median.  That function should run in linear
> time.
> 
> > partitionOnMedian :: forall a. (Ord a) => (S.Seq a) -> BTreeRaw a (S.Seq a)
> 
> where the following data structure holds the sublists and some
> bookkeeping information:
> 
> > data BTreeRaw a m = Leaf
> >   | Node {cmp::(a->Ordering)
> >  , lN :: Int
> >  , less::m
> >  , eq :: (S.Seq a)
> >  , gN :: Int
> >  , greater::m
> >  }
> 
> where 'lN' and 'gN' are the length of 'less' and 'greater'.
> 
> We can make BTreeRaw a functor:
> 
> > instance Functor (BTreeRaw a) where
> > fmap f Leaf = Leaf
> > fmap f (Node c lN l e gN g) = Node c lN (f l) e gN (f g)
> 
> Now using a fixed-point construction we can bootstrap a sorting
> algorithm from partitionOnMedian:
> 
> > data Fix m = Fix {unfix :: (m (Fix m))}
> > type BTree a = Fix (BTreeRaw a)
> 
> > treeSort :: forall a. (Ord a) => S.Seq a -> BTree a
> > treeSort = Fix . helper . partitionOnMedian
> > where helper = fmap (Fix . helper . partitionOnMedian)
> 
> Now treeSort produces the thunk of a balanced binary search tree.  Of
> course we can get a sorted list out of it (forcing the whole
> structure):
> 
> > flatten :: BTree a -> S.Seq a
> > flatten (Fix Leaf) = S.empty
> > flatten (Fix (Node _ lN l e gN g)) = flatten l >< e >< flatten g
> 
> > mySort = flatten . treeSort
> 
> But we can also get elements efficently, forcing only a linear amount
> of comparisions in the worst case:
> 
> > index :: BTree a -> Int -> a
> > index (Fix Leaf) _ = error "tried to get an element of Leaf"
> > index (Fix (Node lN l e gN g)) i | i < lN
> >  = index l i
> >  | i - lN < S.length e
> >  = S.index e (i-lN)
> >  | i - lN - S.length e < gN
> >  = index g (i - lN - S.length e)
> >  | i - lN - S.length e - gN >= 0
> >  = error "index out of bounds"
> 
> Although we do have to force comparisions only once every time we
> touch the same element in the tree, we do still have to traverse the
> tree (in logarithmic time).
> 
> If you want linear time access on first touch of an element and
> constant time access afterwards us toArray:
> 
> > toArray :: (IA.IArray a t) => Fix (BTreeRaw t) -> a Int t
> > toArray tree = IA.listArray (0,maxI) (map (index tree) [0..maxI])
> > where size (Fix Leaf) = 0
> >   size (Fix (Node lN _ e gN _)) = lN + S.length e + gN
> >   maxI = size tree - 1
> 
> [1] Single-Threaded in the sense of Okasaki's use of the word.
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Hylomorphisms (was: [Haskell-cafe] excercise - a completely lazy sorting algorithm)

2009-07-12 Thread Brent Yorgey
On Sun, Jul 12, 2009 at 07:01:11PM +0200, Raynor Vliegendhart wrote:
> On 7/12/09, Heinrich Apfelmus  wrote:
> > Raynor Vliegendhart wrote:
> > > On 7/9/09, Heinrich Apfelmus  wrote:
> > >> Of course, some part of algorithm has to be recursive, but this can be
> > >> outsourced to a general recursion scheme, like the hylomorphism
> > >>
> > >>hylo :: Functor f => (a -> f a) -> (f b -> b) -> (a -> b)
> > >>hylo f g = g . fmap (hylo f g) . f
> > >>
> > >
> > > Is that definition of hylo actually usable? A few on IRC tried to use
> > > that definition for a few examples, but the examples failed to
> > > terminate or blew up the stack.
> >
> > The implementation of  quicksort  with  hylo  works fine for me, given
> > medium sized inputs like for example  quicksort (reverse [1..1000]) .
> >
> > What were the examples you tried?
> >
> 
> One of the examples I tried was:
> 
>hylo (unfoldr (\a -> Just (a,a))) head $ 42
> 
> This expression fails to determinate.
> 
> Here are two examples copumpkin tried on IRC:
> 
>  > let hylo f g = g . fmap (hylo f g) . f  in hylo (flip
> replicate 2) length 5
>5
> 
>  > let hylo f g = g . fmap (hylo f g) . f  in hylo (flip
> replicate 2) sum 5
>* Exception: stack overflow

[] is a strange functor to use with hylo, since it is already
recursive and its only base case (the empty list) doesn't contain any
a's.  Think about the intermediate structure that

  hylo (unfoldr (\a -> Just (a,a))) head

is building up: it is a list of lists of lists of lists of lists of
lists of no wonder it doesn't terminate! =)

Instead, it would be more normal to use something like 

  data ListF a l = Nil | Cons a l

  head :: ListF a l -> a
  head Nil = error "FLERG"
  head (Cons a _) = a

  instance Functor (ListF a) where
fmap _ Nil = Nil
fmap f (Cons a l) = Cons a (f l)

Taking the fixed point of (ListF a) gives us (something isomorphic to)
the normal [a], so we can do what you were presumably trying to do
with your example:

  hylo (\a -> Cons a a) head $ 42

The intermediate structure built up by this hylo is (isomorphic to) an
infinite list of 42's, and it evaluates to '42' just fine.

-Brent
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Hylomorphisms (was: [Haskell-cafe] excercise - a completely lazy sorting algorithm)

2009-07-12 Thread Raynor Vliegendhart
On 7/12/09, Heinrich Apfelmus  wrote:
> Raynor Vliegendhart wrote:
> > On 7/9/09, Heinrich Apfelmus  wrote:
> >> Of course, some part of algorithm has to be recursive, but this can be
> >> outsourced to a general recursion scheme, like the hylomorphism
> >>
> >>hylo :: Functor f => (a -> f a) -> (f b -> b) -> (a -> b)
> >>hylo f g = g . fmap (hylo f g) . f
> >>
> >
> > Is that definition of hylo actually usable? A few on IRC tried to use
> > that definition for a few examples, but the examples failed to
> > terminate or blew up the stack.
>
> The implementation of  quicksort  with  hylo  works fine for me, given
> medium sized inputs like for example  quicksort (reverse [1..1000]) .
>
> What were the examples you tried?
>

One of the examples I tried was:

   hylo (unfoldr (\a -> Just (a,a))) head $ 42

This expression fails to determinate.

Here are two examples copumpkin tried on IRC:

 > let hylo f g = g . fmap (hylo f g) . f  in hylo (flip
replicate 2) length 5
   5

 > let hylo f g = g . fmap (hylo f g) . f  in hylo (flip
replicate 2) sum 5
   * Exception: stack overflow
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] excercise - a completely lazy sorting algorithm

2009-07-12 Thread Bertram Felgenhauer
Petr Pudlak wrote:
> Would it be possible to create a lazy selection/sorting
> algorithm so that getting any element of the sorted list/array by its index
> would require just O(n) time, and getting all the elements would still be in
> O(n * log n)?

The (merge) sorting algorithm provided by Data.List has this property,
providing the first k elements in O(n + k log(n)) time. (source at [1])

As a mental model, you can think of suspended 'merge' evaluations as
internal nodes in a tree, with the two arguments as subtrees. In that
model, the algorithm turns into heap sort: It builds a balanced binary
tree with n external nodes, taking O(n) time -- this job is done by
merge_pairs -- and then repeatedly extracts the minimum element, taking
O(log(n)) time for each one.

regards,

Bertram

[1] 
http://www.haskell.org/ghc/docs/latest/html/libraries/base/src/Data-List.html#mergesort
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] excercise - a completely lazy sorting algorithm

2009-07-06 Thread Matthias Görgens
The "sorted array of bags of unsorted input" is a nice idea.  However,
you have to use the data structure in a single-threaded [1] fashion to
obtain the claimed bounds.

Here's a pure solution that uses amortization and laziness.

> import qualified Data.Sequence as S
> import Data.Sequence ((><))
> import Data.Foldable
> import Data.Monoid

Suppose we have a function to find the the median of a list, and
partition it into three sublists: Smaller than the median, equal to
the media, larger than the median.  That function should run in linear
time.

> partitionOnMedian :: forall a. (Ord a) => (S.Seq a) -> BTreeRaw a (S.Seq a)

where the following data structure holds the sublists and some
bookkeeping information:

> data BTreeRaw a m = Leaf
>   | Node {cmp::(a->Ordering)
>  , lN :: Int
>  , less::m
>  , eq :: (S.Seq a)
>  , gN :: Int
>  , greater::m
>  }

where 'lN' and 'gN' are the length of 'less' and 'greater'.

We can make BTreeRaw a functor:

> instance Functor (BTreeRaw a) where
> fmap f Leaf = Leaf
> fmap f (Node c lN l e gN g) = Node c lN (f l) e gN (f g)

Now using a fixed-point construction we can bootstrap a sorting
algorithm from partitionOnMedian:

> data Fix m = Fix {unfix :: (m (Fix m))}
> type BTree a = Fix (BTreeRaw a)

> treeSort :: forall a. (Ord a) => S.Seq a -> BTree a
> treeSort = Fix . helper . partitionOnMedian
> where helper = fmap (Fix . helper . partitionOnMedian)

Now treeSort produces the thunk of a balanced binary search tree.  Of
course we can get a sorted list out of it (forcing the whole
structure):

> flatten :: BTree a -> S.Seq a
> flatten (Fix Leaf) = S.empty
> flatten (Fix (Node _ lN l e gN g)) = flatten l >< e >< flatten g

> mySort = flatten . treeSort

But we can also get elements efficently, forcing only a linear amount
of comparisions in the worst case:

> index :: BTree a -> Int -> a
> index (Fix Leaf) _ = error "tried to get an element of Leaf"
> index (Fix (Node lN l e gN g)) i | i < lN
>  = index l i
>  | i - lN < S.length e
>  = S.index e (i-lN)
>  | i - lN - S.length e < gN
>  = index g (i - lN - S.length e)
>  | i - lN - S.length e - gN >= 0
>  = error "index out of bounds"

Although we do have to force comparisions only once every time we
touch the same element in the tree, we do still have to traverse the
tree (in logarithmic time).

If you want linear time access on first touch of an element and
constant time access afterwards us toArray:

> toArray :: (IA.IArray a t) => Fix (BTreeRaw t) -> a Int t
> toArray tree = IA.listArray (0,maxI) (map (index tree) [0..maxI])
> where size (Fix Leaf) = 0
>   size (Fix (Node lN _ e gN _)) = lN + S.length e + gN
>   maxI = size tree - 1

[1] Single-Threaded in the sense of Okasaki's use of the word.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] excercise - a completely lazy sorting algorithm

2009-07-06 Thread Andrew Hunter
On Mon, Jul 6, 2009 at 4:32 PM, Matthias
Görgens wrote:
>> It seems to me, that you just need a selection algorithm which works in
>> O(n * k) time for k arbitrary elements. If you combine O(n*k) selection
>> algorithm with any O(n * lg n) sort, you furfil your time constrain.
>
> I guess, we also want the list to be sorted in O(1) after having
> selected every element.
>

I think he's suggesting something along the lines of: for the first
\log n requests, use a selection.  On the \log n + 1th request, just
sort the whole thing.  This obviously isn't the spirit of what's
wanted, but does in fact meet the time bounds.

AHH
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] excercise - a completely lazy sorting algorithm

2009-07-06 Thread Matthias Görgens
> It seems to me, that you just need a selection algorithm which works in
> O(n * k) time for k arbitrary elements. If you combine O(n*k) selection
> algorithm with any O(n * lg n) sort, you furfil your time constrain.

I guess, we also want the list to be sorted in O(1) after having
selected every element.

Matthias.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] excercise - a completely lazy sorting algorithm

2009-07-06 Thread Mads Lindstrøm
Hi Petr,

Maybe this will give inspiration
http://en.wikipedia.org/wiki/Selection_algorithm

It seems to me, that you just need a selection algorithm which works in
O(n * k) time for k arbitrary elements. If you combine O(n*k) selection
algorithm with any O(n * lg n) sort, you furfil your time constrain.


Regards,

Mads

> Hi all,
> 
> about a month ago, we were discussing sorting in Haskell with a friend. We
> realized a nice property of lazy merge-sort (which is AFAIK the implementation
> of Data.List.sort): Getting the first element of the sorted list actually
> requires O(n) time, not O(n * log n) as in imperative languages. And
> immediately we asked: Would it be possible to create a lazy selection/sorting
> algorithm so that getting any element of the sorted list/array by its index
> would require just O(n) time, and getting all the elements would still be in
> O(n * log n)?
> 
> More precisely: The result of sorting an n-element list or array should be a
> structure that allows to ask i-th element of the result (for example, a lazy
> array). Asking k arbitrary elements should take no more than
>   O(min(n * log n, k * n))
> 
> I believe that this could be done, but so far I wasn't able to implement and
> show it myself. I think the solution would be somewhat modified lazy quicksort
> (or "Median of Medians algorithm" if we want to be sure that we get good
> pivots).
> 
> Perhaps somebody else would also like to give it a try? Or perhaps explain me
> why it's not possible?
> 
> Best regards,
> Petr
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe


signature.asc
Description: This is a digitally signed message part
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] excercise - a completely lazy sorting algorithm

2009-07-06 Thread Matthias Görgens
> If someone can translate my algorithm into a non-side-effecting one,
> I'd appreciate it, but I think that like disjoint set/union, this is
> probably inherently side-effecting.  Yes, yes, we could use functional
> arrays, but short of that I don't see a way to avoid side effects to
> take care of my amortized work.

I am just working on a side-effect-free version.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] excercise - a completely lazy sorting algorithm

2009-07-06 Thread Andrew Hunter
On Mon, Jul 6, 2009 at 12:26 PM, Petr Pudlak wrote:
> More precisely: The result of sorting an n-element list or array should be a
> structure that allows to ask i-th element of the result (for example, a lazy
> array). Asking k arbitrary elements should take no more than
>  O(min(n * log n, k * n))
>

I'd argue this is not really a true use of "laziness" and more just
amortized cost.  That said, yes, it can be done!

I'm going to use an imperative data structure because I hate you
all and it makes the analysis/presentation slightly easier.

To wit, we have a sorted array of unsorted bags of inputs.  For
example, (suppose we start with an array of numbers 1..10, scrambled),
we might start with:

(10,{1,5,6,8,3,4,2,7,9,10})

And after some operations, it might look like:

(3,{1,3,2});(4,{6,7,5,4});(3:{8,9,10})

And we're trying to (eventually) hit the state:

(1,{1});(2,{2});...etc.

(It's not hard to see how to in constant time maintain an array of
pointers into these bags to allow finding elements.)

Now, using a linear-time selection algorithm like median-of-medians or
Chazelle's, it's easy to see how to find the k-th largest element in
such a data structure: find the bag containing the k-th element, apply
the selection algorithm.  Furthermore, still in linear time, we can
(while we're working) split the bag around that element we found.  So,
given the data state:

(10,{1,5,6,8,3,4,2,7,9,10})

if we're asked for the 4th largest element, we'll update to:

(3,{1,3,2});(1,{4});(6,{5,6,8,7,9,10})

So far so good, right?  Now we just apply one more change: after each
lookup, we walk across the list of bags, and split each in half (as if
we were asked for the median element of each bag.)  In my running
example, we'd go to:

(1,{1});{1,{2});(1,{3});(1,{4});(2,{5,6});(1,{7});(3,{8,9,10})

This is still linear time--we run a linear-time algorithm on disjoint
subsets of the input.  Now, note: after k lookups to this structure,
the largest bag is at most n/2^k elements long!  Couple with a slight
optimization that overwrites the lookup table into the bags with just
the correct results once the bags are small enough, and this matches
your time bounds quite nicely.

Now, I doubt it'll be fast, but that's not what you asked.  Plus, come
up with a persuasive use case for a system where you /really need/ the
4th, 27th, and 957th largest elements /now/ and none of the rest, and
I promise I'll make something practically efficient. :)

If someone can translate my algorithm into a non-side-effecting one,
I'd appreciate it, but I think that like disjoint set/union, this is
probably inherently side-effecting.  Yes, yes, we could use functional
arrays, but short of that I don't see a way to avoid side effects to
take care of my amortized work.

AHH
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] excercise - a completely lazy sorting algorithm

2009-07-06 Thread Matthias Görgens
Interesting problem.  I have been toying with the same problem for some time.

To solve the problem in theory, I'd concentrate on getting the number
of comparisons into the required O(n) resp. O(n log n) ranges.
Afterwards we can think about building the infrastructure to keep the
number of all operations (book keeping..) in those bounds, too.

Anyway, I'll give a solution to the problem using a randomized
quicksort, soon.  Later we can replace the randomized pivote-picking
with a deteministic linear-median algorithm.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] excercise - a completely lazy sorting algorithm

2009-07-06 Thread Petr Pudlak
Hi all,

about a month ago, we were discussing sorting in Haskell with a friend. We
realized a nice property of lazy merge-sort (which is AFAIK the implementation
of Data.List.sort): Getting the first element of the sorted list actually
requires O(n) time, not O(n * log n) as in imperative languages. And
immediately we asked: Would it be possible to create a lazy selection/sorting
algorithm so that getting any element of the sorted list/array by its index
would require just O(n) time, and getting all the elements would still be in
O(n * log n)?

More precisely: The result of sorting an n-element list or array should be a
structure that allows to ask i-th element of the result (for example, a lazy
array). Asking k arbitrary elements should take no more than
  O(min(n * log n, k * n))

I believe that this could be done, but so far I wasn't able to implement and
show it myself. I think the solution would be somewhat modified lazy quicksort
(or "Median of Medians algorithm" if we want to be sure that we get good
pivots).

Perhaps somebody else would also like to give it a try? Or perhaps explain me
why it's not possible?

Best regards,
Petr
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe