Re: [Haskell-cafe] excercise - a completely lazy sorting algorithm
Hi all, I apologize that I didn't react to your posts, I was on a vacation. (BTW, if you ever come to Slovakia, I strongly recommend visiting Mala (Lesser) Fatra mountains. IMHO it's more beautiful than more-known Tatra mountains.) Thanks for your interest and many intriguing ideas. Especially, I like cata-/ana-/hylo-morphisms, it looks to me as a very useful concept to learn. I hope I'll manage to create my own version of the sorting algorithm based on your advices. Maybe I'll also try to do some real benchmarks, if I have time. -Petr On Tue, Jul 07, 2009 at 02:49:08AM +0200, Matthias Görgens wrote: > The "sorted array of bags of unsorted input" is a nice idea. However, > you have to use the data structure in a single-threaded [1] fashion to > obtain the claimed bounds. > > Here's a pure solution that uses amortization and laziness. > > > import qualified Data.Sequence as S > > import Data.Sequence ((><)) > > import Data.Foldable > > import Data.Monoid > > Suppose we have a function to find the the median of a list, and > partition it into three sublists: Smaller than the median, equal to > the media, larger than the median. That function should run in linear > time. > > > partitionOnMedian :: forall a. (Ord a) => (S.Seq a) -> BTreeRaw a (S.Seq a) > > where the following data structure holds the sublists and some > bookkeeping information: > > > data BTreeRaw a m = Leaf > > | Node {cmp::(a->Ordering) > > , lN :: Int > > , less::m > > , eq :: (S.Seq a) > > , gN :: Int > > , greater::m > > } > > where 'lN' and 'gN' are the length of 'less' and 'greater'. > > We can make BTreeRaw a functor: > > > instance Functor (BTreeRaw a) where > > fmap f Leaf = Leaf > > fmap f (Node c lN l e gN g) = Node c lN (f l) e gN (f g) > > Now using a fixed-point construction we can bootstrap a sorting > algorithm from partitionOnMedian: > > > data Fix m = Fix {unfix :: (m (Fix m))} > > type BTree a = Fix (BTreeRaw a) > > > treeSort :: forall a. (Ord a) => S.Seq a -> BTree a > > treeSort = Fix . helper . partitionOnMedian > > where helper = fmap (Fix . helper . partitionOnMedian) > > Now treeSort produces the thunk of a balanced binary search tree. Of > course we can get a sorted list out of it (forcing the whole > structure): > > > flatten :: BTree a -> S.Seq a > > flatten (Fix Leaf) = S.empty > > flatten (Fix (Node _ lN l e gN g)) = flatten l >< e >< flatten g > > > mySort = flatten . treeSort > > But we can also get elements efficently, forcing only a linear amount > of comparisions in the worst case: > > > index :: BTree a -> Int -> a > > index (Fix Leaf) _ = error "tried to get an element of Leaf" > > index (Fix (Node lN l e gN g)) i | i < lN > > = index l i > > | i - lN < S.length e > > = S.index e (i-lN) > > | i - lN - S.length e < gN > > = index g (i - lN - S.length e) > > | i - lN - S.length e - gN >= 0 > > = error "index out of bounds" > > Although we do have to force comparisions only once every time we > touch the same element in the tree, we do still have to traverse the > tree (in logarithmic time). > > If you want linear time access on first touch of an element and > constant time access afterwards us toArray: > > > toArray :: (IA.IArray a t) => Fix (BTreeRaw t) -> a Int t > > toArray tree = IA.listArray (0,maxI) (map (index tree) [0..maxI]) > > where size (Fix Leaf) = 0 > > size (Fix (Node lN _ e gN _)) = lN + S.length e + gN > > maxI = size tree - 1 > > [1] Single-Threaded in the sense of Okasaki's use of the word. > ___ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Hylomorphisms (was: [Haskell-cafe] excercise - a completely lazy sorting algorithm)
On Sun, Jul 12, 2009 at 07:01:11PM +0200, Raynor Vliegendhart wrote: > On 7/12/09, Heinrich Apfelmus wrote: > > Raynor Vliegendhart wrote: > > > On 7/9/09, Heinrich Apfelmus wrote: > > >> Of course, some part of algorithm has to be recursive, but this can be > > >> outsourced to a general recursion scheme, like the hylomorphism > > >> > > >>hylo :: Functor f => (a -> f a) -> (f b -> b) -> (a -> b) > > >>hylo f g = g . fmap (hylo f g) . f > > >> > > > > > > Is that definition of hylo actually usable? A few on IRC tried to use > > > that definition for a few examples, but the examples failed to > > > terminate or blew up the stack. > > > > The implementation of quicksort with hylo works fine for me, given > > medium sized inputs like for example quicksort (reverse [1..1000]) . > > > > What were the examples you tried? > > > > One of the examples I tried was: > >hylo (unfoldr (\a -> Just (a,a))) head $ 42 > > This expression fails to determinate. > > Here are two examples copumpkin tried on IRC: > > > let hylo f g = g . fmap (hylo f g) . f in hylo (flip > replicate 2) length 5 >5 > > > let hylo f g = g . fmap (hylo f g) . f in hylo (flip > replicate 2) sum 5 >* Exception: stack overflow [] is a strange functor to use with hylo, since it is already recursive and its only base case (the empty list) doesn't contain any a's. Think about the intermediate structure that hylo (unfoldr (\a -> Just (a,a))) head is building up: it is a list of lists of lists of lists of lists of lists of no wonder it doesn't terminate! =) Instead, it would be more normal to use something like data ListF a l = Nil | Cons a l head :: ListF a l -> a head Nil = error "FLERG" head (Cons a _) = a instance Functor (ListF a) where fmap _ Nil = Nil fmap f (Cons a l) = Cons a (f l) Taking the fixed point of (ListF a) gives us (something isomorphic to) the normal [a], so we can do what you were presumably trying to do with your example: hylo (\a -> Cons a a) head $ 42 The intermediate structure built up by this hylo is (isomorphic to) an infinite list of 42's, and it evaluates to '42' just fine. -Brent ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Hylomorphisms (was: [Haskell-cafe] excercise - a completely lazy sorting algorithm)
On 7/12/09, Heinrich Apfelmus wrote: > Raynor Vliegendhart wrote: > > On 7/9/09, Heinrich Apfelmus wrote: > >> Of course, some part of algorithm has to be recursive, but this can be > >> outsourced to a general recursion scheme, like the hylomorphism > >> > >>hylo :: Functor f => (a -> f a) -> (f b -> b) -> (a -> b) > >>hylo f g = g . fmap (hylo f g) . f > >> > > > > Is that definition of hylo actually usable? A few on IRC tried to use > > that definition for a few examples, but the examples failed to > > terminate or blew up the stack. > > The implementation of quicksort with hylo works fine for me, given > medium sized inputs like for example quicksort (reverse [1..1000]) . > > What were the examples you tried? > One of the examples I tried was: hylo (unfoldr (\a -> Just (a,a))) head $ 42 This expression fails to determinate. Here are two examples copumpkin tried on IRC: > let hylo f g = g . fmap (hylo f g) . f in hylo (flip replicate 2) length 5 5 > let hylo f g = g . fmap (hylo f g) . f in hylo (flip replicate 2) sum 5 * Exception: stack overflow ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] excercise - a completely lazy sorting algorithm
Petr Pudlak wrote: > Would it be possible to create a lazy selection/sorting > algorithm so that getting any element of the sorted list/array by its index > would require just O(n) time, and getting all the elements would still be in > O(n * log n)? The (merge) sorting algorithm provided by Data.List has this property, providing the first k elements in O(n + k log(n)) time. (source at [1]) As a mental model, you can think of suspended 'merge' evaluations as internal nodes in a tree, with the two arguments as subtrees. In that model, the algorithm turns into heap sort: It builds a balanced binary tree with n external nodes, taking O(n) time -- this job is done by merge_pairs -- and then repeatedly extracts the minimum element, taking O(log(n)) time for each one. regards, Bertram [1] http://www.haskell.org/ghc/docs/latest/html/libraries/base/src/Data-List.html#mergesort ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] excercise - a completely lazy sorting algorithm
The "sorted array of bags of unsorted input" is a nice idea. However, you have to use the data structure in a single-threaded [1] fashion to obtain the claimed bounds. Here's a pure solution that uses amortization and laziness. > import qualified Data.Sequence as S > import Data.Sequence ((><)) > import Data.Foldable > import Data.Monoid Suppose we have a function to find the the median of a list, and partition it into three sublists: Smaller than the median, equal to the media, larger than the median. That function should run in linear time. > partitionOnMedian :: forall a. (Ord a) => (S.Seq a) -> BTreeRaw a (S.Seq a) where the following data structure holds the sublists and some bookkeeping information: > data BTreeRaw a m = Leaf > | Node {cmp::(a->Ordering) > , lN :: Int > , less::m > , eq :: (S.Seq a) > , gN :: Int > , greater::m > } where 'lN' and 'gN' are the length of 'less' and 'greater'. We can make BTreeRaw a functor: > instance Functor (BTreeRaw a) where > fmap f Leaf = Leaf > fmap f (Node c lN l e gN g) = Node c lN (f l) e gN (f g) Now using a fixed-point construction we can bootstrap a sorting algorithm from partitionOnMedian: > data Fix m = Fix {unfix :: (m (Fix m))} > type BTree a = Fix (BTreeRaw a) > treeSort :: forall a. (Ord a) => S.Seq a -> BTree a > treeSort = Fix . helper . partitionOnMedian > where helper = fmap (Fix . helper . partitionOnMedian) Now treeSort produces the thunk of a balanced binary search tree. Of course we can get a sorted list out of it (forcing the whole structure): > flatten :: BTree a -> S.Seq a > flatten (Fix Leaf) = S.empty > flatten (Fix (Node _ lN l e gN g)) = flatten l >< e >< flatten g > mySort = flatten . treeSort But we can also get elements efficently, forcing only a linear amount of comparisions in the worst case: > index :: BTree a -> Int -> a > index (Fix Leaf) _ = error "tried to get an element of Leaf" > index (Fix (Node lN l e gN g)) i | i < lN > = index l i > | i - lN < S.length e > = S.index e (i-lN) > | i - lN - S.length e < gN > = index g (i - lN - S.length e) > | i - lN - S.length e - gN >= 0 > = error "index out of bounds" Although we do have to force comparisions only once every time we touch the same element in the tree, we do still have to traverse the tree (in logarithmic time). If you want linear time access on first touch of an element and constant time access afterwards us toArray: > toArray :: (IA.IArray a t) => Fix (BTreeRaw t) -> a Int t > toArray tree = IA.listArray (0,maxI) (map (index tree) [0..maxI]) > where size (Fix Leaf) = 0 > size (Fix (Node lN _ e gN _)) = lN + S.length e + gN > maxI = size tree - 1 [1] Single-Threaded in the sense of Okasaki's use of the word. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] excercise - a completely lazy sorting algorithm
On Mon, Jul 6, 2009 at 4:32 PM, Matthias Görgens wrote: >> It seems to me, that you just need a selection algorithm which works in >> O(n * k) time for k arbitrary elements. If you combine O(n*k) selection >> algorithm with any O(n * lg n) sort, you furfil your time constrain. > > I guess, we also want the list to be sorted in O(1) after having > selected every element. > I think he's suggesting something along the lines of: for the first \log n requests, use a selection. On the \log n + 1th request, just sort the whole thing. This obviously isn't the spirit of what's wanted, but does in fact meet the time bounds. AHH ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] excercise - a completely lazy sorting algorithm
> It seems to me, that you just need a selection algorithm which works in > O(n * k) time for k arbitrary elements. If you combine O(n*k) selection > algorithm with any O(n * lg n) sort, you furfil your time constrain. I guess, we also want the list to be sorted in O(1) after having selected every element. Matthias. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] excercise - a completely lazy sorting algorithm
Hi Petr, Maybe this will give inspiration http://en.wikipedia.org/wiki/Selection_algorithm It seems to me, that you just need a selection algorithm which works in O(n * k) time for k arbitrary elements. If you combine O(n*k) selection algorithm with any O(n * lg n) sort, you furfil your time constrain. Regards, Mads > Hi all, > > about a month ago, we were discussing sorting in Haskell with a friend. We > realized a nice property of lazy merge-sort (which is AFAIK the implementation > of Data.List.sort): Getting the first element of the sorted list actually > requires O(n) time, not O(n * log n) as in imperative languages. And > immediately we asked: Would it be possible to create a lazy selection/sorting > algorithm so that getting any element of the sorted list/array by its index > would require just O(n) time, and getting all the elements would still be in > O(n * log n)? > > More precisely: The result of sorting an n-element list or array should be a > structure that allows to ask i-th element of the result (for example, a lazy > array). Asking k arbitrary elements should take no more than > O(min(n * log n, k * n)) > > I believe that this could be done, but so far I wasn't able to implement and > show it myself. I think the solution would be somewhat modified lazy quicksort > (or "Median of Medians algorithm" if we want to be sure that we get good > pivots). > > Perhaps somebody else would also like to give it a try? Or perhaps explain me > why it's not possible? > > Best regards, > Petr > ___ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe signature.asc Description: This is a digitally signed message part ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] excercise - a completely lazy sorting algorithm
> If someone can translate my algorithm into a non-side-effecting one, > I'd appreciate it, but I think that like disjoint set/union, this is > probably inherently side-effecting. Yes, yes, we could use functional > arrays, but short of that I don't see a way to avoid side effects to > take care of my amortized work. I am just working on a side-effect-free version. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] excercise - a completely lazy sorting algorithm
On Mon, Jul 6, 2009 at 12:26 PM, Petr Pudlak wrote: > More precisely: The result of sorting an n-element list or array should be a > structure that allows to ask i-th element of the result (for example, a lazy > array). Asking k arbitrary elements should take no more than > O(min(n * log n, k * n)) > I'd argue this is not really a true use of "laziness" and more just amortized cost. That said, yes, it can be done! I'm going to use an imperative data structure because I hate you all and it makes the analysis/presentation slightly easier. To wit, we have a sorted array of unsorted bags of inputs. For example, (suppose we start with an array of numbers 1..10, scrambled), we might start with: (10,{1,5,6,8,3,4,2,7,9,10}) And after some operations, it might look like: (3,{1,3,2});(4,{6,7,5,4});(3:{8,9,10}) And we're trying to (eventually) hit the state: (1,{1});(2,{2});...etc. (It's not hard to see how to in constant time maintain an array of pointers into these bags to allow finding elements.) Now, using a linear-time selection algorithm like median-of-medians or Chazelle's, it's easy to see how to find the k-th largest element in such a data structure: find the bag containing the k-th element, apply the selection algorithm. Furthermore, still in linear time, we can (while we're working) split the bag around that element we found. So, given the data state: (10,{1,5,6,8,3,4,2,7,9,10}) if we're asked for the 4th largest element, we'll update to: (3,{1,3,2});(1,{4});(6,{5,6,8,7,9,10}) So far so good, right? Now we just apply one more change: after each lookup, we walk across the list of bags, and split each in half (as if we were asked for the median element of each bag.) In my running example, we'd go to: (1,{1});{1,{2});(1,{3});(1,{4});(2,{5,6});(1,{7});(3,{8,9,10}) This is still linear time--we run a linear-time algorithm on disjoint subsets of the input. Now, note: after k lookups to this structure, the largest bag is at most n/2^k elements long! Couple with a slight optimization that overwrites the lookup table into the bags with just the correct results once the bags are small enough, and this matches your time bounds quite nicely. Now, I doubt it'll be fast, but that's not what you asked. Plus, come up with a persuasive use case for a system where you /really need/ the 4th, 27th, and 957th largest elements /now/ and none of the rest, and I promise I'll make something practically efficient. :) If someone can translate my algorithm into a non-side-effecting one, I'd appreciate it, but I think that like disjoint set/union, this is probably inherently side-effecting. Yes, yes, we could use functional arrays, but short of that I don't see a way to avoid side effects to take care of my amortized work. AHH ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] excercise - a completely lazy sorting algorithm
Interesting problem. I have been toying with the same problem for some time. To solve the problem in theory, I'd concentrate on getting the number of comparisons into the required O(n) resp. O(n log n) ranges. Afterwards we can think about building the infrastructure to keep the number of all operations (book keeping..) in those bounds, too. Anyway, I'll give a solution to the problem using a randomized quicksort, soon. Later we can replace the randomized pivote-picking with a deteministic linear-median algorithm. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] excercise - a completely lazy sorting algorithm
Hi all, about a month ago, we were discussing sorting in Haskell with a friend. We realized a nice property of lazy merge-sort (which is AFAIK the implementation of Data.List.sort): Getting the first element of the sorted list actually requires O(n) time, not O(n * log n) as in imperative languages. And immediately we asked: Would it be possible to create a lazy selection/sorting algorithm so that getting any element of the sorted list/array by its index would require just O(n) time, and getting all the elements would still be in O(n * log n)? More precisely: The result of sorting an n-element list or array should be a structure that allows to ask i-th element of the result (for example, a lazy array). Asking k arbitrary elements should take no more than O(min(n * log n, k * n)) I believe that this could be done, but so far I wasn't able to implement and show it myself. I think the solution would be somewhat modified lazy quicksort (or "Median of Medians algorithm" if we want to be sure that we get good pivots). Perhaps somebody else would also like to give it a try? Or perhaps explain me why it's not possible? Best regards, Petr ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe