Re: [Haskell-cafe] Patterns for processing large but finite streams
At Fri, 1 Jul 2011 09:39:32 +0400, Eugene Kirpichov wrote: Hi, I'm rewriting timeplot to avoid holding the whole input in memory, and naturally a problem arises: How to represent large but finite streams and functions that process them, returning other streams or some kinds of aggregate values? Examples: * Adjacent differences of a stream of numbers * Given a stream of numbers with times, split it into buckets by time of given width and produce a stream of (bucket, 50%,75% and 90% quantiles in this bucket) * Sum a stream of numbers Is this, perhaps, what comonads are for? Or iteratees? Sounds like a good job for iteratees. Summing a stream of numbers is just an Iteratee. Transcoding a stream into another stream is a job for an Inum (Iteratee-enumerator) or enumeratee, depending on which package's nomenclature you use. You have three implementations to choose from: - http://hackage.haskell.org/package/iteratee(original) - http://hackage.haskell.org/package/enumerator (John Milikin's re-write) - http://hackage.haskell.org/package/iterIO (my 3rd-generation attempt) David ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Patterns for processing large but finite streams
Eugene Kirpichov wrote: I'm rewriting timeplot to avoid holding the whole input in memory, and naturally a problem arises: How to represent large but finite streams and functions that process them, returning other streams or some kinds of aggregate values? Examples: * Adjacent differences of a stream of numbers * Given a stream of numbers with times, split it into buckets by time of given width and produce a stream of (bucket, 50%,75% and 90% quantiles in this bucket) * Sum a stream of numbers Is this, perhaps, what comonads are for? Or iteratees? Plain old lazy lists? Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Patterns for processing large but finite streams
Plain old lazy lists do not allow me to combine multiple concurrent computations, e.g. I cannot define average from sum and length. 2011/7/1 Heinrich Apfelmus apfel...@quantentunnel.de: Eugene Kirpichov wrote: I'm rewriting timeplot to avoid holding the whole input in memory, and naturally a problem arises: How to represent large but finite streams and functions that process them, returning other streams or some kinds of aggregate values? Examples: * Adjacent differences of a stream of numbers * Given a stream of numbers with times, split it into buckets by time of given width and produce a stream of (bucket, 50%,75% and 90% quantiles in this bucket) * Sum a stream of numbers Is this, perhaps, what comonads are for? Or iteratees? Plain old lazy lists? Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Patterns for processing large but finite streams
Sure you can. runningAverage :: Int - [Double] - [Double] runningAverage n xs = let chunk = take n xs in (sum chunk / length chunk) : runningAverage (tail xs) Lazy lists are absolutely ideal for this purpose. Regards, Malcolm On 1 Jul 2011, at 07:33, Eugene Kirpichov wrote: Plain old lazy lists do not allow me to combine multiple concurrent computations, e.g. I cannot define average from sum and length. 2011/7/1 Heinrich Apfelmus apfel...@quantentunnel.de: Eugene Kirpichov wrote: I'm rewriting timeplot to avoid holding the whole input in memory, and naturally a problem arises: How to represent large but finite streams and functions that process them, returning other streams or some kinds of aggregate values? Examples: * Adjacent differences of a stream of numbers * Given a stream of numbers with times, split it into buckets by time of given width and produce a stream of (bucket, 50%,75% and 90% quantiles in this bucket) * Sum a stream of numbers Is this, perhaps, what comonads are for? Or iteratees? Plain old lazy lists? Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Patterns for processing large but finite streams
I meant the average of the whole list - given a sumS and lengthS (S for Stream), write meanS as something like liftS2 (/) sumS lengthS. Or is that possible with lazy lists too? (looks like arrows actually - which arrow is appropriate here?) 2011/7/1 Malcolm Wallace malcolm.wall...@me.com: Sure you can. runningAverage :: Int - [Double] - [Double] runningAverage n xs = let chunk = take n xs in (sum chunk / length chunk) : runningAverage (tail xs) Lazy lists are absolutely ideal for this purpose. Regards, Malcolm On 1 Jul 2011, at 07:33, Eugene Kirpichov wrote: Plain old lazy lists do not allow me to combine multiple concurrent computations, e.g. I cannot define average from sum and length. 2011/7/1 Heinrich Apfelmus apfel...@quantentunnel.de: Eugene Kirpichov wrote: I'm rewriting timeplot to avoid holding the whole input in memory, and naturally a problem arises: How to represent large but finite streams and functions that process them, returning other streams or some kinds of aggregate values? Examples: * Adjacent differences of a stream of numbers * Given a stream of numbers with times, split it into buckets by time of given width and produce a stream of (bucket, 50%,75% and 90% quantiles in this bucket) * Sum a stream of numbers Is this, perhaps, what comonads are for? Or iteratees? Plain old lazy lists? Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Patterns for processing large but finite streams
On Fri, Jul 1, 2011 at 12:21 PM, Eugene Kirpichov ekirpic...@gmail.com wrote: I meant the average of the whole list - given a sumS and lengthS (S for Stream), write meanS as something like liftS2 (/) sumS lengthS. Or is that possible with lazy lists too? Sure you can. Sum, length and mean could be calculated as left fold. If you need to calculate more that one statistic at time you can combine accumulators sum = foldl (+) 0 length = foldl (\n _ - n+1) 0 data Mean Double Int mean = foldl (\(Mean m n) x - Mean (m + (x - m) / fromIntegral (n+1)) (n+1)) (Mean 0 0) AFAIU iteratees basically use same technique. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Patterns for processing large but finite streams
Alexey, your definition of mean does not look like liftS2 (/) sum length - you have to manually fuse these computations. I'm asking for a formalism that does this fusion automatically (and guaranteedly). 2011/7/1 Alexey Khudyakov alexey.sklad...@gmail.com: On Fri, Jul 1, 2011 at 12:21 PM, Eugene Kirpichov ekirpic...@gmail.com wrote: I meant the average of the whole list - given a sumS and lengthS (S for Stream), write meanS as something like liftS2 (/) sumS lengthS. Or is that possible with lazy lists too? Sure you can. Sum, length and mean could be calculated as left fold. If you need to calculate more that one statistic at time you can combine accumulators sum = foldl (+) 0 length = foldl (\n _ - n+1) 0 data Mean Double Int mean = foldl (\(Mean m n) x - Mean (m + (x - m) / fromIntegral (n+1)) (n+1)) (Mean 0 0) AFAIU iteratees basically use same technique. -- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Patterns for processing large but finite streams
On Fri, Jul 1, 2011 at 12:54 PM, Eugene Kirpichov ekirpic...@gmail.com wrote: Alexey, your definition of mean does not look like liftS2 (/) sum length - you have to manually fuse these computations. Well it was fused for numerical stability I'm asking for a formalism that does this fusion automatically (and guaranteedly). Joining accumulators is quite straightforward. So is joining of initial state. Just creating a joinAcc :: (acc1 - x - acc1) - (acc2 - x - acc2) - (acc1,acc2) - x - (acc1,acc2) joinAcc f1 f2 (s1,s2) x = (f1 s1 x, f2 s2 x) Still you have to handle them separately. sum' = foldl (+) 0 len = foldl (\n _ - n+1) 0 sumLen = foldl (joinAcc (+) (\n _ - n+1)) (0,0) There is more regular approach but it only works with statistics. (function which do not depend on order of elements in the sample) For every statistics monoid for its evaluation could be constructed. For example sum: newtype Sum a = Sum a instance Num a = Monoid (Sum a) where mempty = Sum 0 mappend (Sum a) (Sum b) = Sum (a+b) Composition of these monoids becomes trivial. Just use I pursued this approach in monoid-statistics[1] package. It's reasonably well documented [1] http://hackage.haskell.org/package/monoid-statistics ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Patterns for processing large but finite streams
Eugene Kirpichov wrote: Plain old lazy lists do not allow me to combine multiple concurrent computations, e.g. I cannot define average from sum and length. I meant the average of the whole list - given a sumS and lengthS (S for Stream), write meanS as something like liftS2 (/) sumS lengthS. Or is that possible with lazy lists too? (looks like arrows actually - which arrow is appropriate here?) That's a very good point. Just to clarify for everyone: Eugene wants to write the function average almost *literally* as average xs = sum xs / length xs but he wants the functions sum and length to fuse, so that the input stream xs is *not* shared as a whole. I have thought about this problem for a while actually and have observed the following: 1) You are not looking for a representation of streams, but for a representation of *functions* on streams. The essence of a function on streams is its case analysis of the input. Hence, the simplest solution is to make the case analysis explicit: data StringTo a = CaseOf a (Char - StringTo a) -- function on a stream (here: String) interpret :: StringTo a - (String - a) interpret (CaseOf nil cons) [] = nil interpret (CaseOf nil cons) (x:xs) = interpret (cons x) xs instance Applicative StringTo where pure a = CaseOf a (const $ pure a) (CaseOf nil1 cons1) * (CaseOf nil2 cons2) = CaseOf (nil1 $ nil2) (\c - cons1 c * cons2 c) length = go 0 where go n = CaseOf n (\_ - go $! n+1) average = liftA2 (/) sum length In other words, if you reify case .. of expression , you will be able to fuse them. 2) If Haskell were to support some kind of evaluation under the lambda (partial evaluation, head normal form instead of weak head normal form), it would be unnecessary to make the case expressions implicit. Rather, the applicative instance could be written as follows instance Applicative ((-) String) where pure a = const a f * x = \cs - case cs of [] - f [] $ x [] (c:cs) - let f' cs = f (c:cs) -- partial evaluation on this x' cs = x (c:cs) in f' `partialseq` x' `partialseq` (f' * x') cs We could simply write average = liftA2 (/) sum length and everything would magically fuse. 3) John Hughes has already thought about this problem in his PhD thesis. :) (but it is not available for download on the internet, unfortunately. :( ). His solution was a SYNCHLIST primitive in conjunction with some sort of parallelism PAR. Basically, the SYNCHLIST primitive only allows simultaneous access to the input stream and the parallelism is used to make that simultaneity happen. Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Patterns for processing large but finite streams
Thanks but I'm afraid that's still not quite what I'm looking for; guess I'll have to define my desire by my implementation - so once it's ready I'll show the result to cafe :) 2011/7/1 Alexey Khudyakov alexey.sklad...@gmail.com: On Fri, Jul 1, 2011 at 12:54 PM, Eugene Kirpichov ekirpic...@gmail.com wrote: Alexey, your definition of mean does not look like liftS2 (/) sum length - you have to manually fuse these computations. Well it was fused for numerical stability I'm asking for a formalism that does this fusion automatically (and guaranteedly). Joining accumulators is quite straightforward. So is joining of initial state. Just creating a joinAcc :: (acc1 - x - acc1) - (acc2 - x - acc2) - (acc1,acc2) - x - (acc1,acc2) joinAcc f1 f2 (s1,s2) x = (f1 s1 x, f2 s2 x) Still you have to handle them separately. sum' = foldl (+) 0 len = foldl (\n _ - n+1) 0 sumLen = foldl (joinAcc (+) (\n _ - n+1)) (0,0) There is more regular approach but it only works with statistics. (function which do not depend on order of elements in the sample) For every statistics monoid for its evaluation could be constructed. For example sum: newtype Sum a = Sum a instance Num a = Monoid (Sum a) where mempty = Sum 0 mappend (Sum a) (Sum b) = Sum (a+b) Composition of these monoids becomes trivial. Just use I pursued this approach in monoid-statistics[1] package. It's reasonably well documented [1] http://hackage.haskell.org/package/monoid-statistics -- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Patterns for processing large but finite streams
From: Eugene Kirpichov ekirpic...@gmail.com Subject: [Haskell-cafe] Patterns for processing large but finite streams To: Haskell Cafe haskell-cafe@haskell.org Message-ID: banlktikdsvq2wv4wjr+qmuvksoav0kt...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 Hi, I'm rewriting timeplot to avoid holding the whole input in memory, and naturally a problem arises: How to represent large but finite streams and functions that process them, returning other streams or some kinds of aggregate values? Examples: * Adjacent differences of a stream of numbers * Given a stream of numbers with times, split it into buckets by time of given width and produce a stream of (bucket, 50%,75% and 90% quantiles in this bucket) * Sum a stream of numbers Is this, perhaps, what comonads are for? Or iteratees? This is exactly what iteratees are for. Specifically, enumeratees are stream transformers, iteratees are stream consumers, and enumerators are stream producers. Consider adjacent differences: Given the stream [a, b, c, d, e ], you want to produce [b-a, c-b, d-c, e-d, ...] This is a stream transformer, so you need an enumeratee. Using iteratee, there are at least two obvious ways to produce it: 1) High-level, but probably not as good performance. Use Data.Iteratee.ListLike.roll import Data.Iteratee as I import Control.Applicative diff [x,y] = y-x diff [x]= 0 diffs = convStream (map diff $ roll 2 1) 2) somewhat explicit, probably better performance import qualified Data.ListLike as LL e' iter = do h - I.head unfoldConvStream f h iter where f lastEl = do c - getChunk if LL.null c then return (lastEl, LL.empty) else do let h = LL.head c t = LL.tail c return (LL.last c, LL.cons (h-lastEl) (LL.zipWith (-) t (LL.init c))) either of these can be run by using an enumerator: *Main enumPure1Chunk [1..10] (joinI $ e stream2list) = run [1,1,1,1,1,1,1,1,1,0] *Main let e'2 = e' :: Enumeratee [Int] [Int] IO a *Main enumPure1Chunk [1..10] (joinI $ e'2 stream2list) = run [1,1,1,1,1,1,1,1,1] I should optimize 'roll', it wouldn't be hard. Summing is easy; iteratee has Data.Iteratee.ListLike.sum built-in, but you could also use a fold. enumPure1Chunk is only useful for small amounts of data, but iteratee packages provide enumerators over files, handles, etc., as well as mechanisms by which you can create your own enumerators. The e' enumeratee is really just a model; I'd probably write one specific to whichever type of stream I wanted to work with. This one assumes a cheap 'cons', for example. For producing a stream of buckets, if the times are ordered it would be simple to do with Data.Iteratee.ListLike.breakE. If the times aren't ordered, I would probably use 'group' instead to collect a set number of samples. In my view, the biggest difference between iteratee and enumerator is the stream abstraction. Iteratee provides I.Iteratee s m a where 's' is the stream type, e.g. [Int], String, ByteString, Vector Word8, etc. Although the library only processes a chunk at a time (where a chunk is a subsection of the stream), the type is that of the whole stream. Enumerator instead provides E.Iteratee s m a here, 's' is the type of the chunk. Enumerator treats the stream as having type [s]. The implementations are different too, but ideally that would be hidden from most users. Although the iteratee implementation allows you to use = and $, whereas enumerator sometimes requires you to use == and $$. I think IterIO mostly follows the same type as Enumerator, although the implementation is again quite different. John ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Patterns for processing large but finite streams
After the list discussion, I'm surprised nobody mentioned Max Rabkin/Conal Elliott's blog posts on folds and zips. http://squing.blogspot.com/2008/11/beautiful-folding.html http://conal.net/blog/posts/enhancing-a-zip/ They develop a formalism for zipping functions on lists. Iteratee's `zip` set of functions are somewhat similar, but not quite the same. Specifically they still require multiple traversals of the data, but only over a bounded portion of it, so they're much more efficient. Of course you could combine the above patterns with iteratees by creating functions as above, then just running them with a 'fold'. John L. From: Eugene Kirpichov ekirpic...@gmail.com Thanks but I'm afraid that's still not quite what I'm looking for; guess I'll have to define my desire by my implementation - so once it's ready I'll show the result to cafe :) 2011/7/1 Alexey Khudyakov alexey.sklad...@gmail.com: On Fri, Jul 1, 2011 at 12:54 PM, Eugene Kirpichov ekirpic...@gmail.com wrote: Alexey, your definition of mean does not look like liftS2 (/) sum length - you have to manually fuse these computations. Well it was fused for numerical stability I'm asking for a formalism that does this fusion automatically (and guaranteedly). Joining accumulators is quite straightforward. So is joining of initial state. Just creating a joinAcc :: (acc1 - x - acc1) - (acc2 - x - acc2) - (acc1,acc2) - x - (acc1,acc2) joinAcc f1 f2 (s1,s2) x = (f1 s1 x, f2 s2 x) Still you have to handle them separately. sum' = foldl (+) 0 len ?= foldl (\n _ - n+1) 0 sumLen = foldl (joinAcc (+) (\n _ - n+1)) (0,0) There is more regular approach but it only works with statistics. (function which do not depend on order of elements in the sample) For every statistics monoid for its evaluation could be constructed. For example sum: newtype Sum a = Sum a instance Num a = Monoid (Sum a) where ? mempty = Sum 0 ? mappend (Sum a) (Sum b) = Sum (a+b) Composition of these monoids becomes trivial. Just use I pursued this approach in monoid-statistics[1] package. It's reasonably well documented ?[1] http://hackage.haskell.org/package/monoid-statistics ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Patterns for processing large but finite streams
Hi, You're right, reifying stream processing functions seems indeed the way to go - and that looks even more like arrows :) I thought of something like this: data SP i o = Yield [o] (Maybe (Maybe i - SP i o)) Scalar functions like sum and length are just SP's that return a single item in the output stream. sum :: (Num a) = SP a a sum = sum' 0 where sum' s = Yield [] $ Just $ maybe (Yield [s] Nothing) (sum' . (s+)) Adjacent differences would be like liftA2 (-) input laggedInput laggedInput would be like: laggedInput :: SP i i laggedInput = li Nothing where li maybePrev = Yield (maybe2list maybePrev) $ Just $ maybe empty (li . Just) Looks like this can be made into an instance of Arrow and can be composed etc. 2011/7/1 Heinrich Apfelmus apfel...@quantentunnel.de: Eugene Kirpichov wrote: Plain old lazy lists do not allow me to combine multiple concurrent computations, e.g. I cannot define average from sum and length. I meant the average of the whole list - given a sumS and lengthS (S for Stream), write meanS as something like liftS2 (/) sumS lengthS. Or is that possible with lazy lists too? (looks like arrows actually - which arrow is appropriate here?) That's a very good point. Just to clarify for everyone: Eugene wants to write the function average almost *literally* as average xs = sum xs / length xs but he wants the functions sum and length to fuse, so that the input stream xs is *not* shared as a whole. I have thought about this problem for a while actually and have observed the following: 1) You are not looking for a representation of streams, but for a representation of *functions* on streams. The essence of a function on streams is its case analysis of the input. Hence, the simplest solution is to make the case analysis explicit: data StringTo a = CaseOf a (Char - StringTo a) -- function on a stream (here: String) interpret :: StringTo a - (String - a) interpret (CaseOf nil cons) [] = nil interpret (CaseOf nil cons) (x:xs) = interpret (cons x) xs instance Applicative StringTo where pure a = CaseOf a (const $ pure a) (CaseOf nil1 cons1) * (CaseOf nil2 cons2) = CaseOf (nil1 $ nil2) (\c - cons1 c * cons2 c) length = go 0 where go n = CaseOf n (\_ - go $! n+1) average = liftA2 (/) sum length In other words, if you reify case .. of expression , you will be able to fuse them. 2) If Haskell were to support some kind of evaluation under the lambda (partial evaluation, head normal form instead of weak head normal form), it would be unnecessary to make the case expressions implicit. Rather, the applicative instance could be written as follows instance Applicative ((-) String) where pure a = const a f * x = \cs - case cs of [] - f [] $ x [] (c:cs) - let f' cs = f (c:cs) -- partial evaluation on this x' cs = x (c:cs) in f' `partialseq` x' `partialseq` (f' * x') cs We could simply write average = liftA2 (/) sum length and everything would magically fuse. 3) John Hughes has already thought about this problem in his PhD thesis. :) (but it is not available for download on the internet, unfortunately. :( ). His solution was a SYNCHLIST primitive in conjunction with some sort of parallelism PAR. Basically, the SYNCHLIST primitive only allows simultaneous access to the input stream and the parallelism is used to make that simultaneity happen. Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Patterns for processing large but finite streams
Eugene Kirpichov ekirpic...@gmail.com writes: 2011/7/1 Heinrich Apfelmus apfel...@quantentunnel.de: Eugene Kirpichov wrote: I'm rewriting timeplot to avoid holding the whole input in memory, and naturally a problem arises: Plain old lazy lists? Heretic! :-) I generally have written a bunch of programs that do things that way, and I think it works pretty well with a couple of caveats: 1. Make sure you collect data into strict data structures. Dangerous operations are addition and anything involving Data.Map. And use foldl'. 2. If you plan on working on multiple files, extra care might be needed to close them, or you'll run out of file descriptors. As long as you avoid these pitfalls, the advantage is very clean and simple code. Plain old lazy lists do not allow me to combine multiple concurrent computations, e.g. I cannot define average from sum and length. Yes, this is clunky. I'm not aware of any good solution. -k -- If I haven't seen further, it is by standing in the footprints of giants ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Patterns for processing large but finite streams
This sound exactly like what attribute grammars, like the system developed at Utrecht University [1], are useful for. Erik [1] http://www.cs.uu.nl/wiki/HUT/AttributeGrammarSystem On Fri, Jul 1, 2011 at 10:54, Eugene Kirpichov ekirpic...@gmail.com wrote: Alexey, your definition of mean does not look like liftS2 (/) sum length - you have to manually fuse these computations. I'm asking for a formalism that does this fusion automatically (and guaranteedly). 2011/7/1 Alexey Khudyakov alexey.sklad...@gmail.com: On Fri, Jul 1, 2011 at 12:21 PM, Eugene Kirpichov ekirpic...@gmail.com wrote: I meant the average of the whole list - given a sumS and lengthS (S for Stream), write meanS as something like liftS2 (/) sumS lengthS. Or is that possible with lazy lists too? Sure you can. Sum, length and mean could be calculated as left fold. If you need to calculate more that one statistic at time you can combine accumulators sum = foldl (+) 0 length = foldl (\n _ - n+1) 0 data Mean Double Int mean = foldl (\(Mean m n) x - Mean (m + (x - m) / fromIntegral (n+1)) (n+1)) (Mean 0 0) AFAIU iteratees basically use same technique. -- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Patterns for processing large but finite streams
Hi, I'm rewriting timeplot to avoid holding the whole input in memory, and naturally a problem arises: How to represent large but finite streams and functions that process them, returning other streams or some kinds of aggregate values? Examples: * Adjacent differences of a stream of numbers * Given a stream of numbers with times, split it into buckets by time of given width and produce a stream of (bucket, 50%,75% and 90% quantiles in this bucket) * Sum a stream of numbers Is this, perhaps, what comonads are for? Or iteratees? -- Eugene Kirpichov Principal Engineer, Mirantis Inc. http://www.mirantis.com/ Editor, http://fprog.ru/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe