[Haskell-cafe] Re: Iteratee question
Valery V. Vorotyntsev wrote: The following pattern appears quite often in my code: results - map someConversion `liftM` replicateM nbytes Iter.head The meaning is: take `nbytes' from stream, apply `someConversion' to every byte and return the list of `results'. But there's more than one way to do it: i1, i2, i3 :: Monad m = Int - IterateeG [] Word8 m [String] i1 n = map conv `liftM` replicateM n Iter.head i2 n = map conv `liftM` joinI (Iter.take n stream2list) i3 n = joinI $ Iter.take n $ joinI $ mapStream conv stream2list Of those i1, i2, i3 which one is better and why? Or is there another - preferable - way of applying iteratees to this task? My nai:ve guess is that i1 will have worse performance with big n's. It looks like `i1' is reading bytes one by one, while `i2' takes whole chunks of data... I'm not sure though. You are correct: i2 and i3 can process a chunk of elements at a time, if an enumerator supplies it. That means an iteratee like i2 or i3 can do more work per invocation -- which is always good. Since you have to get the results as a list, you pretty much have to use stream2list. It should be noted that stream2list isn't very efficient: it returns the accumulated list only when it is done -- which happens when the stream is terminated, normally or abnormally. So, stream2list has a terrible latency, and is useful only at the last stage of processing. I found it is most useful for testing (to see the resulting stream) and for writing Unit tests (to compare the produced results with the expected). For incremental processing, it is better to stay within Iteratees. Although I think i2 and i3 should be close in performance (only benchmarking can tell for sure, of course), i3 is more extensible because stream2list is at the end of the chain. If later on further processing is required (or, the latency imposed by stream2list becomes noticeable), the chain can be easily extended. The advantage of the arrangement of i3 is that if some Iteratee further down the chain decided that it has had enough (elements), Iter.take can quickly skip the remaining elements without the need to convert them. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Iteratee question
On Thu, Nov 26, 2009 at 10:17 AM, o...@okmij.org wrote: You are correct: i2 and i3 can process a chunk of elements at a time, if an enumerator supplies it. That means an iteratee like i2 or i3 can do more work per invocation -- which is always good. Since you have to get the results as a list, you pretty much have to use stream2list. It should be noted that stream2list isn't very efficient: it returns the accumulated list only when it is done -- which happens when the stream is terminated, normally or abnormally. So, stream2list has a terrible latency, and is useful only at the last stage of processing. I found it is most useful for testing (to see the resulting stream) and for writing Unit tests (to compare the produced results with the expected). For incremental processing, it is better to stay within Iteratees. Although I think i2 and i3 should be close in performance (only benchmarking can tell for sure, of course), i3 is more extensible because stream2list is at the end of the chain. If later on further processing is required (or, the latency imposed by stream2list becomes noticeable), the chain can be easily extended. The advantage of the arrangement of i3 is that if some Iteratee further down the chain decided that it has had enough (elements), Iter.take can quickly skip the remaining elements without the need to convert them. Thanks for clarification! -- vvv ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Iteratee question
On Fri, Nov 27, 2009 at 4:04 AM, John Lato jwl...@gmail.com wrote: My apologies for not replying; I have been traveling and am only now working through my email. Oleg's response is much better than anything I would have written. I'd like to add one point. stream2list is very inefficient as he mentioned, however only for large values of 'n'. For small n it should be fine. Assuming you're using Word8 elements, small means 4096. This is because the default chunk size reading from a file is 2048 elements, so for any n 4096 you have at most two concatenations in producing the stream2list. Sincerely, John Lato PS for one example of a binary data parser, please see http://inmachina.net/~jwlato/haskell/iter-audio/ This is similar to the audio codec included with iteratee, but much more efficient. In particular, the functions convFunc and unroller in Sound.Iteratee.Codecs.Common are pretty highly optimized. Wonderful! Sample code is very helpful to get familiar with iteratees. Thank you, John. Thanks to both of you. -- vvv ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe