[Haskell-cafe] Re: Iteratee question

2009-11-26 Thread oleg

Valery V. Vorotyntsev wrote:
 The following pattern appears quite often in my code:
  results - map someConversion `liftM` replicateM nbytes Iter.head

 The meaning is: take `nbytes' from stream, apply `someConversion' to
 every byte and return the list of `results'.
 But there's more than one way to do it:

  i1, i2, i3 :: Monad m = Int - IterateeG [] Word8 m [String]
  i1 n = map conv `liftM` replicateM n Iter.head
  i2 n = map conv `liftM` joinI (Iter.take n stream2list)
  i3 n = joinI $ Iter.take n $ joinI $ mapStream conv stream2list

 Of those i1, i2, i3 which one is better and why? Or is there another -
 preferable - way of applying iteratees to this task?

 My nai:ve guess is that i1 will have worse performance with big n's. It
 looks like `i1' is reading bytes one by one, while `i2' takes whole
 chunks of data... I'm not sure though.

You are correct: i2 and i3 can process a chunk of elements at a time,
if an enumerator supplies it. That means an iteratee like i2 or i3 can
do more work per invocation -- which is always good. Since you have to
get the results as a list, you pretty much have to use stream2list. It
should be noted that stream2list isn't very efficient: it returns the
accumulated list only when it is done -- which happens when the stream
is terminated, normally or abnormally. So, stream2list has a terrible
latency, and is useful only at the last stage of processing. I found
it is most useful for testing (to see the resulting stream) and for
writing Unit tests (to compare the produced results with the
expected). For incremental processing, it is better to stay within
Iteratees.

Although I think i2 and i3 should be close in performance (only
benchmarking can tell for sure, of course), i3 is more extensible
because stream2list is at the end of the chain. If later on further
processing is required (or, the latency imposed by stream2list becomes
noticeable), the chain can be easily extended. The advantage of the
arrangement of i3 is that if some Iteratee further down the chain
decided that it has had enough (elements), Iter.take can quickly skip
the remaining elements without the need to convert them. 
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Iteratee question

2009-11-26 Thread Valery V. Vorotyntsev
On Thu, Nov 26, 2009 at 10:17 AM,  o...@okmij.org wrote:
 You are correct: i2 and i3 can process a chunk of elements at a time,
 if an enumerator supplies it. That means an iteratee like i2 or i3 can
 do more work per invocation -- which is always good. Since you have to
 get the results as a list, you pretty much have to use stream2list. It
 should be noted that stream2list isn't very efficient: it returns the
 accumulated list only when it is done -- which happens when the stream
 is terminated, normally or abnormally. So, stream2list has a terrible
 latency, and is useful only at the last stage of processing. I found
 it is most useful for testing (to see the resulting stream) and for
 writing Unit tests (to compare the produced results with the
 expected). For incremental processing, it is better to stay within
 Iteratees.

 Although I think i2 and i3 should be close in performance (only
 benchmarking can tell for sure, of course), i3 is more extensible
 because stream2list is at the end of the chain. If later on further
 processing is required (or, the latency imposed by stream2list becomes
 noticeable), the chain can be easily extended. The advantage of the
 arrangement of i3 is that if some Iteratee further down the chain
 decided that it has had enough (elements), Iter.take can quickly skip
 the remaining elements without the need to convert them.

Thanks for clarification!

-- 
vvv
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Iteratee question

2009-11-26 Thread Valery V. Vorotyntsev
On Fri, Nov 27, 2009 at 4:04 AM, John Lato jwl...@gmail.com wrote:
 My apologies for not replying; I have been traveling and am only now
 working through my email.

 Oleg's response is much better than anything I would have written.
 I'd like to add one point.

 stream2list is very inefficient as he mentioned, however only for
 large values of 'n'.  For small n it should be fine.  Assuming you're
 using Word8 elements, small means  4096.  This is because the
 default chunk size reading from a file is 2048 elements, so for any n
  4096 you have at most two concatenations in producing the
 stream2list.

 Sincerely,
 John Lato

 PS for one example of a binary data parser, please see
 http://inmachina.net/~jwlato/haskell/iter-audio/

 This is similar to the audio codec included with iteratee, but much
 more efficient.  In particular, the functions convFunc and
 unroller in Sound.Iteratee.Codecs.Common are pretty highly
 optimized.

Wonderful!  Sample code is very helpful to get familiar with iteratees.

Thank you, John. Thanks to both of you.

-- 
vvv
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe