Re: [Haskell-cafe] compare iteratee with python's yield

2011-07-28 Thread oleg


Your question is insightful. The posted answers are
excellent. I merely wish to illustrate one, already made, point on a
concrete example.

The complete code is in the file
http://okmij.org/ftp/Haskell/Iteratee/IterDemo.hs

The file implements a series of progressively more complex examples
-- from wc to grep, -- stressing compositionality. The whole program
is assembled from independent building blocks; as our task changes, we
replace one building block with another. The building blocks,
iteratees, may be stateful. We never have to worry about writing out the
whole state of the program and properly initializing it. The overall
state is implicitly composed from the state of each building block; 
each iteratee manages its own state without leaking it.

The particular example to illustrate the state encapsulation is
grep1_word

 grep1_word word fname =
   let search = IterateeM.dropWhile (/= word)  IterateeM.head 
   runI'  = runI . en_handle (\_ - not found: ++ word) in
   print = run = enum_file fname 
 (runI = en_lines 
  ((runI' = en_words search)
   `en_pair` (count_i `en_pair` en_last)))

which implements grep -w -n -m 1 word filename. It searches for
the first occurrence of the given word in the file and returns the
line of that occurrence and the line number.  (The example also
illustrates exception handling: The exception is raised by
Iteratee.head when IterateeM.dropWhile dropped the whole stream while
looking for the given word.) A few comments: en_lines is the
enumeratee, transforming a stream of characters into a stream of
lines; en_words is the enumeratee that transforms the stream of lines
to the stream of words. The stateful iteratee count_i counts the
elements in any stream; en_last, the analogue of List.last, returns
the last seen element. The combinator en_pair pairs two iteratees to
run in parallel, processing the same stream chunk-by-chunk. The
processing stops as soon as one of the iteratees in the pair
stopped. That behavior does the trick: en_words search stops when
the word is found. The whole pair is stopped too. The other component
of the pair, (count_i `en_pair` en_last), will receive EOF and tell us
the line and the line count they have last seen.

I guess the example can be written in Python-like style as follows
(simplifying and assuming that the file is already parsed into lines):

counter = 0
for line in file_lines:
  counter = counter + 1
  if contains line word:
 print counter
 print line
 break


Suppose we wish to change our example to print not only the found
line, but a few lines before it (the option -B of grep). We have to
modify our Python-like code:
 -- add a piece of state to keep the context lines,
and initialize it
 -- add code to update the state as we read lines
 -- add the finalization code, to convert the state to the desired
result

The code will look like the following (### mark the changes):

counter = 0
lines_seen = []  ###
for line in file_lines:
  counter = counter + 1
  accumulate lines_seen line ###
  if contains line word:
 print counter
 print (finalize lines_seen) ###
 break

In contrast, to modify the iteratee code for the new task, we make the
_single_ change: replace en_last with (en_lastN 5) (for 5 lines of
context).

It should be stressed that the Python code has to be changed in three
*distinct, disconnected places*. New state has to be defined and
initialized, before the loop. The state has to be updated, somewhere
in the loop. The state has to be finalized, in the loop termination
part. When changing code, it is easy to overlook the needed
changes. For example, it is easier to accumulate the context lines in
reverse order. When printing the lines, we should not forget to
reverse them. When changes are disconnected, it is hard to reason and
test for them. With so many other things going on, it is hard to write
a proper test (for example, when testing the context accumulation we
don't care about word matching; we should test the accumulation in
isolation of whatever else we are doing in the iteration).


The iteratee en_lastN deals only with the context accumulation aspect:

 -- Return N last received elements
 -- The acc below is the list of N last elements in the reverse
 -- order
 en_lastN :: Monad m = Int - Iteratee el m [el]
 en_lastN n = ie_cont $ step []
  where
  step acc (Chunk [])  = ie_contM (step acc) 
  step acc (Chunk ls)  = ie_contM (step $! Prelude.take n $ reverse ls ++ acc)
  step acc stream  = ie_doneM (reverse acc) stream

The initialization of the state, the accumulation and the finalization are all
in the same place. The en_lastN iteratee does not care of counting,
searching, or whatever else is going on. In fact, the iteratee is pure
and deals with arbitrary elements (not necessarily lines). We can test
it using Quickcheck or HUnit. No IO is needed for the test.

Thus in the iteratee style, the overall state of the program becomes
distributed, across 

[Haskell-cafe] compare iteratee with python's yield

2011-07-01 Thread yi huang
I just read several tutorials on iteratee, i find that iteratee is similar
to python's generator, both allow streamlined data processing. For example,
i can implement enumFile and printChunks in python like this:

EOF = None
def enum_file(bufsize, filename):
with open(filename) as input:
while True:
data = input.read(bufsize)
if not data:
break
yield data
yield EOF

def print_chunks(print_empty, generator):
for chunk in generator:
if chunk==EOF:
print 'EOF'
return
if len(chunk)==0 and not print_empty:
continue
print chunk

print_chunks(True, enum_file(2, data))

But i find iteratee far more complicated than python's generator, is that
because iteratee can do something python's generator can't, or i simply need
to be more familar with functional programming style.



-- 
http://www.yi-programmer.com/blog/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] compare iteratee with python's yield

2011-07-01 Thread Ertugrul Soeylemez
yi huang yi.codepla...@gmail.com wrote:

 I just read several tutorials on iteratee, i find that iteratee is
 similar to python's generator, both allow streamlined data
 processing. For example, i can implement enumFile and printChunks in
 python like this:

 EOF = None
 def enum_file(bufsize, filename):
 with open(filename) as input:
 while True:
 data = input.read(bufsize)
 if not data:
 break
 yield data
 yield EOF

 def print_chunks(print_empty, generator):
 for chunk in generator:
 if chunk==EOF:
 print 'EOF'
 return
 if len(chunk)==0 and not print_empty:
 continue
 print chunk

 print_chunks(True, enum_file(2, data))

 But i find iteratee far more complicated than python's generator, is
 that because iteratee can do something python's generator can't, or i
 simply need to be more familar with functional programming style.

I don't know Python very well, but I suspect that its generators are
really a sort of coroutines.  Iteratees are also coroutines, but their
architecture is quite different.

The difference is that conceptually an iteratee does not know about its
input.  In Python the generator stops to wait for the iterator to
request more input:  The consumer talks to the producer.  This control
flow is turned inside out in iteratees, where the iteratee stops to wait
for the enumerator to provide more input.  The producer talks to the
consumer in iteratees.  This is a conceptual difference, so what's the
advantage?

The main advantage of iteratees, compared to generators, can be seen in
a statically typed language such as Haskell.  Let's say that instead of
printing the input lines your consumer would instead just calculate its
length and return it.  Let's call this consumer 'length'.  If you would
translate generators to Haskell, you would find that your 'length'
consumer would suddenly include a MonadIO constraint, even though it
doesn't need it.

With iteratees' inversion of control only the part which needs the
MonadIO constraint really has it.  An enumerator is really a function
from an iteratee to an iteratee.  It converts an arbitrary iteratee to
an iteratee with additional input, adding the constraints necessary to
fulfill this task.  While with the generator concept you apply a
producer to a consumer, with iteratees you apply a consumer to an
producer.

Hope that helps.


Greets,
Ertugrul


-- 
nightmare = unsafePerformIO (getWrongWife = sex)
http://ertes.de/



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] compare iteratee with python's yield

2011-07-01 Thread Casey McCann
On Fri, Jul 1, 2011 at 6:01 AM, Ertugrul Soeylemez e...@ertes.de wrote:
 I don't know Python very well, but I suspect that its generators are
 really a sort of coroutines.  Iteratees are also coroutines, but their
 architecture is quite different.

Python generators were originally a sort of heavily restricted
coroutine mostly used to implement corecursive sequences, i.e. what we
use lazy lists for Haskell. As Python allows arbitrary side-effects,
this makes them pretty directly equivalent to using lazy I/O in
Haskell. They were later extended to be more like full coroutines,
allowing them to both generate and consume data. I imagine that
something akin to iteratees could be built on top of the
coroutine-style extended generators, but it would likely be more
reliant on the programmer not using things incorrectly and the benefit
of the whole idea is unclear in this context (for the reasons outlined
in the rest of your message).

- C.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe