Paul Rubin <http://[EMAIL PROTECTED]> wrote:
> I just had to write some programs that crunched a lot of large files, > both text and binary. As I use iterators more I find myself wishing > for some maybe-obvious enhancements: > > 1. File iterator for blocks of chars: > > f = open('foo') > for block in f.iterchars(n=1024): ... > > iterates through 1024-character blocks from the file. The default iterator > which loops through lines is not always a good choice since each line can > use an unbounded amount of memory. Default n in the above should be 1 char. the simple way (letting the file object deal w/buffering issues): def iterchars(f, n=1): while True: x = f.read(n) if not x: break yield x the fancy way (doing your own buffering) is left as an exercise for the reader. I do agree it would be nice to have in some module. > 2. wrapped file openers: > There should be functions (either in itertools, builtins, the sys > module, or whereever) that open a file, expose one of the above > iterators, then close the file, i.e. > def file_lines(filename): > with f as open(filename): > for line in f: > yield line > so you can say > > for line in file_lines(filename): > crunch(line) > > The current bogus idiom is to say "for line in open(filename)" but > that does not promise to close the file once the file is exhausted > (part of the motivation of the new "with" statement). There should > similarly be "file_chars" which uses the n-chars iterator instead of > the line iterator. I'm +/-0 on this one vs the idioms: with open(filename) as f: for line in f: crunch(line) with open(filename, 'rb') as f: for block in iterchars(f): crunch(block) Making two lines into one is a weak use case for a stdlib function. > 3. itertools.ichain: > yields the contents of each of a sequence of iterators, i.e.: > def ichain(seq): > for s in seq: > for t in s: > yield t > this is different from itertools.chain because it lazy-evaluates its > input sequence. Example application: > > all_filenames = ['file1', 'file2', 'file3'] > # loop through all the files crunching all lines in each one > for line in (ichain(file_lines(x) for x in all_filenames)): > crunch(x) Yes, subtle but important distinction. > 4. functools enhancements (Haskell-inspired): > Let f be a function with 2 inputs. Then: > a) def flip(f): return lambda x,y: f(y,x) > b) def lsect(x,f): return partial(f,x) > c) def rsect(f,x): return partial(flip(f), x) > > lsect and rsect allow making what Haskell calls "sections". Example: > # sequence of all squares less than 100 > from operator import lt > s100 = takewhile(rsect(lt, 100), (x*x for x in count())) Looks like they'd be useful, but I'm not sure about limiting them to working with 2-argument functions only. Alex -- http://mail.python.org/mailman/listinfo/python-list