Daniel Lescohier <daniel.lescoh...@cbs.com> added the comment: OK, I think I see where I went wrong in my perceptions of the file protocol. I thought that readlines() returned an iterator, not a list, but I see in the library reference manual on File Objects that it returns a list. I think I got confused because there is no equivalent of __iter__ for writing to streams. For input, I'm always using 'for line in file_object' (in other words, file_object.__iter__), so I had assumed that writelines was the mirror image of that, because I never use the readlines method. Then, in my mind, readlines became the mirror image of writelines, which I had assumed took an iterator, so I assumed that readlines returned an iterator. I wonder if this perception problem is common or not.
So, the StreamWriter interface matches the file protocol; readlines() and writelines() deal with lists. There shouldn't be any change to it, because it follows the protocol. Then, the example I wrote would be instead: rows = (line[:-1].split('\t') for line in in_file) projected = (keep_fields(row, 0, 3, 7) for row in rows) filtered = (row for row in projected if row[2]=='1') formatted = (u'\t'.join(row)+'\n' for row in filtered) write = out_file.write for line in formatted: write(line) I think it's correct that the file object write C code only does 1000-line chunks for sequences that have a defined length: if it has a defined length, then that implies that the data exists now, and can be concatenated and written now. Something without a defined length may be a generator with items arriving later. ---------- message_count: 6.0 -> 7.0 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue5445> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com