Caleb Hattingh wrote: > What does ".readlines()" do differently that makes it so much slower > than ".read().splitlines(True)"? To me, the "one obvious way to do it" > is ".readlines()".
readlines reads 100 bytes (at most) at a time. I'm not sure why it does that (probably in order to not read further ahead than necessary to get a line (*)), but for gzip, that is terribly inefficient. I believe the gzip algorithms use a window size much larger than that - not sure how the gzip library deals with small reads. One interpretation would be that gzip decompresses the current block over an over again if the caller only requests 100 bytes each time. This is a pure guess - you would need to read the zlib source code to find out. Anyway, decompressing the entire file at one lets zlib operate at the highest efficiency. Regards, Martin (*) Guessing further, it might be that "read a lot" fails to work well on a socket, as you would have to wait for the complete data before even returning the first line. P.S. Contributions to improve this are welcome. -- http://mail.python.org/mailman/listinfo/python-list