Jack Diederich <jackd...@gmail.com> added the comment:

I tried passing a size to readline to see if increasing the chunk helps
(test file was 120meg with 700k lines).  For values 1k-10k all took
around 30 seconds, with a value of 100 it took 80 seconds, with a value
of 100k it ran for several minutes before I killed it.  The default
starts at 100 and quickly maxes to 512, which seems to be a sweet spot
(thanks whomever figured that out!).

I profiled it and function overhead seems to be the real killer.  30% of
the time is spent in readline().  The next() function does almost
nothing and consumes 1/4th the time of readline().  Ditto for read() and
_unread().  Even lowly len() consumes 1/3rd the time of readline()
because it is called over 2million times.

There doesn't seem to be any way to speed this up without rewriting the
whole thing as a C module.  I'm closing the bug WONTFIX.

----------
nosy: +jackdied
resolution:  -> wont fix
status: open -> closed

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue7471>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to