Antoine Pitrou pit...@free.fr added the comment:
Marking this as a duplicate of #4565 Rewrite the IO stack in C.
--
resolution: - duplicate
status: open - closed
superseder: - Rewrite the IO stack in C
___
Python tracker rep...@bugs.python.org
Antoine Pitrou pit...@free.fr added the comment:
We can't solve this for 3.0.1, downgrading to critical.
--
priority: release blocker - critical
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4561
Changes by Martin v. Löwis mar...@v.loewis.de:
--
priority: deferred blocker - release blocker
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4561
___
Amaury Forgeot d'Arc amaur...@gmail.com added the comment:
The previous implementation only returns bytes and does not translate
newlines. For this particular case, indeed, the plain old FILE* based
object is faster.
--
nosy: +amaury.forgeotdarc
___
Antoine Pitrou pit...@free.fr added the comment:
I know that as hard as it might be for
everyone to believe, there are a lot of people who crank lots of non-
Unicode data with Python.
But cranking data implies you'll do something useful with it, and
therefore spend CPU time doing those
David M. Beazley beaz...@users.sourceforge.net added the comment:
I wish I shared your optimism about this, but I don't. Here's a short
explanation why.
The problem of I/O and the associated interface between hardware, the
operating system kernel, and user applications is one of the most
Antoine Pitrou pit...@free.fr added the comment:
I seem to recall one of the design principles of the new IO stack was to
avoid relying on the C stdlib's buffered API, which has too many
platform-dependant behaviours.
In any case, binary reading has acceptable performance in py3k (although
Antoine Pitrou pit...@free.fr added the comment:
I don't agree that that was a worthy design goal.
I don't necessarily agree either, but it's probably too late now.
The py3k buffered IO object has additional methods (e.g. peek(),
read1()) which can be used by upper layers (text IO) and so
David M. Beazley beaz...@users.sourceforge.net added the comment:
Good luck with that. Most people who get bright ideas such as gee,
maybe I'll write my own version of X where X is some part of the
standard C library pertaining to I/O, end up fighting a losing battle.
Of course, I'd love
David M. Beazley beaz...@users.sourceforge.net added the comment:
I agree with Raymond. For binary reads, I'll go farther and say that
even a 10% slowdown in performance would be surprising if not
unacceptable to some people. I know that as hard as it might be for
everyone to believe,
Antoine Pitrou pit...@free.fr added the comment:
[...]
Although I agree all this is important, I'd challenge the assumption it
has its place in the buffered IO library rather than in lower-level
layers (i.e. kernel userspace unbuffered IO).
In any case, it will be difficult to undo the
Christian Heimes li...@cheimes.de added the comment:
David:
Amaury's work is going to be a part of the standard library as soon as
his work is done. I'm confident that we can reach the old speed of the
2.x file type by carefully moving code to C modules.
___
Antoine Pitrou pit...@free.fr added the comment:
I've written a small file IO benchmark, available here:
http://svn.python.org/view/sandbox/trunk/iobench/
It runs under both 2.6 and 3.x, so that we can compare speeds of
respective implementations.
___
Python
Antoine Pitrou pit...@free.fr added the comment:
Without Christian's patch:
[400KB.txt] read one byte/char at a time... 0.2685 MB/s (100% CPU)
[400KB.txt] read 20 bytes/chars at a time... 4.536 MB/s (98% CPU)
[400KB.txt] read one line at a time...3.805 MB/s
Raymond Hettinger rhettin...@users.sourceforge.net added the comment:
I'm getting caught-up with the IO changes in 3.0 and am a bit confused.
The PEP says, programmers who don't want to muck about in the new I/O
world can expect that the open() factory method will produce an object
Antoine Pitrou pit...@free.fr added the comment:
Christian, by benchmarks I meant a measurement of text reading with and
without the patch.
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4561
___
Changes by Martin v. Löwis [EMAIL PROTECTED]:
--
priority: release blocker - deferred blocker
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4561
___
Changes by Ismail Donmez [EMAIL PROTECTED]:
--
nosy: +cartman
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4561
___
___
Python-bugs-list mailing list
Changes by Winfried Plappert [EMAIL PROTECTED]:
--
nosy: +wplappert
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4561
___
___
Python-bugs-list mailing
New submission from Christian Heimes [EMAIL PROTECTED]:
The new io library needs some serious profiling and optimization work.
I've already fixed a severe slowdown in _fileio.FileIO's read buffer
allocation algorithm (#4533).
More profiling tests have shown a speed problem in write() files
David M. Beazley [EMAIL PROTECTED] added the comment:
I've done some profiling and the performance of reading line-by-line is
considerably worse in Python 3 than in Python 2. For example, this
code:
for line in open(somefile.txt):
pass
Ran 35 times slower in Python 3.0 than Python 2.6
Christian Heimes [EMAIL PROTECTED] added the comment:
Your issue is most like caused by #4533. Please download the latest svn
version of Python 3.0 (branches/release30_maint) and try again.
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4561
Christian Heimes [EMAIL PROTECTED] added the comment:
Here is a patch againt the py3k branch that reduces the time for the
line ending detection from 0.55s to 0.22s for a 50MB file on my test
system.
--
keywords: +patch
Added file:
David M. Beazley [EMAIL PROTECTED] added the comment:
Tried this using projects/python/branches/release30-maint and using the
patch that was just attached. With a 66MB input file, here are the
results of this code fragment:
for line in open(BIGFILE):
pass
Python 2.6: 0.67s
Python 3.0:
David M. Beazley [EMAIL PROTECTED] added the comment:
Just as one other followup, if you change the code in the last example
to use binary mode like this:
for line in open(BIG,rb):
pass
You get the following results:
Python 2.6: 0.64s
Python 3.0: 42.26s (66 times slower)
Georg Brandl [EMAIL PROTECTED] added the comment:
David, the reading bug fix/optimization is not (yet?) on
release30-maint, only on branches/py3k.
--
nosy: +georg.brandl
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4561
David M. Beazley [EMAIL PROTECTED] added the comment:
Just checked it with branches/py3k and the performance is the same.
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4561
___
David M. Beazley [EMAIL PROTECTED] added the comment:
bash-3.2$ uname -a
Darwin david-beazleys-macbook.local 9.5.1 Darwin Kernel Version 9.5.1: Fri
Sep 19 16:19:24 PDT 2008; root:xnu-1228.8.30~1/RELEASE_I386 i386
bash-3.2$ ./python.exe -c import sys; print(sys.version)
3.1a0 (py3k:67609, Dec 6
Changes by Giampaolo Rodola' [EMAIL PROTECTED]:
--
nosy: +giampaolo.rodola
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4561
___
___
Python-bugs-list
Changes by Antoine Pitrou [EMAIL PROTECTED]:
--
nosy: +pitrou
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4561
___
___
Python-bugs-list mailing list
Changes by Christian Heimes [EMAIL PROTECTED]:
Removed file: http://bugs.python.org/file12248/count_linenendings.patch
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4561
___
Antoine Pitrou [EMAIL PROTECTED] added the comment:
I don't think this is a public API, so the function should probably be
renamed _count_lineendings.
Also, are there some benchmark numbers?
___
Python tracker [EMAIL PROTECTED]
Christian Heimes [EMAIL PROTECTED] added the comment:
I'll come up with some reading benchmarks tomorrow. For now here is a
benchmark of write(). You can clearly see the excessive usage of closed,
len() and isinstance().
Added file: http://bugs.python.org/file12256/test_write.log
Changes by Christian Heimes [EMAIL PROTECTED]:
Removed file: http://bugs.python.org/file12256/test_write.log
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4561
___
Christian Heimes [EMAIL PROTECTED] added the comment:
Roundup doesn't display .log files as plain text files.
Added file: http://bugs.python.org/file12257/test_write.txt
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4561
Changes by Barry A. Warsaw [EMAIL PROTECTED]:
--
priority: high - release blocker
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4561
___
___
36 matches
Mail list logo