Frank van Dijk added the comment:
Marc-Andre Lemburg added the comment:
Pointing people to io.open() as alternative to codecs.open() is a good idea,
but that doesn't make codecs.open() less useful.
The reason why codecs.open() uses binary mode is to avoid issues with
automatic newline conversion getting in the way of the file's encoding. Think
of e.g. UTF-16 encoded files that use newlines.
disabling text mode on the underlying file handle to keep a UTF-16 code unit
like 0x010a from getting mangled works, but no newline conversion is a high
price to pay. Newline conversion should (conceptually) be done before encoding
and after decoding. io.open() does it right.
Note that codecs allow handling newlines on a line-by-line bases via the
.readline() keepends parameter, so issues with Windows vs. Unix can be worked
around explicitly. Since default is to keep line ends, no data loss occurs
and application code can deal with line ends as it sees fit.
Trouble is, your average python coder won't do exhaustive research on the pros
and cons of the various options for I/O available and the pros and cons of
dealing with platform differences at the application level. They'll just use
the open() builtin, then realize they need utf-8 output or whatever, google
python write utf-8 or browse the unicode HOWTO, see a very familiar looking
API and assume it'll behave just like open()
As it stands, I'm -1 on this patch, but would be +1 on mentioning io.open()
as alternative to codecs.open() with a slightly different approach to line
ends.
What would that mean concretely ? Undoing the change to the unicode HOWTO and
instead adding a remark along the lines of The codecs.open() function does not
have the automatic newline conversion features that the builtin open() function
provides to make reading and writing text files platform independent. If you
need automatic newline conversion for the Unicode data you read and write,
consider using io.open() instead. ?
I could live with that.
I don't think it's useful to tell people:
* use codecs.open() on Python 2.4, 2.5, 2.6
* use io.open() on Python 2.7 (io is too slow on 2.6 to be a real alternative
to codecs.open())
* use open() on Python 3.4+
The unicode HOWTO already recommends open() on all 3.x versions of the
documentation at docs.python.org.
If you run 2.4 and 2.5 and you're adding new python software to your ancient
system without upgrading python itself the only thing that could happen is that
you'll get a clear-cut error if that new software imports io.
I can't judge how much of a problem slowness of the io module is in 2.6 or how
much 'market share' 2.6 has left, but I'll note that correctness trumps
performance. I'll also note that we're not changing any code here, nor will
there be a rush of coders racing to get their existing apps and frameworks in
line with the new decree.
All we're doing is giving average python programmers a better chance to
discover what the drop in replacement for open() is or why that helpful tip
found on the interwebs left them with a subtly mangled text file that looks
really weird in notepad and makes git complain.
codecs.open() works the same across all these Python versions.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22128
___
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22128
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com