New submission from Ryan McGuire <python....@enigmacurry.com>: Opening a UTF-8 encoded file with unix newlines ("\n") on Win32:
codecs.open("whatever.txt","r","utf-8").read() replaces the newlines ("\n") with CR+LF ("\r\n"). The docs specifically say that : "Files are always opened in binary mode, even if no binary mode was specified. This is done to avoid data loss due to encodings using 8-bit values. This means that no automatic conversion of '\n' is done on reading and writing." And yet, opening the file with an explicit binary mode resolves the situation: codecs.open("whatever.txt","rb","utf-8").read() This reads the file with the original newlines unmodified. The implementation of codecs.open and the documentation are out of sync. ---------- assignee: georg.brandl components: Documentation, Library (Lib) messages: 91995 nosy: EnigmaCurry, georg.brandl severity: normal status: open title: codecs.open on Win32 does not force binary mode type: behavior versions: Python 2.6, Python 3.1 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue6788> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com