[issue1503789] Cannot write source code in UTF16

STINNER Victor Tue, 24 Mar 2009 15:01:01 -0700

STINNER Victor <[email protected]> added the comment:

Attached patch is a partial fix: support UTF-16-LE, UTF-16-BE and 
UTF-32-LE. Some remarks about my patch:
 * UTF-32-BE is not supported because I'm too lazy tonigh 
   to finish the patch and because such file begins with 0x00 0x00
   whereas the parser doesn't like nul bytes
 * I disabled the cookie check if the file starts with a BOM (the
   cookie is ignored) because the charset name is not normalized
   and so if the cookie is not exactly the same as the hardcoded
   charset name (eg. "UTF-16LE"), the test will fail. 
   Eg "utf-16le" != "UTF-16LE" :-(
 * compile() would require much more effort to support UTF-16-* 
   and UTF-32-* because compile() simply rejects any string with 
   nul byte. It's beause it uses functions like strlen() :-/ That's
   why I use subprocess([sys.executable, ...]) in the unit test and
   not simply compile()


Support UTF-{16,32}-{LE,BE} would be nice but it requires to hack to 
parser (especially compile() builtin function) to support nul bytes...

----------
keywords: +patch
Added file: http://bugs.python.org/file13409/tokenizer_bom.patch

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue1503789>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1503789] Cannot write source code in UTF16

Reply via email to