>I added UnicodeReader and UnicodeWriter example classes to the csv module >docs just now. They mention problems with ASCII NUL characters (which I >vaguely remember - NUL-terminated strings are used internally, right?). Do >NULs still present a problem? I saw nothing in the log messages that >mentioned "ascii" or "nul" so I presume it is.
That's right - it still uses null terminated strings internally, and the various special characters (quotechar, escapechar, etc) use null to mean "not specified". Fixing this would cause much upheaval. >Here's what I added. Let me know if you think it needs any corrections, >especially if there's a better way to word "as long as you avoid encodings >like utf-16 that use NULs". Can that just be "as long as you avoid >multi-byte encodings other than utf-8"? I think only utf-8 provides the guarantees needed for this to work - specifically, multi-byte characters need to have the high bit set (otherwise a delimiter or other special character appearing within a multi-byte character would upset the parsing), while at the same time having single byte characters for the characters with special meaning to the parser: note also that none of the special characters (quotechar, delimiter, escapechar, etc) can be a multi-byte sequence. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com