>> Andrew McNamara wrote: >>> There's a bunch of jobs we (CSV module maintainers) have been putting >>> off - attached is a list (in no particular order): >>> * unicode support (this will probably uglify the code considerably). >> >Martin v. Löwis wrote: >> Can you please elaborate on that? What needs to be done, and how is >> that going to be done? It might be possible to avoid considerable >> uglification.
I'm not altogether sure there. The parsing state machine is all written in C, and deals with signed chars - I expect we'll need two versions of that (or one version that's compiled twice using pre-processor macros). Quite a large job. Suggestions gratefully received. M.-A. Lemburg wrote: >Indeed. The trick is to convert to Unicode early and to use Unicode >literals instead of string literals in the code. Yes, although it would be nice to also retain the 8-bit versions as well. >Note that the only real-life Unicode format in use is UTF-16 >(with BOM mark) written by Excel. Note that there's no standard >for specifying the encoding in CSV files, so this is also the only >feasable format. Yes - that's part of the problem I hadn't really thought about yet - the csv module currently interacts directly with files as iterators, but it's clear that we'll need to decode as we go. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com