There's a bunch of jobs we (CSV module maintainers) have been putting off - attached is a list (in no particular order):
* unicode support (this will probably uglify the code considerably). * 8 bit transparency (specifically, allow \0 characters in source string and as delimiters, etc). * Reader and universal newlines don't interact well, reader doesn't honour Dialect's lineterminator setting. All outstanding bug id's (789519, 944890, 967934 and 1072404) are related to this - it's a difficult problem and further discussion is needed. * compare PEP-305 and library reference manual to the module as implemented and either document the differences or correct them. * Address or document Francis Avila's issues as mentioned in this posting: http://www.google.com.au/groups?selm=vsb89q1d3n5qb1%40corp.supernews.com * Several blogs complain that the CSV module is no good for parsing strings. Suggest making it clearer in the documentation that the reader accepts an iterable, rather than a file, and document why an iterable (as opposed to a string) is necessary (multi-line records with embedded newlines). We could also provide an interface that parses a single string (or the old Object Craft interface) for those that really feel the need. See: http://radio.weblogs.com/0124960/2003/09/12.html http://zephyrfalcon.org/weblog/arch_d7_2003_09_06.html#e335 * Compatability API for old Object Craft CSV module? http://mechanicalcat.net/cgi-bin/log/2003/08/18 For example: "from csv.legacy import reader" or something. * Pure python implementation? * Some CSV-like formats consider a quoted field a string, and an unquoted field a number - consider supporting this in the Reader and Writer. See: http://radio.weblogs.com/0124960/2004/04/23.html * Add line number and record number counters to reader object? * it's possible to get the csv parser to suck the whole source file into memory with an unmatched quote character. Need to limit size of internal buffer. Also, review comments from Neal Norwitz, 22 Mar 2003 (some of these should already have been addressed): * remove TODO comment at top of file--it's empty * is CSV going to be maintained outside the python tree? If not, remove the 2.2 compatibility macros for: PyDoc_STR, PyDoc_STRVAR, PyMODINIT_FUNC, etc. * inline the following functions since they are used only in one place get_string, set_string, get_nullchar_as_None, set_nullchar_as_None, join_reset (maybe) * rather than use PyErr_BadArgument, should you use assert? (first example, Dialect_set_quoting, line 218) * is it necessary to have Dialect_methods, can you use 0 for tp_methods? * remove commented out code (PyMem_DEL) on line 261 Have you used valgrind on the test to find memory overwrites/leaks? * PyString_AsString()[0] on line 331 could return NULL in which case you are dereferencing a NULL pointer * note sure why there are casts on 0 pointers lines 383-393, 733-743, 1144-1154, 1164-1165 * Reader_getiter() can be removed and use PyObject_SelfIter() * I think you need PyErr_NoMemory() before returning on line 768, 1178 * is PyString_AsString(self->dialect->lineterminator) on line 994 guaranteed not to return NULL? If not, it could crash by passing to memmove. * PyString_AsString() can return NULL on line 1048 and 1063, the result is passed to join_append() * iteratable should be iterable? (line 1088) * why doesn't csv_writerows() have a docstring? csv_writerow does * any PyUnicode_* methods should be protected with #ifdef Py_USING_UNICODE * csv_unregister_dialect, csv_get_dialect could use METH_O so you don't need to use PyArg_ParseTuple * in init_csv, recommend using PyModule_AddIntConstant and PyModule_AddStringConstant where appropriate Also, review comments from Jeremy Hylton, 10 Apr 2003: I've been reviewing extension modules looking for C types that should participate in garbage collection. I think the csv ReaderObj and WriterObj should participate. The ReaderObj it contains a reference to input_iter that could be an arbitrary Python object. The iterator object could well participate in a cycle that refers to the ReaderObj. The WriterObj has a reference to a writeline callable, which could well be a method of an object that also points to the WriterObj. The Dialect object appears to be safe, because the only PyObject * it refers should be a string. Safe until someone creates an insane string subclass <0.4 wink>. Also, an unrelated comment about the code, the lineterminator of the Dialect is managed by a collection of little helper functions like get_string, set_string, etc. This code appears to be excessively general; since they're called only once, it seems clearer to inline the logic directly in the get/set methods for the lineterminator. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com