RE: [Python-Dev] Re: [Csv] csv module TODO list
Raymond> Would the csv module be a good place to add a DBF reader and Raymond> writer? Not really. Raymond> I've posted a draft on ASPN. It interoperates well with the Raymond> rest of the CSV module because it also accepts/returns a list Raymond> of fieldnames and a sequence of records. Raymond> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/362715 Just clean it up (do the doco thing) and check it in as dbf.py with reader and writer functions. I see your modus operandi at work: code something in Python then switch to C when nobody's looking. ;-) Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: [Csv] csv module TODO list
>Would the csv module be a good place to add a DBF reader and writer? I would have thought it would make sense as it's own module (in the same way that we have separate modules that present a common interface for the different databases), or am I missing something? I'd certainly like to see a DBF parser in python - reading and writing odd file formats is bread-and-butter for us contractors... 8-) -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
RE: [Python-Dev] Re: [Csv] csv module TODO list
Would the csv module be a good place to add a DBF reader and writer? Dbase's dbf file format is one of the oldest, simplest and more common database interchange formats. It can be a good alternative to CSV as a means of sharing data with pre-existing, non-python apps. On the plus side, it has a precise spec, can preserve numeric and date types, has guaranteed round-trip equivalence, and does not have weird escape rules. On the minus side, strings are limited to ASCII without NULs and the fields are fixed length. I've posted a draft on ASPN. It interoperates well with the rest of the CSV module because it also accepts/returns a list of fieldnames and a sequence of records. http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/362715 Raymond Hettinger ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: [Csv] csv module TODO list
On Wed, Jan 05, 2005 at 10:08:49PM +1100, Andrew McNamara wrote: > >Also, review comments from Neal Norwitz, 22 Mar 2003 (some of these should > >already have been addressed): > > I should apologise to Neal here for not replying to him at the time. Hey, I'm impressed you got to them. :-) I completely forgot about it. > >* rather than use PyErr_BadArgument, should you use assert? > >(first example, Dialect_set_quoting, line 218) > > You mean C assert()? I don't think I'm really following you here - > where would the type of the object be checked in a way the user could > recover from? IIRC, I meant C assert(). This goes back to a discussion a long time ago about what is the preferred way to handle invalid arguments. I doubt it's important to change. > >* I think you need PyErr_NoMemory() before returning on line 768, 1178 > > The examples I looked at in the Python core didn't do this - are you sure? > (now lines 832 and 1280). Originally, they were a plain PyObject_NEW(). Now they are a PyObject_GC_New() so it seems no further change is necessary. > >* is PyString_AsString(self->dialect->lineterminator) on line 994 > >guaranteed not to return NULL? If not, it could crash by > >passing to memmove. > >* PyString_AsString() can return NULL on line 1048 and 1063, > >the result is passed to join_append() > > Looking at the PyString_AsString implementation, it looks safe (we ensure > it's really a string elsewhere)? Ok. Then it should be fine. I spot checked lineterminator and it looked ok. > >* iteratable should be iterable? (line 1088) > > Sorry, I don't know what you're getting at here? (now line 1162). Heh, I had to read that twice myself. It was a typo (assuming I wasn't completely wrong)--an extra "at", but it doesn't exist any longer. I don't think there are any changes remaining to be done from my original code review. BTW, I always try to run valgrind before a release, especially major releases. Neal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: [Csv] csv module TODO list
[Andrew McNamara] >> Also, review comments from Jeremy Hylton, 10 Apr 2003: >> >>I've been reviewing extension modules looking for C types that should >>participate in garbage collection. I think the csv ReaderObj and >>WriterObj should participate. The ReaderObj it contains a reference to >>input_iter that could be an arbitrary Python object. The iterator >>object could well participate in a cycle that refers to the ReaderObj. >>The WriterObj has a reference to a writeline callable, which could well >>be a method of an object that also points to the WriterObj. > I finally got around to looking at this, only to realise Jeremy did the > work back in Apr 2003 (thanks). One question, however - the GC doco in > the Python/C API seems to suggest to me that PyObject_GC_Track should be > called on the newly minted object prior to returning from the initialiser > (and correspondingly PyObject_GC_UnTrack should be called prior to > dismantling). This isn't being done in the module as it stands. Is the > module wrong, or is my understanding of the reference manual incorrect? The purpose of "tracking" and "untracking" is to let cyclic gc know when it (respectively) is and isn't safe to call an object's tp_traverse method. Primarily, when an object is first created at the C level, it may contain NULLs or heap trash in pointer slots, and then the object's tp_traverse could segfault if it were called while the object remained in an insane (wrt tp_traverse) state. Similarly, cleanup actions in the tp_dealloc may make a tp_traverse-sane object tp_traverse-insane, so tp_dealloc should untrack the object before that occurs. If tracking is never done, then the object effectively never participates in cyclic gc: its tp_traverse will never get called, and it will effectively act as an external root (keeping itself and everything reachable from it alive). So, yes, track it during construction, but not before all the members referenced by its tp_traverse are in a sane state. Putting the track call "at the end" of the constructor is usually best practice. tp_dealloc should untrack it then. In a debug build, that will assert-fail if the object hasn't actually been tracked. PyObject_GC_Del will untrack it for you (if it's still tracked), but it's risky to rely on that -- it's too easy to forget that Py_DECREFs on contained objects can end up executing arbitrary Python code (via __del__ and weakref callbacks, and via allowing other threads to run), which can in turn trigger a round of cyclic gc *while* your tp_dealloc is still running. So it's safest to untrack the object very early in tp_dealloc. I doubt this happens in the csv module, but an untrack/track pair should also be put around any block of method code that temporarily puts the object into a tp_traverse-insane state and that contains any C API calls that may end up triggering cyclic gc. That's very rare. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: [Csv] csv module TODO list
>There's a bunch of jobs we (CSV module maintainers) have been putting >off - attached is a list (in no particular order): [...] >Also, review comments from Jeremy Hylton, 10 Apr 2003: > >I've been reviewing extension modules looking for C types that should >participate in garbage collection. I think the csv ReaderObj and >WriterObj should participate. The ReaderObj it contains a reference to >input_iter that could be an arbitrary Python object. The iterator >object could well participate in a cycle that refers to the ReaderObj. >The WriterObj has a reference to a writeline callable, which could well >be a method of an object that also points to the WriterObj. I finally got around to looking at this, only to realise Jeremy did the work back in Apr 2003 (thanks). One question, however - the GC doco in the Python/C API seems to suggest to me that PyObject_GC_Track should be called on the newly minted object prior to returning from the initialiser (and correspondingly PyObject_GC_UnTrack should be called prior to dismantling). This isn't being done in the module as it stands. Is the module wrong, or is my understanding of the reference manual incorrect? -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: [Csv] csv module TODO list
>> * is CSV going to be maintained outside the python tree? >> If not, remove the 2.2 compatibility macros for: PyDoc_STR, >> PyDoc_STRVAR, PyMODINIT_FUNC, etc. Andrew> Does anyone thing we should continue to maintain this 2.2 Andrew> compatibility? With the release of 2.4, 2.2 has officially dropped off the radar screen, right (zero probability of a 2.2.n+1 release, though the probability was vanishingly small before). I'd say toss it. Do just that in a single checkin so someone who's interested can do a simple cvs diff to yield an initial patch file for external maintenance of that feature. >> * inline the following functions since they are used only in one >> place get_string, set_string, get_nullchar_as_None, >> set_nullchar_as_None, join_reset (maybe) Andrew> It was done that way as I felt we would be adding more getters Andrew> and setters to the dialect object in future. The only new dialect attribute I envision is an encoding attribute. >> * is it necessary to have Dialect_methods, can you use 0 for tp_methods? Andrew> I was assuming I would need to add methods at some point (in Andrew> fact, I did have methods, but removed them). Dialect objects are really just data containers, right? I don't see that they would need any methods. >> * remove commented out code (PyMem_DEL) on line 261 >> Have you used valgrind on the test to find memory overwrites/leaks? Andrew> No, valgrind wasn't used. I have it here at work. I'll try to find a few minutes to run the csv tests under valgrind's control. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
RE: [Python-Dev] Re: [Csv] csv module TODO list
Skip Montanaro wrote: > Andrew> There's a bunch of jobs we (CSV module > maintainers) have been > Andrew> putting off - attached is a list (in no particular order): > > ... > > In addition, it occurred to me this evening that there's > functionality in the csv module I don't think anybody uses. > ... > I'm also not aware that anyone really uses the Sniffer class, > though it does provide some useful functionality should you > need to analyze random CSV files. I used Sniffer quite heavily for my last contract. The client had multiple multigig csv's which needed deduplicating, but they were all from different sources and therefore in different formats. It would have cost me many more hours without the Sniffer. Please keep it. <:) Robert Brewer MIS Amor Ministries [EMAIL PROTECTED] ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: [Csv] csv module TODO list
>Also, review comments from Neal Norwitz, 22 Mar 2003 (some of these should >already have been addressed): I should apologise to Neal here for not replying to him at the time. Okay, going though the issues Neal raised... >* remove TODO comment at top of file--it's empty Was fixed. >* is CSV going to be maintained outside the python tree? > If not, remove the 2.2 compatibility macros for: > PyDoc_STR, PyDoc_STRVAR, PyMODINIT_FUNC, etc. Does anyone thing we should continue to maintain this 2.2 compatibility? >* inline the following functions since they are used only in one place >get_string, set_string, get_nullchar_as_None, set_nullchar_as_None, >join_reset (maybe) It was done that way as I felt we would be adding more getters and setters to the dialect object in future. >* rather than use PyErr_BadArgument, should you use assert? >(first example, Dialect_set_quoting, line 218) You mean C assert()? I don't think I'm really following you here - where would the type of the object be checked in a way the user could recover from? >* is it necessary to have Dialect_methods, can you use 0 for tp_methods? I was assuming I would need to add methods at some point (in fact, I did have methods, but removed them). >* remove commented out code (PyMem_DEL) on line 261 >Have you used valgrind on the test to find memory overwrites/leaks? No, valgrind wasn't used. >* PyString_AsString()[0] on line 331 could return NULL in which case >you are dereferencing a NULL pointer Was fixed. >* note sure why there are casts on 0 pointers >lines 383-393, 733-743, 1144-1154, 1164-1165 To make it easier when the time comes to add one of those members. >* Reader_getiter() can be removed and use PyObject_SelfIter() Okay, wasn't aware of PyObject_SelfIter - will fix. >* I think you need PyErr_NoMemory() before returning on line 768, 1178 The examples I looked at in the Python core didn't do this - are you sure? (now lines 832 and 1280). >* is PyString_AsString(self->dialect->lineterminator) on line 994 >guaranteed not to return NULL? If not, it could crash by >passing to memmove. >* PyString_AsString() can return NULL on line 1048 and 1063, >the result is passed to join_append() Looking at the PyString_AsString implementation, it looks safe (we ensure it's really a string elsewhere)? >* iteratable should be iterable? (line 1088) Sorry, I don't know what you're getting at here? (now line 1162). >* why doesn't csv_writerows() have a docstring? csv_writerow does Was fixed. >* any PyUnicode_* methods should be protected with #ifdef Py_USING_UNICODE Was fixed. >* csv_unregister_dialect, csv_get_dialect could use METH_O >so you don't need to use PyArg_ParseTuple Was fixed. >* in init_csv, recommend using >PyModule_AddIntConstant and PyModule_AddStringConstant >where appropriate Was fixed. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: [Csv] csv module TODO list
>Andrew> There's a bunch of jobs we (CSV module maintainers) have been >Andrew> putting off - attached is a list (in no particular order): >... > >In addition, it occurred to me this evening that there's functionality in >the csv module I don't think anybody uses. It's very difficult to say for sure that nobody is using it once it's released to the world. >For example, you can register CSV dialects by name, then pass in the >string name instead of the dialect class. I'd be in favor of scrapping >list_dialects, register_dialect and unregister_dialect altogether. While >they are probably trivial little functions I don't think they add much if >anything to the implementation and just complicate the _csv extension >module slightly. Yes, in hindsight, they're not really necessary, although I'm sure we had some motivation for them initially. That said, they're there now, and they shouldn't require much maintenance. >I'm also not aware that anyone really uses the Sniffer class, though it >does provide some useful functionality should you need to analyze random >CSV files. The comment I get repeatedly is that they don't use it because it's "too magic/scary". That's as it should be. But if it didn't exist, then someone would be requesting we add it... 8-) -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: [Csv] csv module TODO list
Andrew> There's a bunch of jobs we (CSV module maintainers) have been Andrew> putting off - attached is a list (in no particular order): ... In addition, it occurred to me this evening that there's functionality in the csv module I don't think anybody uses. For example, you can register CSV dialects by name, then pass in the string name instead of the dialect class. I'd be in favor of scrapping list_dialects, register_dialect and unregister_dialect altogether. While they are probably trivial little functions I don't think they add much if anything to the implementation and just complicate the _csv extension module slightly. I'm also not aware that anyone really uses the Sniffer class, though it does provide some useful functionality should you need to analyze random CSV files. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com