RE: [Python-Dev] Re: [Csv] csv module TODO list

2005-01-12 Thread Skip Montanaro

Raymond> Would the csv module be a good place to add a DBF reader and
Raymond> writer?

Not really.

Raymond> I've posted a draft on ASPN.  It interoperates well with the
Raymond> rest of the CSV module because it also accepts/returns a list
Raymond> of fieldnames and a sequence of records.

Raymond> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/362715

Just clean it up (do the doco thing) and check it in as dbf.py with reader
and writer functions.

I see your modus operandi at work: code something in Python then switch to C
when nobody's looking. ;-)

Skip

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: [Csv] csv module TODO list

2005-01-11 Thread Andrew McNamara
>Would the csv module be a good place to add a DBF reader and writer?  

I would have thought it would make sense as it's own module (in the same
way that we have separate modules that present a common interface for
the different databases), or am I missing something?

I'd certainly like to see a DBF parser in python - reading and writing odd
file formats is bread-and-butter for us contractors... 8-)

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


RE: [Python-Dev] Re: [Csv] csv module TODO list

2005-01-11 Thread Raymond Hettinger
Would the csv module be a good place to add a DBF reader and writer?  

Dbase's dbf file format is one of the oldest, simplest and more common
database interchange formats.   It can be a good alternative to CSV as a
means of sharing data with pre-existing, non-python apps.

On the plus side, it has a precise spec, can preserve numeric and date
types, has guaranteed round-trip equivalence, and does not have weird
escape rules.  On the minus side, strings are limited to ASCII without
NULs and the fields are fixed length.

I've posted a draft on ASPN.  It interoperates well with the rest of the
CSV module because it also accepts/returns a list of fieldnames and a
sequence of records.

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/362715



Raymond Hettinger

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: [Csv] csv module TODO list

2005-01-10 Thread Neal Norwitz
On Wed, Jan 05, 2005 at 10:08:49PM +1100, Andrew McNamara wrote:
> >Also, review comments from Neal Norwitz, 22 Mar 2003 (some of these should
> >already have been addressed):
> 
> I should apologise to Neal here for not replying to him at the time.

Hey, I'm impressed you got to them.  :-) I completely forgot about it.

> >* rather than use PyErr_BadArgument, should you use assert?
> >(first example, Dialect_set_quoting, line 218)
> 
> You mean C assert()? I don't think I'm really following you here -
> where would the type of the object be checked in a way the user could
> recover from?

IIRC, I meant C assert().  This goes back to a discussion a long time
ago about what is the preferred way to handle invalid arguments.
I doubt it's important to change.

> >* I think you need PyErr_NoMemory() before returning on line 768, 1178
> 
> The examples I looked at in the Python core didn't do this - are you sure?
> (now lines 832 and 1280). 

Originally, they were a plain PyObject_NEW().  Now they are a
PyObject_GC_New() so it seems no further change is necessary.

> >* is PyString_AsString(self->dialect->lineterminator) on line 994
> >guaranteed not to return NULL?  If not, it could crash by
> >passing to memmove.
> >* PyString_AsString() can return NULL on line 1048 and 1063, 
> >the result is passed to join_append()
> 
> Looking at the PyString_AsString implementation, it looks safe (we ensure
> it's really a string elsewhere)?

Ok.  Then it should be fine.  I spot checked lineterminator and it
looked ok.

> >* iteratable should be iterable?  (line 1088)
> 
> Sorry, I don't know what you're getting at here? (now line 1162).

Heh, I had to read that twice myself.  It was a typo (assuming
I wasn't completely wrong)--an extra "at", but it doesn't exist
any longer.

I don't think there are any changes remaining to be done from my
original code review.

BTW, I always try to run valgrind before a release, especially
major releases.

Neal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: [Csv] csv module TODO list

2005-01-07 Thread Tim Peters
[Andrew McNamara]
>> Also, review comments from Jeremy Hylton, 10 Apr 2003:
>>
>>I've been reviewing extension modules looking for C types that should
>>participate in garbage collection.  I think the csv ReaderObj and
>>WriterObj should participate.  The ReaderObj it contains a reference to
>>input_iter that could be an arbitrary Python object.  The iterator
>>object could well participate in a cycle that refers to the ReaderObj.
>>The WriterObj has a reference to a writeline callable, which could well
>>be a method of an object that also points to the WriterObj.

> I finally got around to looking at this, only to realise Jeremy did the
> work back in Apr 2003 (thanks). One question, however - the GC doco in
> the Python/C API seems to suggest to me that PyObject_GC_Track should be
> called on the newly minted object prior to returning from the initialiser
> (and correspondingly PyObject_GC_UnTrack should be called prior to
> dismantling). This isn't being done in the module as it stands. Is the
> module wrong, or is my understanding of the reference manual incorrect?

The purpose of "tracking" and "untracking" is to let cyclic gc know
when it (respectively) is and isn't safe to call an object's
tp_traverse method.  Primarily, when an object is first created at the
C level, it may contain NULLs or heap trash in pointer slots, and then
the object's tp_traverse could segfault if it were called while the
object remained in an insane (wrt tp_traverse) state.  Similarly,
cleanup actions in the tp_dealloc may make a tp_traverse-sane object
tp_traverse-insane, so tp_dealloc should untrack the object before
that occurs.

If tracking is never done, then the object effectively never
participates in cyclic gc:  its tp_traverse will never get called, and
it will effectively act as an external root (keeping itself and
everything reachable from it alive).  So, yes, track it during
construction, but not before all the members referenced by its
tp_traverse are in a sane state.  Putting the track call "at the end"
of the constructor is usually best practice.

tp_dealloc should untrack it then.  In a debug build, that will
assert-fail if the object hasn't actually been tracked. 
PyObject_GC_Del will untrack it for you (if it's still tracked), but
it's risky to rely on that --  it's too easy to forget that Py_DECREFs
on contained objects can end up executing arbitrary Python code (via
__del__ and weakref callbacks, and via allowing other threads to run),
which can in turn trigger a round of cyclic gc *while* your tp_dealloc
is still running.  So it's safest to untrack the object very early in
tp_dealloc.

I doubt this happens in the csv module, but an untrack/track pair
should also be put around any block of method code that temporarily
puts the object into a tp_traverse-insane state and that contains any
C API calls that may end up triggering cyclic gc.  That's very rare.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: [Csv] csv module TODO list

2005-01-06 Thread Andrew McNamara
>There's a bunch of jobs we (CSV module maintainers) have been putting
>off - attached is a list (in no particular order): 
[...]
>Also, review comments from Jeremy Hylton, 10 Apr 2003:
>
>I've been reviewing extension modules looking for C types that should
>participate in garbage collection.  I think the csv ReaderObj and
>WriterObj should participate.  The ReaderObj it contains a reference to
>input_iter that could be an arbitrary Python object.  The iterator
>object could well participate in a cycle that refers to the ReaderObj.
>The WriterObj has a reference to a writeline callable, which could well
>be a method of an object that also points to the WriterObj.

I finally got around to looking at this, only to realise Jeremy did the
work back in Apr 2003 (thanks). One question, however - the GC doco in
the Python/C API seems to suggest to me that PyObject_GC_Track should be
called on the newly minted object prior to returning from the initialiser
(and correspondingly PyObject_GC_UnTrack should be called prior to
dismantling). This isn't being done in the module as it stands. Is the
module wrong, or is my understanding of the reference manual incorrect?

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: [Csv] csv module TODO list

2005-01-06 Thread Skip Montanaro

>> * is CSV going to be maintained outside the python tree?
>> If not, remove the 2.2 compatibility macros for: PyDoc_STR,
>> PyDoc_STRVAR, PyMODINIT_FUNC, etc.

Andrew> Does anyone thing we should continue to maintain this 2.2
Andrew> compatibility?

With the release of 2.4, 2.2 has officially dropped off the radar screen,
right (zero probability of a 2.2.n+1 release, though the probability was
vanishingly small before).  I'd say toss it.  Do just that in a single
checkin so someone who's interested can do a simple cvs diff to yield
an initial patch file for external maintenance of that feature.

>> * inline the following functions since they are used only in one
>> place get_string, set_string, get_nullchar_as_None,
>> set_nullchar_as_None, join_reset (maybe)

Andrew> It was done that way as I felt we would be adding more getters
Andrew> and setters to the dialect object in future.

The only new dialect attribute I envision is an encoding attribute.

>> * is it necessary to have Dialect_methods, can you use 0 for tp_methods?

Andrew> I was assuming I would need to add methods at some point (in
Andrew> fact, I did have methods, but removed them).

Dialect objects are really just data containers, right?  I don't see that
they would need any methods.

>> * remove commented out code (PyMem_DEL) on line 261
>> Have you used valgrind on the test to find memory overwrites/leaks?

Andrew> No, valgrind wasn't used.

I have it here at work.  I'll try to find a few minutes to run the csv tests
under valgrind's control.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


RE: [Python-Dev] Re: [Csv] csv module TODO list

2005-01-05 Thread Robert Brewer
Skip Montanaro wrote:
> Andrew> There's a bunch of jobs we (CSV module 
> maintainers) have been
> Andrew> putting off - attached is a list (in no particular order):
> 
> ...
> 
> In addition, it occurred to me this evening that there's 
> functionality in the csv module I don't think anybody uses.
> ...
> I'm also not aware that anyone really uses the Sniffer class,
> though it does provide some useful functionality should you
> need to analyze random CSV files.

I used Sniffer quite heavily for my last contract. The client had
multiple multigig csv's which needed deduplicating, but they were all
from different sources and therefore in different formats. It would have
cost me many more hours without the Sniffer. Please keep it. <:)


Robert Brewer
MIS
Amor Ministries
[EMAIL PROTECTED]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: [Csv] csv module TODO list

2005-01-05 Thread Andrew McNamara
>Also, review comments from Neal Norwitz, 22 Mar 2003 (some of these should
>already have been addressed):

I should apologise to Neal here for not replying to him at the time.

Okay, going though the issues Neal raised...

>* remove TODO comment at top of file--it's empty

Was fixed.

>* is CSV going to be maintained outside the python tree?
>  If not, remove the 2.2 compatibility macros for:
> PyDoc_STR, PyDoc_STRVAR, PyMODINIT_FUNC, etc.

Does anyone thing we should continue to maintain this 2.2 compatibility?

>* inline the following functions since they are used only in one place
>get_string, set_string, get_nullchar_as_None, set_nullchar_as_None,
>join_reset (maybe)

It was done that way as I felt we would be adding more getters and
setters to the dialect object in future.

>* rather than use PyErr_BadArgument, should you use assert?
>(first example, Dialect_set_quoting, line 218)

You mean C assert()? I don't think I'm really following you here -
where would the type of the object be checked in a way the user could
recover from?

>* is it necessary to have Dialect_methods, can you use 0 for tp_methods?

I was assuming I would need to add methods at some point (in fact, I did
have methods, but removed them).

>* remove commented out code (PyMem_DEL) on line 261
>Have you used valgrind on the test to find memory overwrites/leaks?

No, valgrind wasn't used.

>* PyString_AsString()[0] on line 331 could return NULL in which case
>you are dereferencing a NULL pointer

Was fixed.

>* note sure why there are casts on 0 pointers
>lines 383-393, 733-743, 1144-1154, 1164-1165

To make it easier when the time comes to add one of those members.

>* Reader_getiter() can be removed and use PyObject_SelfIter()

Okay, wasn't aware of PyObject_SelfIter - will fix.

>* I think you need PyErr_NoMemory() before returning on line 768, 1178

The examples I looked at in the Python core didn't do this - are you sure?
(now lines 832 and 1280). 

>* is PyString_AsString(self->dialect->lineterminator) on line 994
>guaranteed not to return NULL?  If not, it could crash by
>passing to memmove.
>* PyString_AsString() can return NULL on line 1048 and 1063, 
>the result is passed to join_append()

Looking at the PyString_AsString implementation, it looks safe (we ensure
it's really a string elsewhere)?

>* iteratable should be iterable?  (line 1088)

Sorry, I don't know what you're getting at here? (now line 1162).

>* why doesn't csv_writerows() have a docstring?  csv_writerow does

Was fixed.

>* any PyUnicode_* methods should be protected with #ifdef Py_USING_UNICODE

Was fixed.

>* csv_unregister_dialect, csv_get_dialect could use METH_O 
>so you don't need to use PyArg_ParseTuple

Was fixed.

>* in init_csv, recommend using 
>PyModule_AddIntConstant and PyModule_AddStringConstant
>where appropriate

Was fixed.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: [Csv] csv module TODO list

2005-01-04 Thread Andrew McNamara
>Andrew> There's a bunch of jobs we (CSV module maintainers) have been
>Andrew> putting off - attached is a list (in no particular order):
>...
>
>In addition, it occurred to me this evening that there's functionality in
>the csv module I don't think anybody uses.  

It's very difficult to say for sure that nobody is using it once it's
released to the world.

>For example, you can register CSV dialects by name, then pass in the
>string name instead of the dialect class.  I'd be in favor of scrapping
>list_dialects, register_dialect and unregister_dialect altogether.  While
>they are probably trivial little functions I don't think they add much if
>anything to the implementation and just complicate the _csv extension
>module slightly.  

Yes, in hindsight, they're not really necessary, although I'm sure we
had some motivation for them initially. That said, they're there now,
and they shouldn't require much maintenance.

>I'm also not aware that anyone really uses the Sniffer class, though it
>does provide some useful functionality should you need to analyze random
>CSV files.

The comment I get repeatedly is that they don't use it because it's
"too magic/scary". That's as it should be. But if it didn't exist,
then someone would be requesting we add it... 8-)

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: [Csv] csv module TODO list

2005-01-04 Thread Skip Montanaro

Andrew> There's a bunch of jobs we (CSV module maintainers) have been
Andrew> putting off - attached is a list (in no particular order):

...

In addition, it occurred to me this evening that there's functionality in
the csv module I don't think anybody uses.  For example, you can register
CSV dialects by name, then pass in the string name instead of the dialect
class.  I'd be in favor of scrapping list_dialects, register_dialect and
unregister_dialect altogether.  While they are probably trivial little
functions I don't think they add much if anything to the implementation and
just complicate the _csv extension module slightly.  I'm also not aware that
anyone really uses the Sniffer class, though it does provide some useful
functionality should you need to analyze random CSV files.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com