Re: [Python-Dev] [Csv] skipfinalspace

John Machin Mon, 20 Oct 2008 00:48:37 -0700

Tom Brown wrote:

(Continuing thread started athttp://mail.python.org/pipermail/csv/2008-October/000688.html)
On Sun, Oct 19, 2008 at 16:46, Andrew McNamara<[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
     >I downloaded the 2.6 source tar ball, but is it too late for new
    features to
     >get into versions <3?

    Yep.

     >How would you feel about adding the following tests to
    Lib/test/test_csv.py
     >and getting them to pass?
     >
     >Also http://www.python.org/doc/2.5.2/lib/csv-fmt-params.html says
     >"*skipinitialspace *When True, whitespace immediately following the
     >delimiter is ignored."
     >but my tests show whitespace at the start of any field is ignored,
    including
     >the first field.

    I suspect (but I haven't checked) that it means "after the delimiter and
    before any quoted field (or some variation on that).
I agree that whitespace after the delimiter and before any quoted fieldis skipped. Also whitespace after the start of the line and before anyquoted field is skipped.

    All of the "dialect" parameters are there to allow parsing of a specific
    common form of CSV file. Because there is no formal definition of the
    format, the module simply aims to parse (and produce the same result)
    as common applications such as Excel and Access. Changing the behaviour
    in any non-backwards compatible way is sure to get screams of anguish
    from many users. Even when the behaviour appears to be a bug, you can
    be sure people are counting on it working like that.
skipinitialspace defaults to false and by the same logic skipfinalspaceshould default to false to preserve compatibility with the csv module in2.6. On the other hand, the switch to version 3 is as good a time as anyto break backwards compatibility to adopt something that works betterfor new users.


Read Andrew's lips: They don't want "better", they want "the same as MS".

Based on my experience parsing several hundred csv generated by manydifferent people I think it would be nice to at least have a dialectthat is excel + skipinitialspace=True + skipfinalspace=True.

Based on my experience extracting data from innumerable csv files (andinfinite varieties thereof), spreadsheet files, and database tables, in99.99% of cases one should automatically apply the followingtransformations to each text field:

   * strip leading whitespace
   * strip trailing whitespace
   * replace embedded runs of whitespace by a single space

and one needs to ensure that the definition of whitespace includes theno-break space (NBSP) character.

As this "space normalisation" is needed for all input sources, the csvmodule is IMHO the wrong place to put it. A string method would be abetter idea.


Cheers,
John
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Csv] skipfinalspace

Reply via email to