On Mon, Oct 20, 2008 at 00:48, John Machin <[EMAIL PROTECTED]> wrote:
> Tom Brown wrote: > >> (Continuing thread started at >> http://mail.python.org/pipermail/csv/2008-October/000688.html) >> >> On Sun, Oct 19, 2008 at 16:46, Andrew McNamara < >> [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: >> >> >I downloaded the 2.6 source tar ball, but is it too late for new >> features to >> >get into versions <3? >> >> Yep. >> >> >How would you feel about adding the following tests to >> Lib/test/test_csv.py >> >and getting them to pass? >> > >> >Also http://www.python.org/doc/2.5.2/lib/csv-fmt-params.html says >> >"*skipinitialspace *When True, whitespace immediately following the >> >delimiter is ignored." >> >but my tests show whitespace at the start of any field is ignored, >> including >> >the first field. >> >> I suspect (but I haven't checked) that it means "after the delimiter >> and >> before any quoted field (or some variation on that). >> >> I agree that whitespace after the delimiter and before any quoted field is >> skipped. Also whitespace after the start of the line and before any quoted >> field is skipped. >> > > All of the "dialect" parameters are there to allow parsing of a >> specific >> common form of CSV file. Because there is no formal definition of the >> format, the module simply aims to parse (and produce the same result) >> as common applications such as Excel and Access. Changing the behaviour >> in any non-backwards compatible way is sure to get screams of anguish >> from many users. Even when the behaviour appears to be a bug, you can >> be sure people are counting on it working like that. >> >> >> skipinitialspace defaults to false and by the same logic skipfinalspace >> should default to false to preserve compatibility with the csv module in >> 2.6. On the other hand, the switch to version 3 is as good a time as any to >> break backwards compatibility to adopt something that works better for new >> users. >> > > Read Andrew's lips: They don't want "better", they want "the same as MS". okay. > > > Based on my experience parsing several hundred csv generated by many >> different people I think it would be nice to at least have a dialect that is >> excel + skipinitialspace=True + skipfinalspace=True. >> > > Based on my experience extracting data from innumerable csv files (and > infinite varieties thereof), Wow, that is a _lot_ of files :-P spreadsheet files, and database tables, in 99.99% of cases one should > automatically apply the following transformations to each text field: > * strip leading whitespace > * strip trailing whitespace > * replace embedded runs of whitespace by a single space > and one needs to ensure that the definition of whitespace includes the > no-break space (NBSP) character. > > As this "space normalisation" is needed for all input sources, the csv > module is IMHO the wrong place to put it. A string method would be a better > idea. I agree that strip() and something like re.sub(r"\s+", " " are handy. If 99.99% percent of csv readers should be applying these fixes to every field perhaps there should be easy-to-enable option to apply it. Why force almost everyone to discover they need the transformations and put a line of code around csv reader?
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com