On Oct 20, 2008, at 09:48, John Machin wrote:

Based on my experience extracting data from innumerable csv files (and infinite varieties thereof), spreadsheet files, and database tables, in 99.99% of cases one should automatically apply the following transformations to each text field:
  * strip leading whitespace
  * strip trailing whitespace
  * replace embedded runs of whitespace by a single space
and one needs to ensure that the definition of whitespace includes the no-break space (NBSP) character.

As this "space normalisation" is needed for all input sources, the csv module is IMHO the wrong place to put it. A string method would be a better idea.

Hm. It seems quite familiar, somehow...

You could certainly do the following (for each field)...

  " ".join(field.split())

... but I seem to recall running across something that did this? (Maybe I'm confusing it with some other issue, with the string.capwords function versis str.title :)

--
Magnus Lie Hetland
http://hetland.org


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to