Re: Comma-delimited values

Richard Gaskin Mon, 08 Mar 2010 10:55:45 -0800

Paul D. DeRocco wrote:

Add an inQuotes flag, with an initial value of false. For each item, if it
has a quote in it, toggle the inQuotes flag. Then, if inQuotes is set,
append a comma instead of a tab, to put the item back together again.

Roger that. CSV elements may contain returns within quoted portions,and escaping uses a wide range of conventions differing from program toprogram (within the Microsoft family I've seen it differ from version toversion of the same program). The flag method, while notoriously slow,is the only reliable method I've found for handling all the manyvariants of CSV.


<rant>
A plea for sanity in the software development world:

While we need to write CSV importers from time to time, please neverwrite CSV exporters.


CSV must die.

The problem with CSV is that the comma is very commonly used in data,making it a uniquely stupid choice as a delimiter.

That stupidity could be countered with consistent escaping, but no truestandard has emerged since the many decades of this productivity-abusebegan.

Making a stupid decision even more stupid, most CSV files quotenon-numeric values, as though the programmers did not realize the quotecharacter is commonly used in our language and therefore may likely bepart of the data. So using quote as an escape means that you must escapethe escape. Jeeminy who thinks up that merde?!?!

Sometimes the escape for in-data double-quotes is a double double-quote,which sometimes makes it hard to know what to do with empty values shownas "", esp. given that in some formats the empty value abuts theadjacent commas and in others, like MySQL dumps, it abuts only thetrailing comma but has a space before the leading comma. Other timesdouble-quotes are escaped with \", meaning you'll need to escape anyin-data backslashes too.

For thinking people, about the time you realize that you're escaping theescape that you've escaped to handle your data, it would occur to you togo back and check your original premise. But not so for the creators ofCSV.

As Jim Bufalini pointed out, tab-delimited or even (in fewer cases)pipe-delimited make much saner options.

For my own delimited exports I've adopted the convention used inFileMaker Pro and others, with escapes that are unlikely to be in thedata:


- records are delimited with returns
- fields delimited with tabs
- quotes are never added and are included only when they are part
  of the data
- any in-data returns are escaped with ASCII 11
- any in-data tabs escaped with ASCII 4

Simple to write, lightning-fast to parse.

When you add up all the programmer- and end-user-hours lost to dealingwith the uniquely stupid collection of mish-mashed ad-hoc formats thatis CSV, it amounts to nothing less than a crime against humanity.Several hundred if not thousands of human lifetimes have been losteither dealing with bugs related to CSV parsers, or simply waiting forthe inherently slow parsing of CSV that could have taken meremilliseconds if done in any saner format.


CSV must die.

Please help it die:  never write CSV exporters.
</rant>

--
 Richard Gaskin
 Fourth World
 Rev training and consulting: http://www.fourthworld.com
 Webzine for Rev developers: http://www.revjournal.com
 revJournal blog: http://revjournal.com/blog.irv
_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Comma-delimited values

Reply via email to