Paul D. DeRocco wrote:
Add an inQuotes flag, with an initial value of false. For each item, if it
has a quote in it, toggle the inQuotes flag. Then, if inQuotes is set,
append a comma instead of a tab, to put the item back together again.

Roger that. CSV elements may contain returns within quoted portions, and escaping uses a wide range of conventions differing from program to program (within the Microsoft family I've seen it differ from version to version of the same program). The flag method, while notoriously slow, is the only reliable method I've found for handling all the many variants of CSV.

<rant>
A plea for sanity in the software development world:

While we need to write CSV importers from time to time, please never write CSV exporters.

CSV must die.

The problem with CSV is that the comma is very commonly used in data, making it a uniquely stupid choice as a delimiter.

That stupidity could be countered with consistent escaping, but no true standard has emerged since the many decades of this productivity-abuse began.

Making a stupid decision even more stupid, most CSV files quote non-numeric values, as though the programmers did not realize the quote character is commonly used in our language and therefore may likely be part of the data. So using quote as an escape means that you must escape the escape. Jeeminy who thinks up that merde?!?!

Sometimes the escape for in-data double-quotes is a double double-quote, which sometimes makes it hard to know what to do with empty values shown as "", esp. given that in some formats the empty value abuts the adjacent commas and in others, like MySQL dumps, it abuts only the trailing comma but has a space before the leading comma. Other times double-quotes are escaped with \", meaning you'll need to escape any in-data backslashes too.

For thinking people, about the time you realize that you're escaping the escape that you've escaped to handle your data, it would occur to you to go back and check your original premise. But not so for the creators of CSV.

As Jim Bufalini pointed out, tab-delimited or even (in fewer cases) pipe-delimited make much saner options.

For my own delimited exports I've adopted the convention used in FileMaker Pro and others, with escapes that are unlikely to be in the data:

- records are delimited with returns
- fields delimited with tabs
- quotes are never added and are included only when they are part
  of the data
- any in-data returns are escaped with ASCII 11
- any in-data tabs escaped with ASCII 4

Simple to write, lightning-fast to parse.

When you add up all the programmer- and end-user-hours lost to dealing with the uniquely stupid collection of mish-mashed ad-hoc formats that is CSV, it amounts to nothing less than a crime against humanity. Several hundred if not thousands of human lifetimes have been lost either dealing with bugs related to CSV parsers, or simply waiting for the inherently slow parsing of CSV that could have taken mere milliseconds if done in any saner format.

CSV must die.

Please help it die:  never write CSV exporters.
</rant>

--
 Richard Gaskin
 Fourth World
 Rev training and consulting: http://www.fourthworld.com
 Webzine for Rev developers: http://www.revjournal.com
 revJournal blog: http://revjournal.com/blog.irv
_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to