On Fri, Sep 25, 2009 at 10:24:15AM +1000, BareFeet scratched on the wall:

> In reality, in the thousands of CSV files I've dealt with  
> over the years, they all follow the same standard:
> 
> 1. Rows are delimited by a new line (return and/or line feed).
> 2. Columns are delimited by a comma.
> 3. "Quoted text" is treated as one value item, including any commas or  
> new lines within it.
> 4. A double quote "" is used to put a quote within quotes.
> 
> That's it.

  This is more or less the standard put forth by RFC 4180.  And if this
  is all you've encountered, you're not using very many different
  applications or you're primarily dealing with numbers and simple
  strings that don't contain quotes or commas.  CSV works very very
  well if you never get into the question of escapes, but details count.

  Reading the RFC only proves my point.  SQLite v3 is older than that
  doc, and it pretty much admits the cat was out of the bag a long time
  ago.  There are a ton of optional and might/may/could sections that
  event the format they define has a lot of holes in it (i.e. headers,
  or no headers?).

> Everything I've seen uses this.

  According to the RFC Excel doesn't use double-quotes for anything.
  You might not care about Excel, but I'm willing to bet it is one of
  the most-- if not the most-- common exporters of CSV.  The question
  of getting data from Excel into SQLite shows up on the list every now
  and then.

> Some don't need delimiters  
> in values, so they don't need quotes, but the encompassing  
> specification works for all cases.

  No, it doesn't.  Working on a large website that provided CSV exports
  for a number of data sources, I've seen plenty of examples that don't
  work.  Finding a common format that could be exported into a handful
  of common desktop apps was so pointless we seriously considered
  getting rid of CSV all together, because we got tired of our users
  telling us how simple CSV was, and why couldn't we just do this one
  thing differently so it would work on *their* application.

> It's not that big a deal for SQLite to support it, so it should.

  If it is so simple, and you know where the code is...

  This is, perhaps, the biggest fallacy of CSV... people think it
  is a "simple" format (it isn't), and assume that code support to
  "correctly" (whatever that is) read it is simple.  It isn't.  The RFC
  has a formal grammar that requires over a dozen elements to define!

  Most people setting out to build a CSV reader never think to use a
  full grammar and parser-- after all, it is such a "simple" format--
  and find themselves in a mess of code soon enough.  Seriously, give
  it a try.

  Carlos's Python script (nice!) is a great example.  His comment "I am
  so grateful I did not have to write a parser for CSV" is dead on.
  And, as he points out, the reason the Python module is so good is
  that it is adaptive, and really reads five or six different variants
  of CSV (something a reader can do but a writer cannot).  He was
  also able to clobber it all together in a few hours or less (because
  someone else spent a few hundred hours on the CSV module), further
  proving that advanced support of this kind of thing is really outside
  of the scope of SQLite3.  After all, the .import command is part of
  the shell, not part of the core library.




  CSV is a great quick and dirty format to move data.  But it isn't
  "simple" and it isn't nearly as universal as many assume.  It works
  great if you're just moving simple numbers and strings that don't
  include commas, but becomes a mess when you get into exceptions. 

  Personally, I'd rather have the SQLite team working on core database
  features than trying to build a better CSV parser.  The problem
  is non-trivial and borders on unobtainable and, as Carlos
  proved so clearly, there are better, easier, faster ways.

   -j


-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to