On 6/26/2013 11:21 AM, RSmith wrote:
I meant if a real csv interpreter conforming to  RFC4180 were to read the 
garbage I posted, it would come up with the
result specified.

How do you know what a standard-conforming interpreter would do when presented with input that's invalid under that standard? The standard only describes the meaning of valid input, naturally. What exactly is your claim based on?

Ahh, a question I can actually answer confidently - firstly these claims can only be made for those I did test, the ones I did not I cant say anything about (Naturally). I mentioned earlier about it being a hobby of mine of sorts, but I know this because I have done ludicrous amounts of testing and evaluating imports for and from CSVs (I actually originally had quite a few ideas, not unlike Reinhard, about how to make it better). I tried to devise importers specifcally that would simply import near any format, and succeeded very well too (I could send you some test software if you like to try) but what I couldnt do is make a universal CSV importer that would be impervious to some weird quoting habits (for instance) and not mess up other "proper" CSV imports (well, not anything that withstood rigorous testing) - to both my delight and dismay I found most systems in the wild have their own interpretation and quirks. This required me testing other CSV importers with all kinds of data trying to get them to break or seeing what non-conformances would be accepted (OpenOffice, Googledocs, Excel, even SQLite.import to name a few). All this effort precisely because RFC4180 is less universally implemented than it should be, in fact it is surprising how many systems (mostly proprietary to be fair) export csv that are atrociously non-conforming, but since excel imports it ok, who cares, right?

I could almost from head jot down CSV data that either conforms to RFC4180 and would break some standard importers, or that doesn't conform but would be accepted by many. With CSV and data manipulation and parsing I've been around the block a few times - so believe me when I say I really feel Reinhard's pain, but there really is no quick (but standard) fix.

I would imagine that most (well-written, not otherwise buggy) CSV interpreters agree in their interpretation of RFC4180-conforming input; RFC4180 describes a pretty strict subset of what's found in the wild. It's precisely in their handling of non-conforming input that CSV interpreters differ.

Couldn't agree more - but the real culprit is the purported "CSV" exporters, 
which is what prompted this thread too.


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to