On 6/26/2013 11:21 AM, RSmith wrote:
I meant if a real csv interpreter conforming to RFC4180 were to read the
garbage I posted, it would come up with the
result specified.
How do you know what a standard-conforming interpreter would do when presented with input that's invalid under that standard? The
standard only describes the meaning of valid input, naturally. What exactly is your claim based on?
Ahh, a question I can actually answer confidently - firstly these claims can only be made for those I did test, the ones I did not I
cant say anything about (Naturally). I mentioned earlier about it being a hobby of mine of sorts, but I know this because I have
done ludicrous amounts of testing and evaluating imports for and from CSVs (I actually originally had quite a few ideas, not unlike
Reinhard, about how to make it better). I tried to devise importers specifcally that would simply import near any format, and
succeeded very well too (I could send you some test software if you like to try) but what I couldnt do is make a universal CSV
importer that would be impervious to some weird quoting habits (for instance) and not mess up other "proper" CSV imports (well, not
anything that withstood rigorous testing) - to both my delight and dismay I found most systems in the wild have their own
interpretation and quirks. This required me testing other CSV importers with all kinds of data trying to get them to break or seeing
what non-conformances would be accepted (OpenOffice, Googledocs, Excel, even SQLite.import to name a few). All this effort precisely
because RFC4180 is less universally implemented than it should be, in fact it is surprising how many systems (mostly proprietary to
be fair) export csv that are atrociously non-conforming, but since excel imports it ok, who cares, right?
I could almost from head jot down CSV data that either conforms to RFC4180 and would break some standard importers, or that doesn't
conform but would be accepted by many. With CSV and data manipulation and parsing I've been around the block a few times - so
believe me when I say I really feel Reinhard's pain, but there really is no quick (but standard) fix.
I would imagine that most (well-written, not otherwise buggy) CSV interpreters agree in their interpretation of RFC4180-conforming
input; RFC4180 describes a pretty strict subset of what's found in the wild. It's precisely in their handling of non-conforming
input that CSV interpreters differ.
Couldn't agree more - but the real culprit is the purported "CSV" exporters,
which is what prompted this thread too.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users