Re: [sqlite] CSV import does not handle fields with a comma surrounded by double

2019-05-21 Thread Shawn Wagner
Still applies; just badly worded.

On Tue, May 21, 2019 at 8:47 AM Dominique Devienne 
wrote:

> On Tue, May 21, 2019 at 5:36 PM Shawn Wagner 
> wrote:
>
> > I have a handy script that can handle that sort of input with extra
> spaces
> > (With the --strip option), and other stuff that csv .import doesn't
> always
> > deal well with:
> >
> > https://github.com/shawnw/useful_sqlite_extensions/tree/master/tools
>
>
> Doc says from *unquoted fields*.
> Here it's about a space between the comma and the leading double-quote of a
> *quoted field*.
> So still applies? --DD
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] CSV import does not handle fields with a comma surrounded by double

2019-05-21 Thread Keith Medcalf

On Tuesday, 21 May, 2019 07:55, Richard Hipp  wrote:

>On 5/21/19, Patrick Sherrill  wrote:
>> I don’t know about ‘valid’ csv, it has been a moving target for
>>decades.

>> White space as far as my recollection should not be considered in
>>parsing a csv.

>I'm going by RFC 4180.  https://tools.ietf.org/html/rfc4180.  On page
>2 it says: "Spaces are considered part of a field and should not be
>ignored."

and also the RFC also says in section 5

Each field may or may not be enclosed in double quotes (however some programs, 
such as Microsoft Excel, do not use double quotes at all).  If fields are not 
enclosed with double quotes, then double quotes may not appear inside the 
fields.

So, since white space is not ignored and the first character of the field is 
not a quote (it is a space), then the quote inside the field is an error and 
should presumably toss some sort of a parse exception ...

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] CSV import does not handle fields with a comma surrounded by double

2019-05-21 Thread Dominique Devienne
On Tue, May 21, 2019 at 5:36 PM Shawn Wagner 
wrote:

> I have a handy script that can handle that sort of input with extra spaces
> (With the --strip option), and other stuff that csv .import doesn't always
> deal well with:
>
> https://github.com/shawnw/useful_sqlite_extensions/tree/master/tools


Doc says from *unquoted fields*.
Here it's about a space between the comma and the leading double-quote of a
*quoted field*.
So still applies? --DD
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] CSV import does not handle fields with a comma surrounded by double

2019-05-21 Thread Shawn Wagner
I have a handy script that can handle that sort of input with extra spaces
(With the --strip option), and other stuff that csv .import doesn't always
deal well with:

https://github.com/shawnw/useful_sqlite_extensions/tree/master/tools

On Tue, May 21, 2019 at 6:14 AM Faheem Mitha  wrote:

>
> Hi,
>
> I'm seeing the same bug reported here, in an issue from 2009, supposedly
> fixed in 2014.
>
> https://www.sqlite.org/src/tktview?name=c25aab7e7e with the title:
> The ".import" command does not parse CSV correctly.
>
> I'm using Debian buster, with SQLite version 3.27.2-2.
>
> Here is a simple reproduction recipe. Consider the following simple CSV
> file.
>
> ###
> comma.csv
> ###
> somestuff, "some,stuff"
>
> Then one just needs to do
>
> sqlite3 comma.db 'create table comma(foo, bar);' '.mode csv' '.import
> comma.csv comma' '.exit'
>
> Which gives the error message:
>
> expected 2 columns but found 3 - extras ignored.
>
> As mentioned in that ticket, this is per the
> https://tools.ietf.org/html/rfc4180 standard.
>
> Pandas handles this case fine. I've not checked R.
>
> Should I attempt to reopen that ticket?
>
> Regards, Faheem
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] CSV import does not handle fields with a comma surrounded by double

2019-05-21 Thread Richard Hipp
On 5/21/19, Patrick Sherrill  wrote:
> I don’t know about ‘valid’ csv, it has been a moving target for decades.
> White space as far as my recollection should not be considered in parsing a
> csv.

I'm going by RFC 4180.  https://tools.ietf.org/html/rfc4180.  On page
2 it says: "Spaces are considered part of a field and should not be
ignored."

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] CSV import does not handle fields with a comma surrounded by double

2019-05-21 Thread Jose Isaias Cabrera

Patrick Sherrill, on Tuesday, May 21, 2019 09:38 AM, wrote...

>I don’t know about ‘valid’ csv, it has been a moving target for decades. White 
>space as far as my recollection should not be considered in parsing a csv.

In wikipedia [1], under Specification, #3, it states,

"with the records divided into fields separated by delimiters (typically a 
single reserved character such as comma, semicolon, or tab; sometimes the 
delimiter may include optional spaces),"

I don't know how the RFC agrees with wikipedia, but, it looks like spaces 
should be ignore when importing. However, the RFC [2], section 4 says,

"... Spaces are considered part of a field and should not be ignored... "

So, who is right?


[1] https://en.wikipedia.org/wiki/Comma-separated_values
[2] https://tools.ietf.org/html/rfc4180#section-2
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] CSV import does not handle fields with a comma surrounded by double

2019-05-21 Thread Simon Slavin
On 21 May 2019, at 2:38pm, Patrick Sherrill  wrote:

> I don’t know about ‘valid’ csv, it has been a moving target for decades. 
> White space as far as my recollection should not be considered in parsing a 
> csv. 

You can do it, but you have to be consistent.  There is a problem with this 
specific line:

somestuff, "some,stuff"

CSV files are expected to come in two different format.  One is to have 
double-quotes around all text values:

"somestuff", "some,stuff"

In this form, the comma is just, and obviously, a separator.  Blank space 
around separators is not a problem and parsers can identify it and ignore it as 
required.

The other commonly-found CSV format is this:

somestuff, otherstuff

In this format, strings are not indicated with the use of double quotes.  In 
this case the separator is a comma.  Space characters around it are not blank 
space, they're part of the value.  Because if they aren't there's no way to 
supply a string that begins or ends with a space.  So the second value parsed 
from this line starts with a space character.

Given those two possibilities, How would you expect a parser to figure out this 
line ?

somestuff, "some,stuff"

It's neither one format nor the other.  Parsers might handle this however they 
want.  It is permitted for a parser to look at the first character, decide that 
the line starts with a string with no delimiter, and identify the three strings 
on that line as

somestuff
 "some   <-- this value begins with 
stuff"   <-- this value ends with 

If you need include commas in your strings, delimit all your strings with 
double quotes and the problem will go away.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] CSV import does not handle fields with a comma surrounded by double

2019-05-21 Thread Patrick Sherrill
I don’t know about ‘valid’ csv, it has been a moving target for decades. White 
space as far as my recollection should not be considered in parsing a csv. 

My 2 cents. 

Pat...

Sent from my iPhone

> On May 21, 2019, at 9:28 AM, Richard Hipp  wrote:
> 
>> On 5/21/19, Faheem Mitha  wrote:
>> The ".import" command does not parse CSV correctly.
>> somestuff, "some,stuff"
> 
> This is not valid CSV.  There is an extra space character after the
> comma and before the double-quote.
> 
> -- 
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] CSV import does not handle fields with a comma surrounded by double

2019-05-21 Thread Richard Hipp
On 5/21/19, Faheem Mitha  wrote:
> The ".import" command does not parse CSV correctly.
> somestuff, "some,stuff"

This is not valid CSV.  There is an extra space character after the
comma and before the double-quote.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] CSV import does not handle fields with a comma surrounded by double

2019-05-21 Thread Faheem Mitha


Hi,

I'm seeing the same bug reported here, in an issue from 2009, supposedly
fixed in 2014.

https://www.sqlite.org/src/tktview?name=c25aab7e7e with the title:
The ".import" command does not parse CSV correctly.

I'm using Debian buster, with SQLite version 3.27.2-2.

Here is a simple reproduction recipe. Consider the following simple CSV file.

###
comma.csv
###
somestuff, "some,stuff"

Then one just needs to do

   sqlite3 comma.db 'create table comma(foo, bar);' '.mode csv' '.import 
comma.csv comma' '.exit'

Which gives the error message:

   expected 2 columns but found 3 - extras ignored.

As mentioned in that ticket, this is per the 
https://tools.ietf.org/html/rfc4180 standard.

Pandas handles this case fine. I've not checked R.

Should I attempt to reopen that ticket?

Regards, Faheem
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users