Re: Should CSV parsing be stricter about mid-field quotes?

2023-07-01 Thread Noah Misch
On Sat, May 20, 2023 at 09:16:30AM +0200, Joel Jacobson wrote: > On Fri, May 19, 2023, at 18:06, Daniel Verite wrote: > > COPY FROM file CSV somewhat differs as your example shows, > > but it still mishandle \. when unquoted. For instance, consider this > > file to load with COPY t FROM

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-22 Thread Daniel Verite
Kirk Wolak wrote: > We do NOT do "CSV", we mimic pg_dump. pg_dump uses the text format (as opposed to csv), where \. on a line by itself cannot appear in the data, so there's no problem. The problem is limited to the csv format. Best regards, -- Daniel Vérité

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-22 Thread Kirk Wolak
On Mon, May 22, 2023 at 12:13 PM Daniel Verite wrote: > Joel Jacobson wrote: > > > Is there a valid reason why \. is needed for COPY FROM filename? > > It seems to me it would only be necessary for the COPY FROM STDIN case, > > since files have a natural end-of-file and a known file

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-22 Thread Daniel Verite
Joel Jacobson wrote: > Is there a valid reason why \. is needed for COPY FROM filename? > It seems to me it would only be necessary for the COPY FROM STDIN case, > since files have a natural end-of-file and a known file size. Looking at CopyReadLineText() over at [1], I don't see a

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-20 Thread Joel Jacobson
On Fri, May 19, 2023, at 18:06, Daniel Verite wrote: > COPY FROM file CSV somewhat differs as your example shows, > but it still mishandle \. when unquoted. For instance, consider this > file to load with COPYt FROM '/tmp/t.csv' WITH CSV > $ cat /tmp/t.csv > line 1 > \. > line 3 > line 4 >

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-19 Thread Daniel Verite
Joel Jacobson wrote: > I understand its necessity for STDIN, given that the end of input needs to > be explicitly defined. > However, for files, we have a known file size and the end-of-file can be > detected without the need for special markers. > > Also, is the difference in how

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-19 Thread Joel Jacobson
On Thu, May 18, 2023, at 18:48, Daniel Verite wrote: > Joel Jacobson wrote: >> OTOH, one would then need to inspect the TSV file doesn't contain \. on an >> empty line... > > Note that this is the case for valid CSV contents, since backslash-dot > on a line by itself is both an end-of-data marker

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-18 Thread Daniel Verite
Joel Jacobson wrote: > I've been using that trick myself many times in the past, but thanks to this > deep-dive into this topic, it looks to me like TEXT would be a better format > fit when dealing with unquoted TSV files, or? > > OTOH, one would then need to inspect the TSV file doesn't

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-18 Thread Andrew Dunstan
On 2023-05-18 Th 02:19, Joel Jacobson wrote: On Thu, May 18, 2023, at 08:00, Joel Jacobson wrote: > 1. How about adding a `WITHOUT QUOTE` or `QUOTE NONE` option in conjunction > with `COPY ... WITH CSV`? More ideas: [ QUOTE 'quote_character' | UNQUOTED ] or [ QUOTE 'quote_character' |

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-18 Thread Joel Jacobson
On Thu, May 18, 2023, at 08:35, Pavel Stehule wrote: > Maybe there is another third implementation in Libre Office. > > Generally TSV is not well specified, and then the implementations are not > consistent. Thanks Pavel, that was a very interesting case indeed: Libre Office (tested on Mac)

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-18 Thread Pavel Stehule
čt 18. 5. 2023 v 8:01 odesílatel Joel Jacobson napsal: > On Thu, May 18, 2023, at 00:18, Kirk Wolak wrote: > > Here you go. Not horrible handling. (I use DataGrip so I saved it from > there > > directly as TSV, just for an extra datapoint). > > > > FWIW, if you copy/paste in windows, the data,

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-18 Thread Joel Jacobson
On Thu, May 18, 2023, at 08:00, Joel Jacobson wrote: > 1. How about adding a `WITHOUT QUOTE` or `QUOTE NONE` option in conjunction > with `COPY ... WITH CSV`? More ideas: [ QUOTE 'quote_character' | UNQUOTED ] or [ QUOTE 'quote_character' | NO_QUOTE ] Thinking about it, I recall another hack;

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-18 Thread Joel Jacobson
On Thu, May 18, 2023, at 00:18, Kirk Wolak wrote: > Here you go. Not horrible handling. (I use DataGrip so I saved it from there > directly as TSV, just for an extra datapoint). > > FWIW, if you copy/paste in windows, the data, the field with the tab gets > split into another column in Excel. But

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-17 Thread Kirk Wolak
On Wed, May 17, 2023 at 5:47 PM Joel Jacobson wrote: > On Wed, May 17, 2023, at 19:42, Andrew Dunstan wrote: > > You can use CSV mode pretty reliably for TSV files. The trick is to use a > > quoting char that shouldn't appear, such as E'\x01' as well as setting > the > > delimiter to E'\t'. Yes,

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-17 Thread Joel Jacobson
On Wed, May 17, 2023, at 19:42, Andrew Dunstan wrote: > You can use CSV mode pretty reliably for TSV files. The trick is to use a > quoting char that shouldn't appear, such as E'\x01' as well as setting the > delimiter to E'\t'. Yes, it's far from obvious. I've been using that trick myself many

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-17 Thread Andrew Dunstan
On 2023-05-16 Tu 13:15, Joel Jacobson wrote: On Tue, May 16, 2023, at 13:43, Joel Jacobson wrote: >If we made midfield quoting a CSV error, those users who are currently mistaken >about their TSV/TEXT files being CSV while also having balanced quotes in their >data, would encounter an error

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-16 Thread Joel Jacobson
On Tue, May 16, 2023, at 13:43, Joel Jacobson wrote: >If we made midfield quoting a CSV error, those users who are currently mistaken >about their TSV/TEXT files being CSV while also having balanced quotes in their >data, would encounter an error rather than a silent failure, which I believe

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-16 Thread Joel Jacobson
On Sun, May 14, 2023, at 16:58, Andrew Dunstan wrote: > And if people do follow the method you describe then their input with > unescaped quotes will be rejected 999 times out of 1000. It's only cases where > the field happens to have an even number of embedded quotes, like Joel's > somewhat

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-14 Thread Andrew Dunstan
On 2023-05-13 Sa 23:11, Greg Stark wrote: On Sat, 13 May 2023 at 09:46, Tom Lane wrote: Andrew Dunstan writes: I could see an argument for a STRICT mode which would disallow partially quoted fields, although I'd like some evidence that we're dealing with a real problem here. Is there really

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-13 Thread Greg Stark
On Sat, 13 May 2023 at 09:46, Tom Lane wrote: > > Andrew Dunstan writes: > > I could see an argument for a STRICT mode which would disallow partially > > quoted fields, although I'd like some evidence that we're dealing with a > > real problem here. Is there really a CSV producer that produces

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-13 Thread Tom Lane
Andrew Dunstan writes: > I could see an argument for a STRICT mode which would disallow partially > quoted fields, although I'd like some evidence that we're dealing with a > real problem here. Is there really a CSV producer that produces output > like that you showed in your example? And if

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-13 Thread Andrew Dunstan
On 2023-05-13 Sa 04:20, Joel Jacobson wrote: On Fri, May 12, 2023, at 21:57, Andrew Dunstan wrote: Maybe this is unexpected by you, but it's not by me. What other sane interpretation of that data could there be? And what CSV producer outputs such horrible content? As you've noted, ours

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-13 Thread Joel Jacobson
On Fri, May 12, 2023, at 21:57, Andrew Dunstan wrote: > Maybe this is unexpected by you, but it's not by me. What other sane > interpretation of that data could there be? And what CSV producer outputs > such horrible content? As you've noted, ours certainly does not. Our rules > are clear:

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-12 Thread Andrew Dunstan
On 2023-05-11 Th 10:03, Joel Jacobson wrote: Hi hackers, I've come across an unexpected behavior in our CSV parser that I'd like to bring up for discussion. % cat example.csv id,rating,review 1,5,"Great product, will buy again." 2,3,"I bought this for my 6" laptop but it didn't fit my 8"

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-12 Thread Greg Stark
On Thu, 11 May 2023 at 10:04, Joel Jacobson wrote: > > The parser currently accepts quoting within an unquoted field. This can lead > to > data misinterpretation when the quote is part of the field data (e.g., > for inches, like in the example). I think you're thinking about it differently than

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-11 Thread Pavel Stehule
čt 11. 5. 2023 v 16:04 odesílatel Joel Jacobson napsal: > Hi hackers, > > I've come across an unexpected behavior in our CSV parser that I'd like to > bring up for discussion. > > % cat example.csv > id,rating,review > 1,5,"Great product, will buy again." > 2,3,"I bought this for my 6" laptop