čt 11. 5. 2023 v 16:04 odesílatel Joel Jacobson <[email protected]> napsal:
> Hi hackers, > > I've come across an unexpected behavior in our CSV parser that I'd like to > bring up for discussion. > > % cat example.csv > id,rating,review > 1,5,"Great product, will buy again." > 2,3,"I bought this for my 6" laptop but it didn't fit my 8" tablet" > > % psql > CREATE TABLE reviews (id int, rating int, review text); > \COPY reviews FROM example.csv WITH CSV HEADER; > SELECT * FROM reviews; > > This gives: > > id | rating | review > ----+--------+------------------------------------------------------------- > 1 | 5 | Great product, will buy again. > 2 | 3 | I bought this for my 6 laptop but it didn't fit my 8 tablet > (2 rows) > > The parser currently accepts quoting within an unquoted field. This can > lead to > data misinterpretation when the quote is part of the field data (e.g., > for inches, like in the example). > > Our CSV output rules quote an entire field or not at all. But the import of > fields with mid-field quotes might lead to surprising and undetected > outcomes. > > I think we should throw a parsing error for unescaped mid-field quotes, > and add a COPY option like ALLOW_MIDFIELD_QUOTES for cases where mid-field > quotes are necessary. The error message could suggest this option when it > encounters an unescaped mid-field quote. > > I think the convenience of not having to use an extra option doesn't > outweigh > the risk of undetected data integrity issues. > > Thoughts? > +1 Pavel > /Joel > >
