The thread "Should CSV parsing be stricter about mid-field quotes?" [1] forked into a new topic, with two new ideas, hence this new thread.
1. COPY ... QUOTE NONE In the [1] thread, Andrew Dunstan suggested a trick on how to deal with unquoted but delimited files, such as TSV-files produced by Google Sheets: > You can use CSV mode pretty reliably for TSV files. > The trick is to use a quoting char that shouldn't appear, > such as E'\x01' as well as setting the delimiter to E'\t'. > Yes, it's far from obvious. Would it be an improvement to allow specifying `QUOTE NONE` instead? quotes.tsv: id quote 1 "E = mc^2" -- Albert Einstein COPY quotes FROM '/tmp/quotes.tsv' WITH CSV HEADER DELIMITER E'\t' QUOTE NONE; SELECT * FROM quotes; id | quote ----+------------------------------- 1 | "E = mc^2" -- Albert Einstein (1 row) 2. COPY ... DELIMITER NONE This is meant to improve the use-case when wanting to import e.g. an unstructured log file line-by-line into a single column. The current trick I've been using is similar to the first one, that is, to specify a non-existing delimiter. But that involves having to find some non-existing byte, which is error-prone since future log files might suddenly start to contain it. So I think it would be better to be to be explicit about not wanting to delimit fields at all, treating the entire whole line as a column. Example: % cat /tmp/apache.log 192.168.1.1 - - [19/May/2023:09:54:17 -0700] "GET /index.html HTTP/1.1" 200 431 "http://www.example.com/home.html" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3" 192.168.1.2 - - [19/May/2023:09:55:12 -0700] "POST /form.php HTTP/1.1" 200 512 "http://www.example.com/form.html" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3" CREATE TABLE unstructured_log (whole_line text); COPY unstructured_log FROM '/tmp/apache.log' WITH CSV DELIMITER NONE QUOTE NONE; SELECT * FROM unstructured_log; whole_line ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 192.168.1.1 - - [19/May/2023:09:54:17 -0700] "GET /index.html HTTP/1.1" 200 431 "http://www.example.com/home.html" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3" 192.168.1.2 - - [19/May/2023:09:55:12 -0700] "POST /form.php HTTP/1.1" 200 512 "http://www.example.com/form.html" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3" (2 rows) I hacked together a broken patch just to demonstrate the idea on syntax and basic idea. The `COPY ... FROM` examples above works. But it doesn't work at all for `COPY ... TO`, since it output \0 byte as delimiter and quote in the output, which is of course not what we want. Just wanted some feedback to see if there is any broader interest in this, before proceeding and looking into how to implement it properly. Is this something we want or are there just a few of us who have needed this in the past? /Joel [1] https://www.postgresql.org/message-id/31c81233-d707-0d2a-8111-a915f463459b%40dunslane.net
copy_delimiter_quote_none.patch
Description: Binary data