Hi hackers, Thread [1] renamed, since the format name has now been changed from 'raw' to 'single', as suggested by Andrew Dunstan and Jacob Champion.
[1] https://postgr.es/m/c12516b1-77dc-4ad3-94a7-88527360a...@app.fastmail.com Recap: This is about adding support to import/export text-based formats such as JSONL, or any unstructured text file, where wanting to import each line "as is" into a single column, or wanting to export a single column to a text file. Example importing the meson-logs/testlog.json file Meson generates when building PostgreSQL, which is in JSONL format: # create table meson_log (log_line jsonb); # \copy meson_log from meson-logs/testlog.json (format single); COPY 306 # select log_line->'name' name, log_line->'result' result from meson_log limit 3; name | result -----------------------------------------+-------- "postgresql:setup / tmp_install" | "OK" "postgresql:setup / install_test_files" | "OK" "postgresql:setup / initdb_cache" | "OK" (3 rows) Changes since v16: * EOL handling now works the same as for 'text' and 'csv'. In v16, we supported multi-byte delimiters to allow specifying e.g. Windows EOL (\r\n), but this seemed unnecessary, if we just do what we do for text/csv, that is, to auto-detect the EOL for COPY FROM, and use the OS default EOL for COPY TO. The DELIMITER option is therefore invalid for the 'single' format. This is the biggest change in the code, between v16 and v18. CopyReadLineRawText() has been renamed to CopyReadLineSingleText(), and changed accordingly. * A final EOL is now emitted to the last record in COPY TO. So now it works just like 'text' and 'csv'. * HEADER [ boolean | MATCH ] now supported This is now again supported, as previously suggested by Daniel Verite, possible thanks to the EOL handling. * Docs updated. Below is quoted directly from the copy.sgml, but in plaintext: --- Single Format This format option is used for importing and exporting files containing unstructured text, where each line is treated as a single field. It is useful for data that does not conform to a structured, tabular format and lacks delimiters. In the single format, each line of the input or output is considered a complete value without any field separation. There are no field delimiters, and all characters are taken literally. There is no special handling for quotes, backslashes, or escape sequences. All characters, including whitespace and special characters, are preserved exactly as they appear in the file. However, it's important to note that the text is still interpreted according to the specified ENCODING option or the current client encoding for input, and encoded using the specified ENCODING or the current client encoding for output. When using this format, the COPY command must specify exactly one column. Specifying multiple columns will result in an error. If the table has multiple columns and no column list is provided, an error will occur. The single format does not distinguish a NULL value from an empty string. Empty lines are imported as empty strings, not as NULL values. Encoding works the same as in the text and CSV formats. --- On Fri, Nov 1, 2024, at 22:28, Masahiko Sawada wrote: > I think it would be better to explain how to parse data in raw mode, > especially which steps in the pipeline we skip, in the comment at the > top of copyfromparse.c. Good idea. I've explained it in the comment. /Joel
v18-0001-Introduce-CopyFormat-and-replace-csv_mode-and-binary.patch
Description: Binary data
v18-0002-Add-COPY-format-single.patch
Description: Binary data
v18-0003-Reorganize-option-validations.patch
Description: Binary data