On Fri May 17, 2024 at 7:34 PM IST, Robert Haas wrote: > On Fri, May 17, 2024 at 9:42 AM Christoph Berg <[email protected]> wrote: >> Thanks for summarizing the thread. >> >> Things mentioned in the thread: >> >> 1) rendering of SQL NULLs - include or omit the column >> >> 2) rendering of JSON values - both "quoted string" and "inline as >> JSON" make sense >> >> 3) not quoting numeric values and booleans >> >> 4) no special treatment of other datatypes like arrays or compound >> values, just quote them >> >> 5) row format: JSON object or array (array would be close to CSV >> format) >> >> 6) overall format: array of rows, or simply print each row separately >> ("JSON Lines" format, https://jsonlines.org/) >> >> I think 1, 2 and perhaps 6 make sense to have configurable. Two or >> three \pset options (or one option with a list of flags) don't sound >> too bad complexity-wise. >> >> Or maybe just default to "omit NULL columns" and "inline JSON" (and >> render null as NULL). > > If we're going to just have one option, I agree with making that the > default, and I'd default to an array of row objects. If we're going to > have something configurable, I'd at least consider making (4) > configurable. > > It's tempting to just have one option, like \pset jsonformat > nullcolumns=omit;inlinevalues=json,array;rowformat=object;resultcontainer=array > simply because adding a ton of new options just for this isn't very > appealing. But looking at how long that is, it's probably not a great > idea. So I guess separate options is probably better?
I came across another use case for the JSON output format, and after being directed by Christoph Berg, I am adding this to the thread. Suppose there are two PostgreSQL databases deployed with the same data. Over time, only one was updated, causing data drift. The question to answer now was "Which values are present in database A but not in B?" To achieve this, we can perform a simple set difference over the PK of the tables that have the drift to understand what data needs to be migrated. PostgreSQL doesn’t natively support (i.e., without extensions) cross-database queries. Below are listed some of my personal biases and decisions made: 1. I do not want to add extensions like postgres_fdw for a simple and ad-hoc task. 2. I would rather work with JSON rather than something like CSV because of my familiarity with its tooling (e.g., jq). Following the above decisions/biases, I used CSV as an intermediary format, converted it to a JSON array of PK, and finally used jq to perform the actual set operation. Looking back, I could have used the json_agg function. However, that would require messing with the query. No approach would be better than piping --json output to jq. After going through the thread, the following are my humble opinions: 1. For row format, I strongly urge using objects rather than arrays. I say this because the majority of JSON tooling expects rows to be modeled as records with name fields. The JSON output format should mirror CSV in behaviour while being structurally compatible with the default input expectations of downstream JSON tooling. 2. If we consider null, "null", and omit column as three possible methods to render null values, and `nullcolumns` and `inlinevalues` (options suggested by Robert Haas) as two options that decide on the rendering, we can maintain the semantic difference between NULL and JSON null in 3 out of 4 cases. When the options are set to nullcolumns=noomit and inlinevalues=json, there exists only one method to render null values, which is null. If null semantics are more important than consistency and simplicity, this behaviour can be considered. Regards, Pratik Thakare
