On Fri May 17, 2024 at 7:34 PM IST, Robert Haas wrote:
> On Fri, May 17, 2024 at 9:42 AM Christoph Berg <[email protected]> wrote:
>> Thanks for summarizing the thread.
>>
>> Things mentioned in the thread:
>>
>> 1) rendering of SQL NULLs - include or omit the column
>>
>> 2) rendering of JSON values - both "quoted string" and "inline as
>>    JSON" make sense
>>
>> 3) not quoting numeric values and booleans
>>
>> 4) no special treatment of other datatypes like arrays or compound
>>    values, just quote them
>>
>> 5) row format: JSON object or array (array would be close to CSV
>>    format)
>>
>> 6) overall format: array of rows, or simply print each row separately
>>    ("JSON Lines" format, https://jsonlines.org/)
>>
>> I think 1, 2 and perhaps 6 make sense to have configurable. Two or
>> three \pset options (or one option with a list of flags) don't sound
>> too bad complexity-wise.
>>
>> Or maybe just default to "omit NULL columns" and "inline JSON" (and
>> render null as NULL).
>
> If we're going to just have one option, I agree with making that the
> default, and I'd default to an array of row objects. If we're going to
> have something configurable, I'd at least consider making (4)
> configurable.
>
> It's tempting to just have one option, like \pset jsonformat
> nullcolumns=omit;inlinevalues=json,array;rowformat=object;resultcontainer=array
> simply because adding a ton of new options just for this isn't very
> appealing. But looking at how long that is, it's probably not a great
> idea. So I guess separate options is probably better?

I came across another use case for the JSON output format, and after being 
directed by Christoph Berg, I am adding this to the thread.

Suppose there are two PostgreSQL databases deployed with the same data. Over 
time, only one was updated, causing data drift. The question to answer now was 
"Which values are present in database A but not in B?" To achieve this, we can 
perform a simple set difference over the PK of the tables that have the drift 
to understand what data needs to be migrated. PostgreSQL doesn’t natively 
support (i.e., without extensions) cross-database queries.

Below are listed some of my personal biases and decisions made:

1. I do not want to add extensions like postgres_fdw for a simple and ad-hoc 
task.
2. I would rather work with JSON rather than something like CSV because of my 
familiarity with its tooling (e.g., jq).

Following the above decisions/biases, I used CSV as an intermediary format, 
converted it to a JSON array of PK, and finally used jq to perform the actual 
set operation. Looking back, I could have used the json_agg function. However, 
that would require messing with the query.

No approach would be better than piping --json output to jq.

After going through the thread, the following are my humble opinions:

1. For row format, I strongly urge using objects rather than arrays. I say this 
because the majority of JSON tooling expects rows to be modeled as records with 
name fields. The JSON output format should mirror CSV in behaviour while being 
structurally compatible with the default input expectations of downstream JSON 
tooling.
2. If we consider null, "null", and omit column as three possible methods to 
render null values, and `nullcolumns` and `inlinevalues` (options suggested by 
Robert Haas) as two options that decide on the rendering, we can maintain the 
semantic difference between NULL and JSON null in 3 out of 4 cases. When the 
options are set to nullcolumns=noomit and inlinevalues=json, there exists only 
one method to render null values, which is null. If null semantics are more 
important than consistency and simplicity, this behaviour can be considered.

Regards,
Pratik Thakare


Reply via email to