On Wed, Dec 6, 2023 at 4:03 PM Andrew Dunstan <and...@dunslane.net> wrote:

> > The output size difference does say that maybe we should pay some
> > attention to the nearby request to not always label every field.
> > Perhaps there should be an option for each row to transform to
> > a JSON array rather than an object?
>
> I doubt it. People who want this are likely to want pretty much what
> this patch is providing, not something they would have to transform in
> order to get it. If they want space-efficient data they won't really be
> wanting JSON. Maybe they want Protocol Buffers or something in that vein.
>

For arrays v.s. objects, it's not just about data size. There are plenty of
situations where a JSON array is superior to an object (e.g. duplicate
column names). Lines of JSON arrays of strings is pretty much CSV with JSON
escaping rules and a pair of wrapping brackets. It's common for tabular
data in node.js environments as you don't need a separate CSV parser.

Each one has its place and a default of the row_to_json(...) representation
of the row still makes sense. But if the user has the option of outputting
a single json/jsonb field for each row without an object or array wrapper,
then it's possible to support all of these use cases as the user can
explicitly pick whatever envelope makes sense:

-- Lines of JSON arrays:
COPY (SELECT json_build_array('test-' || a, b) FROM generate_series(1, 3)
a, generate_series(5,6) b) TO STDOUT WITH (FORMAT JSON,
SOME_OPTION_TO_DISABLE_ENVELOPE);
["test-1", 5]
["test-2", 5]
["test-3", 5]
["test-1", 6]
["test-2", 6]
["test-3", 6]

-- Lines of JSON strings:
COPY (SELECT to_json('test-' || x) FROM generate_series(1, 5) x) TO STDOUT
WITH (FORMAT JSON, SOME_OPTION_TO_DISABLE_ENVELOPE);
"test-1"
"test-2"
"test-3"
"test-4"
"test-5"

I'm not sure how I feel about the behavior being automatic if it's a single
top level json / jsonb field rather than requiring the explicit option.
It's probably what a user would want but it also feels odd to change the
output wrapper automatically based on the fields in the response. If it is
automatic and the user wants the additional envelope, the option always
exists to wrap it further in another: json_build_object('some_field",
my_field_i_want_wrapped)

The duplicate field names would be a good test case too. I haven't gone
through this patch but I'm guessing it doesn't filter out duplicates so the
behavior would match up with row_to_json(...), i.e. duplicates are
preserved:

=> SELECT row_to_json(t.*) FROM (SELECT 1 AS a, 2 AS a) t;
  row_to_json
---------------
 {"a":1,"a":2}

If so, that's a good test case to add as however that's handled should be
deterministic.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Reply via email to