Hi Chao, On Tue, Sep 2, 2025 at 11:57 PM Chao Li <li.evan.c...@gmail.com> wrote:
> On Sep 3, 2025, at 10:20, Alexandra Wang <alexandra.wang....@gmail.com> > wrote: > > This change would give an incorrect result for an accessor like > "[0].a" when the jsonb column data is a jsonb object instead of a > jsonb array. I've added two new test cases to cover this scenario: > > In jsonb.sql, the newly added tests are: > select (jb)[0].a from test_jsonb_dot_notation; -- returns same result as > (jb).a > select (jb)[1].a from test_jsonb_dot_notation; -- returns NULL > > > I think the latest patch is still wrong. Ultimately, dot notation “.a", > index “[0] and slice "[1:4]” rely on jsonpath, and subscript [“a”] relies > on the rest logic in jsonb_subscript_transform() in “foreach”. > > Now “jsonb_check_jsonpath_needed()” checks only dot nation and slice, but > not checking index case. So that reason why your case “select (jb)[0].a” > works is because the second indirection is a dot nation. However, “select > (jb)[0][‘a’]” will not work. See my test: > > ``` > evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0].name; > name > --------- > "Alice" > (1 row) > > evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]['name']; > jsonb > ------- > > (1 row) > ``` > > In my test, (jsonb)[0][’name’] should also return “Alice”. > > So, end up, “jsonb_check_jsonpath_needed()” only does incomplete and > inaccurate checks, only jsonb_subscript_make_jsonpath() can make an > accurate decision and return a null jsonpath upon subscript “[‘a’]”. > > As a result, “json_check_jsonpath_needed()” feels not needed at all. In > jsonb_subscript_transform(), just go ahead to run > jsonb_subscript_make_jsonpath() first, if returned jsonpath is NULL, then > run rest of logic. > > With my dirty change of removing json_check_jsonpath_needed: > ``` > chaol@ChaodeMacBook-Air postgresql % git diff > diff --git a/src/backend/utils/adt/jsonbsubs.c > b/src/backend/utils/adt/jsonbsubs.c > index 374040b3b4e..d9faab5c799 100644 > --- a/src/backend/utils/adt/jsonbsubs.c > +++ b/src/backend/utils/adt/jsonbsubs.c > @@ -416,12 +416,12 @@ jsonb_subscript_transform(SubscriptingRef *sbsref, > sbsref->refrestype = JSONBOID; > sbsref->reftypmod = -1; > > - if (jsonb_check_jsonpath_needed(*indirection, isSlice)) > - { > + //if (jsonb_check_jsonpath_needed(*indirection, isSlice)) > + //{ > jsonb_subscript_make_jsonpath(pstate, indirection, sbsref); > if (sbsref->refjsonbpath) > return; > - } > + //} > > /* > * We reach here only in two cases: (a) the JSON simplified > accessor is > ``` > > You can see: > ``` > evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]['name']; > jsonb > --------- > "Alice" > (1 row) > > evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0].name; > name > --------- > "Alice" > (1 row) > ``` > > Then I found “make check” failed. For example the first broken test case: > > ``` > @@ -4998,7 +4998,7 @@ > select ('123'::jsonb)[0]; > jsonb > ------- > - > + 123 > (1 row) > ``` > > The test expected an empty result, which implies “strict” mode. > > But the problem is, which mode should be the default? JSON_QUERY() uses > “lax” as default mode. And from Peter Eisentraut’s blog: > https://peter.eisentraut.org/blog/2023/04/04/sql-2023-is-finished-here-is-whats-new > > ``` > The semantics of this are defined in terms of JSON_QUERY and JSON_VALUE > constructions > (which have been available since SQL:2016), so this really just syntactic > sugar. > ``` > > Also feels like “lax” should be the default mode. If that is true, then my > dirty change of removing “json_check_jsonpath_need()” works properly. > > The current logic with this patch sounds strange. Because > “json_check_jsonpath_need()” iterate through unprocessed indirections to > decide if jsonpath is needed (lax mode). With this logic: > > 1) if index [0] directly following dot notation, like (data).a[0], it’s > lax mode > 2) if index [0] directly following subscript [‘a’], like (data)[‘a’][0], > it’s strict mode > 3) if index [0] directly following the data column, then if there is a dot > nation in indirection list, use lax mode, otherwise strict mode. For the > failed test case, as there is no more indirection following [0], so it > expected strict mode. > > I wonder where this behavior is defined? > > With my change, 1) and 2) are the same, for 3), if index [0] directly > following the data column, regardless what indirections are followed, it’s > by default lax mode. > > So, I think this is a design decision. Maybe I missed something from your > previous design, but I don’t find anything about that from the commit > comments. I feel this would be better aligned with 1) and 2). > Thanks for investigating this, and for the great questions! Here’s a bit of background. The pre-standard jsonb subscripting feature [1] was introduced in Postgres 14. Specifically, support for queries like (jb)[0] and (jb)[0]['a'] was added by this commit: commit 676887a3b0b8e3c0348ac3f82ab0d16e9a24bd43 (HEAD) Author: Alexander Korotkov <akorot...@postgresql.org> Date: Sun Jan 31 23:50:40 2021 +0300 Implementation of subscripting for jsonb Without my patch, from version 14 through the current master branch, all supported versions of Postgres return the following results: test=# select ('123'::jsonb)[0]; jsonb ------- (1 row) test=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]; jsonb ------- (1 row) test=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]['age']; jsonb ------- (1 row) The simplified accessor syntax defined in the SQL standard includes dot-notation access as well as integer-based subscripts. It does not include text-based subscripts such as ['a']. Therefore, the only overlap between the pre-standard jsonb subscripting and the standard simplified accessor is non-slicing integer-based subscripts. (Since the pre-standard jsonb subscripting never supported slicing, we don’t need to consider that case either when comparing the two syntaxes.) When we compare the two syntaxes for non-slicing integer-based subscripts, the only discrepancy is that for a jsonb object jb, (jb)[0] in the pre-standard syntax returns NULL, while (jb)[0] in the standard syntax returns (jb) itself. As long as the integer subscript is non-zero, both syntaxes return the same results. I think this (jb)[0] case is rather trivial. However, since it's been behaving the pre-standard way since PG14, I hesitate to change the existing behavior as part of my patches, and I feel it could be a bikeshedding kind of discussion in a separate thread. The main goal of my patch is to add dot-notation. For now, my intention is for cases like (jb)[0].a to work in the standard way, because if a user chooses dot-notation instead of text-based subscripting, they are likely expecting the standard behavior. Thoughts? [1] https://www.postgresql.org/docs/current/datatype-json.html#JSONB-SUBSCRIPTING