On Sat, Apr 4, 2026 at 12:38 PM Florents Tselai <[email protected]> wrote:
> Here's a v5 which is v4 + pgindent > > > > > On Fri, Apr 3, 2026 at 12:56 PM Florents Tselai <[email protected]> > wrote: > >> >> >> >> On Mon, Mar 2, 2026 at 5:44 AM Chao Li <[email protected]> wrote: >> >>> >>> >>> > On Feb 27, 2026, at 13:59, Florents Tselai <[email protected]> >>> wrote: >>> > >>> > >>> > >>> > On Thu, Feb 26, 2026 at 8:48 AM Chao Li <[email protected]> >>> wrote: >>> > >>> > >>> > > On Feb 1, 2026, at 19:02, Florents Tselai <[email protected]> >>> wrote: >>> > > >>> > > >>> > > >>> > > >>> > > On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai < >>> [email protected]> wrote: >>> > > Hi, >>> > > >>> > > in real-life I work a lot with json & fts search, here's a feature >>> I've always wished I had, >>> > > but never tackle it. Until yesterday that is. >>> > > >>> > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && >>> @.body tsmatch "performance")'); >>> > > >>> > > This patch introduces a tsmatch boolean operator to the JSONPath >>> engine. >>> > > By integrating FTS natively into path expressions, >>> > > this operator allows for high-precision filtering of nested JSONB >>> structures— >>> > > solving issues with structural ambiguity and query complexity. >>> > > >>> > > Currently, users must choose between two suboptimal paths for >>> FTS-ing nested JSON: >>> > > - Imprecise Global Indexing >>> > > jsonb_to_tsvector aggregates text into a flat vector. >>> > > This ignores JSON boundaries, leading to false positives when the >>> same key (e.g., "body") >>> > > appears in different contexts (e.g., a "Product Description" vs. a >>> "Customer Review"). >>> > > >>> > > - Complex SQL Workarounds >>> > > Achieving 100% precision requires unnesting the document via >>> jsonb_array_elements and LATERAL joins. >>> > > This leads to verbose SQL and high memory overhead from generating >>> intermediate heap tuples. >>> > > >>> > > One of the most significant advantages of tsmatch is its ability to >>> participate in multi-condition predicates >>> > > within the same JSON object - something jsonb_to_tsvector cannot do. >>> > > >>> > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && >>> @.body tsmatch "performance")'); >>> > > >>> > > In a flat vector, the association between "Alice" and "performance" >>> is lost. >>> > > tsmatch preserves this link by evaluating the FTS predicate in-place >>> during path traversal. >>> > > >>> > > While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly >>> define an FTS operator, >>> > > tsmatch is architecturally modeled after the standard-defined >>> like_regex. >>> > > >>> > > The implementation follows the like_regex precedent: >>> > > it is a non-indexable predicate that relies on GIN path-matching for >>> pruning and heap re-checks for precision. >>> > > Caching is scoped to the JsonPathExecContext, >>> > > ensuring 'compile-once' efficiency per execution without violating >>> the stability requirements of prepared statements. >>> > > >>> > > This initial implementation uses plainto_tsquery. >>> > > However, the grammar is designed to support a "mode" flag (similar >>> to like_regex flags) >>> > > in future iterations to toggle between to_tsquery, >>> websearch_to_tsquery, and phraseto_tsquery. >>> > > >>> > > Here's a v2, that implements the tsqparser clause >>> > > >>> > > So this should now work too >>> > > >>> > > select jsonb_path_query_array('["fast car", "slow car", "fast and >>> furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w") >>> <v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch> >>> > >>> > Hi Florents, >>> > >>> > Grant pinged me about this. I can review it in coming days. Can you >>> please rebase it? I failed to apply to current master. Also, the CF >>> reported a failure test case, please take a look. >>> > >>> > Hi Evan, >>> > thanks for having a look. The conflict was due to the intro of >>> pg_fallthrough. Not related to this patch . >>> > >>> > I noticed the failure too, but I'm having a hard time reproducing it >>> tbh. >>> > This fails for Debian Trixie with Meson. The same with Autoconf >>> passes... >>> > >>> > https://github.com/Florents-Tselai/postgres/runs/65098077968 >>> > >>> > >>> > >>> > >>> > <v3-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch> >>> >>> I have reviewed v3 and traced a few test cases. Here comes my review >>> comments: >>> >>> 1 >>> ``` >>> + <replaceable>string</replaceable> <literal>tsmatch</literal> >>> <replaceable>string</replaceable> >>> + <optional> <literal>tsconfig</literal> >>> <replaceable>string</replaceable> </optional> >>> + <optional> <literal>tsqparser</literal> >>> <replaceable>string</replaceable> </optional> >>> ``` >>> >>> For all “replaceable”, instead of “string”, would it be better to use >>> something more descriptive? For example: >>> ``` >>> <replaceable>json_string</replaceable> <literal>tsmatch</literal> >>> <replaceable>query</replaceable> >>> <optional> <literal>tsconfig</literal> >>> <replaceable>config_name</replaceable> </optional> >>> <optional> <literal>tsqparser</literal> >>> <replaceable>parser_mode</replaceable> </optional> >>> ``` >>> >>> 2 - jsonpath_gram.y >>> ``` >>> +static bool makeItemTsMatch(JsonPathParseItem *doc, >>> + JsonPathString >>> *tsquery, >>> + JsonPathString >>> *tsconfig, >>> + JsonPathString >>> *tsquery_parser, >>> + >>> JsonPathParseItem ** result, >>> + struct Node >>> *escontext); >>> ``` >>> >>> Format Nit: Looking at the existing code, the J in the second and >>> following lines, should be placed in the same column as the J in the first >>> line. >>> >>> 3 - jsonpath_gram.y >>> ``` >>> + | expr TSMATCH_P STRING_P >>> + { >>> + JsonPathParseItem *jppitem; >>> + /* Pass NULL for tsconfig (3rd) and NULL for >>> tsquery_parser (4th) */ >>> + if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, >>> escontext)) >>> + YYABORT; >>> + $$ = jppitem; >>> + } >>> + | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P >>> + { >>> + JsonPathParseItem *jppitem; >>> + /* Pass NULL for tsquery_parser (4th) */ >>> + if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, >>> escontext)) >>> + YYABORT; >>> + $$ = jppitem; >>> + } >>> + | expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P >>> + { >>> + JsonPathParseItem *jppitem; >>> + /* Pass NULL for tsconfig (3rd) */ >>> + if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, >>> escontext)) >>> + YYABORT; >>> + $$ = jppitem; >>> + } >>> + | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P >>> STRING_P >>> + { >>> + JsonPathParseItem *jppitem; >>> + if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, >>> escontext)) >>> + YYABORT; >>> + $$ = jppitem; >>> + } >>> ``` >>> >>> Feels a little redundant, repeatedly calls makeItemTsMatch. See the >>> attached diff for a simplification. But my version is a bit longer in terms >>> of number of lines. So, up to you. >>> >>> 4 - jsonpath_gram.y >>> ``` >>> +static bool >>> +makeItemTsMatch(JsonPathParseItem *doc, >>> + JsonPathString *tsquery, >>> + JsonPathString *tsconfig, >>> + JsonPathString *tsquery_parser, >>> + JsonPathParseItem **result, >>> + struct Node *escontext) >>> ``` >>> >>> makeItemTsMatch doesn’t need to return a bool. Actually, now it never >>> returns false, instead, it just ereport(ERROR). >>> >>> 5 - jsonpath.h >>> ``` >>> + struct >>> + { >>> + int32 doc; >>> + char *tsquery; >>> + uint32 tsquerylen; >>> + int32 tsconfig; >>> + char *tsqparser; >>> + uint32 tsqparser_len; >>> + } tsmatch; >>> >>> + struct >>> + { >>> + JsonPathParseItem *doc; >>> + char *tsquery; >>> + uint32 tsquerylen; >>> + JsonPathParseItem *tsconfig; >>> + char *tsqparser; >>> + uint32 tsqparser_len; >>> + } tsmatch; >>> } value; >>> ``` >>> >>> tsquerylen doesn’t have _ before len, and tsqparser_len, would it be >>> better to make naming conventions consistent in the same structure? >>> >>> 6 - jsonpath_exec.c >>> ``` >>> #include "tsearch/ts_utils.h" >>> #include "tsearch/ts_cache.h" >>> #include "utils/regproc.h" >>> #include "catalog/namespace.h" >>> >>> static JsonPathBool >>> executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg, >>> void *param) >>> ``` >>> >>> Why don’t put these includes to the header section together with other >>> includes? >>> >>> 7 - jsonpath_exec.c >>> ``` >>> + else >>> + { >>> + /* >>> + * Fallback or Error for unknown flags >>> (should be caught by >>> + * parser) >>> + */ >>> + ereport(ERROR, >>> + >>> (errcode(ERRCODE_SYNTAX_ERROR), >>> + errmsg("unrecognized >>> tsqparser flag"))); >>> + } >>> ``` >>> >>> This “else” should never be entered as the same check has been done by >>> makeItemTsMatch. So, maybe just use an Assert here, or pg_unreachable(). >>> >>> 8 - jsonpath_exec.c >>> ``` >>> + /* Setup Context (Run ONLY once per predicate) */ >>> + if (!cxt->initialized) >>> ``` >>> >>> While tracing this SQL: >>> ``` >>> evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb >>> evantest-# @@ '$.tags[*] ? (@ tsmatch "run" tsconfig "english")'; >>> ?column? >>> ---------- >>> >>> (1 row) >>> ``` >>> >>> I noticed that, when process “jogging”, cxt->initialized is still false, >>> meaning that, the cxt is not reused across array items. Given the same >>> tsconfig should apply to all array items, I think cxt should be reused. >>> >>> 9 - jsonpath_exec.c >>> ``` >>> + /* Select Parser and Compile Query */ >>> + parser_mode = jsp->content.tsmatch.tsqparser; >>> + parser_len = jsp->content.tsmatch.tsqparser_len; >>> + >>> + if (parser_len > 0) >>> + { >>> + /* Dispatch based on flag */ >>> + if (pg_strncasecmp(parser_mode, "pl", >>> parser_len) == 0) >>> ``` >>> >>> Nit: parser_mode is only used inside if (parser_len > 0), it can be >>> defined inside the “if”. >>> >>> 10 - jsonpath_gram.y >>> ``` >>> + ereport(ERROR, >>> + (errcode(ERRCODE_SYNTAX_ERROR), >>> + errmsg("invalid tsquery_parser >>> value: \"%s\"", tsquery_parser->val), >>> + errhint("Valid values are >>> \"pl\", \"ph\", and \"w\"."))); >>> ``` >>> >>> When tested a case with an invalid parser, I got: >>> ``` >>> evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb >>> >>> @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqparser >>> "pss")'; >>> ERROR: invalid tsquery_parser value: "pss @" >>> LINE 2: @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqpar... >>> ^ >>> HINT: Valid values are "pl", "ph", and "w". >>> ``` >>> >>> You can see the it shows a bad looking invalid value. I think that’s >>> because tsquery_parser->val is not NULL terminated. I fixed this problem >>> with: >>> ``` >>> errmsg("invalid tsquery_parser value: \"%.*s\"", (int) >>> tsquery_parser->len, tsquery_parser->val), >>> ``` >>> >>> This change is also included in the attached diff file. >>> >>> 11 - jsonpath.c >>> ``` >>> + if (printBracketes) >>> + appendStringInfoChar(buf, ')'); >>> + break; >>> + >>> if (printBracketes) >>> appendStringInfoChar(buf, ')'); >>> ``` >>> >>> Duplicate code. Looks like a copy-pasto. >>> >>> 12 - jsonpath.c >>> ``` >>> + /* Write the Main Query String */ >>> + appendBinaryStringInfo(buf, >>> + >>> &item->value.tsmatch.tsquerylen, >>> + >>> sizeof(item->value.tsmatch.tsquerylen)); >>> + appendBinaryStringInfo(buf, >>> + >>> item->value.tsmatch.tsquery, >>> + >>> item->value.tsmatch.tsquerylen); >>> + appendStringInfoChar(buf, '\0'); >>> ``` >>> >>> I don’t think we need to manually append ‘\0’ after >>> appendBinaryStringInfo. Looking at the header comment of >>> appendBinaryStringInfo, it says that a trailing null will be added. >>> ``` >>> /* >>> * appendBinaryStringInfo >>> * >>> * Append arbitrary binary data to a StringInfo, allocating more space >>> * if necessary. Ensures that a trailing null byte is present. >>> */ >>> void >>> appendBinaryStringInfo(StringInfo str, const void *data, int datalen) >>> ``` >>> >> >> Here's a v4 which incorporates most of Evan's comments & feedback >> - shifted tsquery compilation logic to use persistent cache within >> JsonPathExecContext >> - Fixed a binary serialization alignment issue which caused Dixie to fail >> earlier >> - I've refactor and simplified the grammar per Evan's input by adding a >> tsmatch_opts rule. >> - Also updated the docs per Evan's comments >> >> >> Andrew was kind enough to give some comments off-list. Here's a v6 - Fixed soft-error path in makeItemTsMatch - ereport replaced with errsave, return value now checked at the grammar call site with YYABORT - Fixed pg_strncasecmp prefix match bug in parser-mode validation - "p" was being accepted as "pl" - Fixed get_ts_config_oid to throw on unknown tsconfig instead of passing InvalidOid downstream - Grammar restructured to make tsconfig and tsqparser order-independent - Removed unused JsonTsMatchContext struct and some misplaced #includes
v6-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch
Description: Binary data
