> On Mar 18, 2026, at 15:52, Chao Li <[email protected]> wrote:
>
>
>
>> On Mar 18, 2026, at 05:33, Tom Lane <[email protected]> wrote:
>>
>> I got an off-list report that a query like this consumes
>> an unreasonable amount of memory:
>>
>> SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000)
>> i),
>> '$[*] ? (@ < $)');
>>
>> For me, that eats about 6GB by the time it's done executing.
>> If that doesn't seem like a lot to you, just add another zero to the
>> generate_series call, and then it'll be more like 600GB, because the
>> leakage is O(N^2).
>>
>> Admittedly, this isn't an especially useful query: its runtime is
>> also O(N^2), because that path expression basically requires us to
>> compare every element of the input JSON array to every other element.
>> But it's not cool that it leaks so much memory while at it.
>>
>> I poked into this and found that the leakage is entirely composed of
>> "JsonValueList"s that are built during path evaluation and then just
>> left to rot until the end of jsonb_path_query(). We can fix it by
>> being careful to free those lists on the way out of each jsonpath
>> evaluation function that creates one. However, just doing that would
>> mean adding pfree overhead on top of palloc overhead, so I went a bit
>> further and reimplemented JsonValueList to be more compact and cheaper
>> to allocate/free. The attached seems to be a bit faster than the
>> existing code as well as not leaking so much memory. See the draft
>> commit message for more details.
>>
>> regards, tom lane
>
> This patch looks like a big win. It not only saves memory, but also makes the
> query much faster.
>
> I tested the query on my MacBook M4, increasing the iteration count from
> 10000 to 50000.
>
> Current master (3b4c2b9db25):
> ```
> evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM
> generate_series(1,50000) i), '$[*] ? (@ < $)');
> Time: 208581.771 ms (03:28.582)
> evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM
> generate_series(1,50000) i), '$[*] ? (@ < $)');
> Time: 217269.595 ms (03:37.270)
> ```
>
> With the patch:
> ```
> evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM
> generate_series(1,50000) i), '$[*] ? (@ < $)');
> Time: 18674.580 ms (00:18.675)
> evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM
> generate_series(1,50000) i), '$[*] ? (@ < $)');
> Time: 18889.329 ms (00:18.889)
> ```
>
> My observations were:
>
> * Before the patch, the backend process memory usage fluctuated between
> roughly 50GB and 145GB, while CPU usage stayed around 30%.
> * With the patch, the backend process memory usage stayed stable at around
> 30MB, while CPU usage stayed around 100%.
>
> After reviewing the patch, I thought JsonValueListLength() might be worth
> optimizing, since it is O(n). I tried adding an ntotal_items field to
> JsonValueList to track the total number of items, similar to the last pointer
> that is only meaningful in the base chunk. But that did not help in my test,
> and I realized JsonValueListLength() is not on the hottest path, so I dropped
> that idea.
>
> From the MacOS Instruments tool, the most expensive parts seem to be
> fillJsonbValue, JsonbIteratorNext, cmp_var_common, and cmp_numerics. But
> those look like separate topics.
>
> Overall, this looks like a solid patch.
>
Forgot to mention that, to run the tests, I turned off debug and assertion, and
compiled with -O2.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/