> On Mar 18, 2026, at 15:52, Chao Li <[email protected]> wrote:
> 
> 
> 
>> On Mar 18, 2026, at 05:33, Tom Lane <[email protected]> wrote:
>> 
>> I got an off-list report that a query like this consumes
>> an unreasonable amount of memory:
>> 
>> SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) 
>> i),
>>                       '$[*] ? (@ < $)');
>> 
>> For me, that eats about 6GB by the time it's done executing.
>> If that doesn't seem like a lot to you, just add another zero to the
>> generate_series call, and then it'll be more like 600GB, because the
>> leakage is O(N^2).
>> 
>> Admittedly, this isn't an especially useful query: its runtime is
>> also O(N^2), because that path expression basically requires us to
>> compare every element of the input JSON array to every other element.
>> But it's not cool that it leaks so much memory while at it.
>> 
>> I poked into this and found that the leakage is entirely composed of
>> "JsonValueList"s that are built during path evaluation and then just
>> left to rot until the end of jsonb_path_query().  We can fix it by
>> being careful to free those lists on the way out of each jsonpath
>> evaluation function that creates one.  However, just doing that would
>> mean adding pfree overhead on top of palloc overhead, so I went a bit
>> further and reimplemented JsonValueList to be more compact and cheaper
>> to allocate/free.  The attached seems to be a bit faster than the
>> existing code as well as not leaking so much memory.  See the draft
>> commit message for more details.
>> 
>> regards, tom lane
> 
> This patch looks like a big win. It not only saves memory, but also makes the 
> query much faster.
> 
> I tested the query on my MacBook M4, increasing the iteration count from 
> 10000 to 50000.
> 
> Current master (3b4c2b9db25):
> ```
> evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM 
> generate_series(1,50000) i), '$[*] ? (@ < $)');
> Time: 208581.771 ms (03:28.582)
> evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM 
> generate_series(1,50000) i), '$[*] ? (@ < $)');
> Time: 217269.595 ms (03:37.270)
> ```
> 
> With the patch:
> ```
> evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM 
> generate_series(1,50000) i), '$[*] ? (@ < $)');
> Time: 18674.580 ms (00:18.675)
> evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM 
> generate_series(1,50000) i), '$[*] ? (@ < $)');
> Time: 18889.329 ms (00:18.889)
> ```
> 
> My observations were:
> 
> * Before the patch, the backend process memory usage fluctuated between 
> roughly 50GB and 145GB, while CPU usage stayed around 30%.
> * With the patch, the backend process memory usage stayed stable at around 
> 30MB, while CPU usage stayed around 100%.
> 
> After reviewing the patch, I thought JsonValueListLength() might be worth 
> optimizing, since it is O(n). I tried adding an ntotal_items field to 
> JsonValueList to track the total number of items, similar to the last pointer 
> that is only meaningful in the base chunk. But that did not help in my test, 
> and I realized JsonValueListLength() is not on the hottest path, so I dropped 
> that idea.
> 
> From the MacOS Instruments tool, the most expensive parts seem to be 
> fillJsonbValue, JsonbIteratorNext, cmp_var_common, and cmp_numerics. But 
> those look like separate topics.
> 
> Overall, this looks like a solid patch.
> 

Forgot to mention that, to run the tests, I turned off debug and assertion, and 
compiled with -O2.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/






Reply via email to