Re: Optimize planner memory consumption for huge arrays

Lepikhov Andrei Thu, 07 Sep 2023 22:12:28 -0700


On Wed, Sep 6, 2023, at 8:09 PM, Ashutosh Bapat wrote:
> Hi Lepikhov,
>
> Thanks for using my patch and I am glad that you found it useful.
>
> On Mon, Sep 4, 2023 at 10:56 AM Lepikhov Andrei
> <a.lepik...@postgrespro.ru> wrote:
>>
>> Hi, hackers,
>>
>> Looking at the planner behaviour with the memory consumption patch [1], I 
>> figured out that arrays increase memory consumption by the optimizer 
>> significantly. See init.sql in attachment.
>> The point here is that the planner does small memory allocations for each 
>> element during estimation. As a result, it looks like the planner consumes 
>> about 250 bytes for each integer element.
>
> I guess the numbers you mentioned in init.sql are total memory used by
> the planner (as reported by the patch in the thread) when planning
> that query and not memory consumed by Const nodes themselves. Am I
> right? I think the measurements need to be explained better and also
> the realistic scenario you are trying to oprimize.


Yes, it is the total memory consumed by the planner - I used the numbers 
generated by your patch [1]. I had been increasing the number of elements in 
the array to exclude the memory consumed by the planner for other purposes. As 
you can see, the array with 1 element consumes 12kB of memory, 1E4 elements - 
2.6 MB. All of that memory increment is related to the only enlargement of this 
array. (2600-12)/10 = 260 bytes. So, I make a conclusion: each 4-byte element 
produces a consumption of 260 bytes of memory.
This scenario I obtained from the user complaint - they had strict restrictions 
on memory usage and were stuck in this unusual memory usage case.

> I guess, the reason you think that partitioning will increase the
> memory consumed is because each partition will have the clause
> translated for it. Selectivity estimation for each partition will
> create those many Const nodes and hence consume memory. Am I right?

Yes.

> Can you please measure the memory consumed with and without your
> patch.

Done. See test case and results in 'init_parts.sql' in attachment. Short 
summary below. I varied a number of elements from 1 to 10000 and partitions 
from 1 to 100. As you can see, partitioning adds a lot of memory consumption by 
itself. But we see an effect from patch also.

master:
elems   1               1E1             1E2             1E3             1E4     
parts
1               28kB    50kB    0.3MB   2.5MB   25MB
10              45kB    143kB   0.6MB   4.8MB   47MB
100             208kB   125kB   3.3MB   27MB    274MB

patched:
elems   1               1E1             1E2             1E3             1E4
parts
1               28kB    48kB    0.25MB  2.2MB   22.8MB
10              44kB    100kB   313kB   2.4MB   23.7MB
100             208kB   101kB   0.9MB   3.7MB   32.4MB

Just for comparison, without partitioning:
elems   1               1E1             1E2             1E3             1E4     
master: 12kB    14kB    37kB    266kB   2.5MB
patched:        12kB    11.5kB  13kB    24kB    141kB

>> It is maybe not a problem most of the time. However, in the case of 
>> partitions, memory consumption multiplies by each partition. Such a corner 
>> case looks weird, but the fix is simple. So, why not?
>
> With vectorized operations becoming a norm these days, it's possible
> to have thousands of element in array of an ANY or IN clause. Also
> will be common to have thousands of partitions. But I think what we
> need to do here is to write a selectivity estimation function which
> takes an const array and return selectivity without requiring to
> create a Const node for each element.

Maybe you're right. Could you show any examples of vectorized usage of postgres 
to understand your idea more clearly?
Here I propose only quick simple solution. I don't think it would change the 
way of development.

>> The diff in the attachment is proof of concept showing how to reduce wasting 
>> of memory. Having benchmarked a bit, I didn't find any overhead.
>>
>
> You might want to include your benchmarking results as well.

Here is nothing interesting. pgbench TPS and planning time for the cases above 
doesn't change planning time.

[1] Report planning memory in EXPLAIN ANALYZE

-- 
Regards,
Andrei Lepikhov

init_parts.sql
Description: application/sql

Re: Optimize planner memory consumption for huge arrays

Reply via email to