On Wed, Sep 6, 2023, at 8:09 PM, Ashutosh Bapat wrote: > Hi Lepikhov, > > Thanks for using my patch and I am glad that you found it useful. > > On Mon, Sep 4, 2023 at 10:56 AM Lepikhov Andrei > <a.lepik...@postgrespro.ru> wrote: >> >> Hi, hackers, >> >> Looking at the planner behaviour with the memory consumption patch [1], I >> figured out that arrays increase memory consumption by the optimizer >> significantly. See init.sql in attachment. >> The point here is that the planner does small memory allocations for each >> element during estimation. As a result, it looks like the planner consumes >> about 250 bytes for each integer element. > > I guess the numbers you mentioned in init.sql are total memory used by > the planner (as reported by the patch in the thread) when planning > that query and not memory consumed by Const nodes themselves. Am I > right? I think the measurements need to be explained better and also > the realistic scenario you are trying to oprimize.
Yes, it is the total memory consumed by the planner - I used the numbers generated by your patch [1]. I had been increasing the number of elements in the array to exclude the memory consumed by the planner for other purposes. As you can see, the array with 1 element consumes 12kB of memory, 1E4 elements - 2.6 MB. All of that memory increment is related to the only enlargement of this array. (2600-12)/10 = 260 bytes. So, I make a conclusion: each 4-byte element produces a consumption of 260 bytes of memory. This scenario I obtained from the user complaint - they had strict restrictions on memory usage and were stuck in this unusual memory usage case. > I guess, the reason you think that partitioning will increase the > memory consumed is because each partition will have the clause > translated for it. Selectivity estimation for each partition will > create those many Const nodes and hence consume memory. Am I right? Yes. > Can you please measure the memory consumed with and without your > patch. Done. See test case and results in 'init_parts.sql' in attachment. Short summary below. I varied a number of elements from 1 to 10000 and partitions from 1 to 100. As you can see, partitioning adds a lot of memory consumption by itself. But we see an effect from patch also. master: elems 1 1E1 1E2 1E3 1E4 parts 1 28kB 50kB 0.3MB 2.5MB 25MB 10 45kB 143kB 0.6MB 4.8MB 47MB 100 208kB 125kB 3.3MB 27MB 274MB patched: elems 1 1E1 1E2 1E3 1E4 parts 1 28kB 48kB 0.25MB 2.2MB 22.8MB 10 44kB 100kB 313kB 2.4MB 23.7MB 100 208kB 101kB 0.9MB 3.7MB 32.4MB Just for comparison, without partitioning: elems 1 1E1 1E2 1E3 1E4 master: 12kB 14kB 37kB 266kB 2.5MB patched: 12kB 11.5kB 13kB 24kB 141kB >> It is maybe not a problem most of the time. However, in the case of >> partitions, memory consumption multiplies by each partition. Such a corner >> case looks weird, but the fix is simple. So, why not? > > With vectorized operations becoming a norm these days, it's possible > to have thousands of element in array of an ANY or IN clause. Also > will be common to have thousands of partitions. But I think what we > need to do here is to write a selectivity estimation function which > takes an const array and return selectivity without requiring to > create a Const node for each element. Maybe you're right. Could you show any examples of vectorized usage of postgres to understand your idea more clearly? Here I propose only quick simple solution. I don't think it would change the way of development. >> The diff in the attachment is proof of concept showing how to reduce wasting >> of memory. Having benchmarked a bit, I didn't find any overhead. >> > > You might want to include your benchmarking results as well. Here is nothing interesting. pgbench TPS and planning time for the cases above doesn't change planning time. [1] Report planning memory in EXPLAIN ANALYZE -- Regards, Andrei Lepikhov
init_parts.sql
Description: application/sql