On Thu, May 11, 2023 at 1:56 AM Peter J. Holzer <hjp-pg...@hjp.at> wrote:

> On 2023-05-10 22:52:47 +0200, Marc Millas wrote:
> > On Wed, May 10, 2023 at 7:24 PM Peter J. Holzer <hjp-pg...@hjp.at>
> wrote:
> >
> >     On 2023-05-10 16:35:04 +0200, Marc Millas wrote:
> >     >  Unique  (cost=72377463163.02..201012533981.80 rows=1021522829864
> width=
> >     97)
> >     >    ->  Gather Merge  (cost=72377463163.02..195904919832.48 rows=
> >     1021522829864 width=97)
> >     ...
> >     >                ->  Parallel Hash Left Join  (cost=
> >     604502.76..1276224253.51 rows=204304565973 width=97)
> >     >                      Hash Cond: ((t1.col_ano)::text =
> (t2.col_ano)::text)
> >     ...
> >     >
> >     > //so.. the planner guess that those 2 join will generate 1000
> billions
> >     rows...
> >
> >     Are some of the col_ano values very frequent? If say the value 42
> occurs
> >     1 million times in both table_a and table_b, the join will create 1
> >     trillion rows for that value alone. That doesn't explain the crash
> or the
> >     disk usage, but it would explain the crazy cost (and would probably
> be a
> >     hint that this query is unlikely to finish in any reasonable time).
> >
> >
> > good guess, even if a bit surprising: there is one (and only one)
> "value" which
> > fit your supposition: NULL
>
> But NULL doesn't equal NULL, so that would result in only one row in the
> left join. So that's not it.
>


so, apo...

the 750000 lines in each tables are not NULLs but '' empty varchar, which,
obviously is not the same thing.
and which perfectly generates 500 billions lines for the left join.
So, no planner or statistics pbs. apologies for the time wasted.
Back to the initial pb:
if, with temp_file_limit positioned to 210 GB, I try to run the select *
from table_a left join table_b on the col_a (which contains the 750000 ''
on both tables)
then postgres do crash, killed by oom, after having taken 1.1 TB of
additional disk space.
the explain plan guess 512 planned partitions. (obviously, I cannot provide
an explain analyze...)

to my understanding, before postgres 13, hash aggregate did eat RAM
limitless in such circumstances.
but in 14.2 ??
(I know, 14.8 is up...)


>         hp
>
> --
>    _  | Peter J. Holzer    | Story must make more sense than reality.
> |_|_) |                    |
> | |   | h...@hjp.at         |    -- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |       challenge!"
>



Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com

Reply via email to