Re: planner chooses incremental but not the best one

2024-02-15 Thread Tomas Vondra



On 2/15/24 13:45, Andrei Lepikhov wrote:
> On 15/2/2024 18:10, Tomas Vondra wrote:
>>
>>
>> On 2/15/24 07:50, Andrei Lepikhov wrote:
>>> On 18/12/2023 19:53, Tomas Vondra wrote:
 On 12/18/23 11:40, Richard Guo wrote:
 The challenge is where to get usable information about correlation
 between columns. I only have a couple very rought ideas of what might
 try. For example, if we have multi-column ndistinct statistics, we
 might
 look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something from

   ndistinct(b,c,d) / ndistinct(b,c)

 If we know how many distinct values we have for the predicate
 column, we
 could then estimate the number of groups. I mean, we know that for the
 restriction "WHERE b = 3" we only have 1 distinct value, so we could
 estimate the number of groups as

   1 * ndistinct(b,c)
>>> Did you mean here ndistinct(c,d) and the formula:
>>> ndistinct(b,c,d) / ndistinct(c,d) ?
>>
>> Yes, I think that's probably a more correct ... Essentially, the idea is
>> to estimate the change in number of distinct groups after adding a
>> column (or restricting it in some way).
> Thanks, I got it. I just think how to implement such techniques with
> extensions just to test the idea in action. In the case of GROUP-BY we
> can use path hook, of course. But what if to invent a hook on clauselist
> estimation?

Maybe.

I have thought about introducing such hook to alter estimation of
clauses, so I'm not opposed to it. Ofc, it depends on where would the
hook be, what would it be allowed to do etc. And as it doesn't exist
yet, it'd be more a "local" improvement to separate the changes into an
extension.

>>> Do you implicitly bear in mind here the necessity of tracking clauses
>>> that were applied to the data up to the moment of grouping?
>>>
>>
>> I don't recall what exactly I considered two months ago when writing the
>> message, but I don't see why we would need to track that beyond what we
>> already have. Shouldn't it be enough for the grouping to simply inspect
>> the conditions on the lower levels?
> Yes, exactly. I've thought about looking into baserestrictinfos and, if
> group-by references a subquery targetlist, into subqueries too.
> 

True. Something like that.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: planner chooses incremental but not the best one

2024-02-15 Thread Andrei Lepikhov

On 15/2/2024 18:10, Tomas Vondra wrote:



On 2/15/24 07:50, Andrei Lepikhov wrote:

On 18/12/2023 19:53, Tomas Vondra wrote:

On 12/18/23 11:40, Richard Guo wrote:
The challenge is where to get usable information about correlation
between columns. I only have a couple very rought ideas of what might
try. For example, if we have multi-column ndistinct statistics, we might
look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something from

  ndistinct(b,c,d) / ndistinct(b,c)

If we know how many distinct values we have for the predicate column, we
could then estimate the number of groups. I mean, we know that for the
restriction "WHERE b = 3" we only have 1 distinct value, so we could
estimate the number of groups as

  1 * ndistinct(b,c)

Did you mean here ndistinct(c,d) and the formula:
ndistinct(b,c,d) / ndistinct(c,d) ?


Yes, I think that's probably a more correct ... Essentially, the idea is
to estimate the change in number of distinct groups after adding a
column (or restricting it in some way).
Thanks, I got it. I just think how to implement such techniques with 
extensions just to test the idea in action. In the case of GROUP-BY we 
can use path hook, of course. But what if to invent a hook on clauselist 
estimation?

Do you implicitly bear in mind here the necessity of tracking clauses
that were applied to the data up to the moment of grouping?



I don't recall what exactly I considered two months ago when writing the
message, but I don't see why we would need to track that beyond what we
already have. Shouldn't it be enough for the grouping to simply inspect
the conditions on the lower levels?
Yes, exactly. I've thought about looking into baserestrictinfos and, if 
group-by references a subquery targetlist, into subqueries too.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: planner chooses incremental but not the best one

2024-02-15 Thread Tomas Vondra



On 2/15/24 07:50, Andrei Lepikhov wrote:
> On 18/12/2023 19:53, Tomas Vondra wrote:
>> On 12/18/23 11:40, Richard Guo wrote:
>> The challenge is where to get usable information about correlation
>> between columns. I only have a couple very rought ideas of what might
>> try. For example, if we have multi-column ndistinct statistics, we might
>> look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something from
>>
>>  ndistinct(b,c,d) / ndistinct(b,c)
>>
>> If we know how many distinct values we have for the predicate column, we
>> could then estimate the number of groups. I mean, we know that for the
>> restriction "WHERE b = 3" we only have 1 distinct value, so we could
>> estimate the number of groups as
>>
>>  1 * ndistinct(b,c)
> Did you mean here ndistinct(c,d) and the formula:
> ndistinct(b,c,d) / ndistinct(c,d) ?

Yes, I think that's probably a more correct ... Essentially, the idea is
to estimate the change in number of distinct groups after adding a
column (or restricting it in some way).

> 
> Do you implicitly bear in mind here the necessity of tracking clauses
> that were applied to the data up to the moment of grouping?
> 

I don't recall what exactly I considered two months ago when writing the
message, but I don't see why we would need to track that beyond what we
already have. Shouldn't it be enough for the grouping to simply inspect
the conditions on the lower levels?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: planner chooses incremental but not the best one

2024-02-14 Thread Andrei Lepikhov

On 15/12/2023 15:58, Richard Guo wrote:

With the patch the estimate for the number of distinct 'b' values is
more accurate.

+1 to commit this patch.
It looks good and resolves kind of a bug in the code.


BTW, this patch does not change any existing regression test results.  I
attempted to devise a regression test that shows how this change can
improve query plans, but failed.  Should I try harder to find such a
test case?
The test that was changed refers to different features. Its behaviour 
can be changed in. the future, and mask testing of this code. IMO, you 
should add a test directly checking appendrel->tuples correction.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: planner chooses incremental but not the best one

2024-02-14 Thread Andrei Lepikhov

On 18/12/2023 19:53, Tomas Vondra wrote:

On 12/18/23 11:40, Richard Guo wrote:
The challenge is where to get usable information about correlation
between columns. I only have a couple very rought ideas of what might
try. For example, if we have multi-column ndistinct statistics, we might
look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something from

 ndistinct(b,c,d) / ndistinct(b,c)

If we know how many distinct values we have for the predicate column, we
could then estimate the number of groups. I mean, we know that for the
restriction "WHERE b = 3" we only have 1 distinct value, so we could
estimate the number of groups as

 1 * ndistinct(b,c)

Did you mean here ndistinct(c,d) and the formula:
ndistinct(b,c,d) / ndistinct(c,d) ?

Do you implicitly bear in mind here the necessity of tracking clauses 
that were applied to the data up to the moment of grouping?


--
regards,
Andrei Lepikhov
Postgres Professional





Re: planner chooses incremental but not the best one

2023-12-26 Thread ywgrit
Hi,Tomas

Recently, I looked at papers related to estimation of cardinarity with
selection. I may be biased towards the scheme provided by the paper
"Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries
and Event Reports". This paper uses distinct sampling as opposed to the
current uniform sampling, and the main differences between the two are as
follows.

1)It not only counts the distincts on each column or multiple columns, but
also saves some rows corresponding to each distinct value, i.e., it saves
some part of the rows of the original relation as samples. The purpose of
saving these rows is to accommodate restrictions on the queries, such as
where clauses.

2)The queries are executed on the samples, and the result of the execution
is used as the statistical value of cardinarity.

The advantages of this paper over existing practices are as follows.

1)The samples collected can be applied to arbitrary predicates, e.g.
predicates that are correlated with the columns of group by clauses.

2)The accuracy is very high, and in some scenarios, the statistical error
can be minimized by hundreds of times compared to uniform sampling.

However, the scheme provided in this paper also has some defects, as
mentioned above, the scheme relies on the collected samples, which will
lead to a significant increase in the storage overhead of statistical
information.

I'd like to hear your opinions.

ywgrit.

ywgrit  于2023年12月22日周五 16:20写道:

> The possible solution of one scenario I can come up with so far is the
> query's predicate columns and group columns belonging to one table .
>
> For a query that contains where clause, perhaps num_groups could be
> estimated according to the following formula.
>
> num_groups = ndistinct(pred_col_1, pred_col_2, ... pred_col_n) with where
> clause * ndistinct(pred_col_1, pred_col_2, ... pred_col_n, sort_col_1,
> sort_col_2, ... sort_col_m) / ndistinct(pred_col_1, pred_col_2, ...
> pred_col_n).
>
> ndistinct(pred_col_1, pred_col_2, ... pred_col_n) with where clause =
> ndistinct(pred_var_1, pred_var_2, ... pred_var_n) * selectivity of rel.
>
> pred_col_n belong to the columns involved in the where clause and
> sort_col_m belong to the columns involved in the group by clause.
>
> The reason for multiplying by selectivity of rel directly is that the
> selectivity of rel depends on only pred_col not sort_col. So the above
> formula can be simplified as follows.
>
> num_groups = ndistinct(pred_col_1, pred_col_2, ... pred_col_n, sort_col_1,
> sort_col_2, ... sort_col_m) * selectivity of rel.
>
> The correctness of the above formula depends on the following conditions.
>
>-
>
>ndistinct(pred_col_1, pred_col_2, ... pred_col_n)*
>ndistinct(pred_col_1, pred_col_2, ... pred_col_n, sort_col_1, sort_col_2,
>... sort_col_m) statistics already exist, and need be accurate.
>-
>
>Both pred_col_n and sort_col_m are uniformly distributed, if not,
>statistics such as mcv are needed for correction.
>-
>
>The tuples of rel are the number of total tuples of the table , not
>the number of filtered tuples.
>
> After experimentation, in the scenario mentioned in previous thread. The
> estimate num_groups is 3, the accuracy of result strongly relies on the
> uniform distribution of b, which makes ndistinct(pred_col_1, pred_col_2,
> ... pred_col_n) with where clause could be able to estimated accurately.
>
> I'd like to hear your opinions.
>
> Regards.
>
> ywgrit.
>
> Tomas Vondra  于2023年12月18日周一 20:53写道:
>
>>
>>
>> On 12/18/23 11:40, Richard Guo wrote:
>> >
>> > On Mon, Dec 18, 2023 at 7:31 AM Tomas Vondra
>> > mailto:tomas.von...@enterprisedb.com>>
>> > wrote:
>> >
>> > Oh! Now I see what you meant by using the new formula in 84f9a35e3
>> > depending on how we sum tuples. I agree that seems like the right
>> thing.
>> >
>> > I'm not sure it'll actually help with the issue, though - if I
>> apply the
>> > patch, the plan does not actually change (and the cost changes just
>> a
>> > little bit).
>> >
>> > I looked at this only very briefly, but I believe it's due to the
>> > assumption of independence I mentioned earlier - we end up using
>> the new
>> > formula introduced in 84f9a35e3, but it assumes it assumes the
>> > selectivity and number of groups are independent. But that'd not the
>> > case here, because the groups are very clearly correlated (with the
>> > condition on "b").
>> >
>> >
>> > You're right.  The patch allows us to adjust the estimate of distinct
>> > values for appendrels using the new formula introduced in 84f9a35e3.
>> > But if the restrictions are correlated with the grouping expressions,
>> > the new formula does not behave well.  That's why the patch does not
>> > help in case [1], where 'b' and 'c' are correlated.
>> >
>> > OTOH, if the restrictions are not correlated with the grouping
>> > expressions, the new formula would perform quite well.  And in this case
>> > the patch would 

Re: planner chooses incremental but not the best one

2023-12-22 Thread Sébastien Lardière

On 15/12/2023 09:58, Richard Guo wrote:


On Thu, Dec 14, 2023 at 6:02 PM Richard Guo  
wrote:


It seems that we need to improve estimate of distinct values in
estimate_num_groups() when taking the selectivity of restrictions into
account.

In 84f9a35e3 we changed to a new formula to perform such estimation.
But that does not apply to the case here, because for an appendrel,
set_append_rel_size() always sets "raw tuples" count equal to "rows",
and that would make estimate_num_groups() skip the adjustment of the
estimate using the new formula.


I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'.
Why don't we set it to the accumulated estimate of tuples from each live
child, like attached?  I believe this aligns more closely with reality.

And this would also allow us to adjust the estimate for the number of
distinct values in estimate_num_groups() for appendrels using the new
formula introduced in 84f9a35e3.  As I experimented, this can improve
the estimate for appendrels.  For instance,

create table t (a int, b int, c float) partition by range(a);
create table tp1 partition of t for values from (0) to (1000);
create table tp2 partition of t for values from (1000) to (2000);

insert into t select i%2000, (10 * random())::int, random() from 
generate_series(1,100) i;

analyze t;

explain analyze select b from t where c < 0.1 group by b;

-- on master
 HashAggregate  (cost=18659.28..19598.74 rows=93946 width=4)
                (actual time=220.760..234.439 rows=63224 loops=1)

-- on patched
 HashAggregate  (cost=18659.28..19294.25 rows=63497 width=4)
                (actual time=235.161..250.023 rows=63224 loops=1)

With the patch the estimate for the number of distinct 'b' values is
more accurate.

BTW, this patch does not change any existing regression test results.  I
attempted to devise a regression test that shows how this change can
improve query plans, but failed.  Should I try harder to find such a
test case?



Hi,

thank you for the patch ; I've tried it and it works with the scenario 
you provide.


As Nicolas's co-worker, I've been involved in this case, but, 
unfortunately, we're not able to test the patch with the actual data for 
the moment, but I'll ask a dump to the real owner.


About the regression test, I don't know how to implement it either.

best regards,

--
Sébastien


Re: planner chooses incremental but not the best one

2023-12-22 Thread ywgrit
The possible solution of one scenario I can come up with so far is the
query's predicate columns and group columns belonging to one table .

For a query that contains where clause, perhaps num_groups could be
estimated according to the following formula.

num_groups = ndistinct(pred_col_1, pred_col_2, ... pred_col_n) with where
clause * ndistinct(pred_col_1, pred_col_2, ... pred_col_n, sort_col_1,
sort_col_2, ... sort_col_m) / ndistinct(pred_col_1, pred_col_2, ...
pred_col_n).

ndistinct(pred_col_1, pred_col_2, ... pred_col_n) with where clause =
ndistinct(pred_var_1, pred_var_2, ... pred_var_n) * selectivity of rel.

pred_col_n belong to the columns involved in the where clause and
sort_col_m belong to the columns involved in the group by clause.

The reason for multiplying by selectivity of rel directly is that the
selectivity of rel depends on only pred_col not sort_col. So the above
formula can be simplified as follows.

num_groups = ndistinct(pred_col_1, pred_col_2, ... pred_col_n, sort_col_1,
sort_col_2, ... sort_col_m) * selectivity of rel.

The correctness of the above formula depends on the following conditions.

   -

   ndistinct(pred_col_1, pred_col_2, ... pred_col_n)* ndistinct(pred_col_1,
   pred_col_2, ... pred_col_n, sort_col_1, sort_col_2, ... sort_col_m)
   statistics already exist, and need be accurate.
   -

   Both pred_col_n and sort_col_m are uniformly distributed, if not,
   statistics such as mcv are needed for correction.
   -

   The tuples of rel are the number of total tuples of the table , not the
   number of filtered tuples.

After experimentation, in the scenario mentioned in previous thread. The
estimate num_groups is 3, the accuracy of result strongly relies on the
uniform distribution of b, which makes ndistinct(pred_col_1, pred_col_2,
... pred_col_n) with where clause could be able to estimated accurately.

I'd like to hear your opinions.

Regards.

ywgrit.

Tomas Vondra  于2023年12月18日周一 20:53写道:

>
>
> On 12/18/23 11:40, Richard Guo wrote:
> >
> > On Mon, Dec 18, 2023 at 7:31 AM Tomas Vondra
> > mailto:tomas.von...@enterprisedb.com>>
> > wrote:
> >
> > Oh! Now I see what you meant by using the new formula in 84f9a35e3
> > depending on how we sum tuples. I agree that seems like the right
> thing.
> >
> > I'm not sure it'll actually help with the issue, though - if I apply
> the
> > patch, the plan does not actually change (and the cost changes just a
> > little bit).
> >
> > I looked at this only very briefly, but I believe it's due to the
> > assumption of independence I mentioned earlier - we end up using the
> new
> > formula introduced in 84f9a35e3, but it assumes it assumes the
> > selectivity and number of groups are independent. But that'd not the
> > case here, because the groups are very clearly correlated (with the
> > condition on "b").
> >
> >
> > You're right.  The patch allows us to adjust the estimate of distinct
> > values for appendrels using the new formula introduced in 84f9a35e3.
> > But if the restrictions are correlated with the grouping expressions,
> > the new formula does not behave well.  That's why the patch does not
> > help in case [1], where 'b' and 'c' are correlated.
> >
> > OTOH, if the restrictions are not correlated with the grouping
> > expressions, the new formula would perform quite well.  And in this case
> > the patch would help a lot, as shown in [2] where estimate_num_groups()
> > gives a much more accurate estimate with the help of this patch.
> >
> > So this patch could be useful in certain situations.  I'm wondering if
> > we should at least have this patch (if it is right).
> >
>
> I do agree the patch seems to do the right thing, and it's worth pushing
> on it's own.
>
> >
> > If that's the case, I'm not sure how to fix this :-(
> >
> >
> > The commit message of 84f9a35e3 says
> >
> > This could possibly be improved upon in the future by identifying
> > correlated restrictions and using a hybrid of the old and new
> > formulae.
> >
> > Maybe this is something we can consider trying.  But anyhow this is not
> > an easy task I suppose.
>
> Yeah, if it was easy, it'd have been done in 84f9a35e3 already ;-)
>
> The challenge is where to get usable information about correlation
> between columns. I only have a couple very rought ideas of what might
> try. For example, if we have multi-column ndistinct statistics, we might
> look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something from
>
> ndistinct(b,c,d) / ndistinct(b,c)
>
> If we know how many distinct values we have for the predicate column, we
> could then estimate the number of groups. I mean, we know that for the
> restriction "WHERE b = 3" we only have 1 distinct value, so we could
> estimate the number of groups as
>
> 1 * ndistinct(b,c)
>
> I'm well aware this is only a very trivial example, and for more complex
> examples it's likely way more complicated. But hopefully it illustrates
> the 

Re: planner chooses incremental but not the best one

2023-12-18 Thread Tomas Vondra



On 12/18/23 11:40, Richard Guo wrote:
> 
> On Mon, Dec 18, 2023 at 7:31 AM Tomas Vondra
> mailto:tomas.von...@enterprisedb.com>>
> wrote:
> 
> Oh! Now I see what you meant by using the new formula in 84f9a35e3
> depending on how we sum tuples. I agree that seems like the right thing.
> 
> I'm not sure it'll actually help with the issue, though - if I apply the
> patch, the plan does not actually change (and the cost changes just a
> little bit).
> 
> I looked at this only very briefly, but I believe it's due to the
> assumption of independence I mentioned earlier - we end up using the new
> formula introduced in 84f9a35e3, but it assumes it assumes the
> selectivity and number of groups are independent. But that'd not the
> case here, because the groups are very clearly correlated (with the
> condition on "b").
> 
> 
> You're right.  The patch allows us to adjust the estimate of distinct
> values for appendrels using the new formula introduced in 84f9a35e3.
> But if the restrictions are correlated with the grouping expressions,
> the new formula does not behave well.  That's why the patch does not
> help in case [1], where 'b' and 'c' are correlated.
> 
> OTOH, if the restrictions are not correlated with the grouping
> expressions, the new formula would perform quite well.  And in this case
> the patch would help a lot, as shown in [2] where estimate_num_groups()
> gives a much more accurate estimate with the help of this patch.
> 
> So this patch could be useful in certain situations.  I'm wondering if
> we should at least have this patch (if it is right).
>

I do agree the patch seems to do the right thing, and it's worth pushing
on it's own.

> 
> If that's the case, I'm not sure how to fix this :-(
> 
> 
> The commit message of 84f9a35e3 says
> 
>     This could possibly be improved upon in the future by identifying
>     correlated restrictions and using a hybrid of the old and new
>     formulae.
> 
> Maybe this is something we can consider trying.  But anyhow this is not
> an easy task I suppose.

Yeah, if it was easy, it'd have been done in 84f9a35e3 already ;-)

The challenge is where to get usable information about correlation
between columns. I only have a couple very rought ideas of what might
try. For example, if we have multi-column ndistinct statistics, we might
look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something from

ndistinct(b,c,d) / ndistinct(b,c)

If we know how many distinct values we have for the predicate column, we
could then estimate the number of groups. I mean, we know that for the
restriction "WHERE b = 3" we only have 1 distinct value, so we could
estimate the number of groups as

1 * ndistinct(b,c)

I'm well aware this is only a very trivial example, and for more complex
examples it's likely way more complicated. But hopefully it illustrates
the general idea.

The other idea would be to maybe look at multi-column MCV, and try using
it to deduce cross-column correlation too (it could be more accurate for
arbitrary predicates).

And finally, we might try being a bit more pessimistic and look at what
the "worst case" behavior could be. So for example instead of trying to
estimate the real number of groups, we'd ask "What's the minimum number
of groups we're likely to get?". And use that to cost the incremental
sort. But I don't think we do this elsewhere, and I'm not sure we want
to start with this risk-based approach here. It might be OK, because we
usually expect the incremental sort to be much cheaper, ...

If this something would be interested in exploring? I don't have
capacity to work on this myself, but I'd be available for reviews,
feedback and so on.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: planner chooses incremental but not the best one

2023-12-18 Thread Richard Guo
On Mon, Dec 18, 2023 at 7:31 AM Tomas Vondra 
wrote:

> Oh! Now I see what you meant by using the new formula in 84f9a35e3
> depending on how we sum tuples. I agree that seems like the right thing.
>
> I'm not sure it'll actually help with the issue, though - if I apply the
> patch, the plan does not actually change (and the cost changes just a
> little bit).
>
> I looked at this only very briefly, but I believe it's due to the
> assumption of independence I mentioned earlier - we end up using the new
> formula introduced in 84f9a35e3, but it assumes it assumes the
> selectivity and number of groups are independent. But that'd not the
> case here, because the groups are very clearly correlated (with the
> condition on "b").


You're right.  The patch allows us to adjust the estimate of distinct
values for appendrels using the new formula introduced in 84f9a35e3.
But if the restrictions are correlated with the grouping expressions,
the new formula does not behave well.  That's why the patch does not
help in case [1], where 'b' and 'c' are correlated.

OTOH, if the restrictions are not correlated with the grouping
expressions, the new formula would perform quite well.  And in this case
the patch would help a lot, as shown in [2] where estimate_num_groups()
gives a much more accurate estimate with the help of this patch.

So this patch could be useful in certain situations.  I'm wondering if
we should at least have this patch (if it is right).


> If that's the case, I'm not sure how to fix this :-(


The commit message of 84f9a35e3 says

This could possibly be improved upon in the future by identifying
correlated restrictions and using a hybrid of the old and new
formulae.

Maybe this is something we can consider trying.  But anyhow this is not
an easy task I suppose.

[1]
https://www.postgresql.org/message-id/CAMbWs4-FtsJ0dQUv6g%3D%3DXR_gsq%3DFj9oiydW6gbqwQ_wrbU0osw%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/CAMbWs4-ocromEKMtVDH3RBMuAJQaQDK0qi4k6zOuvpOnMWZauw%40mail.gmail.com

Thanks
Richard


Re: planner chooses incremental but not the best one

2023-12-17 Thread Tomas Vondra



On 12/14/23 11:02, Richard Guo wrote:
> 
> On Tue, Dec 12, 2023 at 4:40 PM Nicolas Lutic  > wrote:
> 
> I've come across a behaviour of the planner I can't explain.
> After a migration from 11 to 15 (on RDS) we noticed a degradation in
> response time on a query, it went from a few seconds to ten minutes.
> A vacuum(analyze) has been realized to be sure that all is clean.
> The 'explain analyze' shows us a change of plan. Postgresql 15 chooses
> `incremental sort` with an index corresponding to the ORDER BY clause
> (on the created_at column). The previous v11 plan used a more efficient
> index.
> 
> By deactivating incremental sort, response times in v15 are equal to
> v11
> one.
> 
> 
> I think this issue is caused by under-estimating the startup cost of
> incremental sort, which in turn is caused by over-estimating the number
> of groups with equal presorted keys by estimate_num_groups().
> 
> We can simulate the same issue with the query below.
> 
> create table t (a int, b int, c int, d int) partition by range (a);
> create table tp1 partition of t for values from (0) to (1000);
> create table tp2 partition of t for values from (1000) to (2000);
> 
> insert into t select i%2000, i%1000, i%300, i from
> generate_series(1,100)i;
> 
> create index on t(b);
> create index on t(c);
> 
> analyze t;
> 
> -- by default incremental sort is chosen
> explain analyze select * from t where b = 3 order by c, d limit 10;
>                                                                    QUERY
> PLAN
> 
>  Limit  (cost=143.03..570.89 rows=10 width=16) (actual
> time=375.399..375.402 rows=10 loops=1)
>    ->  Incremental Sort  (cost=143.03..42671.85 rows=994 width=16)
> (actual time=375.397..375.399 rows=10 loops=1)
>          Sort Key: t.c, t.d
>          Presorted Key: t.c
>          Full-sort Groups: 1  Sort Method: top-N heapsort  Average
> Memory: 25kB  Peak Memory: 25kB
>          Pre-sorted Groups: 1  Sort Method: top-N heapsort  Average
> Memory: 25kB  Peak Memory: 25kB
>          ->  Merge Append  (cost=0.85..42644.84 rows=994 width=16)
> (actual time=11.415..375.289 rows=335 loops=1)
>                Sort Key: t.c
>                ->  Index Scan using tp1_c_idx on tp1 t_1
>  (cost=0.42..21318.39 rows=498 width=16) (actual time=6.666..189.356
> rows=168 loops=1)
>                      Filter: (b = 3)
>                      Rows Removed by Filter: 171537
>                ->  Index Scan using tp2_c_idx on tp2 t_2
>  (cost=0.42..21316.50 rows=496 width=16) (actual time=4.745..185.870
> rows=168 loops=1)
>                      Filter: (b = 3)
>                      Rows Removed by Filter: 171534
>  Planning Time: 0.501 ms
>  Execution Time: 375.477 ms
> (16 rows)
> 
> -- disable incremental sort
> set enable_incremental_sort to off;
> SET
> 
> explain analyze select * from t where b = 3 order by c, d limit 10;
>                                                                QUERY PLAN
> 
>  Limit  (cost=2577.51..2577.53 rows=10 width=16) (actual
> time=2.928..2.930 rows=10 loops=1)
>    ->  Sort  (cost=2577.51..2579.99 rows=994 width=16) (actual
> time=2.925..2.927 rows=10 loops=1)
>          Sort Key: t.c, t.d
>          Sort Method: top-N heapsort  Memory: 25kB
>          ->  Append  (cost=8.28..2556.03 rows=994 width=16) (actual
> time=0.627..2.670 rows=1000 loops=1)
>                ->  Bitmap Heap Scan on tp1 t_1  (cost=8.28..1276.62
> rows=498 width=16) (actual time=0.625..1.688 rows=500 loops=1)
>                      Recheck Cond: (b = 3)
>                      Heap Blocks: exact=500
>                      ->  Bitmap Index Scan on tp1_b_idx
>  (cost=0.00..8.16 rows=498 width=0) (actual time=0.330..0.330 rows=500
> loops=1)
>                            Index Cond: (b = 3)
>                ->  Bitmap Heap Scan on tp2 t_2  (cost=8.27..1274.43
> rows=496 width=16) (actual time=0.178..0.876 rows=500 loops=1)
>                      Recheck Cond: (b = 3)
>                      Heap Blocks: exact=500
>                      ->  Bitmap Index Scan on tp2_b_idx
>  (cost=0.00..8.14 rows=496 width=0) (actual time=0.093..0.093 rows=500
> loops=1)
>                            Index Cond: (b = 3)
>  Planning Time: 0.481 ms
>  Execution Time: 3.031 ms
> (17 rows)
> 
> As we can see the execution time is 375.477 ms by default and 3.031 ms
> if we disable incremental sort.
> 
> When we calculate the cost of incremental sort, we need to estimate the
> number of groups into which the relation is divided by the presorted
> keys, and then calculate the cost of sorting a single group.  If we
> over-estimate the number of groups, the startup cost of incremental sort
> would 

Re: planner chooses incremental but not the best one

2023-12-17 Thread Tom Lane
Tomas Vondra  writes:
> Yeah, seems like that's the right thing to do. FWIW I've been often
> confused by these fields, because we use tuples and rows as synonyms,
> but in this particular case that's not the same. I wonder if this is
> just a manifestation of this confusion.

For tables, one is the raw number of rows on-disk and the other is the
number of rows predicted to pass the relation's quals.  For something
like an appendrel that doesn't enforce any quals (today anyway), they
should probably be the same; but you need to be sure you're adding
up the right numbers from the inputs.

regards, tom lane




Re: planner chooses incremental but not the best one

2023-12-17 Thread Tomas Vondra



On 12/15/23 09:58, Richard Guo wrote:
> 
> On Thu, Dec 14, 2023 at 6:02 PM Richard Guo  > wrote:
> 
> It seems that we need to improve estimate of distinct values in
> estimate_num_groups() when taking the selectivity of restrictions into
> account.
> 
> In 84f9a35e3 we changed to a new formula to perform such estimation.
> But that does not apply to the case here, because for an appendrel,
> set_append_rel_size() always sets "raw tuples" count equal to "rows",
> and that would make estimate_num_groups() skip the adjustment of the
> estimate using the new formula.
> 
> 
> I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'.
> Why don't we set it to the accumulated estimate of tuples from each live
> child, like attached?  I believe this aligns more closely with reality.
> 

Yeah, seems like that's the right thing to do. FWIW I've been often
confused by these fields, because we use tuples and rows as synonyms,
but in this particular case that's not the same. I wonder if this is
just a manifestation of this confusion.

> And this would also allow us to adjust the estimate for the number of
> distinct values in estimate_num_groups() for appendrels using the new
> formula introduced in 84f9a35e3.

I don't follow. Why wouldn't it be using the new formula even without
your patch? (using the "wrong" value, ofc, but the same formula).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: planner chooses incremental but not the best one

2023-12-15 Thread Richard Guo
On Thu, Dec 14, 2023 at 6:02 PM Richard Guo  wrote:

> It seems that we need to improve estimate of distinct values in
> estimate_num_groups() when taking the selectivity of restrictions into
> account.
>
> In 84f9a35e3 we changed to a new formula to perform such estimation.
> But that does not apply to the case here, because for an appendrel,
> set_append_rel_size() always sets "raw tuples" count equal to "rows",
> and that would make estimate_num_groups() skip the adjustment of the
> estimate using the new formula.
>

I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'.
Why don't we set it to the accumulated estimate of tuples from each live
child, like attached?  I believe this aligns more closely with reality.

And this would also allow us to adjust the estimate for the number of
distinct values in estimate_num_groups() for appendrels using the new
formula introduced in 84f9a35e3.  As I experimented, this can improve
the estimate for appendrels.  For instance,

create table t (a int, b int, c float) partition by range(a);
create table tp1 partition of t for values from (0) to (1000);
create table tp2 partition of t for values from (1000) to (2000);

insert into t select i%2000, (10 * random())::int, random() from
generate_series(1,100) i;
analyze t;

explain analyze select b from t where c < 0.1 group by b;

-- on master
 HashAggregate  (cost=18659.28..19598.74 rows=93946 width=4)
(actual time=220.760..234.439 rows=63224 loops=1)

-- on patched
 HashAggregate  (cost=18659.28..19294.25 rows=63497 width=4)
(actual time=235.161..250.023 rows=63224 loops=1)

With the patch the estimate for the number of distinct 'b' values is
more accurate.

BTW, this patch does not change any existing regression test results.  I
attempted to devise a regression test that shows how this change can
improve query plans, but failed.  Should I try harder to find such a
test case?

Thanks
Richard


v1-0001-Adjust-tuples-estimate-for-appendrel.patch
Description: Binary data


Re: planner chooses incremental but not the best one

2023-12-14 Thread Richard Guo
On Tue, Dec 12, 2023 at 4:40 PM Nicolas Lutic  wrote:

> I've come across a behaviour of the planner I can't explain.
> After a migration from 11 to 15 (on RDS) we noticed a degradation in
> response time on a query, it went from a few seconds to ten minutes.
> A vacuum(analyze) has been realized to be sure that all is clean.
> The 'explain analyze' shows us a change of plan. Postgresql 15 chooses
> `incremental sort` with an index corresponding to the ORDER BY clause
> (on the created_at column). The previous v11 plan used a more efficient
> index.
>
> By deactivating incremental sort, response times in v15 are equal to v11
> one.


I think this issue is caused by under-estimating the startup cost of
incremental sort, which in turn is caused by over-estimating the number
of groups with equal presorted keys by estimate_num_groups().

We can simulate the same issue with the query below.

create table t (a int, b int, c int, d int) partition by range (a);
create table tp1 partition of t for values from (0) to (1000);
create table tp2 partition of t for values from (1000) to (2000);

insert into t select i%2000, i%1000, i%300, i from
generate_series(1,100)i;

create index on t(b);
create index on t(c);

analyze t;

-- by default incremental sort is chosen
explain analyze select * from t where b = 3 order by c, d limit 10;
   QUERY
PLAN

 Limit  (cost=143.03..570.89 rows=10 width=16) (actual
time=375.399..375.402 rows=10 loops=1)
   ->  Incremental Sort  (cost=143.03..42671.85 rows=994 width=16) (actual
time=375.397..375.399 rows=10 loops=1)
 Sort Key: t.c, t.d
 Presorted Key: t.c
 Full-sort Groups: 1  Sort Method: top-N heapsort  Average Memory:
25kB  Peak Memory: 25kB
 Pre-sorted Groups: 1  Sort Method: top-N heapsort  Average Memory:
25kB  Peak Memory: 25kB
 ->  Merge Append  (cost=0.85..42644.84 rows=994 width=16) (actual
time=11.415..375.289 rows=335 loops=1)
   Sort Key: t.c
   ->  Index Scan using tp1_c_idx on tp1 t_1
 (cost=0.42..21318.39 rows=498 width=16) (actual time=6.666..189.356
rows=168 loops=1)
 Filter: (b = 3)
 Rows Removed by Filter: 171537
   ->  Index Scan using tp2_c_idx on tp2 t_2
 (cost=0.42..21316.50 rows=496 width=16) (actual time=4.745..185.870
rows=168 loops=1)
 Filter: (b = 3)
 Rows Removed by Filter: 171534
 Planning Time: 0.501 ms
 Execution Time: 375.477 ms
(16 rows)

-- disable incremental sort
set enable_incremental_sort to off;
SET

explain analyze select * from t where b = 3 order by c, d limit 10;
   QUERY PLAN

 Limit  (cost=2577.51..2577.53 rows=10 width=16) (actual time=2.928..2.930
rows=10 loops=1)
   ->  Sort  (cost=2577.51..2579.99 rows=994 width=16) (actual
time=2.925..2.927 rows=10 loops=1)
 Sort Key: t.c, t.d
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Append  (cost=8.28..2556.03 rows=994 width=16) (actual
time=0.627..2.670 rows=1000 loops=1)
   ->  Bitmap Heap Scan on tp1 t_1  (cost=8.28..1276.62
rows=498 width=16) (actual time=0.625..1.688 rows=500 loops=1)
 Recheck Cond: (b = 3)
 Heap Blocks: exact=500
 ->  Bitmap Index Scan on tp1_b_idx  (cost=0.00..8.16
rows=498 width=0) (actual time=0.330..0.330 rows=500 loops=1)
   Index Cond: (b = 3)
   ->  Bitmap Heap Scan on tp2 t_2  (cost=8.27..1274.43
rows=496 width=16) (actual time=0.178..0.876 rows=500 loops=1)
 Recheck Cond: (b = 3)
 Heap Blocks: exact=500
 ->  Bitmap Index Scan on tp2_b_idx  (cost=0.00..8.14
rows=496 width=0) (actual time=0.093..0.093 rows=500 loops=1)
   Index Cond: (b = 3)
 Planning Time: 0.481 ms
 Execution Time: 3.031 ms
(17 rows)

As we can see the execution time is 375.477 ms by default and 3.031 ms
if we disable incremental sort.

When we calculate the cost of incremental sort, we need to estimate the
number of groups into which the relation is divided by the presorted
keys, and then calculate the cost of sorting a single group.  If we
over-estimate the number of groups, the startup cost of incremental sort
would be under-estimated.  In the first plan above, the number of groups
with equal 't.c' is estimated to be 300 by estimate_num_groups(), but is
actually 3 after applying the restriction 'b = 3'.  As a result, the
startup cost of the incremental sort is estimated to be 143.03, but is
actually 14222.68.  That's why the 

planner chooses incremental but not the best one

2023-12-12 Thread Nicolas Lutic


Dear Hackers,

I've come across a behaviour of the planner I can't explain.
After a migration from 11 to 15 (on RDS) we noticed a degradation in 
response time on a query, it went from a few seconds to ten minutes.

A vacuum(analyze) has been realized to be sure that all is clean.
The 'explain analyze' shows us a change of plan. Postgresql 15 chooses 
`incremental sort` with an index corresponding to the ORDER BY clause 
(on the created_at column). The previous v11 plan used a more efficient 
index.


By deactivating incremental sort, response times in v15 are equal to v11 
one.


Here is the query

    SELECT inputdocum0_.id AS col_0_0_
    FROM document_management_services.input_document inputdocum0_
    WHERE (inputdocum0_.indexation_domain_id in 
('2d29daf6-e151-479a-a52a-78b08bb3009d'))
  AND (inputdocum0_.indexation_subsidiary_id in 
('9f9df402-f70b-40d9-b283-a3c35232469a'))

  AND (inputdocum0_.locked_at IS NULL)
  AND (inputdocum0_.locked_by_app IS NULL)
  AND (inputdocum0_.locked_by_user IS NULL)
  AND (inputdocum0_.lock_time_out IS NULL)
  AND inputdocum0_.archiving_state<> 'DESTROYED'
  AND (inputdocum0_.creation_state in ('READY'))
  AND inputdocum0_.active_content=true
  AND (inputdocum0_.processing_state in ('PENDING_INDEXATION'))
    ORDER BY inputdocum0_.created_at ASC,
 inputdocum0_.reception_id ASC,
 inputdocum0_.reception_order ASC
    LIMIT 50 ;

Here are some details, the table `input_document` is partionned by hash 
with 20 partitions  with a lot of indexes


Indexes:
    "input_document_pkey" PRIMARY KEY, btree (id)
    "input_document_api_version_idx" btree (api_version) INVALID
    "input_document_created_at_idx" btree (created_at)
    "input_document_created_by_user_profile_idx" btree 
(created_by_user_profile)
    "input_document_dashboard_idx" btree (processing_state, 
indexation_family_id, indexation_group_id, reception_id) INCLUDE 
(active_content, archiving_state, creation_state) WHERE active_content = 
true AND archiving_state <> 'DESTROYED'::text AND creation_state <> 
'PENDING'::text
    "input_document_fts_description_idx" gin 
(to_tsvector('simple'::regconfig, description))
    "input_document_fts_insured_firstname_idx" gin 
(to_tsvector('simple'::regconfig, indexation_insured_firstname))
    "input_document_fts_insured_lastname_idx" gin 
(to_tsvector('simple'::regconfig, indexation_insured_lastname))
    "input_document_indexation_activity_id_idx" btree 
(indexation_activity_id)

    "input_document_indexation_agency_id_idx" btree (indexation_agency_id)
    "input_document_indexation_distributor_id_idx" btree 
(indexation_distributor_id)

    "input_document_indexation_domain_id_idx" btree (indexation_domain_id)
    "input_document_indexation_family_id_idx" btree (indexation_family_id)
    "input_document_indexation_group_id_idx" btree (indexation_group_id)
    "input_document_indexation_insurer_id_idx" btree 
(indexation_insurer_id)

    "input_document_indexation_nature_id_idx" btree (indexation_nature_id)
    "input_document_indexation_reference_idx" btree (indexation_reference)
    "input_document_indexation_subsidiary_id_idx" btree 
(indexation_subsidiary_id)
    "input_document_indexation_warranty_id_idx" btree 
(indexation_warranty_id)

    "input_document_locked_by_user_idx" btree (locked_by_user)
    "input_document_modified_at_idx" btree (modified_at)
    "input_document_modified_by_user_profile_idx" btree 
(modified_by_user_profile)

    "input_document_processing_state_idx" btree (processing_state)
    "input_document_stock_idx" btree (active_content, archiving_state, 
creation_state, processing_state) WHERE active_content AND 
archiving_state <> 'DESTROYED'::text AND creation_state <> 
'PENDING'::text AND (processing_state = ANY 
('{PENDING_PROCESSING,PENDING_INDEXATION,READY}'::text[]))
    "input_dom_act_pi_idx" btree (indexation_activity_id, 
indexation_domain_id) WHERE processing_state = 'PENDING_INDEXATION'::text
    "input_dom_act_pp_idx" btree (indexation_activity_id, 
indexation_domain_id) WHERE processing_state = 'PENDING_PROCESSING'::text
    "input_dom_act_sub_idx" btree (indexation_activity_id, 
indexation_domain_id, indexation_subsidiary_id)

    "input_reception_id_created_at_idx" btree (reception_id, created_at)
    "input_reception_id_reception_order_idx" btree (reception_id, 
reception_order)
    "operational_perimeter_view_idx" btree (processing_state, 
indexation_distributor_id) WHERE processing_state = 
'PENDING_PROCESSING'::text


Please find attached the 3 plans

explain_analyse_incremental_off.txt with enable_incremental_sort to off
explain_analyse_incremental_on.txt with enable_incremental_sort to on
explain_analyse_incremental_on_limit5000 with enable_incremental_sort to 
on but with increase the limit to 5000, in this case plan choose don't 
use `Incremental Sort`


The point that I don't understand in the plan (incremental_sort to on) 
is the top level one, the limit