Re: Indexes on expressions with multiple columns and operators

2025-10-18 Thread Andrei Lepikhov
On 22/9/2025 18:09, Frédéric Yhuel wrote: On 9/22/25 15:57, Andrei Lepikhov wrote: I wonder if we could devise another kind of extended statistic that would provide these "partitioned statistics" without actually partitioning.I'm not sure I fully understand your case, but SQL Server demonstrates

Re: Indexes on expressions with multiple columns and operators

2025-10-18 Thread Tom Lane
Andrei Lepikhov writes: > On 25/9/2025 12:41, Frédéric Yhuel wrote: >> So, on SQL Server, you can do this: >> CREATE STATISTICS FooStats ON foo (ackid, crit) WHERE crit = 'WARNING'; > Nice! Thanks for the report. I think the only reason why Postgres > doesn't have it yet is the computational cos

Re: Indexes on expressions with multiple columns and operators

2025-10-18 Thread Andrei Lepikhov
On 25/9/2025 12:41, Frédéric Yhuel wrote: So, on SQL Server, you can do this: CREATE STATISTICS FooStats ON foo (ackid, crit) WHERE crit = 'WARNING'; It would be great to have a similar feature in PostgreSQL.Nice! Thanks for the report. I think the only reason why Postgres doesn't have it yet i

Re: Indexes on expressions with multiple columns and operators

2025-10-17 Thread Frédéric Yhuel
On 9/23/25 12:43, Andrei Lepikhov wrote: But is it the same for the 'distinct' statistics? It seems you should love it - the number of groups in GROUP-BY, DISTINCT, and even HashJoin should be estimated more precisely, no? I think it has more potential, and I would love to use this weapon,

Re: Indexes on expressions with multiple columns and operators

2025-10-17 Thread Andrei Lepikhov
On 23/9/2025 15:31, Frédéric Yhuel wrote: To get back to the topic of partitioned statistics, do you know if SQL Server is smart enough to handle this case [1] that we discussed last year? (with filtered statistics) [1] https://www.postgresql.org/message-id/flat/b860c71a-7cab-4d88- ad87-8c1f2

Re: Indexes on expressions with multiple columns and operators

2025-10-17 Thread Andrei Lepikhov
On 13/10/2025 16:55, Tom Lane wrote: Andrei Lepikhov writes: On 25/9/2025 12:41, Frédéric Yhuel wrote: So, on SQL Server, you can do this: CREATE STATISTICS FooStats ON foo (ackid, crit) WHERE crit = 'WARNING'; Nice! Thanks for the report. I think the only reason why Postgres doesn't have i

Re: Indexes on expressions with multiple columns and operators

2025-09-27 Thread Frédéric Yhuel
On 9/22/25 23:15, Andrei Lepikhov wrote: I'm not sure I fully understand your case, but SQL Server demonstrates an interesting approach: they have a WHERE clause attached to statistics. So, having implemented this, you may separate the whole range of values inside the table into 'partitions'

Re: Indexes on expressions with multiple columns and operators

2025-09-25 Thread Frédéric Yhuel
On 9/23/25 12:20, Frédéric Yhuel wrote: On 9/22/25 23:15, Andrei Lepikhov wrote: I'm not sure I fully understand your case, but SQL Server demonstrates an interesting approach: they have a WHERE clause attached to statistics. So, having implemented this, you may separate the whole range of

Re: Indexes on expressions with multiple columns and operators

2025-09-25 Thread Frédéric Yhuel
On 9/23/25 15:31, Frédéric Yhuel wrote: To get back to the topic of partitioned statistics, do you know if SQL Server is smart enough to handle this case [1] that we discussed last year? (with filtered statistics) [1] https://www.postgresql.org/message-id/flat/b860c71a-7cab-4d88- ad87-8c1f

Re: Indexes on expressions with multiple columns and operators

2025-09-23 Thread Andrei Lepikhov
On 23/9/2025 12:20, Frédéric Yhuel wrote: On 9/22/25 23:15, Andrei Lepikhov wrote: It may solve at least one issue with the 'dependencies' statistics: a single number describing the dependency between any two values in the columns often leads to incorrect estimations, as I see. For what it's

Re: Indexes on expressions with multiple columns and operators

2025-09-22 Thread Frédéric Yhuel
On 9/22/25 15:57, Andrei Lepikhov wrote: On 22/9/2025 15:37, Frédéric Yhuel wrote: I wonder if this is an argument in favour of decoupling the sample size and the precision of the statistics. Here, we basically want the sample size to be as big as the table in order to include the few (NULL

Re: Indexes on expressions with multiple columns and operators

2025-09-22 Thread Andrei Lepikhov
On 22/9/2025 15:37, Frédéric Yhuel wrote: I wonder if this is an argument in favour of decoupling the sample size and the precision of the statistics. Here, we basically want the sample size to be as big as the table in order to include the few (NULL, WARNING) values. I also have seen how repea

Re: Indexes on expressions with multiple columns and operators

2025-09-22 Thread Frédéric Yhuel
On 9/20/25 18:51, Tom Lane wrote: I concluded that maybe I was overthinking this part. We only really need to check the rowcount estimate, since the indexscan cost estimate is already okay. And stats_ext.sql seems to have gotten away with assuming that rowcount estimates are reliably reprodu

Re: Indexes on expressions with multiple columns and operators

2025-09-20 Thread Tom Lane
=?UTF-8?Q?Fr=C3=A9d=C3=A9ric_Yhuel?= writes: > On 9/18/25 18:40, Tom Lane wrote: >> The attached fixes things so it works like it did pre-a391ff3c3. > Indeed, it works well! Thanks for testing! >> I spent some time trying to devise a test case, and was reminded >> of why I didn't have one befor

Re: Indexes on expressions with multiple columns and operators

2025-09-19 Thread Jehan-Guillaume de Rorthais
On Thu, 18 Sep 2025 12:59:11 -0400 Tom Lane wrote: > Jehan-Guillaume de Rorthais writes: > > On a fresh instance from HEAD with its default configuration, it shows: > > > Index Scan using foo_s_idx on foo (cost=0.29..8.39 rows=3 width=13) > > Index Cond: (s(crit, ackid) = true) > >

Re: Indexes on expressions with multiple columns and operators

2025-09-19 Thread Frédéric Yhuel
On 9/18/25 18:40, Tom Lane wrote: The attached fixes things so it works like it did pre-a391ff3c3. Indeed, it works well! I spent some time trying to devise a test case, and was reminded of why I didn't have one before: it's hard to make a case that will be robust enough to not show diffs

Re: Indexes on expressions with multiple columns and operators

2025-09-18 Thread Tom Lane
=?UTF-8?Q?Fr=C3=A9d=C3=A9ric_Yhuel?= writes: > On 9/17/25 16:41, Tom Lane wrote: >> =?UTF-8?Q?Fr=C3=A9d=C3=A9ric_Yhuel?= writes: >>> 2) the number of estimated rows is completely off in the second EXPLAIN, >>> whereas the planner could easily use the statistics of foo_f_idx. >> Hmm, not sure abo

Re: Indexes on expressions with multiple columns and operators

2025-09-18 Thread Tom Lane
Jehan-Guillaume de Rorthais writes: > On a fresh instance from HEAD with its default configuration, it shows: > Index Scan using foo_s_idx on foo (cost=0.29..8.39 rows=3 width=13) > Index Cond: (s(crit, ackid) = true) > It seems statistics shown in "pg_stats" view for function "s()" a

Re: Indexes on expressions with multiple columns and operators

2025-09-18 Thread Ranier Vilela
Em qui., 18 de set. de 2025 às 13:40, Tom Lane escreveu: > I wrote: > > Sigh ... so the answer is this used to work (since commit 39df0f150) > > and then I carelessly broke it in commit a391ff3c3. If you try this > > test case in versions 9.5..11 you get a spot-on rowcount estimate. > > Serves m

Re: Indexes on expressions with multiple columns and operators

2025-09-18 Thread Tom Lane
I wrote: > Sigh ... so the answer is this used to work (since commit 39df0f150) > and then I carelessly broke it in commit a391ff3c3. If you try this > test case in versions 9.5..11 you get a spot-on rowcount estimate. > Serves me right for not having a test case I guess, but I'm astonished > that

Re: Indexes on expressions with multiple columns and operators

2025-09-18 Thread Jehan-Guillaume de Rorthais
Hi there, I think this discussion has a nice solution, thank you! However, while poking around this issue yesterday, we also found something surprising between estimated rows and costs when using a function. Bellow the scenario to apply on top of Frederic's one to quickly expose the weirdness:

Re: Indexes on expressions with multiple columns and operators

2025-09-17 Thread Frédéric Yhuel
Thank you Laurenz and Tom! I'm going to quote Tom's email here: On 9/17/25 16:41, Tom Lane wrote: =?UTF-8?Q?Fr=C3=A9d=C3=A9ric_Yhuel?= writes: Hello, in the following, I don't understand why: 1) the expression index isn't used in the first EXPLAIN The planner doesn't look for multi-clause ma

Indexes on expressions with multiple columns and operators

2025-09-17 Thread Frédéric Yhuel
Hello, in the following, I don't understand why: 1) the expression index isn't used in the first EXPLAIN 2) the number of estimated rows is completely off in the second EXPLAIN, whereas the planner could easily use the statistics of foo_f_idx. (SQL script attached, tested with master and v17)

Re: Indexes on expressions with multiple columns and operators

2025-09-17 Thread Frédéric Yhuel
On 9/17/25 16:57, Frédéric Yhuel wrote: Yes, Laurenz made a similar suggestion, but the problem is that I'm mostly interested in the estimated number of output rows... because in the real query, there's a very bad Hash Join above (the Nested Loop is *much* faster). BTW, I've also tested an

Re: Indexes on expressions with multiple columns and operators

2025-09-17 Thread Tom Lane
=?UTF-8?Q?Fr=C3=A9d=C3=A9ric_Yhuel?= writes: > Hello, in the following, I don't understand why: > 1) the expression index isn't used in the first EXPLAIN The planner doesn't look for multi-clause matches of that sort. You could apply a little ju-jitsu perhaps: regression=# EXPLAIN (ANALYZE, SUMM

Re: Indexes on expressions with multiple columns and operators

2025-09-17 Thread Laurenz Albe
On Wed, 2025-09-17 at 15:55 +0200, Frédéric Yhuel wrote: > Hello, in the following, I don't understand why: > > 1) the expression index isn't used in the first EXPLAIN > > 2) the number of estimated rows is completely off in the second EXPLAIN, > whereas the planner could easily use the statisti