Re: Indexes on expressions with multiple columns and operators
On 25/9/2025 12:41, Frédéric Yhuel wrote: So, on SQL Server, you can do this: CREATE STATISTICS FooStats ON foo (ackid, crit) WHERE crit = 'WARNING'; It would be great to have a similar feature in PostgreSQL.Nice! Thanks for the report. I think the only reason why Postgres doesn't have it yet is the computational cost. SQL Server utilises a separate background worker to manage this vast amount of statistical data. Not sure that Postgres core wants it. Maybe one more contrib extension can be a solution? -- regards, Andrei Lepikhov, pgEdge
Re: Question about nested loops..
On 10/10/2025 10:51, Frits Jalvingh wrote: Hi Andrei, Thanks a lot for your response. I do not fully get it though because the well-performing query also references the outer (external?) relation: -> where eenheid.id_h_eenheid = eenheid_s.id_h_eenheid But your response made me think a but more.This is the schema of EXPLAIN for the second 'optimal' query: Seq Scan on s_h_eenheid_ssm D SubPlan 2 -> Hash Join Hash Cond: (A.x = B.x) -> Seq Scan on A Filter: ... -> Hash -> Hash Join Hash Cond: (C.y = B.x) -> Seq Scan on s_h_huurovereenkomst_ssm C -> Hash -> Seq Scan on s_h_eenheid_ssm B There is no reference to the upper relation D - each HashJoin refers to its left and right side only. -- regards, Andrei Lepikhov, pgEdge
Re: Indexes on expressions with multiple columns and operators
On 22/9/2025 18:09, Frédéric Yhuel wrote: On 9/22/25 15:57, Andrei Lepikhov wrote: I wonder if we could devise another kind of extended statistic that would provide these "partitioned statistics" without actually partitioning.I'm not sure I fully understand your case, but SQL Server demonstrates an interesting approach: they have a WHERE clause attached to statistics. So, having implemented this, you may separate the whole range of values inside the table into 'partitions' by such a WHERE condition. It may solve at least one issue with the 'dependencies' statistics: a single number describing the dependency between any two values in the columns often leads to incorrect estimations, as I see. -- regards, Andrei Lepikhov
Re: Partition pruning is not happening (even in PG18)
On Tue, 30 Sept 2025 at 02:49, Lauro Ojeda wrote: > By looking into it, I have the impression there is a bug in the costing sum > in that situation, where the cost of the "never executed" partitions should > be deducted from the final cost estimation, which would make pruning to be > the preferred option in this case. > > Are my assumptions correct? The "never executed" part is not determined at planning time. It's only something the executor gets to figure out, and by that time, it's no good adjusting the plan's costs since the planner has already decided on what it thinks the best plan is and it's too late to change that. > Is there anything I could do to influence the planner to dismiss the cost of > "never executed" scans? Not in the general sense, but for nodes that are "never executed" due to run-time partition pruning. It would be possible to make some assumptions during planning. The planner would need to do some extra work during planning to figure out if an AppendPath or MergeAppend path has run-time prunable parameters. If those parameters plus any non-parameterized quals result in pruning being possible at run-time, then *maybe* there's something we can do better. For equi joins, you could assume that only 1 of the Append children will be scanned. The planner will have no idea which one, but it could do something like total_append_cost / number_of_append_children. For other join types, it's much less clear how that would work. There have been previous proposals to do something with DEFAULT_INEQ_SEL or maybe DEFAULT_RANGE_INEQ_SEL. I've forgotten the exact details as it was about 8 years ago. David
Re: Indexes on expressions with multiple columns and operators
Andrei Lepikhov writes: > On 25/9/2025 12:41, Frédéric Yhuel wrote: >> So, on SQL Server, you can do this: >> CREATE STATISTICS FooStats ON foo (ackid, crit) WHERE crit = 'WARNING'; > Nice! Thanks for the report. I think the only reason why Postgres > doesn't have it yet is the computational cost. I think it's more lack of round tuits. If we had such an option for statistics objects, presumably we'd determine the applicability of a particular statistics object to a query the same way we do for partial indexes, namely try to prove the statistics' restriction condition from the query WHERE clauses. I've not heard complaints about that being unduly expensive. In the meantime, I believe the old-fashioned approach of creating a partial expression index and letting ANALYZE collect stats on that will serve, at least for simple statistics. regards, tom lane
Re: Partition pruning is not happening (even in PG18)
Hi David, Thank you for your nice reply. I have the impression there is something heavily penalizing the usage of partition pruning. While trying to go a bit further, I realized that partition pruning is not occurring because the planner gives the cost of seq-scanning all partitions to just over 2M units, but the query takes over 3 minutes. When forcing the usage of PK index via hinting (which down the line forces the partition pruning to happen), the cost sky-rocket to 53M units (however scanning 4 partitions out of 12). Even though with much higher cost, the query launches less workers and completes in a third (or less) of the time of all partition scans. My observation is that the cost of seq-scanning each partition is estimated in ~124k units, but index-scanning each partition is 440k (say, 3x higher). Therefore, scanning 4 partitions, the total cost should be ~1.7M units, which is below the 2M units from the cost of all partition scanning, thus the planner should prefer the indexed path instead. However when analyzing the hinted query (doing index scan and thus partition pruning) we can note that the cost estimated to each partition scan was added to the final cost, even though 8 out of 12 partitions were not scanned (never executed). postgres=> explain analyze /*+ IndexScan(accounts accounts_pkey) */ select aid, abalance from accounts where transaction_date in (select dt_col from t2); QUERY PLAN - Nested Loop (cost=1.56..53359847.63 rows=2767123 width=8) (actual time=2.299..111346.649 rows=1664742 loops=1) -> HashAggregate (cost=1.12..1.23 rows=10 width=4) (actual time=1.060..40.562 rows=6 loops=1) Group Key: t2.dt_col Batches: 1 Memory Usage: 24kB -> Seq Scan on t2 (cost=0.00..1.10 rows=10 width=4) (actual time=1.048..1.050 rows=6 loops=1) -> Append (cost=0.43..5302777.61 rows=3320703 width=12) (actual time=1.554..18514.181 rows=277457 loops=6) -> Index Scan using accounts_p1_pkey on accounts_p1 accounts_1 (cost=0.43..434635.82 rows=273011 width=12) (actual time=2.214..17627.113 rows=276971 loops=1) Index Cond: (transaction_date = t2.dt_col) -> Index Scan using accounts_p2_pkey on accounts_p2 accounts_2 (cost=0.43..441097.44 rows=277283 width=12) (actual time=2.185..16280.020 rows=276634 loops=1) Index Cond: (transaction_date = t2.dt_col) -> Index Scan using accounts_p3_pkey on accounts_p3 accounts_3 (cost=0.43..441666.94 rows=277392 width=12) (actual time=0.751..25992.220 rows=278390 loops=1) Index Cond: (transaction_date = t2.dt_col) -> Index Scan using accounts_p4_pkey on accounts_p4 accounts_4 (cost=0.43..441689.70 rows=277483 width=12) (never executed) Index Cond: (transaction_date = t2.dt_col) -> Index Scan using accounts_p5_pkey on accounts_p5 accounts_5 (cost=0.43..441883.26 rows=277546 width=12) (never executed) Index Cond: (transaction_date = t2.dt_col) -> Index Scan using accounts_p6_pkey on accounts_p6 accounts_6 (cost=0.43..*441857.26 *rows=277591 width=12) (actual time=1.376..17001.777 rows=277582 loops=3) Index Cond: (transaction_date = t2.dt_col) -> Index Scan using accounts_p7_pkey on accounts_p7 accounts_7 (cost=0.43..441837.57 rows=277503 width=12) (never executed) Index Cond: (transaction_date = t2.dt_col) -> Index Scan using accounts_p8_pkey on accounts_p8 accounts_8 (cost=0.43..441843.95 rows=277511 width=12) (never executed) Index Cond: (transaction_date = t2.dt_col) -> Index Scan using accounts_p9_pkey on accounts_p9 accounts_9 (cost=0.43..441711.03 rows=277506 width=12) (never executed) Index Cond: (transaction_date = t2.dt_col) -> Index Scan using accounts_p10_pkey on accounts_p10 accounts_10 (cost=0.43..441918.96 rows=277554 width=12) (never executed) Index Cond: (transaction_date = t2.dt_col) -> Index Scan using accounts_p11_pkey on accounts_p11 accounts_11 (cost=0.43..441501.86 rows=277377 width=12) (never executed) Index Cond: (transaction_date = t2.dt_col) -> Index Scan using accounts_p12_pkey on accounts_p12 accounts_12 (cost=0.43..434530.31 rows=272946 width=12) (never executed) Index Cond: (transaction_date = t2.dt_col) Planning Time: 0.879 ms Execution Time: 111432.755 ms (32 rows) By looking into it, I have the impression there is a bug in the costing sum in that situation, where the cost of the "never executed" partitions should be deducted from the final cost estimation, which would make pruning to be the preferred option in this case. Are my assumptions correct? Is there anything I could do to influence the planner to dismiss the cost of
