Re: Poor performance with row wise comparisons

2025-10-17 Thread Peter Geoghegan
On Fri, Feb 7, 2025 at 2:05 AM Jon Emord  wrote:
> My expectation is that the following two queries would have roughly the same 
> performance.
> They both use the same index only scans and return the same 100 rows of data.
> The main difference I see in the explain output is that the row wise 
> comparison has 3,000 times
> the shared buffer hits, but it is unclear why there would need to be more 
> hits.

I agree that this doesn't make sense.

The problem here is that the row compare condition that terminates the
scan (namely "(ROW(data_model_id, primary_key) <= ROW(123,
'DEF'::text))") was unable to recognize that we've reached the end of
all matching tuples upon reaching the first tuple that's > "(123,
'DEF')". The scan would only terminate upon reaching the first tuple
whose data_model_id was > 123. Which (in this particular case) meant
that the scan read far more index leaf pages than necessary. Note that
this wouldn't have mattered very much if there weren't so many
irrelevant tuples that were "data_model_id = 123 AND > '(123, 'DEF')'"
-- but there were.

I fixed this problem in passing, in the context of a bug fix that went
into Postgres 18 (see commit bd3f59fd, in particular the part about
marking lower-order subkeys as required to continue the scan,
described towards the end of the commit message). You should look into
upgrading to Postgres 18 if this issue is important to you.

-- 
Peter Geoghegan




Re: Indexes on expressions with multiple columns and operators

2025-10-17 Thread Frédéric Yhuel




On 9/23/25 12:43, Andrei Lepikhov wrote:
But is it the same for the 'distinct' statistics? It seems you should 
love it - the number of groups in GROUP-BY, DISTINCT, and even HashJoin 
should be estimated more precisely, no?


I think it has more potential, and I would love to use this weapon, but 
I haven't had the opportunity yet. It would be interesting to know how 
much it is used in real life.


To get back to the topic of partitioned statistics, do you know if SQL 
Server is smart enough to handle this case [1] that we discussed last 
year? (with filtered statistics)


[1] 
https://www.postgresql.org/message-id/flat/b860c71a-7cab-4d88-ad87-8c1f2eea9ae8%40dalibo.com





Re: Poor performance with row wise comparisons

2025-10-17 Thread David G. Johnston
On Friday, October 10, 2025,  wrote:

> Looking for help on storing and retrieving the personal data as masked.
> Any references and implementation details would help
>

Don’t reply to existing threads with unrelated stuff.  Just send an email
to begin your own thread.  And choose an appropriate place to send it.

https://www.postgresql.org/list/

David J.


Re: Question about nested loops..

2025-10-17 Thread Andrei Lepikhov

On 9/10/2025 10:52, Frits Jalvingh wrote:
I do not understand why the simpler query (without the self join) 
produces a plan that seems to require nested loops, I hope someone can 
explain.

It seems obvious. You have a join clause:
'enheid.id_h_eenheid = huurovereenkomst_s._l_eenheid'

One side of this clause fits the underlying relation, but another one 
does not (references the external relation).
According to the HashJoin rules, the hash join clause is quite strict: 
each side should depend on only the left or right side of the join. So, 
it is just impossible here.
What can be done here without rewriting the query? This subject requires 
investigation to determine if the outer part of the hash clause can be 
parameterised or not.


-- regards, Andrei Lepikhov
pgEdge




Re: Indexes on expressions with multiple columns and operators

2025-10-17 Thread Andrei Lepikhov

On 13/10/2025 16:55, Tom Lane wrote:

Andrei Lepikhov  writes:

On 25/9/2025 12:41, Frédéric Yhuel wrote:

So, on SQL Server, you can do this:
CREATE STATISTICS FooStats ON foo (ackid, crit) WHERE crit = 'WARNING';



Nice! Thanks for the report. I think the only reason why Postgres
doesn't have it yet is the computational cost.


I think it's more lack of round tuits.  If we had such an option for
statistics objects, presumably we'd determine the applicability of a
particular statistics object to a query the same way we do for partial
indexes, namely try to prove the statistics' restriction condition
from the query WHERE clauses.  I've not heard complaints about that
being unduly expensive.

In the meantime, I believe the old-fashioned approach of creating
a partial expression index and letting ANALYZE collect stats on that
will serve, at least for simple statistics.I know at least two extensions (one of which is mine) that attempt to 
analyse query post-execution state, identify unsuccessful predictions on 
cardinality, number of groups, and work_mem, and fix these issues by 
creating MCV and distinct extended statistics.


Of course, without extended statistics on join clauses, their effect is 
highly limited, but we are preparing ;).


Many combinations of clauses may occur. Partial indexes can affect the 
whole system's performance in automatic mode. Additionally, I would 
personally like to play the same game as SQL Server already does - 
compute statistics in an efficient manner - during a Scan. The filter of 
such a scan may serve as a WHERE condition in the extended statistics.


--
regards, Andrei Lepikhov,
pgEdge




Re: Indexes on expressions with multiple columns and operators

2025-10-17 Thread Andrei Lepikhov

On 23/9/2025 15:31, Frédéric Yhuel wrote:
To get back to the topic of partitioned statistics, do you know if SQL 
Server is smart enough to handle this case [1] that we discussed last 
year? (with filtered statistics)


[1] https://www.postgresql.org/message-id/flat/b860c71a-7cab-4d88- 
ad87-8c1f2eea9ae8%40dalibo.com
Not sure. After changing employers recently, I've lost access to the SQL 
Server instance :( and can't check how smart it is. However, as I 
recall, they design many features, and sometimes their optimiser finds a 
good solution/estimation through an unexpected approach.


--
regards, Andrei Lepikhov,
pgEdge