date:20251017

Re: Poor performance with row wise comparisons

2025-10-17 Thread Peter Geoghegan

On Fri, Feb 7, 2025 at 2:05 AM Jon Emord  wrote:
> My expectation is that the following two queries would have roughly the same 
> performance.
> They both use the same index only scans and return the same 100 rows of data.
> The main difference I see in the explain output is that the row wise 
> comparison has 3,000 times
> the shared buffer hits, but it is unclear why there would need to be more 
> hits.

I agree that this doesn't make sense.

The problem here is that the row compare condition that terminates the
scan (namely "(ROW(data_model_id, primary_key) <= ROW(123,
'DEF'::text))") was unable to recognize that we've reached the end of
all matching tuples upon reaching the first tuple that's > "(123,
'DEF')". The scan would only terminate upon reaching the first tuple
whose data_model_id was > 123. Which (in this particular case) meant
that the scan read far more index leaf pages than necessary. Note that
this wouldn't have mattered very much if there weren't so many
irrelevant tuples that were "data_model_id = 123 AND > '(123, 'DEF')'"
-- but there were.

I fixed this problem in passing, in the context of a bug fix that went
into Postgres 18 (see commit bd3f59fd, in particular the part about
marking lower-order subkeys as required to continue the scan,
described towards the end of the commit message). You should look into
upgrading to Postgres 18 if this issue is important to you.

-- 
Peter Geoghegan

Re: Indexes on expressions with multiple columns and operators

2025-10-17 Thread Frédéric Yhuel





On 9/23/25 12:43, Andrei Lepikhov wrote:
But is it the same for the 'distinct' statistics? It seems you should 
love it - the number of groups in GROUP-BY, DISTINCT, and even HashJoin 
should be estimated more precisely, no?


I think it has more potential, and I would love to use this weapon, but 
I haven't had the opportunity yet. It would be interesting to know how 
much it is used in real life.


To get back to the topic of partitioned statistics, do you know if SQL 
Server is smart enough to handle this case [1] that we discussed last 
year? (with filtered statistics)


[1] 
https://www.postgresql.org/message-id/flat/b860c71a-7cab-4d88-ad87-8c1f2eea9ae8%40dalibo.com

Re: Poor performance with row wise comparisons

2025-10-17 Thread David G. Johnston

On Friday, October 10, 2025,  wrote:

> Looking for help on storing and retrieving the personal data as masked.
> Any references and implementation details would help
>

Don’t reply to existing threads with unrelated stuff.  Just send an email
to begin your own thread.  And choose an appropriate place to send it.

https://www.postgresql.org/list/

David J.

Re: Question about nested loops..

2025-10-17 Thread Andrei Lepikhov


On 9/10/2025 10:52, Frits Jalvingh wrote:
I do not understand why the simpler query (without the self join) 
produces a plan that seems to require nested loops, I hope someone can 
explain.

It seems obvious. You have a join clause:
'enheid.id_h_eenheid = huurovereenkomst_s._l_eenheid'

One side of this clause fits the underlying relation, but another one 
does not (references the external relation).
According to the HashJoin rules, the hash join clause is quite strict: 
each side should depend on only the left or right side of the join. So, 
it is just impossible here.
What can be done here without rewriting the query? This subject requires 
investigation to determine if the outer part of the hash clause can be 
parameterised or not.


-- regards, Andrei Lepikhov
pgEdge

Re: Indexes on expressions with multiple columns and operators

2025-10-17 Thread Andrei Lepikhov

On 13/10/2025 16:55, Tom Lane wrote:

Andrei Lepikhov writes:

On 25/9/2025 12:41, Frédéric Yhuel wrote:

So, on SQL Server, you can do this:
CREATE STATISTICS FooStats ON foo (ackid, crit) WHERE crit = 'WARNING';

Nice! Thanks for the report. I think the only reason why Postgres
doesn't have it yet is the computational cost.

I think it's more lack of round tuits. If we had such an option for
statistics objects, presumably we'd determine the applicability of a
particular statistics object to a query the same way we do for partial
indexes, namely try to prove the statistics' restriction condition
from the query WHERE clauses. I've not heard complaints about that
being unduly expensive.

In the meantime, I believe the old-fashioned approach of creating
a partial expression index and letting ANALYZE collect stats on that
will serve, at least for simple statistics.I know at least two extensions (one of which is mine) that attempt to
analyse query post-execution state, identify unsuccessful predictions on
cardinality, number of groups, and work_mem, and fix these issues by
creating MCV and distinct extended statistics.

Of course, without extended statistics on join clauses, their effect is
highly limited, but we are preparing ;).

Many combinations of clauses may occur. Partial indexes can affect the
whole system's performance in automatic mode. Additionally, I would
personally like to play the same game as SQL Server already does -
compute statistics in an efficient manner - during a Scan. The filter of
such a scan may serve as a WHERE condition in the extended statistics.

--
regards, Andrei Lepikhov,
pgEdge

Re: Indexes on expressions with multiple columns and operators

2025-10-17 Thread Andrei Lepikhov


On 23/9/2025 15:31, Frédéric Yhuel wrote:
To get back to the topic of partitioned statistics, do you know if SQL 
Server is smart enough to handle this case [1] that we discussed last 
year? (with filtered statistics)


[1] https://www.postgresql.org/message-id/flat/b860c71a-7cab-4d88- 
ad87-8c1f2eea9ae8%40dalibo.com
Not sure. After changing employers recently, I've lost access to the SQL 
Server instance :( and can't check how smart it is. However, as I 
recall, they design many features, and sometimes their optimiser finds a 
good solution/estimation through an unexpected approach.


--
regards, Andrei Lepikhov,
pgEdge

Re: Poor performance with row wise comparisons

Re: Indexes on expressions with multiple columns and operators

Re: Poor performance with row wise comparisons

Re: Question about nested loops..

Re: Indexes on expressions with multiple columns and operators

Re: Indexes on expressions with multiple columns and operators

6 matches

Site Navigation

Mail list logo

Footer information