Andrei, Tom, thank you for the explanation. I had already tried using extended statistics but without success; I probably did something wrong.
After creating statistics on all the tables, I’m now getting optimal timings in production. This is not the ideal solution because this issue will delay the PostgreSQL version upgrade: our application will also need to be tested with respect to response times, and we will need to create dedicated statistics depending on the specific cases. In addition, there may be cases that slip through testing and then surface in production due to a different data distribution. Based on your explanation, my take is that the planner is not able to produce adequate estimates when there are filters on many columns; when there are millions of rows involved this become noticeable and extended statistics may be needed. One or more commits in PG16 changed the context sligthly and caused the unfortunate path to be chosen. In any case, I don’t think there are other workable solutions without changing the code. I get good results using CTEs or LATERAL with OFFSET 0, but the point was to find a solution without having to change the code. Thanks again, regards. Mauro Gatti Il giorno gio 5 mar 2026 alle ore 22:44 Tom Lane <[email protected]> ha scritto: > Andrei Lepikhov <[email protected]> writes: > > On 5/3/26 17:25, Mauro Gatti wrote: > >> ## Questions for the community > > > Thanks for stable reproduction! > > Yes, we appreciate that much effort being put into trouble reports. > Makes it a lot easier to see what's going wrong. > > > Your case is typical for 'never executed' nodes. As you can see, the > > costs of your query plans are very close, and the estimation error is > > large due to multiple clauses in your filter. As I see, for the planner, > > there is no difference in which version of the plan to choose - it is > > just a game of chance. > > Right. Given the very-far-off rowcount estimates for some of the > index scans, it'd be surprising if the planner arrived at a good > join order. It's a "garbage in, garbage out" situation. As Andrei > suggested, you can often improve bad rowcount estimates by creating > custom statistics. > > I found it was sufficient to do > > =# create statistics on > brand_id,line_code,model_year,model_code,version_code,pricelist_id from > pricelist_options; > CREATE STATISTICS > =# analyze pricelist_options; > ANALYZE > > That doesn't result in fully accurate estimates: > > =# explain analyze select * from pricelist_options pl where ((pl.brand_id > = 10) AND ((pl.line_code)::text = 'ABC'::text) AND ((pl.model_year)::text = > 'YR'::text) AND ((pl.model_code)::text = 'ABC'::text) AND > ((pl.version_code)::text = 'VER-001'::text) AND (pl.pricelist_id = 100)); > > QUERY PLAN > > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Index Scan using ix_pricelist_options_1 on pricelist_options pl > (cost=0.43..117.12 rows=28 width=106) (actual time=0.015..0.037 rows=165.00 > loops=1) > Index Cond: ((brand_id = 10) AND ((line_code)::text = 'ABC'::text) AND > ((model_year)::text = 'YR'::text) AND ((model_code)::text = 'ABC'::text) > AND ((version_code)::text = 'VER-001'::text) AND (pricelist_id = 100)) > > but "28 rows" is a lot closer to 165 than "1 row", and it's enough > to push the planner to choose the plan you want. > > I do concur with Andrei's recommendation to create stats matching > your other multicolumn indexes, though. > > regards, tom lane >
