Re: Duplicate WHERE condition changes performance and plan
> If you're using SSD storage, or if the DB is small compared with > shared_buffers or RAM, then random_page_cost should be closer to > seq_page_cost. I don't *think* we are using SSDs but I'll need to confirm that though. > How large are the indexes? problem_id_idx1 ? Using the query from here: https://wiki.postgresql.org/wiki/Index_Maintenance#Index_size.2Fusage_statistics Output here: https://gist.github.com/indy-singh/e33eabe5cc937043c93b42a8783b3bfb I've setup a repo here where it is possible to reproduce the weird behaviour I'm getting:- https://github.com/indy-singh/postgres-duplicate-where-conditon That contains the data (amended to remove any private information) as well as the statements need to recreate tables, indices, and constraints, I think after some trial and error this is something to do with the size of the table and statistics. I've been trying to put together a Short, Self Contained, Correct example (http://sscce.org/) and the problem only appears when fill problem_instance.message with junk, but I have to do it in two steps as outlined in the README in repo. Indy
Re: PostgreSQL does not choose my indexes well
On Thu, Apr 23, 2020 at 7:36 AM Arcadio Ortega Reinoso < arcadio.ort...@gmail.com> wrote: > explain (analyze, buffers, format text) select * from entidad where > cod_tabla = 4 > > > Index Scan using idx_tabla_entidad on entidad (cost=0.56..51121.41 > rows=1405216 width=20) (actual time=0.037..242.609 rows=1409985 loops=1) >Index Cond: ((cod_tabla)::bigint = 4) >Buffers: shared hit=12839 > Planning Time: 0.158 ms > Execution Time: 311.828 ms > In order to read 1409985 / 12839 = 109 rows per buffer page, the table must be extraordinarily well clustered on this index. That degree of clustering is going to steal much of the thunder from the index-only scan. But in my hands, it does still prefer the partial index with index-only scan by a cost estimate ratio of 3 to 1 (despite it actually being slightly slower) so I don't know why you don't get it being used. This was how I populated the table: insert into entidad select id, floor(random()*25)::int, floor(random()*1000)::int from generate_series(1,3400) f(id); cluster entidad USING idx_tabla_entidad ; 0.3 seconds for 1.4 million rows is pretty good. How much better are you hoping to get by micro-managing the planner? To figure it out, it might help to see the explain (analyze, buffers, format text) of the plan you want it to use. But the only way I see to do that is to drop the other index. If you don't want to "really" drop the index, you can drop it in a transaction, run the "explain (analyze, buffers, format text)" query, and rollback the transaction. (Note this will lock the table for the entire duration of the transaction, so it is not something to do cavalierly in production) Cheers, Jeff
Re: PostgreSQL does not choose my indexes well
Greetings, * Jeff Janes (jeff.ja...@gmail.com) wrote: > In order to read 1409985 / 12839 = 109 rows per buffer page, the table must > be extraordinarily well clustered on this index. That degree of clustering > is going to steal much of the thunder from the index-only scan. But in my > hands, it does still prefer the partial index with index-only scan by a > cost estimate ratio of 3 to 1 (despite it actually being slightly slower) > so I don't know why you don't get it being used. Turns out to be because what was provided wasn't actually what was being used- there's a domain in there and that seems to gum up the works and make it so we don't consider the partial index as being something we can use (see the discussion at the end of the other sub-thread). Thanks, Stephen signature.asc Description: PGP signature
Re: PostgreSQL does not choose my indexes well
Stephen Frost writes: > Turns out to be because what was provided wasn't actually what was being > used- there's a domain in there and that seems to gum up the works and > make it so we don't consider the partial index as being something we can > use (see the discussion at the end of the other sub-thread). Some simple experiments here don't find that a domain-type column prevents use of the partial index. So it's still not entirely clear what's happening for the OP. I concur with Jeff's suggestion to try forcing use of the desired index, and see whether it happens at all and what the cost estimate is. I'm also wondering exactly which Postgres version this is. regards, tom lane
Re: PostgreSQL does not choose my indexes well
On Fri, Apr 24, 2020 at 2:33 PM Stephen Frost wrote: > Greetings, > > * Jeff Janes (jeff.ja...@gmail.com) wrote: > > In order to read 1409985 / 12839 = 109 rows per buffer page, the table > must > > be extraordinarily well clustered on this index. That degree of > clustering > > is going to steal much of the thunder from the index-only scan. But in > my > > hands, it does still prefer the partial index with index-only scan by a > > cost estimate ratio of 3 to 1 (despite it actually being slightly slower) > > so I don't know why you don't get it being used. > > Turns out to be because what was provided wasn't actually what was being > used- there's a domain in there and that seems to gum up the works and > make it so we don't consider the partial index as being something we can > use (see the discussion at the end of the other sub-thread). > Thanks. I somehow managed to overlook the existence of the entire last 24 hours of discussion. But if I change the type of entidad.cod_tabla to match the domain now shown in table.cod_table, I can still get the index only scan over the partial index. Now the cost estimate has changed so it slightly prefers the other index instead (in agreement with the original report) but usage of the partial index-only can is still possible (e.g. if I drop the single column full-table index). I don't understand why the domain changes the estimate without changing the execution, but it isn't something that is very important to me. I'm more interested in the index only scan is not actually much if any faster. Even if there is no IO benefit due to the clustering, I'd still expect there to be some CPU benefit of not jumping back and forth between index pages and heap pages, but iI don't know how much effort it is worth to put into that either. Cheers, Jeff