On Mon, 6 Dec 2021 at 18:03, Alan Hodgson <ahodg...@lists.simkin.ca> wrote: ... > The table has nearly 29 million records. 5069 of them match > shipment_import_id = 5090609. There is an index on shipment_import_id, which > the planner happily uses without the LIMIT specifically. Yet with it the > query planner will always do something like: > > # explain SELECT "shipment_import_records".* FROM "shipment_import_records" > WHERE "shipment_import_records"."shipment_import_id" = 5090609 ORDER BY > "shipment_import_records"."id" ASC LIMIT 1; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.44..873.35 rows=1 width=243) > -> Index Scan using shipment_import_records_pkey on > shipment_import_records (cost=0.44..5122227.70 rows=5868 width=243) > Filter: (shipment_import_id = 5090609) > .. which takes minutes.
Can you post an explain analyze? To me it seems like the planner thinks shipment_import_id is randomly distributed and the table is well correlated with it's PK, so scanning it for the first id should be fast. But from the names of the field you may have correlation between shipment_import_id and id hidden somewhere ( like they are two serial growing together, you query for the latest shipment ids and it scans all the table ). An explain analyze should show that ( or three, one for that shipment import id, one for 1, one for a really big one ) > Just wondering if there's a knob I can turn to make these more likely to work > without constantly implementing workarounds? You may try a composite index. Francisco Olarte.