Tim Uckun <timuc...@gmail.com> writes: >> Am I right in guessing that pg_stats.n_distinct is much too low for >> the domain_id column?
> the domain_id is in the topical urls. A select count of domains shows > that there are 700 domains, the pg_stats shows 170 which seems kind of > low but maybe is not out of bounds because most domains are probably > inactive. It sounds like you've got one of those long-tail distributions where there are lots and lots of the first couple hundred domains, and not many at all of the last few. The problem with the stats as you've got them is that the planner isn't aware of the long tail, so for this specific domain id that's not even there at all, you're getting an estimate of a couple of thousand matches --- which is why it goes for the indexscan-according-to-ORDER-BY plan. It's figuring it will hit one of those matches and be able to end the scan after reading much less than all of the table. As I've stated repeatedly, your next move needs to be to increase the stats target, at least for that column if not globally. You probably don't need to have it know about every last domain id, but you need to have it know about enough that it realizes that domains not included in the MCV list are going to appear less than a couple of thousand times. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs