"Jeremy Dunn" <[EMAIL PROTECTED]> writes:
> Agreed. However, given that count(*) is a question that can be answered
> _solely_ using the index (without reference to the actual data blocks),
As Bruno noted, that is not the case in Postgres; we must visit the
table rows anyway.
> When I just tried it again with a value of 300, analyze, then run the
> query, I get a *worse* result for an estimate. I don't understand this.
That's annoying. How repeatable are these results --- if you do ANALYZE
over again several times, how much does the row count estimate change
each time? (It should change somewhat, since ANALYZE is taking a random
sample, but one would like to think not a whole lot.) Is the variance
more or less at the higher stats target? Take a look at a few different
CID values to get a sense of the accuracy, don't look at just one ...
(Actually, you might find it more profitable to look at the pg_stats
entry for the CID column rather than reverse-engineering the stats via
ANALYZE. Look at how well the most-common-values list and associated
frequency numbers track reality.)
Also, can you think of any reason for the distribution of CID values
to be nonuniform within the table? For instance, do rows get inserted
in order of increasing CID, or is there any clustering of rows with the
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match