On Wed, Mar 25, 2015 at 1:00 PM, Feike Steenbergen < feikesteenber...@gmail.com> wrote:
> On 25 March 2015 at 19:07, Jeff Janes <jeff.ja...@gmail.com> wrote: > > > Also, I doubt that that is the problem in the first place. If you > collect a > > sample of 30,000 (which the default target size of 100 does), and the > > frequency of the second most common is really 0.00307333 at the time you > > sampled it, you would expect to find it 92 times in the sample. The > chances > > against actually finding 1 instead of around 92 due to sampling error are > > astronomical. > > It can be that the distribution of values is very volatile; we hope > the increased stats target (from the default=100 to 1000 for this > column) and frequent autovacuum and autoanalyze helps in keeping the > estimates correct. > > It seems that it did find some other records (<> 'PRINTED), as is > demonstrated in the stats where there was only one value in the MCV > list: the frequency was 0.996567 and the fraction of nulls was 0, > therefore leaving 0.03+ for other values. But because none of them > were in the MCV and MCF list, they were all treated as equals. They > are certainly not equal. > Now that I look back at the first post you made, it certainly looks like the statistics target was set to 1 when that was analyzed, not to 100. But it doesn't look quite correct for that, either. What version of PostgreSQL are running? 'select version();' What do you get when to do "analyze verbose print_list"? How can the avg_width be 4 when the vast majority of entries are 7 characters long? Cheers, Jeff