=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <j.urban...@students.mimuw.edu.pl> writes:
> Tom Lane wrote:
>> I came across this bit in ts_typanalyze.c:
>> 
>>      /* We want statistic_target * 100 lexemes in the MCELEM array */
>>      num_mcelem = stats->attr->attstattarget * 100;
>> 
>> I wonder whether the multiplier here should be changed?

> The origin of that bit is this post:
> http://archives.postgresql.org/pgsql-hackers/2008-07/msg00556.php
> and the following few downthread ones.

> If we bump the default statistics target 10 times, then changing the 
> multiplier to 10 seems the right thing to do.

OK, will do.

> Only thing that needs 
> caution is the frequency of pruning we do in the Lossy Counting 
> algorithm, that IIRC is correlated with the desired target length of the 
> MCELEM array.

Right below that we have

        /*
         * We set bucket width equal to the target number of result lexemes.
         * This is probably about right but perhaps might need to be scaled
         * up or down a bit?
         */
        bucket_width = num_mcelem;

so it should track automatically.  AFAICS the argument in the above
thread that this is an appropriate pruning distance holds good
regardless of just how we obtain the target mcelem count.

> BTW: I've been occupied with other things and might have missed some 
> discussions, but at some point it has been considered to use Lossy 
> Counting to gather statistics from regular columns, not only tsvectors. 
> Wouldn't this help the performance hit ANALYZE takes from upping 
> default_stats_target?

Perhaps, but it's not likely to get done for 8.4 ...

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to