Re: [HACKERS] ANALYZE sampling is too good

Claudio Freire Thu, 12 Dec 2013 10:35:01 -0800

On Thu, Dec 12, 2013 at 3:29 PM, Tom Lane <[email protected]> wrote:
> Jeff Janes <[email protected]> writes:
>> It would be relatively easy to fix this if we trusted the number of visible
>> rows in each block to be fairly constant.  But without that assumption, I
>> don't see a way to fix the sample selection process without reading the
>> entire table.
>
> Yeah, varying tuple density is the weak spot in every algorithm we've
> looked at.  The current code is better than what was there before, but as
> you say, not perfect.  You might be entertained to look at the threads
> referenced by the patch that created the current sampling method:
> http://www.postgresql.org/message-id/[email protected]
>
> particularly
> http://www.postgresql.org/message-id/flat/[email protected]#[email protected]
>
>
> However ... where this thread started was not about trying to reduce
> the remaining statistical imperfections in our existing sampling method.
> It was about whether we could reduce the number of pages read for an
> acceptable cost in increased statistical imperfection.



Well, why not take a supersample containing all visible tuples from N
selected blocks, and do bootstrapping over it, with subsamples of M
independent rows each?


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE sampling is too good

Reply via email to