Florian Pflug <f...@phlo.org> wrote:
> One problem I see with this approach is that its efficiency
> depends on the average tuple length, at least with a naive
> approach to random ctid generator. The simplest way to generate
> those randomly without introducing bias is to generate a random
> page index between 0 and the relation's size in pages,
> and then generate random tuple index between 0 and
> MaxHeapTuplesPerPage, which is 291 on x86-64 assuming the standard
> page size of 8k.
I think we can do better than that without moving too far from "the
simplest way".  It seems like we should be able to get a more
accurate calculation of a minimum base tuple size based on the table
definition, and calculate the maximum number of those which could
fit on a page.  On the other hand, ctid uses a line pointer, doesn't
it?  Do we need to worry about dead line pointers allowing higher
tuple indexes than the calculated maximum number of tuples per page?
> The current toasting threshold (TOAST_TUPLE_THRESHOLD) is
> approximately 2k, so having tables with an average heap tuple size
> of a few hundred bytes doesn't seem unlikely. Now, assume the
> average tuple length is 128 bytes, i.e. on average you'll have ~
> 8k/128 = 64 live tuples / page if the fill factor is 100% and all
> tuples are live. To account for lower fill factors and dead
> tuples, let's thus say there are 50 live tuples / page. Then, on 
> average, only every 6th randomly generated ctid will point to a
> live tuple. But whether or not it does can only be decided after
> reading the page from disk, so you end up with a rate of 6
> random-access reads per returned tuple.
> IIRC, the cutoff point where an index scan loses compared to a
> sequential scan is somewhere around 10% of the table read, i.e. if
> a predicate selects more than 10% of the available rows, a
> sequential scan is more efficient than an index scan.
That ratio *might* be better for a ctid scan, since you don't have
the index access in the mix.
> Scaling that with the 1/6-th success rate from above means that
> Kevin's approach would only beat a sequential scan if the sampling
> percentage isn't much larger than 1%, assuming an average row size
> of 128 bytes.
> The algorithm still seems like a good choice for very small
> sampling percentages, though.
Yeah, even with a maximum tuple count calculated by page, there are
certainly going to be cases where another approach will be faster,
especially where the sample is a relatively high percentage of the
table.  It would be good to have multiple plans compete on costs, if
possible.  It would not surprise me at all if the typical break-even
point between the two techniques was somewhere on the order of a 1%
sample size, but one would hope we could get there on the basis of
estimated costs rather than using arbitrary rules or forcing the
user to guess and code a choice explicitly.

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to