Re: Index Skip Scan

Jesper Pedersen Thu, 13 Sep 2018 08:40:59 -0700

Hi Alexander.

On 9/13/18 9:01 AM, Alexander Kuzmenkov wrote:

While testing this patch


Thanks for the review !

I noticed that current implementation doesn'tperform well when we have lots of small groups of equal values. Here isthe execution time of index skip scan vs unique over index scan, in ms,depending on the size of group. The benchmark script is attached.
group size    skip        unique
1             2,293.85    132.55
5             464.40      106.59
10            239.61      102.02
50            56.59       98.74
100           32.56       103.04
500           6.08        97.09

Yes, this doesn't look good. Using your test case I'm seeing that uniqueis being chosen when the group size is below 34, and skip above. This iswith the standard initdb configuration; did you change something else ?Or did you force the default plan ?

So, the current implementation can lead to performance regression, andthe choice of the plan depends on the notoriously unreliable ndistinctstatistics.


Yes, Peter mentioned this, which I'm still looking at.

The regression is probably because skip scan always does_bt_search to find the next unique tuple.


Very likely.

I think we can improve this,and the skip scan can be strictly faster than index scan regardless ofthe data. As a first approximation, imagine that we somehow skippedequal tuples inside _bt_next instead of sending them to the parentUnique node. This would already be marginally faster than Unique + Indexscan. A more practical implementation would be to remember our positionin tree (that is, BTStack returned by _bt_search) and use it to skippages in bulk. This looks straightforward to implement for a tree thatdoes not change, but I'm not sure how to make it work with concurrentmodifications. Still, this looks a worthwhile direction to me, becauseif we have a strictly faster skip scan, we can just use it always andnot worry about our unreliable statistics. What do you think?

This is something to look at -- maybe there is a way to usebtpo_next/btpo_prev instead/too in order to speed things up. Atm we justhave the scan key in BTScanOpaqueData. I'll take a look after myupcoming vacation; feel free to contribute those changes in the meantimeof course.


Thanks again !

Best regards,
 Jesper

Re: Index Skip Scan

Reply via email to