Re: B-tree cache prefetches

Andrey Borodin Sun, 14 Oct 2018 21:59:08 -0700

> 15 окт. 2018 г., в 2:38, Thomas Munro <thomas.mu...@enterprisedb.com> 
> написал(а):
> 
> On Mon, Oct 15, 2018 at 12:03 AM Andrey Borodin <x4...@yandex-team.ru> wrote:
>> 31 авг. 2018 г., в 2:40, Thomas Munro <thomas.mu...@enterprisedb.com> 
>> написал(а):
>> A related topic is the cache-unfriendliness of traditional binary
>> searches of sorted data.  Check out "Array Layouts for
>> Comparison-Based Searching"[1] if you haven't already.  It says that
>> if your array fits in L2 cache, as our btree pages do, then binary
>> search still wins against Eytzinger et al, which I was a bit
>> disappointed
>> 
>> I've re-read that paper. Their results are not about our case, here's a 
>> quote:
>>> For arrays small enough _to be kept_ in L2 cache, the branch-free binary 
>>> search code listed in Listing 2 is the fastest algorithm
>> 
>> Surely, we cannot keep all pages in L2. Eytzinger layout could be realistic 
>> possibility, except one simple problem: I do not see how can we insert items 
>> into this layout.
> 
> I don't know.  Aren't we talking about binary search within one 8KB
> page here, anyway?
The paper discuss multiple searches inside one 8Kb region, while we are doing 
single search in all-uncached page and move away. That's the difference.

Binserach is optimal, if the page is in L1 before we start search.

> 
> Thinking about some other ideas for cache prefetching in btree code,
> here's an idea to improve pipelining of index scans (a bit like the
> two-step pipeline idea from the hash join prefetch thread).  We know
> the TIDs of heap tuples we'll be looking up in the near future, so can
> we prefetch all the way to the tuple?   There are three random
> accesses that follow soon after reading an index tuple:
> BufTableLookup(), then page header, then the tuple itself.  At the
> logical extreme, we could anticipate the probe of SharedBufHash for
> the TID from index tuple i + 3 (perhaps dynahash could expose a
> hash_prefetch_for_hash() facility), and anticipate the read of the
> page header for the tuple pointed to by index tuple i + 2 (perhaps a
> dirty read of SharedBufHash that is only prepared to look at the first
> entry in the bucket would be OK for this), and anticipate the read of
> the tuple for the TID from index tuple i + 1 (perhaps a dirty read of
> the page item table).  Or some less ambitious subset, if that pipeline
> turns out to be too deep; it seems like the simplest experiment would
> be to try prefetching just SharedBufHash lookups.  This may all be
> completely crazy, I haven't tried any of it.
This idea also looks good to me. One thing bothers me. if we do 
__builtin_prefetch(&a[b[c[d]]],0,1), and a, b, c and d all are not in cache - 
will we incur CPU wait time for fetching a,b and c?
This [0] guys are expoiting C++ coroutines for prefetching this kind of stuff, 
but it seems like too much of hassle.
> 
>> Also, that research is quite distant from B-tree binsearch: thier data items 
>> are small and do not represent reference to actual data. Eytzinger exploits 
>> the fact that two posiible future keys share same cache line. But it is hard 
>> to provide, given we place items at quite random place of the page.
> 
> Based on the the above quotes, I'm not suggesting we should use
> Eytzinger for search-within-page.  But out of curiosity, I checked,
> and saw that you can actually get a pair of index tuples holding a
> six-character text key or an integer key into a 64 byte cache line.

Well, this proves that in theory Eytzinger is an opportunity.

Best regards, Andrey Borodin.

[0] http://www.vldb.org/pvldb/vol11/p1702-jonathan.pdf
Re: B-tree cache prefetches

Reply via email to