> 15 окт. 2018 г., в 2:38, Thomas Munro <thomas.mu...@enterprisedb.com>
> написал(а):
>
> On Mon, Oct 15, 2018 at 12:03 AM Andrey Borodin <x4...@yandex-team.ru> wrote:
>> 31 авг. 2018 г., в 2:40, Thomas Munro <thomas.mu...@enterprisedb.com>
>> написал(а):
>> A related topic is the cache-unfriendliness of traditional binary
>> searches of sorted data. Check out "Array Layouts for
>> Comparison-Based Searching"[1] if you haven't already. It says that
>> if your array fits in L2 cache, as our btree pages do, then binary
>> search still wins against Eytzinger et al, which I was a bit
>> disappointed
>>
>> I've re-read that paper. Their results are not about our case, here's a
>> quote:
>>> For arrays small enough _to be kept_ in L2 cache, the branch-free binary
>>> search code listed in Listing 2 is the fastest algorithm
>>
>> Surely, we cannot keep all pages in L2. Eytzinger layout could be realistic
>> possibility, except one simple problem: I do not see how can we insert items
>> into this layout.
>
> I don't know. Aren't we talking about binary search within one 8KB
> page here, anyway?
The paper discuss multiple searches inside one 8Kb region, while we are doing
single search in all-uncached page and move away. That's the difference.
Binserach is optimal, if the page is in L1 before we start search.
>
> Thinking about some other ideas for cache prefetching in btree code,
> here's an idea to improve pipelining of index scans (a bit like the
> two-step pipeline idea from the hash join prefetch thread). We know
> the TIDs of heap tuples we'll be looking up in the near future, so can
> we prefetch all the way to the tuple? There are three random
> accesses that follow soon after reading an index tuple:
> BufTableLookup(), then page header, then the tuple itself. At the
> logical extreme, we could anticipate the probe of SharedBufHash for
> the TID from index tuple i + 3 (perhaps dynahash could expose a
> hash_prefetch_for_hash() facility), and anticipate the read of the
> page header for the tuple pointed to by index tuple i + 2 (perhaps a
> dirty read of SharedBufHash that is only prepared to look at the first
> entry in the bucket would be OK for this), and anticipate the read of
> the tuple for the TID from index tuple i + 1 (perhaps a dirty read of
> the page item table). Or some less ambitious subset, if that pipeline
> turns out to be too deep; it seems like the simplest experiment would
> be to try prefetching just SharedBufHash lookups. This may all be
> completely crazy, I haven't tried any of it.
This idea also looks good to me. One thing bothers me. if we do
__builtin_prefetch(&a[b[c[d]]],0,1), and a, b, c and d all are not in cache -
will we incur CPU wait time for fetching a,b and c?
This [0] guys are expoiting C++ coroutines for prefetching this kind of stuff,
but it seems like too much of hassle.
>
>> Also, that research is quite distant from B-tree binsearch: thier data items
>> are small and do not represent reference to actual data. Eytzinger exploits
>> the fact that two posiible future keys share same cache line. But it is hard
>> to provide, given we place items at quite random place of the page.
>
> Based on the the above quotes, I'm not suggesting we should use
> Eytzinger for search-within-page. But out of curiosity, I checked,
> and saw that you can actually get a pair of index tuples holding a
> six-character text key or an integer key into a 64 byte cache line.
Well, this proves that in theory Eytzinger is an opportunity.
Best regards, Andrey Borodin.
[0] http://www.vldb.org/pvldb/vol11/p1702-jonathan.pdf