On 12 September 2015 at 21:19, Andres Freund <and...@anarazel.de> wrote:

> On 2015-09-12 13:12:26 +0100, Simon Riggs wrote:
> > Why do we have to do buffer lookups using the full buffer tag?
> We don't necessarily.
> > Why not just use (relNode, blockNum) and resolve hash collisions, if any?
> I tried that and unfortunately it came out as a negative - the number of
> collision gets higher and collisions in a hashtable will often imply a
> cache miss. You can even see the comparison function rising in the
> profile.
> I've actually changed course a bit and I'm trying something different: A
> two level structure. One hashtable that maps (RelFileNode, ForkNumber)
> to a 'open relation' data structure, and from there a radix tree over
> just the block number. To avoid having to look up in the hashtable
> frequently there's a pointer in RelationData to a fork's radix tree.
> This seems to have several advantages:
> * It doesn't require changes to the physical representation

Seems necessary

> * A radix tree allows ordered traversal, allowing for efficient
>   prefetching et al.

Could be very important... some benchmarks of whether that was worthwhile
would really help

> * The ability to efficiently get all pages that belong to a relation
>   makes dropping/truncating tables radically more efficient than now.

Not very important, since there are other approaches

> * A single lookup in the radix tree is on average significantly faster
>   than in the hash table. I've measured ~350 vs 60 cycles in a
>   nonconcurrent workload and ~820 vs 72~ cycles in a concurrent
>   workload, using a recorded buffer access traces.


> * Having a 'open relation' representation in shared memory also makes it
>   much easier to implement more efficient relation extension and
>   truncation - we can have pointers in there that signal how far the
>   relation has been extended and how far it has been initialized.

Probably important, for extension

> The biggest disadvantage is that it's a good bit more complex than what
> we have right now. That we need dynamically many radix trees makes
> memory management nontrivial.

Which might mean we lose benefit by requiring many additional memory

> Sounds halfway sane?

I think it is a good research direction, as long as we get the focus right.

Another similar approach is direct allocation of chunks of memory for
in-memory access. That would be much less flexible, but access would be in
<5 cycles. I mention this for comparison only.

So if the focus is on more efficient and reasonably flexible access to data
in memory, then it sounds good.

Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to