On 12 September 2015 at 21:19, Andres Freund <and...@anarazel.de> wrote:
> On 2015-09-12 13:12:26 +0100, Simon Riggs wrote:
> > Why do we have to do buffer lookups using the full buffer tag?
> We don't necessarily.
> > Why not just use (relNode, blockNum) and resolve hash collisions, if any?
> I tried that and unfortunately it came out as a negative - the number of
> collision gets higher and collisions in a hashtable will often imply a
> cache miss. You can even see the comparison function rising in the
> I've actually changed course a bit and I'm trying something different: A
> two level structure. One hashtable that maps (RelFileNode, ForkNumber)
> to a 'open relation' data structure, and from there a radix tree over
> just the block number. To avoid having to look up in the hashtable
> frequently there's a pointer in RelationData to a fork's radix tree.
> This seems to have several advantages:
> * It doesn't require changes to the physical representation
> * A radix tree allows ordered traversal, allowing for efficient
> prefetching et al.
Could be very important... some benchmarks of whether that was worthwhile
would really help
> * The ability to efficiently get all pages that belong to a relation
> makes dropping/truncating tables radically more efficient than now.
Not very important, since there are other approaches
> * A single lookup in the radix tree is on average significantly faster
> than in the hash table. I've measured ~350 vs 60 cycles in a
> nonconcurrent workload and ~820 vs 72~ cycles in a concurrent
> workload, using a recorded buffer access traces.
> * Having a 'open relation' representation in shared memory also makes it
> much easier to implement more efficient relation extension and
> truncation - we can have pointers in there that signal how far the
> relation has been extended and how far it has been initialized.
Probably important, for extension
> The biggest disadvantage is that it's a good bit more complex than what
> we have right now. That we need dynamically many radix trees makes
> memory management nontrivial.
Which might mean we lose benefit by requiring many additional memory
> Sounds halfway sane?
I think it is a good research direction, as long as we get the focus right.
Another similar approach is direct allocation of chunks of memory for
in-memory access. That would be much less flexible, but access would be in
<5 cycles. I mention this for comparison only.
So if the focus is on more efficient and reasonably flexible access to data
in memory, then it sounds good.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services