On 12 September 2015 at 21:19, Andres Freund <and...@anarazel.de> wrote:
> On 2015-09-12 13:12:26 +0100, Simon Riggs wrote: > > Why do we have to do buffer lookups using the full buffer tag? > > We don't necessarily. > > > Why not just use (relNode, blockNum) and resolve hash collisions, if any? > > I tried that and unfortunately it came out as a negative - the number of > collision gets higher and collisions in a hashtable will often imply a > cache miss. You can even see the comparison function rising in the > profile. > > I've actually changed course a bit and I'm trying something different: A > two level structure. One hashtable that maps (RelFileNode, ForkNumber) > to a 'open relation' data structure, and from there a radix tree over > just the block number. To avoid having to look up in the hashtable > frequently there's a pointer in RelationData to a fork's radix tree. > > This seems to have several advantages: > * It doesn't require changes to the physical representation > Seems necessary > * A radix tree allows ordered traversal, allowing for efficient > prefetching et al. > Could be very important... some benchmarks of whether that was worthwhile would really help > * The ability to efficiently get all pages that belong to a relation > makes dropping/truncating tables radically more efficient than now. > Not very important, since there are other approaches > * A single lookup in the radix tree is on average significantly faster > than in the hash table. I've measured ~350 vs 60 cycles in a > nonconcurrent workload and ~820 vs 72~ cycles in a concurrent > workload, using a recorded buffer access traces. > Good > * Having a 'open relation' representation in shared memory also makes it > much easier to implement more efficient relation extension and > truncation - we can have pointers in there that signal how far the > relation has been extended and how far it has been initialized. > Probably important, for extension > The biggest disadvantage is that it's a good bit more complex than what > we have right now. That we need dynamically many radix trees makes > memory management nontrivial. > Which might mean we lose benefit by requiring many additional memory accesses. > Sounds halfway sane? > I think it is a good research direction, as long as we get the focus right. Another similar approach is direct allocation of chunks of memory for in-memory access. That would be much less flexible, but access would be in <5 cycles. I mention this for comparison only. So if the focus is on more efficient and reasonably flexible access to data in memory, then it sounds good. -- Simon Riggs http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services