On Thu, Aug 18, 2016 at 5:05 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Aug 17, 2016 at 10:54 PM, Claudio Freire <klaussfre...@gmail.com> > wrote: >> The attached patch tries to maintain the initial status of B-Tree >> indexes, which are created with equal-key runs in physical order, >> during the whole life of the B-Tree, and make key-tid pairs >> efficiently searchable in the process. > > I have two pretty important questions that doesn't seem to be > addressed in your email.
I believe I addressed that in the README, but if I wasn't clear, let me know and I'll expand there too. Summarizing here: > 1. How does it do that? It adds the block number and the offset number of the index tuple's t_tid (that is, the pointer into the heap) as a final (implicit) sort key, as is done already in tuplesort. It just keeps doing that while comparing keys in the B-Tree when searching the leaf page for insertion, so the insertion point now points to the exact point where the insertion has to happen to maintain that condition, and not just to the first valid position within the same-key run. In fact, that's why non-leaf index tuples need a different format, because while leaf index tuples contain the heap pointer already, non-leaf ones contain only the downlink, not the pointer into the heap. To be able to do comparisons and pick the right downlink, the original heap pointer in the leaf index tuple is copied into the downlink index tuple when splitting pages into an additional IndexTupleData header that is prepended only to non-leaf index tuples. This representation is chosen to minimize code changes, such that (itup+1) is usable as before, and also because it's reasonably compact. But it *is* necessary to check whether it's a leaf or non-leaf tuple to choose whether to use (itup+1) or just itup, which is why the patch requires so many changes in so many places. > 2. What's the benefit? The idea came up in the discussion about WARM updates. Basically, in various situations it is convenient to be able to find the index tuples that point to a specific heap tuple. Without this, doing so isn't efficient - the only way to find such index tuples is to find the leftmost index tuple that has the same key, and walk the index right links until the right tid is found. For non-unique indexes, but also for unique indexes with heavy bloat, this could involve a large amount of I/O, so it isn't efficient in several contexts. Think vacuum and WARM updates mostly (like HOT updates, but where only some indexes don't need updating, and some do). Having the ability to find a heap tuple by key-ctid pair is thus beneficial because it allows efficient implementations for those operations. It may benefit other parts of the code, but those are the ones that come to mind. It also makes index scans of the form SELECT * FROM t WHERE some_col = some_const; Scan the heap in sequential order, even if some_col has low cardinality (ie: the query returns lots of tuples), which is a nice performance side effect. -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers