On Thu, Aug 4, 2016 at 3:07 PM, Bruce Momjian <br...@momjian.us> wrote: > On Thu, Aug 4, 2016 at 02:35:32PM -0300, Claudio Freire wrote: >> > Vacuum will need to be taught not to break the invariant, but I >> > believe this is viable and efficient. >> >> >> And I didn't mention the invariant. >> >> The invariant would be that all WARM chains: >> >> * contain entire HOT chains (ie: no item pointers pointing to the >> middle of a HOT chain) > > You mean no _index_ item pointers, right?
Yes >> * start at the item pointer, and end when the t_ctid of the heap >> tuple either points to another page, or a tuple with a different index >> key > > Uh, there is only one visible tuple in a HOT chain, so I don't see the > different index key as being a significant check. For a HOT chain, sure. That's why I mentioned it as an optimization. But t_ctid, I checked, unless I read the code wrong, is set for non-HOT updates too. So following t_ctid regardless of the HOT-update flags should yield update chains that can walk several buffers, and that's in fact the mechanism I'm proposing, only just stop following the chain when it jumps buffers. > That is, I think we > should check the visibility first, as we do now, and then, for the > only-visible tuple, check the key. No, because, as I explained in the example, doing that will result in duplicates being returned. The whole update chain can span multiple WARM chains, and each WARM chain can span multiple HOT chains. Consider, assuming there's an index on id and another on val: INSERT INTO TABLE t (id,val,dat) VALUES (32,'a','blabla'); UPDATE t SET dat='blabla2' where id = 32; UPDATE t SET dat='blabla3', id=33 where id = 32; UPDATE t SET val='b' where id = 33; UPDATE t SET dat='blabla4' where id = 33; UPDATE t SET val='a' where id = 33; This produces a case similar to the one I described. Assuming all updates happen on the same page, the first will be HOT, so no key check is needed. The second cannot be HOT, because it updates the id. But it can be WARM, if it's done on the same page, so it can avoid updating the index on val. The third will need a new index item pointer, because it has a new key. The t_ctid chain will walk all the versions of the row, but the one with 'blabla2' will not have the HOT-updated bit set. So regular HOT chain walking will stop there, I propose to not stop there, so the index on val can follow the chain into blabla3 as if it were a HOT update. Correctness is achieved only if there is no index item pointer on that index pointing to that index version, and since there is no efficient way of checking, the invariants I propose aim to guarantee it so it can be assumed. Since WARM updates don't create new index item pointers, what needs to be checked is whether updating from version 2 to version 3 would be a WARM update on this index or not, and that equals to checking the index keys for equality. > I don't see a value in (expensive) > checking the key of an invisible tuple in hopes of stoping the HOT chain > traversal. Not HOT chain, the point is to guarantee correctness by properly identifying the end of the WARM (not HOT) chain, which is left implicit. > Also, what happens when a tuple chain goes off-page, then returns to the > original page? The WARM chain ends on the page hop, and when it returns it's a new WARM chain. So traversal would not follow t_ctid pointers that hop pages, which makes it efficient, and also guarantees correctness (since if the is a page hop, we know the index will contain an item pointer to the version in that other page, so we'll visit it when we get there). > That is a new HOT chain given our current code, but > would the new tuple addition realize it needs to create a new index > tuple? The new tuple code needs to check the index's root HOT tid for a > non-match. I'm not proposing to change how HOT works, but adding another very similar mechanism on top of it called WARM. So HOT will keep working like that, HOT pruning will happen as usual, but we'll have the concept (immaterialized in the physical representation of the data, just implicit) of WARM chains. WARM chains would span one or several HOT chains, so they're "bigger". >> This is maintained by: >> >> * No item pointer will be created for row versions whose immediately >> previous version is already on the index with the same key > > You really only have the ctid HOT head stored in the index, not the > immediate ctid. I know. I just mentioned it just in case there was a transient time during cleanup when it isn't true, but thinking it a little bit more, if it weren't, HOT would also cause duplicates during index scans, so it's probably not the case (or protected with a lock). -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers