On Sun, Nov 27, 2016 at 10:44 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: > On Mon, Nov 28, 2016 at 4:50 AM, Robert Haas <robertmh...@gmail.com> wrote: >> Well, my original email did contain a discussion of the need for >> delete-marking. I said that you couldn't do in-place updates when >> indexed columns were modified unless the index AMs had support for >> delete-marking, which is the same point you are making here. > > Sorry, I had not read that part earlier, but now that I read it, I > think there is a slight difference in what I am saying. I thought > along with delete-marking, we might need transaction visibility > information in the index as well.
I think we need to avoid putting the visibility information in the index because that will make the index much bigger. It would be a shame to get the visibility index out of the heap (as this design does) only to be forced to add it to the index. Note that, percentage-wise, it's quite a bit worse to have visibility information in the index than in the heap, because the index tuples are going to be much narrower than the heap tuples, often just one column. (The cost of having this information in the heap can be amortized across the whole width of the table.) I don't have the whole design for the delete-marking stuff worked out yet. I'm thinking we could have a visibility map where the bit for each page is 1 if the page certainly has no pending UNDO and 0 if it might have some. In addition, if a tuple is deleted or the indexed column value is changed, we delete-mark the associated index entries. If we later discover that the page has no current UNDO (i.e. is all-visible) and the tuple at a given TID matches our index entry, we can clear the delete-mark for that index entry. So, an index-only scan rechecks the heap if the tuples is delete-marked OR the visibility-map bit for the page is not set; if neither is the case, it can assume the heap tuple is visible. Another option would be to get rid of the visibility map and rely only on the delete-marks. If we did that, then tuples would have to be delete-marked when first inserted since they wouldn't be all-visible until sometime after the commit of the inserting transaction. > BTW, it is not completely clear > whether you want a delete-marking system or you think we could do > without that by avoiding in-place updates, it seems to me from what > you have written that you are inclined towards having a delete-marking > system. Yes, that's my inclination. I don't think it would be necessary to have the delete-marking in order to produce a committable patch, but the benefit of this approach would be much reduced without that. >> However, >> I agree that the case where the indexed column gets set back to a >> previous value while the old index tuple for that value still exists >> is particularly tricky. I think that what we need to do there is >> avoid inserting an index entry for a given value/TID pair if an index >> entry for that value/TID pair already exists. > > Are you saying this for indexes with a delete-marking system or for > indexes without that or for both? I'm saying that in any case in which you allow in-place update, you have to make sure you don't get multiple entries pointing at the same TID unless they have different values in the index tuple. >> That's a little different from your analysis. In your step-3 >> analysis, you say that an index scan for 2 will find the step-1 tuple, >> but that's not really right. The index scan will find the index tuple >> which points to the whole chain of tuples, step-3 => step-2 => step-1, >> and it will decide which heap tuple from that chain the user can see >> based on the user's snapshot. > > I think the scan will not traverse the chain if it starts after > step-3's commit and that's what I intend to say. I don't see why that would be true. >> That's OK, but we're in real trouble if >> step-3 inserted an additional index tuple pointing to that chain >> rather than simply noting that one already exists. If it adds an >> additional one, then we'll visit two index tuples for that TID. Each >> time, the visibility information in UNDO will cause us to return the >> correct tuple, but we've erred by returning it twice (once per index >> entry) rather than just once. > > Why will scan traverse the UNDO chain if it starts after step-3 commit? Why wouldn't it? I think that if an index scan hits a page with an UNDO chain, it always need to traverse the whole thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers