On Mon, Nov 28, 2016 at 11:01 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Sun, Nov 27, 2016 at 10:44 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>> On Mon, Nov 28, 2016 at 4:50 AM, Robert Haas <robertmh...@gmail.com> wrote:
>>> Well, my original email did contain a discussion of the need for
>>> delete-marking.  I said that you couldn't do in-place updates when
>>> indexed columns were modified unless the index AMs had support for
>>> delete-marking, which is the same point you are making here.
>> Sorry, I had not read that part earlier, but now that I read it, I
>> think there is a slight difference in what I am saying.   I thought
>> along with delete-marking, we might need transaction visibility
>> information in the index as well.
> I think we need to avoid putting the visibility information in the
> index because that will make the index much bigger.

I agree that there will be an increase in index size, but it shouldn't
be much if we have transaction information (and that too only active
transactions) at page level instead of at tuple level.  I think the
number of concurrent writes on the index will be lesser as compared to
the heap.  There are a couple of benefits of having visibility
information in the index.

a. Heap and index could be independently cleaned up and in most cases
by foreground transactions only.  The work for vacuum will be quite
less as compared to now.  I think this will help us in avoiding the
bloat to a great degree.

b. Improved index-only scans, because now we don't need visibility map
of the heap to check tuple visibility.

c. Some speed improvements for index scans can also be expected
because with this we don't need to perform heap fetch for invisible
index tuples.

>  It would be a
> shame to get the visibility index out of the heap (as this design
> does) only to be forced to add it to the index.  Note that,
> percentage-wise, it's quite a bit worse to have visibility information
> in the index than in the heap, because the index tuples are going to
> be much narrower than the heap tuples, often just one column.  (The
> cost of having this information in the heap can be amortized across
> the whole width of the table.)

I think with proposed design there is a good chance that for many of
the index scans, there is a need for the extra hop to decide
visibility of index tuple. I think as of today, during index scan if
the corresponding heap page is not-all-visible, then we check the
corresponding heap tuple to decide about the visibility of index
tuple, whereas with proposed design, I think it first needs to check
heap page and then TPD, if there is more than one transaction
operating on page.

> I don't have the whole design for the delete-marking stuff worked out
> yet.  I'm thinking we could have a visibility map where the bit for
> each page is 1 if the page certainly has no pending UNDO and 0 if it
> might have some.  In addition, if a tuple is deleted or the indexed
> column value is changed, we delete-mark the associated index entries.
> If we later discover that the page has no current UNDO (i.e. is
> all-visible) and the tuple at a given TID matches our index entry, we
> can clear the delete-mark for that index entry.  So, an index-only
> scan rechecks the heap if the tuples is delete-marked OR the
> visibility-map bit for the page is not set; if neither is the case, it
> can assume the heap tuple is visible.

Such a design will work, but I think this is more work to mark the
tuple as Dead as compared to current system.

> Another option would be to get rid of the visibility map and rely only
> on the delete-marks.  If we did that, then tuples would have to be
> delete-marked when first inserted since they wouldn't be all-visible
> until sometime after the commit of the inserting transaction.

The previous idea sounds better.

>>>  That's OK, but we're in real trouble if
>>> step-3 inserted an additional index tuple pointing to that chain
>>> rather than simply noting that one already exists.  If it adds an
>>> additional one, then we'll visit two index tuples for that TID.  Each
>>> time, the visibility information in UNDO will cause us to return the
>>> correct tuple, but we've erred by returning it twice (once per index
>>> entry) rather than just once.
>> Why will scan traverse the UNDO chain if it starts after step-3 commit?
> Why wouldn't it?  I think that if an index scan hits a page with an
> UNDO chain, it always need to traverse the whole thing.

If you notice the scan has started after all the writes have
committed, so what is the guarantee that there will be a UNDO chain?
Yes, it could be there if there is some live transaction which started
before step, otherwise, I am not sure if we can guarantee that UNDO
chain will be present.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to