On 25 April 2015 at 01:12, Amit Kapila <amit.kapil...@gmail.com> wrote:

> On Sat, Apr 25, 2015 at 1:58 AM, Jim Nasby <jim.na...@bluetreble.com>
> wrote:
> >
> > On 4/23/15 10:40 PM, Amit Kapila wrote:
> >>
> >> I agree with you and what I think one of the major reasons of bloat is
> that
> >> Index segment doesn't have visibility information due to which clearing
> of
> >> Index needs to be tied along with heap.  Now if we can move transaction
> >> information at page level, then we can even think of having it in Index
> >> segment as well and then Index can delete/prune it's tuples on it's own
> >> which can reduce the bloat in index significantly and there is a benefit
> >> to Vacuum as well.
> >
> >
> > I don't see how putting visibility at the page level helps indexes at
> all. We could already put XMIN in indexes if we wanted, but it won't help,
> because...
> >
>
> We can do that by putting transaction info at tuple level in index as
> well but that will be huge increase in size of index unless we devise
> a way to have variable index tuple header rather than a fixed.
>
> >> Now this has some downsides as well like Delete
> >> needs to traverse Index segment as well to Delete mark the tuples, but
> >> I think the upsides of reducing bloat can certainly outweigh the
> downsides.
> >
> >
> > ... which isn't possible. You can not go from a heap tuple to an index
> tuple.
>
> We will have the access to index value during delete, so why do you
> think that we need linkage between heap and index tuple to perform
> Delete operation?  I think we need to think more to design Delete
> .. by CTID, but that should be doable.
>

I see some assumptions here that need to be challenged.

We can keep xmin and/or xmax on index entries. The above discussion assumes
that the information needs to be updated synchronously. We already store
visibility information on index entries using the lazily updated killtuple
mechanism, so I don't see much problem in setting the xmin in a similar
lazy manner. That way when we use the index if xmax is set we use it, if it
is not we check the heap. (And then you get to freeze indexes as well ;-( )
Anyway, I have no objection to making index AM pass visibility information
to indexes that wish to know the information, as long as it is provided
lazily.

The second assumption is that if we had visibility information in the index
that it would make a difference to bloat. Since as I mention, we already do
have visibility information, I don't see that adding xmax or xmin would
make any difference at all to bloat. So -1 to adding it **for that reason**.


A much better idea is to work out how to avoid index bloat at cause. If we
are running an UPDATE and we cannot get a cleanup lock, we give up and do a
non-HOT update, causing the index to bloat. It seems better to wait for a
short period to see if we can get the cleanup lock. The short period is
currently 0, so lets start there and vary the duration of wait upwards
proportionally as the index gets more bloated.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to