On Tue, Aug 2, 2016 at 5:51 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> Why we need to add a record in all indexes if only the key
> corresponding to one of indexes is updated?  Basically, if the tuple
> can fit on same page, why can't we consider it as HOT (or HPT - heap
> partial tuple or something like that), unless it updates all the keys
> for all the indexes.  Now, we can't consider such tuple versions for
> pruning as we do for HOT.  The downside of this could be that we might
> need to retain some of the line pointers for more time (as we won't be
> able to reuse the line pointer till it is used in any one of the
> indexes and those could be reused once we make next non-HOT update).
> However, this should allow us not to update the indexes for which the
> corresponding column in tuple is not updated.  I think it is a basic
> premise that if any index column is updated then the update will be
> considered as non-HOT, so there is a good chance that I might be
> missing something here.

Well, I think that the biggest advantage of a HOT update is the fact
that it enables HOT pruning.  In other words, we're not primarily
trying to minimize index traffic; we're trying to make cleanup of the
heap cheaper.  So this could certainly be done, but I'm not sure it
would buy us enough to be worth the engineering effort involved.

Personally, I think that incremental surgery on our current heap
format to try to fix this is not going to get very far.  If you look
at the history of this, 8.3 was a huge release for timely cleanup of
dead tuple.  There was also significant progress in 8.4 as a result of
5da9da71c44f27ba48fdad08ef263bf70e43e689.   As far as I can recall, we
then made no progress at all in 9.0 - 9.4.  We made a very small
improvement in 9.5 with 94028691609f8e148bd4ce72c46163f018832a5b, but
that's pretty niche.  In 9.6, we have "snapshot too old", which I'd
argue is potentially a large improvement, but it was big and invasive
and will no doubt pose code maintenance hazards in the years to come;
also, many people won't be able to use it or won't realize that they
should use it.  I think it is likely that further incremental
improvements here will be quite hard to find, and the amount of effort
will be large relative to the amount of benefit.  I think we need a
new storage format where the bloat is cleanly separated from the data
rather than intermingled with it; every other major RDMS works that
way.  Perhaps this is a case of "the grass is greener on the other
side of the fence", but I don't think so.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to