Pavan Deolasee <> wrote:
One good thing is that the patch is ready and fully functional. So that
allows those who are keen to run real performance tests and see the actual
impact of the patch.

Very true.

I see your point. But I would like to think this way: does the technology
significantly help many common use cases, that are currently not addressed
by HOT? It probably won't help all workloads, that's given. Also, we don't
have any credible alternative while this patch has progressed quite a lot.
May be Robert will soon present the pluggable storage/UNDO patch and that
will cover everything and more that is currently covered by HOT/WARM. That
will probably make many other things redundant.

Well, I don't assume that it will; again, I just don't know. I agree
with your general assessment of things, which is that WARM, EDB's
Z-Heap/UNDO project, and things like IOTs have significant overlap in
terms of the high-level problems that they fix. While it's hard to say
just how much overlap exists, it's clearly more than a little. And, you
are right that we don't have a credible alternative in this general
category right now. The WARM patch is available today.

As you may have noticed, in recent weeks I've been very vocal about the
role of index bloat in cases where bloat has a big impact on production
workloads. I think that it has an under-appreciated role in workloads
that deteriorate over time, as bloat accumulates. Perhaps HOT made such
a big difference to workloads 10 years ago not just because it prevented
creating new index entries. It also reduced fragmentation of the
keyspace in indexes, by never inserting duplicates in the first place.

I have some rough ideas related to this, and to the general questions
you're addressing. I'd like to run these by you.

In-place index updates + HOT

Maybe we could improve things markedly in this general area by "chaining
together HOT chains", and updating index heap pointers in place, to
point to the start of the latest HOT chain in that chain of chains
(provided the index tuple was "logically unchanged" -- otherwise, you'd
need to have both sets of indexed values at once, of course). Index
tuples therefore always point to the latest HOT chain, favoring recent
MVCC snapshots over older ones.


HOT pruning is great because you can remove heap bloat without worrying
about there being index entries with heap item pointers pointing to what
is removed. But isn't that limitation as much about what is in the index
as it is about what is in the heap?

Under this scheme, you don't even have to keep around the old ItemId
stub when pruning, if it's a sufficiently old HOT chain that no index
points to the corresponding TID. That may not seem like a lot of bloat
to have to keep around, but it accumulates within a page until VACUUM
runs, ultimately limiting the effectiveness of pruning for certain

Old snapshots/row versions

Superseding HOT chains have their last heap tuple's t_tid point to the
start of the preceding/superseded HOT chain (not their own TID, as
today, which is redundant), which may or may not be on the same heap
page. That's how old snapshots go backwards to get old versions, without
needing their own "logically redundant" index entries. So with UPDATE
heavy workloads that are essentially HOT-safe today, performance doesn't
tank due to a long running transaction that obstructs pruning within a
heap page, and thus necessitates the insertion of new index tuples.
That's the main justification for this entire design.

It's also possible that pruning can be taught that since only one index
update was logically necessary when the to-be-pruned HOT chain was
created, it's worth doing a "retail index tuple deletion" against the
index tuple that was logically necessary, then completely obliterating
the HOT chain, stub item pointer and all.

Bloat and locality

README.HOT argues against HOT chains that span pages, which this is a
bit like, on the grounds that it's bad news that every recent snapshot
has to go through the old heap page. That makes sense, but only because
the temporal locality there is horrible, which would not be the case
here. README.HOT says that that cost is not worth the benefit of
preventing a new index write, but I think that it ought to take into
account that not all index writes are equal. There is an appreciable
difference between inserting a new tuple, and updating one in-place. We
can remove the cost (hurting new snapshots by making them go through old
heap pages) while preserving most of the benefits (no logically
unnecessary index bloat).

The benefit of HOT is clearly more bloat prevention than not having to
visit indexes at all. InnoDB secondary index updates update the index
twice: The first time, during the update itself, and the second time, by
the purge thread, once the xact commits. Clearly they care about doing
clean-up of indexes eagerly. Also, a key design goal of UNDO within the
original ARIES paper is to make deletion of index tuples make the space
reclaimable immediately, even before the transaction commits. While it
wouldn't be practical to get that to work for the general case on an
MVCC system, I think it can work for logically unchanged index tuples
through in-place index tuple updates. If nothing else, the priorities
for ARIES tell us something.

Obviously what I describe here is totally hand-wavy, and actually
undertaking this project would be incredibly difficult. If nothing else
it may be useful to you, or to others, to hear me slightly reframe the
benefits of HOT in this way. Moreover, a lot of what I'm describing here
has overlap with stuff that I presume that EDB will need for
Z-Heap/UNDO. For example, since it's clear that you cannot immediately
remove an updated secondary index tuple in UNDO, it still has to have
its own "out of band" lifetime. How is it ever going to get physically
deleted, otherwise? So maybe you end up updating that in-place, to point
into UNDO directly, rather than pointing to a heap TID that is
necessarily the most recent version, which could introduce ambiguity
(what happens when it is changed, then changed back?). That's actually
rather similar to what you could do with HOT + the existing heapam,
except that there is a clearer demarcation of "current" (heap) and
"pending garbage" (UNDO) within Robert's design.

Peter Geoghegan

Sent via pgsql-hackers mailing list (
To make changes to your subscription:

Reply via email to