Pavan Deolasee <pavan.deola...@gmail.com> wrote:
One good thing is that the patch is ready and fully functional. So that allows those who are keen to run real performance tests and see the actual impact of the patch.
I see your point. But I would like to think this way: does the technology significantly help many common use cases, that are currently not addressed by HOT? It probably won't help all workloads, that's given. Also, we don't have any credible alternative while this patch has progressed quite a lot. May be Robert will soon present the pluggable storage/UNDO patch and that will cover everything and more that is currently covered by HOT/WARM. That will probably make many other things redundant.
Well, I don't assume that it will; again, I just don't know. I agree with your general assessment of things, which is that WARM, EDB's Z-Heap/UNDO project, and things like IOTs have significant overlap in terms of the high-level problems that they fix. While it's hard to say just how much overlap exists, it's clearly more than a little. And, you are right that we don't have a credible alternative in this general category right now. The WARM patch is available today. As you may have noticed, in recent weeks I've been very vocal about the role of index bloat in cases where bloat has a big impact on production workloads. I think that it has an under-appreciated role in workloads that deteriorate over time, as bloat accumulates. Perhaps HOT made such a big difference to workloads 10 years ago not just because it prevented creating new index entries. It also reduced fragmentation of the keyspace in indexes, by never inserting duplicates in the first place. I have some rough ideas related to this, and to the general questions you're addressing. I'd like to run these by you. In-place index updates + HOT ============================ Maybe we could improve things markedly in this general area by "chaining together HOT chains", and updating index heap pointers in place, to point to the start of the latest HOT chain in that chain of chains (provided the index tuple was "logically unchanged" -- otherwise, you'd need to have both sets of indexed values at once, of course). Index tuples therefore always point to the latest HOT chain, favoring recent MVCC snapshots over older ones. Pruning ------- HOT pruning is great because you can remove heap bloat without worrying about there being index entries with heap item pointers pointing to what is removed. But isn't that limitation as much about what is in the index as it is about what is in the heap? Under this scheme, you don't even have to keep around the old ItemId stub when pruning, if it's a sufficiently old HOT chain that no index points to the corresponding TID. That may not seem like a lot of bloat to have to keep around, but it accumulates within a page until VACUUM runs, ultimately limiting the effectiveness of pruning for certain workloads. Old snapshots/row versions -------------------------- Superseding HOT chains have their last heap tuple's t_tid point to the start of the preceding/superseded HOT chain (not their own TID, as today, which is redundant), which may or may not be on the same heap page. That's how old snapshots go backwards to get old versions, without needing their own "logically redundant" index entries. So with UPDATE heavy workloads that are essentially HOT-safe today, performance doesn't tank due to a long running transaction that obstructs pruning within a heap page, and thus necessitates the insertion of new index tuples. That's the main justification for this entire design. It's also possible that pruning can be taught that since only one index update was logically necessary when the to-be-pruned HOT chain was created, it's worth doing a "retail index tuple deletion" against the index tuple that was logically necessary, then completely obliterating the HOT chain, stub item pointer and all. Bloat and locality ------------------ README.HOT argues against HOT chains that span pages, which this is a bit like, on the grounds that it's bad news that every recent snapshot has to go through the old heap page. That makes sense, but only because the temporal locality there is horrible, which would not be the case here. README.HOT says that that cost is not worth the benefit of preventing a new index write, but I think that it ought to take into account that not all index writes are equal. There is an appreciable difference between inserting a new tuple, and updating one in-place. We can remove the cost (hurting new snapshots by making them go through old heap pages) while preserving most of the benefits (no logically unnecessary index bloat). The benefit of HOT is clearly more bloat prevention than not having to visit indexes at all. InnoDB secondary index updates update the index twice: The first time, during the update itself, and the second time, by the purge thread, once the xact commits. Clearly they care about doing clean-up of indexes eagerly. Also, a key design goal of UNDO within the original ARIES paper is to make deletion of index tuples make the space reclaimable immediately, even before the transaction commits. While it wouldn't be practical to get that to work for the general case on an MVCC system, I think it can work for logically unchanged index tuples through in-place index tuple updates. If nothing else, the priorities for ARIES tell us something. Obviously what I describe here is totally hand-wavy, and actually undertaking this project would be incredibly difficult. If nothing else it may be useful to you, or to others, to hear me slightly reframe the benefits of HOT in this way. Moreover, a lot of what I'm describing here has overlap with stuff that I presume that EDB will need for Z-Heap/UNDO. For example, since it's clear that you cannot immediately remove an updated secondary index tuple in UNDO, it still has to have its own "out of band" lifetime. How is it ever going to get physically deleted, otherwise? So maybe you end up updating that in-place, to point into UNDO directly, rather than pointing to a heap TID that is necessarily the most recent version, which could introduce ambiguity (what happens when it is changed, then changed back?). That's actually rather similar to what you could do with HOT + the existing heapam, except that there is a clearer demarcation of "current" (heap) and "pending garbage" (UNDO) within Robert's design. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers