On Tue, Dec 6, 2016 at 9:35 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Mon, Dec 5, 2016 at 4:49 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: >> I can very well understand the reason for not doing so (IIUC, it is >> about complexity and time to get it right when we are already trying >> to solve one big and complex problem of the system), but saying most >> of the benefit can be realized by having simple optimization seems >> less convincing. I think having simple optimizations won't solve the >> multi-pass vacuum problem. However, having the visibility information >> in the index will solve that and avoid index bloat much aggressively. > > I don't think that multi-pass vacuum would exist in this system, > regardless of any of the details we're talking about here. If you are > doing in-place updates of tuples and changing the indexed columns, > then you can't rely on the current system for removing index entries > at all.
Sure. The point I was trying to make was that index entries can't be removed independently apart from the case where some previous index scan has marked it dead after consulting heap. In this new system, I think we can't remove undo entries of heap page till we clear corresponding index entries. I think we need to somehow collect the old values from undo corresponding to index and then scan the index remove the index entry and after that corresponding undo entry can be removed. > The indexed entries don't point to a "dead" line pointer; > they point to a tuple that no longer matches the index entry. > >>> 3. Otherwise, you need to look at the heap page. But if the heap >>> page's UNDO pointer is invalid, then the whole page is all-visible, so >>> you're done. (You can also set the visibility map bit and/or the >>> index tuple's definitely-all-visible bit to avoid having to recheck >>> the heap page in the future.) >>> >>> 4. Each tuple will have a bit indicating whether it's been recently >>> modified; that is, whether the transaction whose UNDO log is >>> referenced by the page's UNDO pointer did anything to that tuple; or >>> if the page points to a TPD entry, whether any of the transactions in >>> the TPD modified that tuple. If that bit is not set for that >>> particular tuple, it's all-visible and you're done. >> >> I think 4th optimization is especially good and can cover many cases, >> but I am not sure how do we unset it once it is set. For example, >> once the effect of transaction that has touched that row is >> all-visible, how will we unset it. It might be better to store the >> offset of transaction that has last modified the tuple, if the last >> transaction that has touched the row is visible to snapshot, then we >> don't need to process the UNDO. > > If the page has an invalid UNDO pointer and you change it to a valid > UNDO pointer, you unset all the bits at the same time, except of > course for the bit pertaining to the tuple you are modifying. If the > page's UNDO pointer is still valid when you update it, then you set > the bits for any tuples you are modifying for which it is not already > set. > Okay, so this optimization can work only after all the active transactions operating on a page are finished. If that is true, in some cases such a design can consume a lot of CPU traversing all the tuples in a page for un-setting the bit, especially when such tuples are less. >> One more thing that needs some thoughts is how will the rollback work >> if keep just one bit for delete marking. I mean for heap we can >> directly apply from UNDO, but for the index, I think we need to also >> take care of undoing it in some way. For example, search the tree to >> remove the deleting marking from old value and delete marking the new >> value. Now if that is what we need to do then first we need to >> traverse the tree twice which is okay considering rollback is a seldom >> operation, the second challenge would be how to make such an operation >> atomic with respect to WAL. > > We don't necessarily need UNDO to clean up the indexes, although it > might be a good idea. It would *work* to just periodically scan the > index for delete-marked items. For each one, visit the heap and see > if there's a version of that tuple present in the heap or current UNDO > that matches the index entry. If not, remove the index entry. > I think this is somewhat similar to how we clean the index now and seems to be a workable design. However, don't you think that it will have somewhat similar characteristics for index-bloat as we have now? OTOH for heap, it will help us to take away old versions away from the main heap, but still, we can't get rid of that space or reuse that space till we can clean corresponding index entries. > However, we could try to be more aggressive: when you UNDO an UPDATE > or DELETE in the heap, also go search the index and remove any > delete-marked entries that are no longer needed. The advantage of > this is that we'd be more likely to visit the index entries while they > are still in cache. The disadvantage is that it is complex and might > fail if the index is corrupted. We can't have the whole system melt > down in that case; UNDO has to always succeed. > Agreed. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers