Re: POC: Cleaning up orphaned files using undo logs

Dilip Kumar Wed, 07 Aug 2019 03:53:58 -0700

On Tue, Jul 30, 2019 at 1:54 PM Dilip Kumar <dilipbal...@gmail.com> wrote:
>
> On Tue, Jul 30, 2019 at 12:21 PM Thomas Munro <thomas.mu...@gmail.com> wrote:
> >
> > One data structure that could perhaps hold this would be
> > UndoLogTableEntry (the per-backend cache, indexed by undo log number,
> > with pretty fast lookups; used for things like
> > UndoLogNumberGetCategory()).  As long as you never want to have
> > inter-transaction compression, that should have the right scope to
> > give recovery per-undo log tracking.  If you ever wanted to do
> > compression between transactions too, maybe UndoLogSlot could work,
> > but that'd have more complications.
>
> I think this could be a good idea.  I had thought of keeping in the
> slot as my 3rd option but later I removed it thinking that we need to
> expose the compression field to the undo log layer.  I think keeping
> in the UndoLogTableEntry is a better idea than keeping in the slot.
> But, I still have the same problem that we need to expose undo
> record-level fields to undo log layer to compute the cache entry size.
>   OTOH, If we decide to get from the first record of the page (as I
> mentioned up thread) then I don't think there is any performance issue
> because we are inserting on the same page.  But, for doing that we
> need to unpack the complete undo record (guaranteed to be on one
> page).   And, UnpackUndoData will internally unpack the payload data
> as well which is not required in our case unless we change
> UnpackUndoData such that it unpacks only what the caller wants (one
> input parameter will do).
>
> I am not sure out of these two which idea is better?
>


I have one more problem related to compression of the command id
field.  Basically, the problem is that we don't set the command id in
the WAL and we will always store FirstCommandId in the undo[1].   So
suppose there were 2 operations under a different CID then during DO
time both the undo record will store the CID field in their respective
undo records but during REDO time, all the commands will store the
same CID(FirstCommandId) so as per the compression logic the
subsequent record for the same transaction will not store the CID
field.  I am not sure what is the best way to handle this but I have
few ideas.

1) Don't compress the CID field ever.
2) Write CID in WAL,  but just for compressing the CID field in undo
(which may not necessarily go to disk) we don't want to add extra 4
bytes in the WAL.

Any better idea to handle this?

[1] 
https://www.postgresql.org/message-id/CAFiTN-u2Ny2E-NgT8nmE65awJ7keOzePODZTEg98ceF%2BsNhRtw%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: POC: Cleaning up orphaned files using undo logs

Reply via email to