On Fri, Nov 13, 2020 at 6:02 PM Antonin Houska <a...@cybertec.at> wrote: > > Amit Kapila <amit.kapil...@gmail.com> wrote: > > > On Thu, Nov 12, 2020 at 2:45 PM Antonin Houska <a...@cybertec.at> wrote: > > > > > > > > > No background undo > > > ------------------ > > > > > > Reduced complexity of the patch seems to be the priority at the moment. > > > Amit > > > suggested that cleanup of an orphaned relation file is simple enough to be > > > done on foreground and I agree. > > > > > > > Yeah, I think we should try and see if we can make it work but I > > noticed that there are few places like AbortOutOfAnyTransaction where > > we have the assumption that undo will be executed in the background. > > We need to deal with it. > > I think this is o.k. if we always check for unapplied undo during startup. >
Hmm, how it is ok to leave undo (and rely on startup) unless it is a PANIC error. IIRC, this path is invoked in non-panic errors as well. Basically, we won't be able to discard such an undo which doesn't seem like a good idea. > > > "undo worker" is still there, but it only processes undo requests after > > > server > > > restart because relation data can only be changed in a transaction - it > > > seems > > > cleaner to launch a background worker for this than to hack the startup > > > process. > > > > > > > But, I see there are still multiple undoworkers that are getting > > launched and I am not sure if that works correctly because a > > particular undoworker is connected to a database and then it starts > > processing all the pending undo. > > Each undo worker applies only transactions for its own database, see > ProcessExistingUndoRequests(): > > /* We can only process undo of the database we are connected to. */ > if (xact_hdr.dboid != MyDatabaseId) > continue; > > Nevertheless, as I've just mentioned in my response to Thomas, I admit that we > should try to live w/o the undo worker altogether. > Okay, but keep in mind that there could be a large amount of undo (unlike redo which has some limit as we can replay it from the last checkpoint) which needs to be processed but it might be okay to live with that for now. Another thing is that it seems we need to connect to the database to perform it which might appear a bit odd that we don't allow users to connect to the database but internally we are connecting it. These are just some points to consider while finalizing the solution to this. > > > Discarding > > > ---------- > > > > > > There's no discard worker for the URS infrastructure yet. I thought about > > > discarding the undo log during checkpoint, but checkpoint should probably > > > do > > > more straightforward tasks than the calculation of a new discard pointer > > > for > > > each undo log, so a background worker is needed. A few notes on that: > > > > > > * until the zheap AM gets added, only the transaction that creates the > > > undo > > > records needs to access them. This assumption should make the > > > discarding > > > algorithm a bit simpler. Note that with zheap, the other transactions > > > need > > > to look for old versions of tuples, so the concept of > > > oldestXidHavingUndo > > > variable is needed there. > > > > > > * it's rather simple to pass pointer the URS pointer to the discard > > > worker > > > when transaction either committed or the undo has been executed. > > > > > > > Why can't we have a separate discard worker which keeps on scanning > > the undorecords and discard accordingly? Giving the onus of foreground > > process might be tricky because say discard worker is not up to speed > > and we ran out of space to pass such information for each commit/abort > > request. > > Sure, there should be a discard worker. The question is how to make its work > efficient. The initial run after restart probably needs to scan everything > between 'discard' and 'insert' pointers, > Yeah, such an initial scan would be helpful to identify pending aborts and allow them to be processed. > but then it should process only the > parts created by individual transactions. > Yeah, it needs to process transaction-by-transaction to see which all we can discard. Also, note that in Single-User mode we need to discard undo after commit. I think we also need to maintain oldestXidHavingUndo for CLOG truncation and transaction-wraparound. We can't allow CLOG truncation for the transaction whose undo is not discarded as that could be required by some other transaction. For similar reasons, we can't allow transaction-wraparound and we need to integrate this into the existing xid-allocation mechanism. I have found one of the old patch (Allow-execution-and-discard-of-undo-by-background-wo) attached where all these concepts were implemented. Unless you have a reason why we don't these things, you might want to refer to the attached patch to either re-use or refer to these ideas. There are a few other things like undorequest and some undoworker mechanism which you can ignore. -- With Regards, Amit Kapila.