Re: [HACKERS] BLK_DONE state in XLogReadBufferForRedoExtended
On Tue, Oct 17, 2017 at 7:53 AM, Michael Paquier wrote: > On Mon, Oct 16, 2017 at 9:50 PM, Amit Kapila wrote: >> If above analysis is correct, then I think we can say that row state >> in a page will be same during recovery as it was when the original >> operation was performed if the full_page_writes are enabled. I am not >> sure how much this can help in current heap format, but this can help >> in zheap (undo based heap). > > If I understood that correctly, that looks like a sane assumption. For > REGBUF_NO_IMAGE you may need to be careful though with undo > operations. > Right, but as of now, we don't need to use this assumption with REGBUF_NO_IMAGE. >> In zheap, we are writing complete tuple for Delete operation in undo >> so that we can reclaim the corresponding tuple space as soon as the >> deleting transaction is committed. Now, during recovery, we have to >> generate the complete undo record (which includes the entire tuple) >> and for that ideally, we should write the complete tuple in WAL, but >> instead of that, I think we can regenerate it from the original page. >> This is only applicable when full_page_writes are enabled, otherwise, >> a complete tuple is required in WAL. > > Yeah, you should really try to support both modes as well. > I also think so. > Fortunately, it is possible to know if full page writes are enforced > at the moment a record is assembled and inserted, so you could rely on > that. > Yeah, but actually we need to know whether full page writes are enforced while forming the record (something like in log_heap_update). Now, ideally to read the flags XLogCtlInsert-> fullPageWrites or XLogCtlInsert->forcePageWrites, we need an insertion lock as we do in XLogInsertRecord. However, I think we don't need an insertion lock to read the values for this purpose, rather we can use GetFullPageWriteInfo which doesn't need a lock. The reason is that if the value of doPageWrites is true while forming and assembling the WAL records, then we will include the copy of page even if the value changes in XLogInsertRecord. OTOH, if it is false while forming and assembling the WAL records, then we would have to anyway include undo tuple in the WAL record which will avoid the dependency on full_page_image, so even if doPageWrites changes to true in XLogInsertRecord, we don't need to worry. >> I am not sure how much above makes sense to anyone without a detailed >> explanation, but I thought I should give some background on why I >> asked this question. However, if anybody needs more explanation or >> sees any fault in above understanding, please let me know. > > Thanks for clarifying, I was wondering the reason behind the question > as well. It is the second time that I see the word zheap on -hackers, > and the first time was no longer than 2 days ago by Robert. > This is a big undertaking and will take time to reach a stage where the whole project can be shared, but some of the important design points which are quite linked with existing technology can be discussed earlier to avoid making wrong assumptions. Thanks for having a discussion on this topic. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BLK_DONE state in XLogReadBufferForRedoExtended
On Mon, Oct 16, 2017 at 9:50 PM, Amit Kapila wrote: > If above analysis is correct, then I think we can say that row state > in a page will be same during recovery as it was when the original > operation was performed if the full_page_writes are enabled. I am not > sure how much this can help in current heap format, but this can help > in zheap (undo based heap). If I understood that correctly, that looks like a sane assumption. For REGBUF_NO_IMAGE you may need to be careful though with undo operations. > In zheap, we are writing complete tuple for Delete operation in undo > so that we can reclaim the corresponding tuple space as soon as the > deleting transaction is committed. Now, during recovery, we have to > generate the complete undo record (which includes the entire tuple) > and for that ideally, we should write the complete tuple in WAL, but > instead of that, I think we can regenerate it from the original page. > This is only applicable when full_page_writes are enabled, otherwise, > a complete tuple is required in WAL. Yeah, you should really try to support both modes as well. Fortunately, it is possible to know if full page writes are enforced at the moment a record is assembled and inserted, so you could rely on that. > I am not sure how much above makes sense to anyone without a detailed > explanation, but I thought I should give some background on why I > asked this question. However, if anybody needs more explanation or > sees any fault in above understanding, please let me know. Thanks for clarifying, I was wondering the reason behind the question as well. It is the second time that I see the word zheap on -hackers, and the first time was no longer than 2 days ago by Robert. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BLK_DONE state in XLogReadBufferForRedoExtended
On Fri, Oct 13, 2017 at 11:57 AM, Amit Kapila wrote: > On Fri, Oct 13, 2017 at 10:32 AM, Michael Paquier > wrote: >> On Thu, Oct 12, 2017 at 10:47 PM, Amit Kapila >> wrote: >>> Today, I was trying to think about cases when we can return BLK_DONE >>> in XLogReadBufferForRedoExtended. One thing that occurred to me is >>> that it can happen during the replay of WAL if the full_page_writes is >>> off. Basically, when the full_page_writes is on, then during crash >>> recovery, it will always first restore the full page image of page and >>> then apply the WAL corresponding to that page, so we will never hit >>> the case where the lsn of the page is greater than lsn of WAL record. >>> >>> Are there other cases for which we can get BLK_DONE state? Is it >>> mentioned somewhere in code/comments which I am missing? >> >> Remember the thread about meta page optimization... Some index AMs use >> XLogInitBufferForRedo() to redo their meta pages and those don't have >> a FPW so when doing crash recovery you may finish by not replaying a >> meta page if its LSN on the page header is newer than what's being >> replayed. >> > > I think for metapage usage, it will anyway apply the changes. See > _bt_restore_page. > >> That's another case where BLK_DONE can be reached, even if >> full_page_writes is on. >> > > Yeah and probably if someone uses REGBUF_NO_IMAGE. However, I was > mainly thinking about cases where caller is not doing anything to > prevent full_page_image explicitly. > > If above analysis is correct, then I think we can say that row state in a page will be same during recovery as it was when the original operation was performed if the full_page_writes are enabled. I am not sure how much this can help in current heap format, but this can help in zheap (undo based heap). In zheap, we are writing complete tuple for Delete operation in undo so that we can reclaim the corresponding tuple space as soon as the deleting transaction is committed. Now, during recovery, we have to generate the complete undo record (which includes the entire tuple) and for that ideally, we should write the complete tuple in WAL, but instead of that, I think we can regenerate it from the original page. This is only applicable when full_page_writes are enabled, otherwise, a complete tuple is required in WAL. I am not sure how much above makes sense to anyone without a detailed explanation, but I thought I should give some background on why I asked this question. However, if anybody needs more explanation or sees any fault in above understanding, please let me know. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BLK_DONE state in XLogReadBufferForRedoExtended
On Fri, Oct 13, 2017 at 10:32 AM, Michael Paquier wrote: > On Thu, Oct 12, 2017 at 10:47 PM, Amit Kapila wrote: >> Today, I was trying to think about cases when we can return BLK_DONE >> in XLogReadBufferForRedoExtended. One thing that occurred to me is >> that it can happen during the replay of WAL if the full_page_writes is >> off. Basically, when the full_page_writes is on, then during crash >> recovery, it will always first restore the full page image of page and >> then apply the WAL corresponding to that page, so we will never hit >> the case where the lsn of the page is greater than lsn of WAL record. >> >> Are there other cases for which we can get BLK_DONE state? Is it >> mentioned somewhere in code/comments which I am missing? > > Remember the thread about meta page optimization... Some index AMs use > XLogInitBufferForRedo() to redo their meta pages and those don't have > a FPW so when doing crash recovery you may finish by not replaying a > meta page if its LSN on the page header is newer than what's being > replayed. > I think for metapage usage, it will anyway apply the changes. See _bt_restore_page. > That's another case where BLK_DONE can be reached, even if > full_page_writes is on. > Yeah and probably if someone uses REGBUF_NO_IMAGE. However, I was mainly thinking about cases where caller is not doing anything to prevent full_page_image explicitly. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BLK_DONE state in XLogReadBufferForRedoExtended
On Thu, Oct 12, 2017 at 10:47 PM, Amit Kapila wrote: > Today, I was trying to think about cases when we can return BLK_DONE > in XLogReadBufferForRedoExtended. One thing that occurred to me is > that it can happen during the replay of WAL if the full_page_writes is > off. Basically, when the full_page_writes is on, then during crash > recovery, it will always first restore the full page image of page and > then apply the WAL corresponding to that page, so we will never hit > the case where the lsn of the page is greater than lsn of WAL record. > > Are there other cases for which we can get BLK_DONE state? Is it > mentioned somewhere in code/comments which I am missing? Remember the thread about meta page optimization... Some index AMs use XLogInitBufferForRedo() to redo their meta pages and those don't have a FPW so when doing crash recovery you may finish by not replaying a meta page if its LSN on the page header is newer than what's being replayed. That's another case where BLK_DONE can be reached, even if full_page_writes is on. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] BLK_DONE state in XLogReadBufferForRedoExtended
Today, I was trying to think about cases when we can return BLK_DONE in XLogReadBufferForRedoExtended. One thing that occurred to me is that it can happen during the replay of WAL if the full_page_writes is off. Basically, when the full_page_writes is on, then during crash recovery, it will always first restore the full page image of page and then apply the WAL corresponding to that page, so we will never hit the case where the lsn of the page is greater than lsn of WAL record. Are there other cases for which we can get BLK_DONE state? Is it mentioned somewhere in code/comments which I am missing? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers