On Thu, Nov 29, 2018 at 3:44 PM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > On Fri, Aug 24, 2018 at 5:53 PM Alexander Korotkov > > <a.korot...@postgrespro.ru> wrote: > > > > Given I've no feedback on this idea yet, I'll try to implement a PoC > > patch for that. It doesn't look to be difficult. And we'll see how > > does it work. > > Unfortunately, current version of the patch doesn't pass the tests and fails > on > initdb. But maybe you already have this PoC you were talking about, that will > also incorporate the feedback from this thread? For now I'll move it to the > next CF.
Finally, I managed to write a PoC. If look at the list of problems I've enumerated in [1], this PoC is aimed for 1 and 3. > 1) Data corruption on file truncation error (explained in [1]). > 2) Expensive scanning of the whole shared buffers before file truncation. > 3) Cancel of read-only queries on standby even if hot_standby_feedback > is on, caused by replication of AccessExclusiveLock. 2 is pretty independent problem and could be addressed later. Basically, this patch does following: 1. Introduces new flag BM_DIRTY_BARRIER, which prevents dirty buffer from being written out. 2. Implements two-phase truncation of node buffers. First phase is prior to file truncation and marks past truncation point dirty buffers as BM_DIRTY_BARRIER. Second phase is post file truncation and actually wipes out past truncation point buffers. 3. On exception happen during file truncation, BM_DIRTY_BARRIER flag will be released from buffers. Thus, no data corruption should happens here. If file truncation was partially complete, then file might be extended by write of dirty buffer. I'm not sure how likely is it, but extension could lead to the errors again. But this still shouldn't cause a data corruption. 4. Having too many buffers marked as BM_DIRTY_BARRIER, would paralyze buffer manager. This is why we're keeping not more than NBuffers/2 to be marked as BM_DIRTY_BARRIER. If limit is exceeded, then dirty buffers are just written at the first phase. 5. lazy_truncate_heap() now takes ExclusiveLock instead of AccessExclusiveLock. This part is not really complete. At least, we need to ensure that past truncation point reads, caused by real-only queries concurrent to truncation, don't lead to real errors. Any thoughts? 1. https://www.postgresql.org/message-id/CAPpHfdtD3U2DpGZQJNe21s9s1s-Va7NRNcP1isvdCuJxzYypcg%40mail.gmail.com ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
0001-POC-fix-relation-truncation-1.patch
Description: Binary data