I accidentally sent this off-list, sending to the list now: On Sun, Mar 26, 2017 at 10:38 PM, Kyotaro HORIGUCHI < horiguchi.kyot...@lab.ntt.co.jp> wrote:
> At Sat, 25 Mar 2017 19:53:47 -0700, Jeff Janes <jeff.ja...@gmail.com> > wrote in <CAMkU=1x3+DPsfSU+AF7WAzAVugmEhUA2+jNf7SuAL-MSKQ+_KA@mail. > gmail.com> > > On Thu, Mar 23, 2017 at 7:01 PM, Kyotaro HORIGUCHI < > > horiguchi.kyot...@lab.ntt.co.jp> wrote: > > > > > At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada < > sawada.m...@gmail.com> > > > wrote in <CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ@mail. > > > gmail.com> > > > > On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmh...@gmail.com > > > > > wrote: > > > > > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.ja...@gmail.com> > > > wrote: > > > > >> Isn't HEAP2_CLEAN only issued before an intended HOT update? > (Which > > > then > > > > >> can't leave the block as all visible or all frozen). I think the > > > issue is > > > > >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this > > > correctly, > > > > >> that neither of those ever update the FSM, regardless of FPI? > > > > > > > > > > Yes, updates to the FSM are never logged. Forcing replay of > > > > > HEAP2_FREEZE_PAGE to update the FSM might be a good idea. > > > > > > > > > > > > > I think I was missing something. I imaged your situation is that FPI > > > > is replayed during crash recovery after the crashed server vacuums > the > > > > page and marked it as all-frozen. But this situation is also resolved > > > > by that solution. > > > > > > # HEAP2_CLEAN is issued in lazy_vacuum_page > > > > > > It will work but I'm not sure it is right direction for > > > HEAP2_FREEZE_PAGE to touch FSM. > > > > > > As Masahiko said, the situation must be created by HEAP2_VISIBLE > > > without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I > > > think only the latter can happen. The comment in heap_xlog_clean > > > below is right generally but if a page filled with tuples becomes > > > almost empty and freezable by this cleanup, a problematic > > > situation like this occurs. > > > > > > > I now think this is not the cause of the problem I am seeing. I made the > > replay of FREEZE_PAGE update the FSM (both with and without FPI), but > that > > did not fix it. With frequent crashes, it still accumulated a lot of > > frozen and empty (but full according to FSM) pages. I also set up > replica > > streaming and turned off crashing on the master, and the FSM of the > replica > > stays accurate, so the WAL stream and replay logic is doing the right > thing > > on the replica. > > > > I now think the dirtied FSM pages are somehow not getting marked as > dirty, > > or are getting marked as dirty but somehow the checkpoint is skipping > > them. It looks like MarkBufferDirtyHint does do some operations unlocked > > which could explain lost update, but it seems unlikely that that would > > happen often enough to see the amount of lost updates I am seeing. > > Hmm.. clearing dirty hint seems already protected by exclusive > lock. And I think it can occur without lock failure. > > Other than by FPI, FSM update is omitted when record LSN is older > than page LSN. If heap page is evicted but FSM page is not after > vacuuming and before power cut, replaying HEAP2_CLEAN skips > update of FSM even though FPI is not attached. Of course this > cannot occur on standby. One FSM page covers as many heap pages > as about 4k, so FSM can stay far longer than heap pages. > This corresponds to action == BLK_DONE case, right? > > ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page > is already empty when entering lazy_sacn_heap, or a page of > non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is > issued to set ALL_FROZEN. > > Perhaps the problem will be fixed by forcing heap_xlog_visible to > update FSM (addition to FREEZE_PAGE), or the same in > heap_xlog_clean. (As menthined in the previous mail, I prefer the > latter.) > When I make heap_xlog_clean update FSM even on BLK_RESTORED (but not on BLK_DONE), it solves the problem I was seeing. Which still leaves me wondering why the problem doesn't show up on the standby because, unlike BLK_DONE, BLK_RESTORED should have the same issue on standby as it does on a recovering master, shouldn't it? Maybe the difference is that the existence a replication slot delays the clean up in a way that causes a different pattern of WAL records. > > > > /* > > > > * Update the FSM as well. > > > > * > > > > * XXX: Don't do this if the page was restored from full page image. > We > > > > * don't bother to update the FSM in that case, it doesn't need to be > > > > * totally accurate anyway. > > > > */ > > > > > > > What does that save us? If we restored from FPI, we already have the > block > > in memory (we don't need to see the old version, just the new one), so it > > doesn't save us a random read IO. > > Updates on random pages can cause visits to many unloaded FSM > pages. It may be intending to avoid that. But I think that that would be no worse for BLK_RESTORED than it is for BLK_NEEDS_REDO. Why optimize only one of the cases, if it is worth optimizing either one? Cheers, Jeff
fsm_clean.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers