At Sat, 25 Mar 2017 19:53:47 -0700, Jeff Janes <jeff.ja...@gmail.com> wrote in <CAMkU=1x3+dpsfsu+af7wazavugmehua2+jnf7sual-mskq+...@mail.gmail.com> > On Thu, Mar 23, 2017 at 7:01 PM, Kyotaro HORIGUCHI < > horiguchi.kyot...@lab.ntt.co.jp> wrote: > > > At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada <sawada.m...@gmail.com> > > wrote in <CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ@mail. > > gmail.com> > > > On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmh...@gmail.com> > > wrote: > > > > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.ja...@gmail.com> > > wrote: > > > >> Isn't HEAP2_CLEAN only issued before an intended HOT update? (Which > > then > > > >> can't leave the block as all visible or all frozen). I think the > > issue is > > > >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this > > correctly, > > > >> that neither of those ever update the FSM, regardless of FPI? > > > > > > > > Yes, updates to the FSM are never logged. Forcing replay of > > > > HEAP2_FREEZE_PAGE to update the FSM might be a good idea. > > > > > > > > > > I think I was missing something. I imaged your situation is that FPI > > > is replayed during crash recovery after the crashed server vacuums the > > > page and marked it as all-frozen. But this situation is also resolved > > > by that solution. > > > > # HEAP2_CLEAN is issued in lazy_vacuum_page > > > > It will work but I'm not sure it is right direction for > > HEAP2_FREEZE_PAGE to touch FSM. > > > > As Masahiko said, the situation must be created by HEAP2_VISIBLE > > without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I > > think only the latter can happen. The comment in heap_xlog_clean > > below is right generally but if a page filled with tuples becomes > > almost empty and freezable by this cleanup, a problematic > > situation like this occurs. > > > > I now think this is not the cause of the problem I am seeing. I made the > replay of FREEZE_PAGE update the FSM (both with and without FPI), but that > did not fix it. With frequent crashes, it still accumulated a lot of > frozen and empty (but full according to FSM) pages. I also set up replica > streaming and turned off crashing on the master, and the FSM of the replica > stays accurate, so the WAL stream and replay logic is doing the right thing > on the replica. > > I now think the dirtied FSM pages are somehow not getting marked as dirty, > or are getting marked as dirty but somehow the checkpoint is skipping > them. It looks like MarkBufferDirtyHint does do some operations unlocked > which could explain lost update, but it seems unlikely that that would > happen often enough to see the amount of lost updates I am seeing.
Hmm.. clearing dirty hint seems already protected by exclusive lock. And I think it can occur without lock failure. Other than by FPI, FSM update is omitted when record LSN is older than page LSN. If heap page is evicted but FSM page is not after vacuuming and before power cut, replaying HEAP2_CLEAN skips update of FSM even though FPI is not attached. Of course this cannot occur on standby. One FSM page covers as many heap pages as about 4k, so FSM can stay far longer than heap pages. ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page is already empty when entering lazy_sacn_heap, or a page of non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is issued to set ALL_FROZEN. Perhaps the problem will be fixed by forcing heap_xlog_visible to update FSM (addition to FREEZE_PAGE), or the same in heap_xlog_clean. (As menthined in the previous mail, I prefer the latter.) > > > /* > > > * Update the FSM as well. > > > * > > > * XXX: Don't do this if the page was restored from full page image. We > > > * don't bother to update the FSM in that case, it doesn't need to be > > > * totally accurate anyway. > > > */ > > > > What does that save us? If we restored from FPI, we already have the block > in memory (we don't need to see the old version, just the new one), so it > doesn't save us a random read IO. Updates on random pages can cause visits to many unloaded FSM pages. It may be intending to avoid that. Or, especially for INSERT, successive operations tends to occur on the same heap page, the complexity of calculating FSM wouldn't be so small relatively. FMS tells a lie that the page has spare space after that but it doesn't harm. But I think that the things are different for operations that increments free space. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers