Hi hackers!

heap_xlog_visible is not bumping heap page LSN when setting all-visible flag in it.
There is long comment explaining it:

        /*
         * We don't bump the LSN of the heap page when setting the visibility
         * map bit (unless checksums or wal_hint_bits is enabled, in which
         * case we must), because that would generate an unworkable volume of          * full-page writes.  This exposes us to torn page hazards, but since
         * we're not inspecting the existing page contents in any way, we
         * don't care.
         *
         * However, all operations that clear the visibility map bit *do* bump          * the LSN, and those operations will only be replayed if the XLOG LSN          * follows the page LSN.  Thus, if the page LSN has advanced past our
         * XLOG record's LSN, we mustn't mark the page all-visible, because
         * the subsequent update won't be replayed to clear the flag.
         */

But it still not clear for me that not bumping LSN in this place is correct if wal_log_hints is set. In this case we will have VM page with larger LSN than heap page, because visibilitymap_set bumps LSN of VM page. It means that in theory after recovery we may have page marked as all-visible in VM, but not having PD_ALL_VISIBLE  in page header. And it violates VM constraint:

 * When we *set* a visibility map during VACUUM, we must write WAL. This may  * seem counterintuitive, since the bit is basically a hint: if it is clear,
 * it may still be the case that every tuple on the page is visible to all
 * transactions; we just don't know that for certain.  The difficulty is that  * there are two bits which are typically set together: the PD_ALL_VISIBLE bit  * on the page itself, and the visibility map bit.  If a crash occurs after the  * visibility map page makes it to disk and before the updated heap page makes
 * it to disk, redo must set the bit on the heap page.  Otherwise, the next
 * insert, update, or delete on the heap page will fail to realize that the
 * visibility map bit must be cleared, possibly causing index-only scans to
 * return wrong answers.




Reply via email to