Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()
Hi Michael On Mon, Nov 6, 2017 at 6:18 PM, Michael Paquier wrote: > > Did you really test WAL replay? This still ignores that PageGetLSN is > as well taken in some code paths, like recovery, where actions on the > page are guaranteed to be serialized, like during recovery, so this > patch would cause the system to blow up. Note that pageinspect, > amcheck and wal_consistency_checking also process on page copies. So > the assertion failure of 0002 would trigger in those cases. > Indeed, the assertion tripped during WAL replay on the standby. This was caught by TAP tests under src/test/recovery. The assertion is now fixed so that WAL replay is exempt from the check. Please find the new patch attached. The tests now pass with the fix. I also manually verified that recovery works with "wal_consistency_checking=all". Asim 0002-PageGetLSN-assert-that-locks-are-properly-held.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()
Hi Michael On Mon, Oct 2, 2017 at 6:48 PM, Michael Paquier wrote: > > Jacob, here are some ideas to make this thread move on. I would > suggest to produce a set of patches that do things incrementally: > 1) One patch that changes the calls of PageGetLSN to > BufferGetLSNAtomic which are now not appropriate. You have spotted > some of them in the btree and gist code, but not all based on my first > lookup. There is still one in gistFindCorrectParent and one in btree > _bt_split. The monitoring of the other calls (sequence.c and > vacuumlazy.c) looked safe. There is another one in XLogRecordAssemble > that should be fixed I think. Thank you for your suggestions. Please find the first patch attached as "0001-...". We verified both, gistFindCorrectParent and _bt_split and all calls to PageGetLSN are made with exclusive lock on the buffer contents held. > 2) A second patch that strengthens a bit checks around > BufferGetLSNAtomic. One idea would be to use LWLockHeldByMe, as you > are originally suggesting. > A comment could be as well added in bufpage.h for PageGetLSN to let > users know that it should be used carefully, in the vein of what is > mentioned in src/backend/access/transam/README. The second patch "0002-..." does the above. We have a comment added to AssertPageIsLockedForLSN as suggested. The assertion added caught at least one code path where TestForOldSnapshot calls PageGetLSN without holding any lock. The snapshot_too_old test in "check-world" failed due to the assertion failure. This needs to be fixed, see the open question in the opening mail on this thread. Asim and Jacob 0001-Change-incorrect-calls-to-PageGetLSN-to-BufferGetLSN.patch Description: Binary data 0002-PageGetLSN-assert-that-locks-are-properly-held.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers