On 2018-04-23 13:22:21 +0300, Heikki Linnakangas wrote: > On 13/04/18 13:08, Michael Paquier wrote: > > On Fri, Apr 13, 2018 at 02:15:35PM +0530, amul sul wrote: > > > I have looked into this and found that the issue is in heap_xlog_delete > > > -- we > > > have missed to set the correct offset number from the target_tid when > > > XLH_DELETE_IS_PARTITION_MOVE flag is set. > > > > Oh, this looks good to me. So when a row was moved across partitions > > this could have caused incorrect tuple references on a standby, which > > could have caused corruptions. > > Hmm. So, the problem was that HeapTupleHeaderSetMovedPartitions() only sets > the block number to InvalidBlockNumber, and leaves the offset number > unchanged. WAL replay didn't preserve the offset number, so the master and > the standby had a different offset number in the ctid.
Right. > Why does HeapTupleHeaderSetMovedPartitions() leave the offset number > unchanged? The old offset number is meaningless without the block number. > Also, bits and magic values in the tuple header are scarce. We're > squandering a whole range of values in the ctid, everything with > ip_blkid==InvalidBlockNumber, to mean "moved to different partition", when a > single value would suffice. Yes, I agree on that. > I kept using InvalidBlockNumber there, so ItemPointerIsValid() still > considers those item pointers as invalid. But my gut feeling is actually > that it would be better to use e.g. 0 as the block number, so that these > item pointers would appear valid. Again, to follow the precedent of > speculative insertion tokens. But I'm not sure if there was some > well-thought-out reason to make them appear invalid. A comment on that would > be nice, at least. That seems risky to me. We want something that stops EPQ style chasing without running into asserts for invalid offsets... Greetings, Andres Freund