On Thu, Mar 8, 2018 at 10:07 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Pavan Deolasee <pavan.deola...@gmail.com> writes: >> I am actually very surprised that 0001-Invalidate-ip_blkid-v5.patch does >> not do anything to deal with the fact that t_ctid may no longer point to >> itself to mark end of the chain. I just can't see how that would work. >> ... >> I am actually worried that we're tinkering with ip_blkid to handle one >> corner case of detecting partition key update. This is going to change >> on-disk format and probably need more careful attention. > > You know, either one of those alone would be scary as hell. Both in > one patch seem to me to be sufficient reason to reject it outright. > Not only will it be an unending source of bugs, but it's chewing up > far too much of what few remaining degrees-of-freedom we have in the > on-disk format ... for a single purpose that hasn't even been sold as > something we have to have.
I agree that it isn't clear that it's worth making a change to the on-disk format for this feature. I made the argument when it was first proposed that we should just document that there would be anomalies with cross-partition updates that didn't occur otherwise. However, multiple people thought that it was worth burning one of our precious few remaining infomask bits in order to throw an error in that case rather than just silently having an anomaly, and that's why this patch got written. It's not too late to decide that we'd rather not do that after all. However, there's no such thing as a free lunch. We can't use the CTID field to point to a CTID in another table because there's no room to include the identify of the other table in the field. We can't widen it to make room because that would break on-disk compatibility and bloat our already-too-big tuple headers. So, we cannot make it work like it does when the updates are confined to a single partition. Therefore, the only options are (1) ignore the problem, and let a cross-partition update look entirely like a delete+insert, (2) try to throw some error in the case where this introduces user-visible anomalies that wouldn't be visible otherwise, or (3) revert update tuple routing entirely. I voted for (1), but the consensus was (2). I think that (3) will make a lot of people sad; it's a very good feature. If we want to have (2), then we've got to have some way to mark a tuple that was deleted as part of a cross-partition update, and that requires a change to the on-disk format. In short, the two things that you are claiming are prohibitively scary if done in the same patch look to me like they're actually just one thing, and that one thing is something which absolutely has to be done in order to implement the design most community members favored in the original discussion. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company