On Mon, May 29, 2017 at 5:26 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: >> But I think, we can also take step-by-step approach even for v11. If >> we agree that it is ok to silently do the updates as long as we >> document the behaviour, we can go ahead and do this, and then as a >> second step, implement error handling as a separate patch. If that >> patch does not materialize, we at least have the current behaviour >> documented. > > I think that is sensible approach if we find the second step involves > big or complicated changes.
I think it is definitely a good idea to separate the two patches. UPDATE tuple routing without any special handling for the EPQ issue is just a partitioning feature. The proposed handling for the EPQ issue is an *on-disk format change*. That turns a patch which is subject only to routine bugs into one which can eat your data permanently -- so having the "can eat your data permanently" separated out for both review and commit seems only prudent. For me, it's not a matter of which patch is big or complicated, but rather a matter of one of them being a whole lot riskier than the other. Even UPDATE tuple routing could mess things up pretty seriously if we end up with tuples in the wrong partition, of course, but the other thing is still worse. In terms of a development plan, I think we would need to have both patches before either could be committed. I believe that everyone other than me who has expressed an opinion on this issue has said that it's unacceptable to just ignore the issue, so it doesn't sound like there will be much appetite for having #1 go into the tree without #2. I'm still really concerned about that approach because we do not have very much bit space left and WARM wants to use quite a bit of it. I think it's quite possible that we'll be sad in the future if we find that we can't implement feature XYZ because of the bit-space consumed by this feature. However, I don't have the only vote here and I'm not going to try to shove this into the tree over multiple objections (unless there are a lot more votes the other way, but so far there's no sign of that). Greg/Amit's idea of using the CTID field rather than an infomask bit seems like a possibly promising approach. Not everything that needs bit-space can use the CTID field, so using it is a little less likely to conflict with something else we want to do in the future than using a precious infomask bit. However, I'm worried about this: /* Make sure there is no forward chain link in t_ctid */ tp.t_data->t_ctid = tp.t_self; The comment does not say *why* we need to make sure that there is no forward chain link, but it implies that some code somewhere in the system does or at one time did depend on no forward link existing. Any such code that still exists will need to be updated. Anybody know what code that might be, exactly? The other potential issue I see here is that I know the WARM code also tries to use the bit-space in the CTID field; in particular, it uses the CTID field of the last tuple in a HOT chain to point back to the root of the chain. That seems like it could conflict with the usage proposed here, but I'm not totally sure. Has anyone investigated this issue? Regarding the trigger issue, I can't claim to have a terribly strong opinion on this. I think that practically anything we do here might upset somebody, but probably any halfway-reasonable thing we choose to do will be OK for most people. However, there seems to be a discrepancy between the approach that got the most votes and the one that is implemented by the v8 patch, so that seems like something to fix. For what it's worth, in the future, I imagine that we might allow adding a trigger to a partitioned table and having that cascade down to all descendant tables. In that world, firing the BR UPDATE trigger for the old partition and the AR UPDATE trigger for the new partition will look a lot like the behavior the user would expect on an unpartitioned table, which could be viewed as a good thing. On the other hand, it's still going to be a DELETE+INSERT under the hood for the foreseeable future, so firing the delete triggers and then the insert triggers is also defensible. Is there any big difference between these appraoches in terms of how much code is required to make this work? In terms of the approach taken by the patch itself, it seems surprising to me that the patch only calls ExecSetupPartitionTupleRouting when an update fails the partition constraint. Note that in the insert case, we call that function at the start of execution; calling it in the middle seems to involve additional hazards; for example, is it really safe to add additional ResultRelInfos midway through the operation? Is it safe to take more locks midway through the operation? It seems like it might be a lot safer to decide at the beginning of the operation whether this is needed -- we can skip it if none of the columns involved in the partition key (or partition key expressions) are mentioned in the update. (There's also the issue of triggers, but I'm not sure that it's sensible to allow a trigger on an individual partition to reroute an update to another partition; what if we get an infinite loop?) + if (concurrently_deleted) + return NULL; I don't understand the motivation for this change, and there are no comments explaining it that I can see. Perhaps the concurrency-related (i.e. EPQ) behavior here could be tested via the isolation tester. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers