Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Tue, Mar 3, 2015 at 12:05 AM, Heikki Linnakangas hlinn...@iki.fi wrote: My experimental branch works just fine (with a variant jjanes_upsert with subxact looping), until I need to restart an update after a failed heap_update() that still returned HeapTupleMayBeUpdated (having super deleted within an ExecUpdate() call). There is no good way to do that for ExecUpdate() that I can see, because an existing, visible row is affected (unlike with ExecInsert()). Even if it was possible, it would be hugely invasive to already very complicated code paths. Ah, so we can't easily use super-deletion to back out an UPDATE. I had not considered that. Yeah. When I got into considering making EvalPlanQualFetch() look at speculative tokens, it became abundantly clear that that code would never be committed, even if I could make it work. I continue to believe that the best way forward is to incrementally commit the work by committing ON CONFLICT IGNORE first. That way, speculative tokens will remain strictly the concern of UPSERTers or sessions doing INSERT ... ON CONFLICT IGNORE. Ok, let's try that. Can you cut a patch that does just ON CONFLICT IGNORE, please? Of course. I'll have that for your shortly. Thanks -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 03/02/2015 11:21 PM, Peter Geoghegan wrote: On Mon, Mar 2, 2015 at 12:15 PM, Heikki Linnakangas hlinn...@iki.fi wrote: Hmm. I used a b-tree to estimate the effect that the locking would have in the UPSERT case, for UPSERT into a table with a b-tree index. But you're right that for the question of whether this is acceptable for the case of regular insert into a table with exclusion constraints, other indexams are more interesting. In a very quick test, the overhead with a single GiST index seems to be about 5%. IMHO that's still a noticeable slowdown, so let's try to find a way to eliminate it. Honestly, I don't know why you're so optimistic that this can be fixed, even with that heavyweight lock (for regular inserters + exclusion constraints). We already concluded that it can be fixed, with the heavyweight lock. See http://www.postgresql.org/message-id/54f4a0e0.4070...@iki.fi. Do you see some new problem with that that hasn't been discussed yet? To eliminate the heavy-weight lock, we'll need some new ideas, but it doesn't seem that hard. My experimental branch, which I showed you privately shows big problems when I simulate upserting with exclusion constraints (so we insert first, handle exclusion violations using the traditional upsert subxact pattern that does work with B-Trees). It's much harder to back out of a heap_update() than it is to back out of a heap_insert(), since we might well be updated a tuple visible to some other MVCC snapshot. Whereas we can super delete a tuple knowing that only a dirty snapshot might have seen it, which bounds the complexity (only wait sites for B-Trees + exclusion constraints need to worry about speculative insertion tokens and so on). My experimental branch works just fine (with a variant jjanes_upsert with subxact looping), until I need to restart an update after a failed heap_update() that still returned HeapTupleMayBeUpdated (having super deleted within an ExecUpdate() call). There is no good way to do that for ExecUpdate() that I can see, because an existing, visible row is affected (unlike with ExecInsert()). Even if it was possible, it would be hugely invasive to already very complicated code paths. Ah, so we can't easily use super-deletion to back out an UPDATE. I had not considered that. I continue to believe that the best way forward is to incrementally commit the work by committing ON CONFLICT IGNORE first. That way, speculative tokens will remain strictly the concern of UPSERTers or sessions doing INSERT ... ON CONFLICT IGNORE. Ok, let's try that. Can you cut a patch that does just ON CONFLICT IGNORE, please? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Mon, Mar 2, 2015 at 12:15 PM, Heikki Linnakangas hlinn...@iki.fi wrote: Hmm. I used a b-tree to estimate the effect that the locking would have in the UPSERT case, for UPSERT into a table with a b-tree index. But you're right that for the question of whether this is acceptable for the case of regular insert into a table with exclusion constraints, other indexams are more interesting. In a very quick test, the overhead with a single GiST index seems to be about 5%. IMHO that's still a noticeable slowdown, so let's try to find a way to eliminate it. Honestly, I don't know why you're so optimistic that this can be fixed, even with that heavyweight lock (for regular inserters + exclusion constraints). My experimental branch, which I showed you privately shows big problems when I simulate upserting with exclusion constraints (so we insert first, handle exclusion violations using the traditional upsert subxact pattern that does work with B-Trees). It's much harder to back out of a heap_update() than it is to back out of a heap_insert(), since we might well be updated a tuple visible to some other MVCC snapshot. Whereas we can super delete a tuple knowing that only a dirty snapshot might have seen it, which bounds the complexity (only wait sites for B-Trees + exclusion constraints need to worry about speculative insertion tokens and so on). My experimental branch works just fine (with a variant jjanes_upsert with subxact looping), until I need to restart an update after a failed heap_update() that still returned HeapTupleMayBeUpdated (having super deleted within an ExecUpdate() call). There is no good way to do that for ExecUpdate() that I can see, because an existing, visible row is affected (unlike with ExecInsert()). Even if it was possible, it would be hugely invasive to already very complicated code paths. I continue to believe that the best way forward is to incrementally commit the work by committing ON CONFLICT IGNORE first. That way, speculative tokens will remain strictly the concern of UPSERTers or sessions doing INSERT ... ON CONFLICT IGNORE. Unless, you're happy to have UPDATEs still deadlock, and only fix unprincipled deadlocks for the INSERT case. I don't know why you want to make the patch incremental by fixing unprincipled deadlocks in the first place, since you've said that you don't really care about it as a real world problem, and because it now appears to have significant additional difficulties that were not anticipated. Please, let's focus on getting ON CONFLICT IGNORE into a commitable state - that's the best way of incrementally committing this work. Fixing unprincipled deadlocks is not a useful way of incrementally committing this work. I've spent several days producing a prototype that shows the exact nature of the problem. If you think I'm mistaken, and that fixing unprincipled deadlocks first is the right thing to do, please explain why with reference to that prototype. AFAICT, doing things your way is going to add significant additional complexity for *no appreciable benefit*. You've already gotten exactly what you were looking for in every other regard. In particular, ON CONFLICT UPDATE could work with exclusion constraints without any of these problems, because of the way we do the auxiliary update there (we lock the row ahead of updating/qual evaluation). I've bent over backwards to make sure that is the case. Please, throw me a bone here. Thank you -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 02/21/2015 10:41 PM, Peter Geoghegan wrote: On Sat, Feb 21, 2015 at 11:15 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: What I had in mind is that the winning inserter waits on the other inserter's token, without super-deleting. Like all inserts do today. So the above scenario becomes: * Session 1 physically inserts, and then checks for a conflict. * Session 2 physically inserts, and then checks for a conflict. * Session 1 sees session 2's conflicting TID. Session 1's XID is older, so it wins. It waits for session 2's token, without super-deleting. * Session 2 sees session 1's conflicting TID. It super deletes, releases token, and sleeps on session 1's token. * Session 1 wakes up. It looks at session 2's tuple again and sees that it was super-deleted. There are no further conflicts, so the insertion is complete, and it releases the token. * Session 2 wakes up. It looks at session 1's tuple again and sees that it's still there. It goes back to sleep, this time on session 2's XID. * Session 1 commits. Session 2 wakes up, sees that the tuple is still there, and throws a contraint violation error. I think we're actually 100% in agreement, then. I just prefer to have the second last step (the check without a promise tuple visible to anyone made by the loser) occur as part of the pre-check that happens anyway with ON CONFLICT IGNORE. Otherwise, you'll end up with some much more complicated control flow that has to care about not doing that twice for ON CONFLICT IGNORE...and for the benefit of what? For regular inserters, that we don't actually care about fixing unprincipled deadlocks for? Right. That will allow me to review and test the locking part of the patch separately. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 02/17/2015 02:11 AM, Peter Geoghegan wrote: Whatever works, really. I can't say that the performance implications of acquiring that hwlock are at the forefront of my mind. I never found that to be a big problem on an 8 core box, relative to vanilla INSERTs, FWIW - lock contention is not normal, and may be where any heavweight lock costs would really be encountered. Oh, cool. I guess the fast-path in lmgr.c kicks ass, then :-). Seems that way. But even if that wasn't true, it wouldn't matter, since I don't see that we have a choice. I did some quick performance testing on this. For easy testing, I used a checkout of git master, and simply added LockAcquire + LockRelease calls to ExecInsert, around the heap_insert() call. The test case I used was: psql -c create table footest (id serial primary key); echo insert into footest select from generate_series(1, 1); inserts.sql pgbench -n -f inserts.sql postgres -T100 -c4 With the extra lock calls, I got 56 tps on my laptop. With unpatched git master, I got 60 tps. I also looked at the profile with perf, and indeed about 10% of the CPU time was spent in LockAcquire and LockRelease together. So the extra locking incurs about 10% overhead. I think this was pretty ḿuch a worst case scenario, but not a hugely unrealistic one - many real-world tables have only a few columns, and few indexes. With more CPUs you would probably start to see contention, in addition to just the extra overhead. Are we OK with a 10% overhead, caused by the locking? That's probably acceptable if that's what it takes to get UPSERT. But it's not OK just to solve the deadlock issue with regular insertions into a table with exclusion constraints. Can we find a scheme to eliminate that overhead? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Mon, Mar 2, 2015 at 11:20 AM, Heikki Linnakangas hlinn...@iki.fi wrote: Are we OK with a 10% overhead, caused by the locking? That's probably acceptable if that's what it takes to get UPSERT. But it's not OK just to solve the deadlock issue with regular insertions into a table with exclusion constraints. Can we find a scheme to eliminate that overhead? Looks like you tested a B-Tree index here. That doesn't seem particularly representative of what you'd see with exclusion constraints. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 03/02/2015 09:29 PM, Peter Geoghegan wrote: On Mon, Mar 2, 2015 at 11:20 AM, Heikki Linnakangas hlinn...@iki.fi wrote: Are we OK with a 10% overhead, caused by the locking? That's probably acceptable if that's what it takes to get UPSERT. But it's not OK just to solve the deadlock issue with regular insertions into a table with exclusion constraints. Can we find a scheme to eliminate that overhead? Looks like you tested a B-Tree index here. That doesn't seem particularly representative of what you'd see with exclusion constraints. Hmm. I used a b-tree to estimate the effect that the locking would have in the UPSERT case, for UPSERT into a table with a b-tree index. But you're right that for the question of whether this is acceptable for the case of regular insert into a table with exclusion constraints, other indexams are more interesting. In a very quick test, the overhead with a single GiST index seems to be about 5%. IMHO that's still a noticeable slowdown, so let's try to find a way to eliminate it. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 02/21/2015 12:15 AM, Peter Geoghegan wrote: On Fri, Feb 20, 2015 at 1:07 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Then I refuse to believe that the livelock hazard exists, without the pre-check. If you have a livelock scenario in mind, it really shouldn't be that difficult to write down the list of steps. I just meant practical, recreatable steps - a test case. I should emphasize that what I'm saying is not that important. Even if I am wrong, I'm not suggesting that we do anything that we don't both agree is needed anyway. If I'm right, that is merely an impediment to incrementally committing the work by fixing exclusion constraints, AFAICT. Ultimately, that isn't all that important. Anyway, here is how I think livelocks could happen, in theory, with regular insertion into a table with exclusion constraints, with your patch [1] applied (which has no pre-check), this can happen: * Session 1 physically inserts, and then checks for a conflict. * Session 2 physically inserts, and then checks for a conflict. * Session 1 sees session 2's conflicting TID, then super deletes and releases token. * Session 2 sees session 1's conflicting TID, then super deletes and releases token. * Session 1 waits or tries to wait on session 2's token. It isn't held anymore, or is only held for an instant. * Session 2 waits or tries to wait on session 1's token. It isn't held anymore, or is only held for an instant. * Session 1 restarts from scratch, having made no useful progress in respect of the slot being inserted. * Session 2 restarts from scratch, having made no useful progress in respect of the slot being inserted. (Livelock) If there is a tie-break on XID (you omitted this from your patch [1] but acknowledge it as an omission), than that doesn't really change anything (without adding a pre-check, too). That's because: What do we actually do or not do when we're the oldest XID, that gets to win? Ah, ok, I can see the confusion now. Do we not wait on anything, and just declare that we're done? Then I think that breaks exclusion constraint enforcement, because we need to rescan the index to do that (i.e., goto retry). Do we wait on their token, as my most recent revision does, but *without* a pre-check, for regular inserters? Then I think that our old tuple could keep multiple other sessions spinning indefinitely. What I had in mind is that the winning inserter waits on the other inserter's token, without super-deleting. Like all inserts do today. So the above scenario becomes: * Session 1 physically inserts, and then checks for a conflict. * Session 2 physically inserts, and then checks for a conflict. * Session 1 sees session 2's conflicting TID. Session 1's XID is older, so it wins. It waits for session 2's token, without super-deleting. * Session 2 sees session 1's conflicting TID. It super deletes, releases token, and sleeps on session 1's token. * Session 1 wakes up. It looks at session 2's tuple again and sees that it was super-deleted. There are no further conflicts, so the insertion is complete, and it releases the token. * Session 2 wakes up. It looks at session 1's tuple again and sees that it's still there. It goes back to sleep, this time on session 2's XID. * Session 1 commits. Session 2 wakes up, sees that the tuple is still there, and throws a contraint violation error. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Sat, Feb 21, 2015 at 11:15 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Ah, ok, I can see the confusion now. Cool. Do we not wait on anything, and just declare that we're done? Then I think that breaks exclusion constraint enforcement, because we need to rescan the index to do that (i.e., goto retry). Do we wait on their token, as my most recent revision does, but *without* a pre-check, for regular inserters? Then I think that our old tuple could keep multiple other sessions spinning indefinitely. What I had in mind is that the winning inserter waits on the other inserter's token, without super-deleting. Like all inserts do today. So the above scenario becomes: * Session 1 physically inserts, and then checks for a conflict. * Session 2 physically inserts, and then checks for a conflict. * Session 1 sees session 2's conflicting TID. Session 1's XID is older, so it wins. It waits for session 2's token, without super-deleting. * Session 2 sees session 1's conflicting TID. It super deletes, releases token, and sleeps on session 1's token. * Session 1 wakes up. It looks at session 2's tuple again and sees that it was super-deleted. There are no further conflicts, so the insertion is complete, and it releases the token. * Session 2 wakes up. It looks at session 1's tuple again and sees that it's still there. It goes back to sleep, this time on session 2's XID. * Session 1 commits. Session 2 wakes up, sees that the tuple is still there, and throws a contraint violation error. I think we're actually 100% in agreement, then. I just prefer to have the second last step (the check without a promise tuple visible to anyone made by the loser) occur as part of the pre-check that happens anyway with ON CONFLICT IGNORE. Otherwise, you'll end up with some much more complicated control flow that has to care about not doing that twice for ON CONFLICT IGNORE...and for the benefit of what? For regular inserters, that we don't actually care about fixing unprincipled deadlocks for? Are we on the same page now? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Fri, Feb 20, 2015 at 1:07 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Then I refuse to believe that the livelock hazard exists, without the pre-check. If you have a livelock scenario in mind, it really shouldn't be that difficult to write down the list of steps. I just meant practical, recreatable steps - a test case. I should emphasize that what I'm saying is not that important. Even if I am wrong, I'm not suggesting that we do anything that we don't both agree is needed anyway. If I'm right, that is merely an impediment to incrementally committing the work by fixing exclusion constraints, AFAICT. Ultimately, that isn't all that important. Anyway, here is how I think livelocks could happen, in theory, with regular insertion into a table with exclusion constraints, with your patch [1] applied (which has no pre-check), this can happen: * Session 1 physically inserts, and then checks for a conflict. * Session 2 physically inserts, and then checks for a conflict. * Session 1 sees session 2's conflicting TID, then super deletes and releases token. * Session 2 sees session 1's conflicting TID, then super deletes and releases token. * Session 1 waits or tries to wait on session 2's token. It isn't held anymore, or is only held for an instant. * Session 2 waits or tries to wait on session 1's token. It isn't held anymore, or is only held for an instant. * Session 1 restarts from scratch, having made no useful progress in respect of the slot being inserted. * Session 2 restarts from scratch, having made no useful progress in respect of the slot being inserted. (Livelock) If there is a tie-break on XID (you omitted this from your patch [1] but acknowledge it as an omission), than that doesn't really change anything (without adding a pre-check, too). That's because: What do we actually do or not do when we're the oldest XID, that gets to win? Do we not wait on anything, and just declare that we're done? Then I think that breaks exclusion constraint enforcement, because we need to rescan the index to do that (i.e., goto retry). Do we wait on their token, as my most recent revision does, but *without* a pre-check, for regular inserters? Then I think that our old tuple could keep multiple other sessions spinning indefinitely. Only by checking for conflicts *first*, without a non-super-deleted physical index tuple can these other sessions notice that there is a conflict *without creating more conflicts*, which is what I believe is really needed. At the very least it's something I'm much more comfortable with, and that seems like reason enough to do it that way, given that we don't actually care about unprincipled deadlocks with regular inserters with exclusion constraints. Why take the chance with livelocking such inserters, though? I hope that we don't get bogged down on this, because, as I said, it doesn't matter too much. I'm tempted to concede the point for that reason, since the livelock is probably virtually impossible. I'm just giving you my opinion on how to make the handling of exclusion constraints as reliable as possible. Thanks [1] http://www.postgresql.org/message-id/54dfc6f8.5050...@vmware.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 02/20/2015 10:39 PM, Peter Geoghegan wrote: On Fri, Feb 20, 2015 at 11:34 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: So, um, are you agreeing that there is no problem? Or did I misunderstand? If you see a potential issue here, can you explain it as a simple list of steps, please. Yes. I'm saying that AFAICT, there is no livelock hazard provided other sessions must do the pre-check (as they must for ON CONFLICT IGNORE). So I continue to believe that they must pre-check, which you questioned. ... Hard to break down the problem into steps, since it isn't a problem that I was able to recreate (as a noticeable livelock). Then I refuse to believe that the livelock hazard exists, without the pre-check. If you have a livelock scenario in mind, it really shouldn't be that difficult to write down the list of steps. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Fri, Feb 20, 2015 at 11:34 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: So, um, are you agreeing that there is no problem? Or did I misunderstand? If you see a potential issue here, can you explain it as a simple list of steps, please. Yes. I'm saying that AFAICT, there is no livelock hazard provided other sessions must do the pre-check (as they must for ON CONFLICT IGNORE). So I continue to believe that they must pre-check, which you questioned. The only real downside here (with not doing the pre-check for regular inserters with exclusion constraints) is that we can't fix unprincipled deadlocks for general exclusion constraint inserters (since we don't want to make them pre-check), but we don't actually care about that directly. But it also means that it's hard to see how we can incrementally commit such a fix to break down the patch into more manageable chunks, which is a little unfortunate. Hard to break down the problem into steps, since it isn't a problem that I was able to recreate (as a noticeable livelock). But the point of what I was saying is that the first phase pre-check allows us to notice that the one session that got stuck waiting in the second phase (other conflicters notice its tuple, and so don't insert a new tuple this iteration). Conventional insertion with exclusion constraints insert first and only then looks for conflicts (since there is no pre-check). More concretely, if you're the transaction that does not break here, within check_exclusion_or_unique_constraint(), and so unexpectedly waits in the second phase: + /* + * At this point we have either a conflict or a potential conflict. If + * we're not supposed to raise error, just return the fact of the + * potential conflict without waiting to see if it's real. + */ + if (violationOK !wait) + { + /* + * For unique indexes, detecting conflict is coupled with physical + * index tuple insertion, so we won't be called for recheck + */ + Assert(!indexInfo-ii_Unique); + + conflict = true; + if (conflictTid) + *conflictTid = tup-t_self; + + /* + * Livelock insurance. + * + * When doing a speculative insertion pre-check, we cannot have an + * unprincipled deadlock with another session, fundamentally + * because there is no possible mutual dependency, since we only + * hold a lock on our token, without attempting to lock anything + * else (maybe this is not the first iteration, but no matter; + * we'll have super deleted and released insertion token lock if + * so, and all locks needed are already held. Also, our XID lock + * is irrelevant.) + * + * In the second phase, where there is a re-check for conflicts, we + * can't deadlock either (we never lock another thing, since we + * don't wait in that phase). However, a theoretical livelock + * hazard exists: Two sessions could each see each other's + * conflicting tuple, and each could go and delete, retrying + * forever. + * + * To break the mutual dependency, we may wait on the other xact + * here over our caller's request to not do so (in the second + * phase). This does not imply the risk of unprincipled deadlocks + * either, because if we end up unexpectedly waiting, the other + * session will super delete its own tuple *before* releasing its + * token lock and freeing us, and without attempting to wait on us + * to release our token lock. We'll take another iteration here, + * after waiting on the other session's token, not find a conflict + * this time, and then proceed (assuming we're the oldest XID). + * + * N.B.: Unprincipled deadlocks are still theoretically possible + * with non-speculative insertion with exclusion constraints, but + * this seems inconsequential, since an error was inevitable for + * one of the sessions anyway. We only worry about speculative + * insertion's problems, since they're likely with idiomatic usage. + */ + if (TransactionIdPrecedes(xwait, GetCurrentTransactionId())) + break; /* go and super delete/restart speculative insertion */ + } + Then you must successfully insert when you finally goto retry and do another iteration within check_exclusion_or_unique_constraint(). The other conflicters can't have failed to notice your pre-existing tuple, and can't have failed to super delete their own tuples before you are woken (provided you really are the single oldest XID). Is that any clearer? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 02/19/2015 10:09 PM, Peter Geoghegan wrote: On Thu, Feb 19, 2015 at 11:10 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I fully agree with your summary here. However, why should we suppose that while we wait, the other backends don't both delete and then re-insert their tuple? They need the pre-check to know not to re-insert their tuple (seeing our tuple, immediately after we wake as the preferred backend with the older XID) in order to break the race. But today, exclusion constraints are optimistic in that the insert happens first, and only then the check. The pre-check turns that the other way around, in a limited though necessary sense. I'm not sure I understand exactly what you're saying, but AFAICS the pre-check doesn't completely solve that either. It's entirely possible that the other backend deletes its tuple, our backend then performs the pre-check, and the other backend re-inserts its tuple again. Sure, the pre-check reduces the chances, but we're talking about a rare condition to begin with, so I don't think it makes sense to add much code just to reduce the chances further. But super deletion occurs *before* releasing the token lock, which is the last thing we do before looping around and starting again. So iff we're the oldest XID, the one that gets to win by unexpectedly waiting on another's token in our second phase (second call to check_exclusion_or_unique_constraint()), we will not, in fact, see anyone else's tuple, because they'll all be forced to go through the first phase and find our pre-existing, never-deleted tuple, so we can't see any new tuple from them. And, because they super delete before releasing their token, they'll definitely have super deleted when we're woken up, so we can't see any old/existing tuple either. We have our tuple inserted this whole time - ergo, we do, in fact, win reliably. So, um, are you agreeing that there is no problem? Or did I misunderstand? If you see a potential issue here, can you explain it as a simple list of steps, please. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Thu, Feb 19, 2015 at 11:10 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I fully agree with your summary here. However, why should we suppose that while we wait, the other backends don't both delete and then re-insert their tuple? They need the pre-check to know not to re-insert their tuple (seeing our tuple, immediately after we wake as the preferred backend with the older XID) in order to break the race. But today, exclusion constraints are optimistic in that the insert happens first, and only then the check. The pre-check turns that the other way around, in a limited though necessary sense. I'm not sure I understand exactly what you're saying, but AFAICS the pre-check doesn't completely solve that either. It's entirely possible that the other backend deletes its tuple, our backend then performs the pre-check, and the other backend re-inserts its tuple again. Sure, the pre-check reduces the chances, but we're talking about a rare condition to begin with, so I don't think it makes sense to add much code just to reduce the chances further. But super deletion occurs *before* releasing the token lock, which is the last thing we do before looping around and starting again. So iff we're the oldest XID, the one that gets to win by unexpectedly waiting on another's token in our second phase (second call to check_exclusion_or_unique_constraint()), we will not, in fact, see anyone else's tuple, because they'll all be forced to go through the first phase and find our pre-existing, never-deleted tuple, so we can't see any new tuple from them. And, because they super delete before releasing their token, they'll definitely have super deleted when we're woken up, so we can't see any old/existing tuple either. We have our tuple inserted this whole time - ergo, we do, in fact, win reliably. The fly in the ointment is regular inserters, perhaps, but we've agreed that they're not too important here, and even when that happens we're in deadlock land, not livelock land, which is obviously a nicer place to live. Does that make sense? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 02/18/2015 11:43 PM, Peter Geoghegan wrote: Heikki seemed to think that the deadlock problems were not really worth fixing independently of ON CONFLICT UPDATE support, but rather represented a useful way of committing code incrementally. Do I have that right? Yes. The way I chose to break the livelock (what I call livelock insurance) does indeed involve comparing XIDs, which Heikki thought most promising. But it also involves having the oldest XID wait on another session's speculative token in the second phase, which ordinarily does not occur in the second phase/check_exclusion_or_unique_constraint() call. The idea is that one session is guaranteed to be the waiter that has a second iteration within its second-phase check_exclusion_or_unique_constraint() call, where (following the super deletion of conflict TIDs by the other conflicting session or sessions) reliably finds that it can proceed (those other sessions are denied the opportunity to reach their second phase by our second phase waiter's still-not-super-deleted tuple). However, it's difficult to see how to map this on to general exclusion constraint insertion + enforcement. In Heikki's recent sketch of this [1], there is no pre-check, since that is considered an UPSERT thing deferred until a later patch, and therefore my scheme here cannot work (recall that for plain inserts with exclusion constraints, we insert first and check last). I have a hard time justifying adding the pre-check for plain exclusion constraint inserters given the total lack of complaints from the field regarding unprincipled deadlocks, and given the fact that it would probably make the code more complicated than it needs to be. It is critical that there be a pre-check to prevent livelock, though, because otherwise the restarting sessions can go straight to their second phase (check_exclusion_or_unique_constraint() call), without ever realizing that they should not do so. Hmm. I haven't looked at your latest patch, but I don't think you need to pre-check for this to work. To recap, the situation is that two backends have already inserted the heap tuple, and then see that the other backend's tuple conflicts. To avoid a livelock, it's enough that one backend super-deletes its own tuple first, before waiting for the other to complete, while the other other backend waits without super-deleting. No? It seems like the livelock insurance is pretty close to or actually free, and doesn't imply that a second phase wait for token needs to happen too often. With heavy contention on a small number of possible tuples (100), and 8 clients deleting and inserting, I can see it happening only once every couple of hundred milliseconds on my laptop. It's not hard to imagine why the code didn't obviously appear to be broken before now, as the window for an actual livelock must have been small. Also, deadlocks bring about more deadlocks (since the deadlock detector kicks in), whereas livelocks do not bring about more livelocks. It might be easier to provoke the livelocks with a GiST opclass that's unusually slow. I wrote the attached opclass for the purpose of testing this a while ago, but I haven't actually gotten around to do much with it. It's called useless_gist, because it's a GiST opclass for integers, like btree_gist, but the penalty and picksplit functions are totally dumb. The result is that the tuples are put to the index in pretty much random order, and every scan has to scan the whole index. I'm posting it here, in the hope that it happens to be useful, but I don't really know if it is. - Heikki useless_gist.tar.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Thu, Feb 19, 2015 at 5:21 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Hmm. I haven't looked at your latest patch, but I don't think you need to pre-check for this to work. To recap, the situation is that two backends have already inserted the heap tuple, and then see that the other backend's tuple conflicts. To avoid a livelock, it's enough that one backend super-deletes its own tuple first, before waiting for the other to complete, while the other other backend waits without super-deleting. No? I fully agree with your summary here. However, why should we suppose that while we wait, the other backends don't both delete and then re-insert their tuple? They need the pre-check to know not to re-insert their tuple (seeing our tuple, immediately after we wake as the preferred backend with the older XID) in order to break the race. But today, exclusion constraints are optimistic in that the insert happens first, and only then the check. The pre-check turns that the other way around, in a limited though necessary sense. Granted, it's unlikely that we'd livelock due to one session always deleting and then nullifying that by re-inserting in time, but the theoretical risk seems real. Therefore, I'm not inclined to bother committing something without a pre-check, particularly since we're not really interested in fixing unprincipled deadlocks for ordinary exclusion constraint inserters (useful to know that you also think that doesn't matter, BTW). Does that make sense? This is explained within livelock insurance new-to-V2.3 comments in check_exclusion_or_unique_constraint(). (Not that I think that explanation is easier to follow than this one). It might be easier to provoke the livelocks with a GiST opclass that's unusually slow. I wrote the attached opclass for the purpose of testing this a while ago, but I haven't actually gotten around to do much with it. It's called useless_gist, because it's a GiST opclass for integers, like btree_gist, but the penalty and picksplit functions are totally dumb. The result is that the tuples are put to the index in pretty much random order, and every scan has to scan the whole index. I'm posting it here, in the hope that it happens to be useful, but I don't really know if it is. Thanks. I'll try and use this for testing. Haven't been able to break exclusion constraints with the jjanes_upsert test suite in a long time, now. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 02/19/2015 08:16 PM, Peter Geoghegan wrote: On Thu, Feb 19, 2015 at 5:21 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Hmm. I haven't looked at your latest patch, but I don't think you need to pre-check for this to work. To recap, the situation is that two backends have already inserted the heap tuple, and then see that the other backend's tuple conflicts. To avoid a livelock, it's enough that one backend super-deletes its own tuple first, before waiting for the other to complete, while the other other backend waits without super-deleting. No? I fully agree with your summary here. However, why should we suppose that while we wait, the other backends don't both delete and then re-insert their tuple? They need the pre-check to know not to re-insert their tuple (seeing our tuple, immediately after we wake as the preferred backend with the older XID) in order to break the race. But today, exclusion constraints are optimistic in that the insert happens first, and only then the check. The pre-check turns that the other way around, in a limited though necessary sense. I'm not sure I understand exactly what you're saying, but AFAICS the pre-check doesn't completely solve that either. It's entirely possible that the other backend deletes its tuple, our backend then performs the pre-check, and the other backend re-inserts its tuple again. Sure, the pre-check reduces the chances, but we're talking about a rare condition to begin with, so I don't think it makes sense to add much code just to reduce the chances further. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 02/16/2015 11:31 AM, Andres Freund wrote: On 2015-02-16 10:00:24 +0200, Heikki Linnakangas wrote: I'm starting to think that we should bite the bullet and consume an infomask bit for this. The infomask bits are a scarce resource, but we should use them when it makes sense. It would be good for forensic purposes too, to leave a trace that a super-deletion happened. Well, we IIRC don't have any left right now. We could reuse MOVED_IN|MOVED_OUT, as that never was set in the past. But it'd essentially use two infomask bits forever... t_infomask is all used, but t_infomask2 has two bits left: /* * information stored in t_infomask2: */ #define HEAP_NATTS_MASK 0x07FF /* 11 bits for number of attributes */ /* bits 0x1800 are available */ #define HEAP_KEYS_UPDATED 0x2000 /* tuple was updated and key cols * modified, or tuple deleted */ #define HEAP_HOT_UPDATED0x4000 /* tuple was HOT-updated */ #define HEAP_ONLY_TUPLE 0x8000 /* this is heap-only tuple */ #define HEAP2_XACT_MASK 0xE000 /* visibility-related bits */ - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Mon, Feb 16, 2015 at 12:00 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: So INSERT ON CONFLICT IGNORE on a table with an exclusion constraint might fail. I don't like that. The point of having the command in the first place is to deal with concurrency issues. If it sometimes doesn't work, it's broken. I don't like it either, although I think it wouldn't come up very often with exclusion constraints - basically, it would rarely be noticed due to the different use cases. To be honest, in suggesting this idea I was hedging against us not figuring out a solution to that problem in time. You didn't like my suggestion of dropping IGNORE entirely, either. I'll do my best to come up with something, but I'm uncomfortable that at this late stage, ON CONFLICT IGNORE support for exclusion constraints seems like a risk to the entire project. I ask that if push comes to shove you show some flexibility here. I'll try my best to ensure that you don't have to even consider committing something with a notable omission. You don't have to give me an answer to this now. The idea of comparing the TIDs of the tuples as a tie-breaker seems most promising to me. If the conflicting tuple's TID is smaller than the inserted tuple's, super-delete and wait. Otherwise, wait without super-deletion. That's really simple. Do you see a problem with that? No. I'll work on that, and see how it stands up to stress testing. Come to think of it, that does seem most promising. BTW, it would good to explain somewhere in comments or a README the term unprincipled deadlock. It's been a useful concept, and hard to grasp. If you defined it once, with examples and everything, then you could just have See .../README in other places that need to refer it. Agreed. I have described those in the revised executor README, in case you missed that. Do you think they ought to have their own section? Note that hash indexes have unprincipled deadlock issues, but no one has bothered to fix them. Whatever works, really. I can't say that the performance implications of acquiring that hwlock are at the forefront of my mind. I never found that to be a big problem on an 8 core box, relative to vanilla INSERTs, FWIW - lock contention is not normal, and may be where any heavweight lock costs would really be encountered. Oh, cool. I guess the fast-path in lmgr.c kicks ass, then :-). Seems that way. But even if that wasn't true, it wouldn't matter, since I don't see that we have a choice. I don't see how storing the promise token in heap tuples buys us not having to involve heavyweight locking of tokens. (I think you may have made a thinko in suggesting otherwise) It wouldn't get rid of heavyweight locks, but it would allow getting rid of the procarray changes. The inserter could generate a token, then acquire the hw-lock on the token, and lastly insert the heap tuple with the correct token. Do you really think that's worth the disk overhead? I generally agree with the zero overhead principle, which is that anyone not using the feature shouldn't pay no price for it (or vanishingly close to no price). Can't say that I have a good sense of the added distributed cost (if any) to be paid by adding new fields to the PGPROC struct (since the PGXACT struct was added in 9.2). Is your only concern that the PGPROC array will now have more fields, making it more complicated? Surely that's a price worth paying to avoid making these heap tuples artificially somewhat larger. You're probably left with tuples that are at least 8 bytes larger, once alignment is taken into consideration. That doesn't seem great. That couldn't work without further HeapTupleSatisfiesDirty() logic. Why not? Just meant that it wasn't sufficient to check xmin == xmax, while allowing that something like that could work with extra work (e.g. the use of infomask bits)... Ok, so we can't unconditionally always ignore tuples with xmin==xmax. We would need an infomask bit to indicate super-deletion, and only ignore it if the bit is set. ...which is what you say here. I'm starting to think that we should bite the bullet and consume an infomask bit for this. The infomask bits are a scarce resource, but we should use them when it makes sense. It would be good for forensic purposes too, to leave a trace that a super-deletion happened. I too think the tqual.c changes are ugly. I can't see a way around using a token lock, though - I would only consider (and only consider) refining the interface by which a waiter becomes aware that it must wait on the outcome of the inserting xact's speculative insertion/value lock (and in particular, that is should not wait on xact outcome). We clearly need something to wait on that isn't an XID...heavyweight locking of a token value is the obvious way of doing that. Yeah. Jim Nasby said something about setting the HEAP_XMIN_INVALID hint bit. Maybe he is right...if that can be made to be reliable (always
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Mon, Feb 16, 2015 at 4:11 PM, Peter Geoghegan p...@heroku.com wrote: Jim Nasby said something about setting the HEAP_XMIN_INVALID hint bit. Maybe he is right...if that can be made to be reliable (always WAL-logged), it could be marginally better than setting xmin to invalidTransactionId. I'm not a big fan of that. The xmin-invalid bit is currently always just a hint. Well, Andres makes the point that that isn't quite so. Hmm. So the tqual.c routines actually check if (HeapTupleHeaderXminInvalid(tuple)). Which is: #define HeapTupleHeaderXminInvalid(tup) \ ( \ ((tup)-t_infomask (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)) == \ HEAP_XMIN_INVALID \ ) What appears to happen if I try to only use the HEAP_XMIN_INVALID bit (and not explicitly set xmin to InvalidTransactionId, and not explicitly check that xmin isn't InvalidTransactionId in each HeapTupleSatisfies* routine) is that after a little while, Jeff Janes' tool shows up spurious unique violations, as if some tuple has become visible when it shouldn't have. I guess this is because the HEAP_XMIN_COMMITTED hint bit can still be set, which in effect invalidates the HEAP_XMIN_INVALID hint bit. It takes about 2 minutes before this happens for the first time when fsync = off, following a fresh initdb, which is unacceptable. I'm not sure if it's worth trying to figure out how HEAP_XMIN_COMMITTED gets set. Not that I'm 100% sure that that's what this really is; it just seems very likely. Attached broken patch (broken_visible.patch) shows the changes made to reveal this problem. Only the changes to tqual.c and heap_delete() should matter here, since I did not test recovery. I also thought about unifying the check for if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple), InvalidTransactionId)) with the conventional HeapTupleHeaderXminInvalid() macro, and leaving everything else as-is. This is no good either, and for similar reasons - control often won't reach the macro, which is behind a check of if (!HeapTupleHeaderXminCommitted(tuple)). The best I think we can hope for is having a dedicated if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple), InvalidTransactionId)) macro HeapTupleHeaderSuperDeleted() to do the work each time, which does not need to be checked so often. A second attached patch (compact_tqual_works.patch - which is non-broken, AFAICT) shows how this is possible, while also moving the check further down within each tqual.c routine (which seems more in keeping with the fact that super deletion is a relatively obscure concept). I haven't been able to break this variant using my existing test suite, so this seems promising as a way of reducing tqual.c clutter. However, as I said the other day, this is basically just polish. -- Peter Geoghegan diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 0aa3e57..b777c56 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -2899,7 +2899,7 @@ l1: } else { - HeapTupleHeaderSetXmin(tp.t_data, InvalidTransactionId); + HeapTupleHeaderSetXminInvalid(tp.t_data); } /* Make sure there is no forward chain link in t_ctid */ @@ -7382,12 +7382,12 @@ heap_xlog_delete(XLogReaderState *record) htup-t_infomask = ~(HEAP_XMAX_BITS | HEAP_MOVED); htup-t_infomask2 = ~HEAP_KEYS_UPDATED; HeapTupleHeaderClearHotUpdated(htup); + if (xlrec-flags XLOG_HEAP_KILLED_SPECULATIVE_TUPLE) + HeapTupleHeaderSetXminInvalid(htup); + fix_infomask_from_infobits(xlrec-infobits_set, htup-t_infomask, htup-t_infomask2); - if (!(xlrec-flags XLOG_HEAP_KILLED_SPECULATIVE_TUPLE)) - HeapTupleHeaderSetXmax(htup, xlrec-xmax); - else - HeapTupleHeaderSetXmin(htup, InvalidTransactionId); + HeapTupleHeaderSetXmax(htup, xlrec-xmax); HeapTupleHeaderSetCmax(htup, FirstCommandId, false); /* Mark the page as a candidate for pruning */ diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c index 99bb417..fd857b1 100644 --- a/src/backend/utils/time/tqual.c +++ b/src/backend/utils/time/tqual.c @@ -170,13 +170,6 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer) Assert(ItemPointerIsValid(htup-t_self)); Assert(htup-t_tableOid != InvalidOid); - /* - * Never return super-deleted tuples - */ - if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple), - InvalidTransactionId)) - return false; - if (!HeapTupleHeaderXminCommitted(tuple)) { if (HeapTupleHeaderXminInvalid(tuple)) @@ -735,15 +728,6 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot, snapshot-xmin = snapshot-xmax = InvalidTransactionId; snapshot-speculativeToken = 0; - /* - * Never return super-deleted tuples - * - * XXX: Comment this code out and you'll get conflicts within - * ExecLockUpdateTuple(), which result in an infinite loop. - */ - if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple), - InvalidTransactionId)) - return false; if
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 2015-02-16 10:00:24 +0200, Heikki Linnakangas wrote: On 02/16/2015 02:44 AM, Peter Geoghegan wrote: Are we happy with acquiring the SpeculativeInsertLock heavy-weight lock for every insertion? That seems bad for performance reasons. Also, are we happy with adding the new fields to the proc array? Another alternative that was discussed was storing the speculative insertion token on the heap tuple itself. (See http://www.postgresql.org/message-id/52d00d2d.6030...@vmware.com) Whatever works, really. I can't say that the performance implications of acquiring that hwlock are at the forefront of my mind. I never found that to be a big problem on an 8 core box, relative to vanilla INSERTs, FWIW - lock contention is not normal, and may be where any heavweight lock costs would really be encountered. Oh, cool. I guess the fast-path in lmgr.c kicks ass, then :-). I don't think it actually uses the fast path, does it? IIRC that's restricted to LOCKTAG_RELATION when done via LockAcquireExtended + open coded for the VirtualXactLock table. I'm not super worried atm either. Worth checking, but probably not worth obsessing over. Besides, why should one transaction be entitled to insert a conflicting value tuple just because a still running transaction deleted it (having also inserted the tuple itself)? Didn't one prototype version of value locking #2 have that as a bug [1]? In fact, originally, didn't the xmin set to invalid thing come immediately from realizing that that wasn't workable? Ah, right. So the problem was that some code might assume that if you insert a row, delete it in the same transaction, and then insert the same value again, the 2nd insertion will always succeed because the previous insertion effectively acquired a value-lock on the key. Ok, so we can't unconditionally always ignore tuples with xmin==xmax. We would need an infomask bit to indicate super-deletion, and only ignore it if the bit is set. I'm starting to think that we should bite the bullet and consume an infomask bit for this. The infomask bits are a scarce resource, but we should use them when it makes sense. It would be good for forensic purposes too, to leave a trace that a super-deletion happened. Well, we IIRC don't have any left right now. We could reuse MOVED_IN|MOVED_OUT, as that never was set in the past. But it'd essentially use two infomask bits forever... Jim Nasby said something about setting the HEAP_XMIN_INVALID hint bit. Maybe he is right...if that can be made to be reliable (always WAL-logged), it could be marginally better than setting xmin to invalidTransactionId. I'm not a big fan of that. The xmin-invalid bit is currently always just a hint. We already rely on XMIN_INVALID being set correctly for freezing. C.f. HeapTupleHeaderXminFrozen, HeapTupleHeaderXminInvalid, et al. So it'd not necessarily end up being that bad. And the super deletion could easily just set it in the course of it's WAL logging. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 02/16/2015 02:44 AM, Peter Geoghegan wrote: On Sat, Feb 14, 2015 at 2:06 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I did not solve the potential for livelocks (discussed at http://www.postgresql.org/message-id/cam3swztftt_fehet3tu3ykcpcypynnauquz3q+naasnh-60...@mail.gmail.com). The patch always super-deletes the already-inserted tuple on conflict, and then waits for the other inserter. That would be nice to fix, using one of the ideas from that thread, or something else. How about we don't super-delete at all with exclusion constraints? I'd be willing to accept unprincipled deadlocks for exclusion constraints, because they already exist today for UPSERT/NOSERT type use cases, and with idiomatic usage seem far less likely for the IGNORE variant (especially with exclusion constraints). So INSERT ON CONFLICT IGNORE on a table with an exclusion constraint might fail. I don't like that. The point of having the command in the first place is to deal with concurrency issues. If it sometimes doesn't work, it's broken. I can see people using ON CONFLICT UPDATE where they'd almost or actually be better off using a plain UPDATE - that's quite a different situation. I find livelocks to be a *very* scary prospect, and I don't think the remediations that were mentioned are going to fly. It's just too messy, and too much of a niche use case. TBH I am skeptical of the idea that we can fix exclusion constraints properly in this way at all, at least not without the exponential backoff thing, which just seems horrible. The idea of comparing the TIDs of the tuples as a tie-breaker seems most promising to me. If the conflicting tuple's TID is smaller than the inserted tuple's, super-delete and wait. Otherwise, wait without super-deletion. That's really simple. Do you see a problem with that? Although you understood what I was on about when I first talked about unprincipled deadlocks, I think that acceptance of that idea came much later from other people, because it's damn complicated. BTW, it would good to explain somewhere in comments or a README the term unprincipled deadlock. It's been a useful concept, and hard to grasp. If you defined it once, with examples and everything, then you could just have See .../README in other places that need to refer it. It's not worth it to add some weird Dining philosophers exponential backoff thing to make sure that the IGNORE variant when used with exclusion constraints can never deadlock in an unprincipled way, but if it is worth it then it seems unreasonable to suppose that this patch needs to solve that pre-existing problem. No? The point of solving the existing problem is that it allows us to split the patch, for easier review. Are we happy with acquiring the SpeculativeInsertLock heavy-weight lock for every insertion? That seems bad for performance reasons. Also, are we happy with adding the new fields to the proc array? Another alternative that was discussed was storing the speculative insertion token on the heap tuple itself. (See http://www.postgresql.org/message-id/52d00d2d.6030...@vmware.com) Whatever works, really. I can't say that the performance implications of acquiring that hwlock are at the forefront of my mind. I never found that to be a big problem on an 8 core box, relative to vanilla INSERTs, FWIW - lock contention is not normal, and may be where any heavweight lock costs would really be encountered. Oh, cool. I guess the fast-path in lmgr.c kicks ass, then :-). I don't see how storing the promise token in heap tuples buys us not having to involve heavyweight locking of tokens. (I think you may have made a thinko in suggesting otherwise) It wouldn't get rid of heavyweight locks, but it would allow getting rid of the procarray changes. The inserter could generate a token, then acquire the hw-lock on the token, and lastly insert the heap tuple with the correct token. Are we happy with the way super-deletion works? Currently, the xmin field is set to invalid to mark the tuple as super-deleted. That required checks in HeapTupleSatisfies* functions. One alternative would be to set xmax equal to xmin, and teach the code currently calls XactLockTableWait() to check if xmax=xmin, and not consider the tuple as conflicting. That couldn't work without further HeapTupleSatisfiesDirty() logic. Why not? Besides, why should one transaction be entitled to insert a conflicting value tuple just because a still running transaction deleted it (having also inserted the tuple itself)? Didn't one prototype version of value locking #2 have that as a bug [1]? In fact, originally, didn't the xmin set to invalid thing come immediately from realizing that that wasn't workable? Ah, right. So the problem was that some code might assume that if you insert a row, delete it in the same transaction, and then insert the same value again, the 2nd insertion will always succeed because the previous insertion effectively acquired a
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Sat, Feb 14, 2015 at 2:06 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Thanks for taking a look at it. That's somewhat cleaned up in the attached patchseries - V2.2. This has been rebased to repair the minor bit-rot pointed out by Thom. I don't really have the energy to review this patch in one batch, so I'd really like to split this into two: I think we're all feeling worn out at this point, by this patch and by others. I do appreciate your making the effort. 1. Solve the existing problem with exclusion constraints that if two insertions happen concurrently, one of them might be aborted with deadlock detected error, rather then conflicting key violation error. That really wouldn't be worth fixing otherwise, but it happens to be a useful stepping stone for this upsert patch. 2. All the rest. I think we need a more pragmatic approach to dealing with the exclusion constraint problems. I took a stab at extracting the parts needed to do 1. See attached. I didn't modify ExecUpdate to use speculative insertions, so the issue remains for UPDATEs, but that's easy to add. Cool. I did not solve the potential for livelocks (discussed at http://www.postgresql.org/message-id/cam3swztftt_fehet3tu3ykcpcypynnauquz3q+naasnh-60...@mail.gmail.com). The patch always super-deletes the already-inserted tuple on conflict, and then waits for the other inserter. That would be nice to fix, using one of the ideas from that thread, or something else. How about we don't super-delete at all with exclusion constraints? I'd be willing to accept unprincipled deadlocks for exclusion constraints, because they already exist today for UPSERT/NOSERT type use cases, and with idiomatic usage seem far less likely for the IGNORE variant (especially with exclusion constraints). I can see people using ON CONFLICT UPDATE where they'd almost or actually be better off using a plain UPDATE - that's quite a different situation. I find livelocks to be a *very* scary prospect, and I don't think the remediations that were mentioned are going to fly. It's just too messy, and too much of a niche use case. TBH I am skeptical of the idea that we can fix exclusion constraints properly in this way at all, at least not without the exponential backoff thing, which just seems horrible. We never really debated the options for how to do the speculative insertion and super-deletion. This patch is still based on the quick dirty demo patches I posted about a year ago, in response to issues you found with earlier versions. That might be OK - maybe I really hit the nail on designing those things and got them right on first try - but more likely there are better alternatives. Intuitively, it seem likely that you're right here. However, it was necessary to work through the approach to see what the problems are. For example, the need for modifications to tqual.c became apparent only through putting a full implementation of ON CONFLICT UPDATE through various tests. In general, I've emphasized that the problems with any given value locking implementation are non-obvious. Anyone who thinks that he can see all the problems with his approach to value locking without having a real implementation that is tested for problems like unprincipled deadlocks is probably wrong. That's sort of where I'm coming from with suggesting we allow unprincipled deadlocks with exclusion constraints. I'm not confident that we can find a perfect solution, and know that it's a perfect solution. It's too odd, and too niche a requirement. Although you understood what I was on about when I first talked about unprincipled deadlocks, I think that acceptance of that idea came much later from other people, because it's damn complicated. It's not worth it to add some weird Dining philosophers exponential backoff thing to make sure that the IGNORE variant when used with exclusion constraints can never deadlock in an unprincipled way, but if it is worth it then it seems unreasonable to suppose that this patch needs to solve that pre-existing problem. No? If we do something like an exponential backoff, which I think might work, I fear that that kind of yak-shaving will leave us with something impossible to justify committing; a total mess. Better the devil you know (possible unprincipled deadlocks with the IGNORE variant + exclusion constraints). Are we happy with acquiring the SpeculativeInsertLock heavy-weight lock for every insertion? That seems bad for performance reasons. Also, are we happy with adding the new fields to the proc array? Another alternative that was discussed was storing the speculative insertion token on the heap tuple itself. (See http://www.postgresql.org/message-id/52d00d2d.6030...@vmware.com) Whatever works, really. I can't say that the performance implications of acquiring that hwlock are at the forefront of my mind. I never found that to be a big problem on an 8 core box, relative to vanilla INSERTs, FWIW - lock contention is not
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Fri, Feb 13, 2015 at 7:22 PM, Jim Nasby jim.na...@bluetreble.com wrote: In patch 1, sepgsql is also affected by this commit. Note that this commit necessitates an initdb, since stored ACLs are broken. Doesn't that warrant bumping catversion? Yes, but traditionally that is left until the last minute - when the patch is committed. That's why it's called out in the commit message (it isn't otherwise obvious - it's not a common catversion necessitating change like an addition to a system catalog). Patch 2 + * When killing a speculatively-inserted tuple, we set xmin to invalid and +if (!(xlrec-flags XLOG_HEAP_KILLED_SPECULATIVE_TUPLE)) When doing this, should we also set the HEAP_XMIN_INVALID hint bit? reads more of patch Ok, I see we're not doing this because the check for a super deleted tuple is already cheap. Probably worth mentioning that in the comment... See my remarks to Heikki on this. I don't think it adds much. ExecInsert(): + * We don't suppress the effects (or, perhaps, side-effects) of + * BEFORE ROW INSERT triggers. This isn't ideal, but then we + * cannot proceed with even considering uniqueness violations until + * these triggers fire on the one hand, but on the other hand they + * have the ability to execute arbitrary user-defined code which + * may perform operations entirely outside the system's ability to + * nullify. I'm a bit confused as to why we're calling BEFORE triggers out here... hasn't this always been true for both BEFORE *and* AFTER triggers? The comment makes me wonder if there's some magic that happens with AFTER... Yes, but the difference here is that the UPDATE path could be taken (which is sort of like when a before row insert path returns NULL). What I'm calling out is the dependency on firing before row insert triggers to decide if the alternative path must be taken. Roughly speaking, we must perform part of the INSERT (firing of before row insert triggers) in order to decide to do or not do an INSERT. This is, as I said, similar to when those triggers return NULL, and won't matter with idiomatic patterns of before trigger usage. Still feels worth calling out, because sometimes users do foolishly write before triggers with many external side-effects. ExecLockUpdateTuple(): + * Try to lock tuple for update as part of speculative insertion. If + * a qual originating from ON CONFLICT UPDATE is satisfied, update + * (but still lock row, even though it may not satisfy estate's + * snapshot). I find this confusing... which row is being locked? The previously inserted one? Perhaps a better wording would be satisfied, update. Lock the original row even if it doesn't satisfy estate's snapshot. Take a look at the executor README. We're locking the only row that can be locked when an UPSERT non-conclusively thinks to take the UPDATE path: the row that was found during our pre-check. We can only UPDATE when we find such a tuple, and then lock it without finding a row-level conflict. infer_unique_index(): +/* + * We need not lock the relation since it was already locked, either by + * the rewriter or when expand_inherited_rtentry() added it to the query's + * rangetable. + */ +relationObjectId = rt_fetch(parse-resultRelation, parse-rtable)-relid; + +relation = heap_open(relationObjectId, NoLock); Seems like there should be an Assert here. Also, the comment should probably go before the heap_open call. An Assert() of what? Note that the similar function get_relation_info() does about the same thing here. heapam_xlog.h: +/* reuse xl_heap_multi_insert-only bit for xl_heap_delete */ I wish this would mention why it's safe to do this. Also, the comment mentions xl_heap_delete when the #define is for XLOG_HEAP_KILLED_SPECULATIVE_TUPLE; presumably that's wrong. Perhaps: /* * reuse XLOG_HEAP_LAST_MULTI_INSERT bit for * XLOG_HEAP_KILLED_SPECULATIVE_TUPLE. This is safe because we never do * multi-inserts for INSERT ON CONFLICT. */ It's safe, as the comment indicates, because the former is only used for xl_heap_multi_insert records, while the latter is only used for xl_heap_delete records. There is no ambiguity, because naturally we're always able to establish what type of record is under consideration before checking the bit is set. The XLOG_HEAP_* format is used for other flags there, despite the fact that other flags (like XLOG_HEAP_PREFIX_FROM_OLD) can only appear in certain context-appropriate xl_heap_* records. AFAICT, all that I've done that's new here is rely on that, since a bunch of those bits were used up in the last year or two, and the need to even consider bit reuse here is a new problem. I'll review the remaining patches later. Thanks. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 02/10/2015 02:21 AM, Peter Geoghegan wrote: On Fri, Feb 6, 2015 at 1:51 PM, Bruce Momjian br...@momjian.us wrote: Other than the locking part, the biggest part of this patch is adjusting things so that an INSERT can change into an UPDATE. Thanks for taking a look at it. That's somewhat cleaned up in the attached patchseries - V2.2. This has been rebased to repair the minor bit-rot pointed out by Thom. I don't really have the energy to review this patch in one batch, so I'd really like to split this into two: 1. Solve the existing problem with exclusion constraints that if two insertions happen concurrently, one of them might be aborted with deadlock detected error, rather then conflicting key violation error. That really wouldn't be worth fixing otherwise, but it happens to be a useful stepping stone for this upsert patch. 2. All the rest. I took a stab at extracting the parts needed to do 1. See attached. I didn't modify ExecUpdate to use speculative insertions, so the issue remains for UPDATEs, but that's easy to add. I did not solve the potential for livelocks (discussed at http://www.postgresql.org/message-id/cam3swztftt_fehet3tu3ykcpcypynnauquz3q+naasnh-60...@mail.gmail.com). The patch always super-deletes the already-inserted tuple on conflict, and then waits for the other inserter. That would be nice to fix, using one of the ideas from that thread, or something else. We never really debated the options for how to do the speculative insertion and super-deletion. This patch is still based on the quick dirty demo patches I posted about a year ago, in response to issues you found with earlier versions. That might be OK - maybe I really hit the nail on designing those things and got them right on first try - but more likely there are better alternatives. Are we happy with acquiring the SpeculativeInsertLock heavy-weight lock for every insertion? That seems bad for performance reasons. Also, are we happy with adding the new fields to the proc array? Another alternative that was discussed was storing the speculative insertion token on the heap tuple itself. (See http://www.postgresql.org/message-id/52d00d2d.6030...@vmware.com) Are we happy with the way super-deletion works? Currently, the xmin field is set to invalid to mark the tuple as super-deleted. That required checks in HeapTupleSatisfies* functions. One alternative would be to set xmax equal to xmin, and teach the code currently calls XactLockTableWait() to check if xmax=xmin, and not consider the tuple as conflicting. - Heikki diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 46060bc1..0aa3e575 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -2048,6 +2048,9 @@ FreeBulkInsertState(BulkInsertState bistate) * This causes rows to be frozen, which is an MVCC violation and * requires explicit options chosen by user. * + * If HEAP_INSERT_SPECULATIVE is specified, the MyProc-specInsert fields + * are filled. + * * Note that these options will be applied when inserting into the heap's * TOAST table, too, if the tuple requires any out-of-line data. * @@ -2196,6 +2199,13 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid, END_CRIT_SECTION(); + /* + * Let others know that we speculatively inserted this tuple, before + * releasing the buffer lock. + */ + if (options HEAP_INSERT_SPECULATIVE) + SetSpeculativeInsertionTid(relation-rd_node, heaptup-t_self); + UnlockReleaseBuffer(buffer); if (vmbuffer != InvalidBuffer) ReleaseBuffer(vmbuffer); @@ -2616,11 +2626,17 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask) * (the last only for HeapTupleSelfUpdated, since we * cannot obtain cmax from a combocid generated by another transaction). * See comments for struct HeapUpdateFailureData for additional info. + * + * If 'killspeculative' is true, caller requires that we super-delete a tuple + * we just inserted in the same command. Instead of the normal visibility + * checks, we check that the tuple was inserted by the current transaction and + * given command id. Also, instead of setting its xmax, we set xmin to + * invalid, making it immediately appear as dead to everyone. */ HTSU_Result heap_delete(Relation relation, ItemPointer tid, CommandId cid, Snapshot crosscheck, bool wait, - HeapUpdateFailureData *hufd) + HeapUpdateFailureData *hufd, bool killspeculative) { HTSU_Result result; TransactionId xid = GetCurrentTransactionId(); @@ -2678,7 +2694,18 @@ heap_delete(Relation relation, ItemPointer tid, tp.t_self = *tid; l1: - result = HeapTupleSatisfiesUpdate(tp, cid, buffer); + if (!killspeculative) + { + result = HeapTupleSatisfiesUpdate(tp, cid, buffer); + } + else + { + if (tp.t_data-t_choice.t_heap.t_xmin != xid || + tp.t_data-t_choice.t_heap.t_field3.t_cid != cid) + elog(ERROR, attempted to super-delete a tuple from other CID); + result =
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Tue, Feb 10, 2015 at 9:21 AM, Peter Geoghegan p...@heroku.com wrote: * There was some squashing of commits, since Andres felt that they weren't all useful as separate commits. I've still split out the RTE permissions commit, as well as the RLS commit (plus the documentation and test commits, FWIW). I hope that this will make it easier to review parts of the patch, without being generally excessive. When documentation and tests are left out, the entire patch series is left at: Patch moved to next CF. -- Michael
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 2/9/15 6:21 PM, Peter Geoghegan wrote: Thanks for taking a look at it. That's somewhat cleaned up in the attached patchseries - V2.2. In patch 1, sepgsql is also affected by this commit. Note that this commit necessitates an initdb, since stored ACLs are broken. Doesn't that warrant bumping catversion? Patch 2 + * When killing a speculatively-inserted tuple, we set xmin to invalid and +if (!(xlrec-flags XLOG_HEAP_KILLED_SPECULATIVE_TUPLE)) When doing this, should we also set the HEAP_XMIN_INVALID hint bit? reads more of patch Ok, I see we're not doing this because the check for a super deleted tuple is already cheap. Probably worth mentioning that in the comment... ExecInsert(): + * We don't suppress the effects (or, perhaps, side-effects) of + * BEFORE ROW INSERT triggers. This isn't ideal, but then we + * cannot proceed with even considering uniqueness violations until + * these triggers fire on the one hand, but on the other hand they + * have the ability to execute arbitrary user-defined code which + * may perform operations entirely outside the system's ability to + * nullify. I'm a bit confused as to why we're calling BEFORE triggers out here... hasn't this always been true for both BEFORE *and* AFTER triggers? The comment makes me wonder if there's some magic that happens with AFTER... +spec != SPEC_NONE? HEAP_INSERT_SPECULATIVE:0 Non-standard formatting. Given the size of the patch and work already into it I'd just leave it for the next formatting run; I only mention it in case someone has some compelling reason not to. ExecLockUpdateTuple(): + * Try to lock tuple for update as part of speculative insertion. If + * a qual originating from ON CONFLICT UPDATE is satisfied, update + * (but still lock row, even though it may not satisfy estate's + * snapshot). I find this confusing... which row is being locked? The previously inserted one? Perhaps a better wording would be satisfied, update. Lock the original row even if it doesn't satisfy estate's snapshot. infer_unique_index(): +/* + * We need not lock the relation since it was already locked, either by + * the rewriter or when expand_inherited_rtentry() added it to the query's + * rangetable. + */ +relationObjectId = rt_fetch(parse-resultRelation, parse-rtable)-relid; + +relation = heap_open(relationObjectId, NoLock); Seems like there should be an Assert here. Also, the comment should probably go before the heap_open call. heapam_xlog.h: +/* reuse xl_heap_multi_insert-only bit for xl_heap_delete */ I wish this would mention why it's safe to do this. Also, the comment mentions xl_heap_delete when the #define is for XLOG_HEAP_KILLED_SPECULATIVE_TUPLE; presumably that's wrong. Perhaps: /* * reuse XLOG_HEAP_LAST_MULTI_INSERT bit for * XLOG_HEAP_KILLED_SPECULATIVE_TUPLE. This is safe because we never do * multi-inserts for INSERT ON CONFLICT. */ I'll review the remaining patches later. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 2015-02-04 16:49:46 -0800, Peter Geoghegan wrote: On Tue, Feb 2, 2015 at 01:06 AM, Andres Freund and...@2ndquadrant.com wrote: Generally the split into the individual commits doesn't seem to make much sense to me. I think trying to make that possible is a good idea in patches of this size. It e.g. seems entirely possible to structure the patchset so that the speculative lock infrastructure is added first and the rest later. I've not thought more about how to split it up further, but I'm pretty sure it's possible. The commits individually (except the first) aren't indivdiually commitable and aren't even meaningful. Splitting off the internal docs, tests and such actually just seems to make reviewing harder because you miss context. Splitting it so that individual piece are committable and reviewable makes sense, but... I have no problem doing the user docs later. If you split of RLS support, you need to throw an error before it's implemented. I mostly agree. Basically, I did not intend for all of the patches to be individually committed. The mechanism by which EXCLUDED.* expressions are added is somewhat novel, and deserves to be independently *considered*. I'm trying to show how the parts fit together more so than breaking things down in to smaller commits (as you picked up on, 0001 is the exception - that is genuinely intended to be committed early). Also, those commit messages give me the opportunity to put those parts in their appropriate context vis-a-vis our discussions. They refer to the Wiki, for example, or reasons why pg_stat_statements shouldn't care about ExcludedExpr. Obviously the final commit messages won't look that way. FWIW, I don't think anything here really should refer to the wiki... 0002: * Tentatively I'd say that killspeculative should be done via a separate function instead of heap_delete() Really? I guess if that were to happen, it would entail refactoring heap_delete() to call a static function, which was also called by a new kill_speculative() function that does this. Otherwise, you'd have far too much duplication. I don't really think there actually is that much common inbetween those. It looks to me that most of the code in heap_delete isn't actually relevant for this case and could be cut short. My guess is that only the WAL logging would be separated out. * I doubt logical decoding works with the patch as it stands. I thought so. Perhaps you could suggest a better use of the available XLOG_HEAP_* bits. I knew I needed to consider that more carefully (hence the XXX comment), but didn't get around to it. I think you probably need to add test macros that make sure only the individual bits are sets, and not the combination and then only use those. * If a arbiter index is passed to ExecCheckIndexConstraints(), can't we abort the loop after checking it? Also, do we really have to iterate over indexes for that case? How about moving the loop contents to a separate function and using that separately for the arbiter cases? Well, the failure to do that implies very few extra cycles, but sure. It's not that much about the CPU cycles, but also about the mental ones. If you have to think what happens if there's more than one match... * ItemPointerIsValid What about it? Uh. Oh. No idea. I wrote this pretty late at night ;) * /* * This may occur when an instantaneously invisible tuple is blamed * as a conflict because multiple rows are inserted with the same * constrained values. How can this happen? We don't insert multiple rows with the same command id? This is a cardinality violation [1]. It can definitely happen - just try the examples you see on the Wiki. I don't care about the wiki from the point of code comments. This needs to be understandable in five years time. * Perhaps it has previously been discussed but I'm not convinced by the reasoning for not looking at opclasses in infer_unique_index(). This seems like it'd prohibit ever having e.g. case insensitive opclasses - something surely worthwile. I don't think anyone gave that idea the thumbs-up. However, I really don't see the problem. Sure, we could have case insensitive opclasses in the future, and you may want to make a unique index using one. Then the problem suddenly becomes that previous choices of indexes/statements aren't possible anymore. It seems much better to introduce the syntax now and not have too much of a usecase for it. * The whole speculative insert logic isn't really well documented. Why, for example, do we actually need the token? And why are there no issues with overflow? And where is it documented that a 0 means there's no token? ... Fair enough. Presumably it's okay that overflow theoretically could occur, because a race is all but impossible. The token represents a particular attempt by some backend at inserting a tuple, that needs to be waited on
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Tue, Feb 10, 2015 at 12:04 AM, Andres Freund and...@2ndquadrant.com wrote: FWIW, I don't think anything here really should refer to the wiki... The Wiki pages have done a good job of summarizing things...it certainly didn't seem that hard for you to get up to speed here. Also, as you'll know from working on logical decoding, it's hard to know what is complex and esoteric and what is relatively accessible when you're this close to a big project. I recall that you said as much before. I'm focused on signposting so that reviewers can follow what each patch does with the minimum of effort (with reference to the Wiki or whatever). I see no reason to not do things that way until commit...it feels like there is less chance of the concepts going over people's head this way. I don't really think there actually is that much common inbetween those. It looks to me that most of the code in heap_delete isn't actually relevant for this case and could be cut short. My guess is that only the WAL logging would be separated out. I'll think about that some more. * I doubt logical decoding works with the patch as it stands. I thought so. Perhaps you could suggest a better use of the available XLOG_HEAP_* bits. I knew I needed to consider that more carefully (hence the XXX comment), but didn't get around to it. I think you probably need to add test macros that make sure only the individual bits are sets, and not the combination and then only use those. This too. * /* * This may occur when an instantaneously invisible tuple is blamed * as a conflict because multiple rows are inserted with the same * constrained values. How can this happen? We don't insert multiple rows with the same command id? This is a cardinality violation [1]. It can definitely happen - just try the examples you see on the Wiki. I don't care about the wiki from the point of code comments. This needs to be understandable in five years time. That wasn't clear before - you seemed to me to be questioning if this was even possible. There is a section in the INSERT SQL reference page about cardinality violations, so we're certainly talking about something that a code reader likely heard of. Also, the nitty gritty showing various scenarios on the Wiki is the quickest way to know what is possible (but is much too long winded for user visible documentation or code comments). * Perhaps it has previously been discussed but I'm not convinced by the reasoning for not looking at opclasses in infer_unique_index(). This seems like it'd prohibit ever having e.g. case insensitive opclasses - something surely worthwile. I don't think anyone gave that idea the thumbs-up. However, I really don't see the problem. Sure, we could have case insensitive opclasses in the future, and you may want to make a unique index using one. Then the problem suddenly becomes that previous choices of indexes/statements aren't possible anymore. It seems much better to introduce the syntax now and not have too much of a usecase for it. The only way the lack of a way of specifying which opclass to use could bite us is if there were two *actually* defined unique indexes on the same column, each with different equality operators. That seems like kind of a funny scenario, even if that were quite possible (even if non-default opclasses existed that had a non-identical equality operators, which is basically not the case today). I grant that is a bit odd that we're talking about unique indexes definitions affecting semantics, but that is to a certain extent the nature of the beast. As a compromise, I suggest having the inference specification optionally accept a named opclass per attribute, in the style of CREATE INDEX (I'm already reusing a bit of the raw parser support for CREATE INDEX, you see) - that'll make inference insist on that opclass, rather than make it a strict matter of costing available alternatives (not that any alternative is expected with idiomatic usage). That implies no additional parser overhead, really. If that's considered ugly, then at least it's an ugly thing that literally no one will ever use in the foreseeable future...and an ugly thing that is no more necessary in CREATE INDEX than here (and yet CREATE INDEX lives with the ugliness). It's really not about me understanding it right now, but about longer term. Sure. Can you add a UPSERT test for logical decoding? I doubt it'll work right now, even in the repost. Okay. * /* * Immediately VACUUM super-deleted tuples */ if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple), InvalidTransactionId)) return HEAPTUPLE_DEAD; Is that branch really needed? Shouldn't it just be happening as a consequence of the already existing code? Same in SatisfiesMVCC. If you actually needed that block, it'd need to be done in SatisfiesSelf as well, no? You have a comment about a possible loop - but that seems wrong to me,
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 29 January 2015 at 23:38, Peter Geoghegan p...@heroku.com wrote: On Sat, Jan 17, 2015 at 6:48 PM, Peter Geoghegan p...@heroku.com wrote: I continued with this since posting V2.0. Attached version (V2.1) fixes bit-rot caused by the recent changes by Stephen (Fix column-privilege leak in error-message paths). More precisely, it is rebased on top of today's 17792b commit. Patch 0002 no longer applies due to a conflict in src/backend/executor/execUtils.c. Thom
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 02/03/2015 08:17 PM, Peter Geoghegan wrote: On Tue, Feb 3, 2015 at 2:05 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: TRAP: FailedAssertion(!(node-spec != SPEC_INSERT || node-arbiterIndex != ((Oid) 0)), File: nodeModifyTable.c, Line: 1619) Is that just because of the hack in parse_clause.c? Yes. I never built with assertions and so didn't see this, but it doesn't matter. With assertions disabled, count_upsert_exclusion.pl ran successfully to the end. I also tried running VACUUM FREEZE upsert_race_test in a loop in another session at the same time, but it didn't make a difference. How quickly do you see the errors? I also tried applying crash_REL9_5.patch from the jjanes_upsert kit, and set jj_xid=1 to increase XID burn rate, but I'm still not seeing any errors. Did you build fully optimized, assertion-free code? I've been doing that. I found it necessary to recreate some of the bugs Jeff's tool caught. I also think that I might have needed an 8 core box to see it, but less sure about that. I had compiled with -O0, but without assertions. I tried now again with -O3. It's been running for about 10 minutes now, and I haven't seen any errors. Since you can reproduce this, it would be good if you could debug this. The error message where the alleged duplicate key actually had a different value is a bit scary. Makes me wonder if it might be a bug with exclusion constraints in general, or just with the patch. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Wed, Feb 4, 2015 at 9:54 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I had compiled with -O0, but without assertions. I tried now again with -O3. It's been running for about 10 minutes now, and I haven't seen any errors. Did you run with an artificially high XID burn rate (i.e. did you also apply Jeff's modifications to Postgres, and specify a high burn rate using his custom GUC)? Maybe that was important. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Tue, Feb 2, 2015 at 01:06 AM, Andres Freund and...@2ndquadrant.com wrote: A first (not actually that quick :() look through the patches to see what actually happened in the last months. I didn't keep up with the thread. So, let me get this out of the way: This is the first in-depth technical review that this work has had in a long time. Thank you for your help here. Generally the split into the individual commits doesn't seem to make much sense to me. The commits individually (except the first) aren't indivdiually commitable and aren't even meaningful. Splitting off the internal docs, tests and such actually just seems to make reviewing harder because you miss context. Splitting it so that individual piece are committable and reviewable makes sense, but... I have no problem doing the user docs later. If you split of RLS support, you need to throw an error before it's implemented. I mostly agree. Basically, I did not intend for all of the patches to be individually committed. The mechanism by which EXCLUDED.* expressions are added is somewhat novel, and deserves to be independently *considered*. I'm trying to show how the parts fit together more so than breaking things down in to smaller commits (as you picked up on, 0001 is the exception - that is genuinely intended to be committed early). Also, those commit messages give me the opportunity to put those parts in their appropriate context vis-a-vis our discussions. They refer to the Wiki, for example, or reasons why pg_stat_statements shouldn't care about ExcludedExpr. Obviously the final commit messages won't look that way. 0001: * References INSERT with ON CONFLICT UPDATE, can thus not be committed independently. I don't think those references really are needed. * I'm not a fan of the increased code duplication in ExecCheckRTEPerms(). Can you look into cleaning that up? * Also the comments inside the ACL_INSERT case still reference UPDATE. Other than that I think we can just go ahead and commit this ahead of time. Mentioning ON CONFLICT UPDATE (OCU henceforth) in the commit message only. Cool. Attached revision makes those changes. 0007: * AMs alone isn't particularly unique. * Without the context of the discussion unprincipled deadlocks aren't well defined. * Too many words. * Waiting too long isn't defined. Neither is why that'd imply unprincipled deadlocks. Somewhat understandable with the context of the discussion, but surely not a couple years down the road. * As is I don't find the README entry super helpful. It should state what the reason for doing this is cleary, start at the higher level, and then move to the details. * Misses details about the speculative heavyweight locking of tuples. Fair points. I'll work through that feedback. Actually, I think we should memorialize that unprincipled deadlocks should be avoided in some more general way, since they are after all a general problem that we've seen elsewhere. I'm not sure about how to go about doing that, though. 0002: * Tentatively I'd say that killspeculative should be done via a separate function instead of heap_delete() Really? I guess if that were to happen, it would entail refactoring heap_delete() to call a static function, which was also called by a new kill_speculative() function that does this. Otherwise, you'd have far too much duplication. * I think we should, as you ponder in a comment, do the OCU specific stuff lazily and/or in a separate function from BuildIndexInfo(). That function is already quite visible in profiles, and the additional work isn't entirely trivial. Okay. * I doubt logical decoding works with the patch as it stands. I thought so. Perhaps you could suggest a better use of the available XLOG_HEAP_* bits. I knew I needed to consider that more carefully (hence the XXX comment), but didn't get around to it. * The added ereport (i.e. user facing) error message in ExecInsertIndexTuples won't help a user at all. So, this: /* Skip this index-update if the predicate isn't satisfied */ if (!ExecQual(predicate, econtext, false)) + { + if (arbiterIdx == indexRelation-rd_index-indexrelid) + ereport(ERROR, + (errcode(ERRCODE_TRIGGERED_ACTION_EXCEPTION), + errmsg(partial arbiter unique index has predicate that does not cover tuple proposed for insertion), + errdetail(ON CONFLICT inference clause implies that the tuple proposed for insertion actually be covered by partial predicate for index \%s\., + RelationGetRelationName(indexRelation)), + errhint(ON CONFLICT inference clause must infer a unique index that covers the final tuple, after BEFORE ROW INSERT triggers fire.), + errtableconstraint(heapRelation, + RelationGetRelationName(indexRelation;
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Wed, Feb 4, 2015 at 10:03 AM, Peter Geoghegan p...@heroku.com wrote: On Wed, Feb 4, 2015 at 9:54 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I had compiled with -O0, but without assertions. I tried now again with -O3. It's been running for about 10 minutes now, and I haven't seen any errors. Did you run with an artificially high XID burn rate (i.e. did you also apply Jeff's modifications to Postgres, and specify a high burn rate using his custom GUC)? Maybe that was important. Excuse me: I now see that you specifically indicated that you did. But looks like your XID burn rate was quite a lot higher than mine (assuming that you were consistent in using jj_xid=1, although I'm not asserting that that was significant). I attach a log of output from an example session where exclusion constraints are shown to break (plus the corresponding server log, plus /proc/cpuinfo on the off chance that that's significant). As you can from the fact that the span of time recorded in the log is only a couple of minutes, this is really easy for me to recreatesometimes. I could not recreate the problem with only 4 clients (on this 8 core server) after a few dozen attempts, and then I couldn't recreate the issue at all, so clearly those details matter. It might have something to do with CPU scaling, which I've found can significantly affect outcomes for things like this (looks like my hosting provider changed settings in the system BIOS recently, such that I cannot set the CPU governor to performance). Perhaps you could take a crack at recreating this, Jeff? Thanks -- Peter Geoghegan Example run of jjanes_upsert tool: pg@gerbil:~/jjanes_upsert$ perl count_upsert_exclusion.pl 8 [Thu Feb 5 04:38:36 2015] NOTICE: extension btree_gist already exists, skipping [Thu Feb 5 04:38:36 2015] init done at count_upsert_exclusion.pl line 106. pg@gerbil:~/jjanes_upsert$ perl count_upsert_exclusion.pl 8 1 [Thu Feb 5 04:38:56 2015] NOTICE: extension btree_gist already exists, skipping [Thu Feb 5 04:38:56 2015] init done at count_upsert_exclusion.pl line 106. [Thu Feb 5 04:39:01 2015] sum is 528 [Thu Feb 5 04:39:01 2015] count is 8740 [Thu Feb 5 04:39:01 2015] For tuple with index value 1, 6 != 3 at count_upsert_exclusion.pl line 174. pg@gerbil:~/jjanes_upsert$ perl count_upsert_exclusion.pl 8 1 [Thu Feb 5 04:39:08 2015] NOTICE: extension btree_gist already exists, skipping [Thu Feb 5 04:39:08 2015] init done at count_upsert_exclusion.pl line 106. pg@gerbil:~/jjanes_upsert$ perl count_upsert_exclusion.pl 8 1 [Thu Feb 5 04:39:20 2015] NOTICE: extension btree_gist already exists, skipping [Thu Feb 5 04:39:20 2015] init done at count_upsert_exclusion.pl line 106. Corresponding PostgreSQL server logs (note: This uses the PDT timezone, not UTC as above): pg@gerbil:~/postgresql$ pg_ctl start -o --fsync=off --JJ_xid=50 --JJ_vac=1 -w 21 | tee postgres.log waiting for server to start31541 2015-02-04 19:36:02 PST LOG: database system was shut down at 2015-02-04 19:35:12 PST 31541 2015-02-04 19:36:02 PST LOG: JJ transaction ID wrap limit is 2147484359, limited by database with OID 1 31540 2015-02-04 19:36:02 PST LOG: database system is ready to accept connections done server started 31706 2015-02-04 19:38:42 PST LOG: duration: 161.921 ms statement: insert into upsert_race_test (index, count) values ('4751','1') on conflict update set count=TARGET.count + EXCLUDED.count where TARGET.index = EXCLUDED.index returning upsert_race_test.count 31697 2015-02-04 19:38:42 PST LOG: duration: 161.900 ms statement: insert into upsert_race_test (index, count) values ('6931','-1') on conflict update set count=TARGET.count + EXCLUDED.count where TARGET.index = EXCLUDED.index returning upsert_race_test.count 31695 2015-02-04 19:38:42 PST LOG: duration: 161.966 ms statement: insert into upsert_race_test (index, count) values ('5679','-1') on conflict update set count=TARGET.count + EXCLUDED.count where TARGET.index = EXCLUDED.index returning upsert_race_test.count 31707 2015-02-04 19:38:42 PST LOG: duration: 161.959 ms statement: insert into upsert_race_test (index, count) values ('5876','1') on conflict update set count=TARGET.count + EXCLUDED.count where TARGET.index = EXCLUDED.index returning upsert_race_test.count 31705 2015-02-04 19:38:42 PST LOG: duration: 162.009 ms statement: insert into upsert_race_test (index, count) values ('1507','1') on conflict update set count=TARGET.count + EXCLUDED.count where TARGET.index = EXCLUDED.index returning upsert_race_test.count 31704 2015-02-04 19:38:42 PST LOG: duration: 162.058 ms statement: insert into
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Tue, Feb 3, 2015 at 2:05 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: TRAP: FailedAssertion(!(node-spec != SPEC_INSERT || node-arbiterIndex != ((Oid) 0)), File: nodeModifyTable.c, Line: 1619) Is that just because of the hack in parse_clause.c? Yes. I never built with assertions and so didn't see this, but it doesn't matter. With assertions disabled, count_upsert_exclusion.pl ran successfully to the end. I also tried running VACUUM FREEZE upsert_race_test in a loop in another session at the same time, but it didn't make a difference. How quickly do you see the errors? I also tried applying crash_REL9_5.patch from the jjanes_upsert kit, and set jj_xid=1 to increase XID burn rate, but I'm still not seeing any errors. Did you build fully optimized, assertion-free code? I've been doing that. I found it necessary to recreate some of the bugs Jeff's tool caught. I also think that I might have needed an 8 core box to see it, but less sure about that. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 01/30/2015 01:38 AM, Peter Geoghegan wrote: On the stress-testing front, I'm still running Jeff Janes' tool [1], while also continuing to use his Postgres modifications to artificially increase the XID burn rate. [1] https://github.com/petergeoghegan/jjanes_upsert I followed the instructions in README.md to reproduce this. I downloaded the tool, applied the upsert patchset, applied the hack to parse_clause.c as instructed in the README.md file, installed btree_gist, and ran count_upsert_exclusion.pl. It failed immediately with an assertion failure: TRAP: FailedAssertion(!(node-spec != SPEC_INSERT || node-arbiterIndex != ((Oid) 0)), File: nodeModifyTable.c, Line: 1619) Is that just because of the hack in parse_clause.c? With assertions disabled, count_upsert_exclusion.pl ran successfully to the end. I also tried running VACUUM FREEZE upsert_race_test in a loop in another session at the same time, but it didn't make a difference. How quickly do you see the errors? I also tried applying crash_REL9_5.patch from the jjanes_upsert kit, and set jj_xid=1 to increase XID burn rate, but I'm still not seeing any errors. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Mon, Feb 02, 2015 at 4:48 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I think that the fundamental, unfixable race condition here is the disconnect between index tuple insertion and checking for would-be exclusion violations that exclusion constraints naturally have here, that unique indexes naturally don't have [1] (note that I'm talking only about approach #2 to value locking here; approach #1 isn't in V2.0). I suspect that the feature is not technically feasible to make work correctly with exclusion constraints, end of story. VACUUM interlocking is probably also involved here, but the unfixable race condition seems like our fundamental problem. It's not a fundamental, unfixable race condition. In [1], I gave you three ideas straight off the top of my head on how that could be fixed. That was different - I tried to make it work by fixing some bugs there. However, I'm now finding myself up against these new bugs. I think that the underlying cause is the lack of any real locking (unlike with the B-Tree AM) in *both* cases, but I don't even know that for sure. The error messages you see are quite odd - why should a btree_gist-based exclusion constraint cause a violation when non-conflicting values are inserted? There is some other race condition here. This wasn't a livelock (or a deadlock), which is what your comments in early January apply to. I think that this has something to do with VACUUM interlocking. But with the B-Tree AM (which we're handling differently, by re-using infrastructure used for deferred unique constraints), things work quite well. The patch stands up to Jeff's vigorous stress-tests. I'm not fundamentally in disagreement with you about any of this. All I'm saying is that we should cut scope today. We're not precluding picking up an IGNORE feature that does support exclusion constraints in the future. Why should we insist upon having that in the first cut? It's both significantly harder, and significantly less useful to users, and so cutting that makes perfect sense AFAICT. As I've said many times, exclusion constraint support was only ever going to be useful to the IGNORE variant (I've tested exclusion constraints by contriving a case to make them do UPSERTs, but this is only for the benefit of the stress-test). Thanks -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 30 January 2015 at 21:58, Peter Geoghegan p...@heroku.com wrote: On Fri, Jan 30, 2015 at 6:59 AM, Geoff Winkless pgsqlad...@geoff.dj wrote: I suppose there's no reason why we couldn't use a no-op ON CONFLICT UPDATE anyway Right. IGNORE isn't really all that compelling for that reason. Note that this will still lock the unmodified row, though. Mmmf. So I would have to make sure that my source tuples were unique before doing the INSERT (otherwise the first ON CONFLICT UPDATE for a tuple would block any other)? That's potentially very slow :( When you say that you can't add exclusion constraints later, do you mean from a coding point of view or just because people would get confused whether exclusion constraints could be IGNOREd or not? Geoff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 2 February 2015 at 14:32, Geoff Winkless pgsqlad...@geoff.dj wrote: Mmmf. So I would have to make sure that my source tuples were unique before doing the INSERT (otherwise the first ON CONFLICT UPDATE for a tuple would block any other)? That's potentially very slow :( Replying to my own message, because it occurs to me I might be being stupid (surely not :) ) When you say this will still lock the unmodified row did you mean just that it's locked to _other_ processes until commit? That would be much less impactful. Geoff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 01/18/2015 04:48 AM, Peter Geoghegan wrote: I think that the fundamental, unfixable race condition here is the disconnect between index tuple insertion and checking for would-be exclusion violations that exclusion constraints naturally have here, that unique indexes naturally don't have [1] (note that I'm talking only about approach #2 to value locking here; approach #1 isn't in V2.0). I suspect that the feature is not technically feasible to make work correctly with exclusion constraints, end of story. VACUUM interlocking is probably also involved here, but the unfixable race condition seems like our fundamental problem. It's not a fundamental, unfixable race condition. In [1], I gave you three ideas straight off the top of my head on how that could be fixed. Please work with me towards a committable patch. I'm trying... [1] http://www.postgresql.org/message-id/54a7c76d.3070...@vmware.com - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 01/30/2015 01:38 AM, Peter Geoghegan wrote: I have not addressed the recently described problems with exclusion constraints. I hope we can do so shortly. Simply removing IGNORE support until such time as we straighten that all out (9.6?) seems like the simplest solution. No need to block the progress of UPSERT, since exclusion constraint support was only ever going to be useful for the less compelling IGNORE variant. What do other people think? Do you agree with my view that we should shelve IGNORE support for now, Heikki? No, I don't agree. Let's fix it. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
Hi, A first (not actually that quick :() look through the patches to see what actually happened in the last months. I didn't keep up with the thread. Generally the split into the individual commits doesn't seem to make much sense to me. The commits individually (except the first) aren't indivdiually commitable and aren't even meaningful. Splitting off the internal docs, tests and such actually just seems to make reviewing harder because you miss context. Splitting it so that individual piece are committable and reviewable makes sense, but... I have no problem doing the user docs later. If you split of RLS support, you need to throw an error before it's implemented. 0001: * References INSERT with ON CONFLICT UPDATE, can thus not be committed independently. I don't think those references really are needed. * I'm not a fan of the increased code duplication in ExecCheckRTEPerms(). Can you look into cleaning that up? * Also the comments inside the ACL_INSERT case still reference UPDATE. Other than that I think we can just go ahead and commit this ahead of time. Mentioning ON CONFLICT UPDATE (OCU henceforth) in the commit message only. 0007: * AMs alone isn't particularly unique. * Without the context of the discussion unprincipled deadlocks aren't well defined. * Too many words. * Waiting too long isn't defined. Neither is why that'd imply unprincipled deadlocks. Somewhat understandable with the context of the discussion, but surely not a couple years down the road. * As is I don't find the README entry super helpful. It should state what the reason for doing this is cleary, start at the higher level, and then move to the details. * Misses details about the speculative heavyweight locking of tuples. 0002: * Tentatively I'd say that killspeculative should be done via a separate function instead of heap_delete() * I think we should, as you ponder in a comment, do the OCU specific stuff lazily and/or in a separate function from BuildIndexInfo(). That function is already quite visible in profiles, and the additional work isn't entirely trivial. * I doubt logical decoding works with the patch as it stands. * The added ereport (i.e. user facing) error message in ExecInsertIndexTuples won't help a user at all. * Personally I don't care one iota for comments like Get information from the result relation info structure.. Yes, one of these already exists, but ... * If a arbiter index is passed to ExecCheckIndexConstraints(), can't we abort the loop after checking it? Also, do we really have to iterate over indexes for that case? How about moving the loop contents to a separate function and using that separately for the arbiter cases? * Don't like the comment above check_exclusion_or_unique_constraint()'s much. Makes too much of a special case of OCU * ItemPointerIsValid * ExecCheckHeapTupleVisible's comment It is not acceptable to proceed sounds like you're talking with a child or so ;) * ExecCheckHeapTupleVisible()'s errhint() sounds like an argument/excuse (actually like a code comment). That's not going to help a user at all. * I find the modified control flow in ExecInsert() pretty darn ugly. I think this needs to be cleaned up. The speculative case should imo be a separate function or something. * /* * This may occur when an instantaneously invisible tuple is blamed * as a conflict because multiple rows are inserted with the same * constrained values. How can this happen? We don't insert multiple rows with the same command id? * ExecLockUpdatedTuple() has (too?) many comments, but little overview of what/why it is doing what it does on a higher level. * plan_speculative_use_index: Use the planner to decide speculative insertion arbiter index - Huh? rel is the target to undergo ON CONFLICT UPDATE/IGNORE. - Which rel? * formulations as fundamental nexus are hard to understand imo. * Perhaps it has previously been discussed but I'm not convinced by the reasoning for not looking at opclasses in infer_unique_index(). This seems like it'd prohibit ever having e.g. case insensitive opclasses - something surely worthwile. * Doesn't infer_unique_index() have to look for indisvalid? This isn't going to work well with a invalid (not to speak for a !ready) index. * Is -relation in the UpdateStmt generated in transformInsertStmt ever used for anything? If so, it'd possibly generate some possible nastyness due to repeated name lookups. Looks like it'll be used in transformUpdateStmt * What's the reason for the !pstate-p_parent? Also why the parens? pstate-p_is_speculative = (pstate-parentParseState (!pstate-p_parent_cte pstate-parentParseState-p_is_insert pstate-parentParseState-p_is_speculative)); * Why did you need to make %nonassoc
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Fri, Jan 30, 2015 at 6:59 AM, Geoff Winkless pgsqlad...@geoff.dj wrote: I appreciate the work you're doing and (as a user rather than a pg-hacker) don't want to butt in but if it would be possible to allow support for IGNORE for unique but not exclusion constraints that would be really helpful for my own use cases, where being able to insert from a dataset into a table containing unique constraints without having to first check the dataset for uniqueness (within both itself and the target table) is very useful. It's possible that I've misunderstood anyway and that you mean purely that exclusion constraint IGNORE should be shelved until 9.6, in which case I apologise. Well, the issue is that we can't really add exclusion constraints to the IGNORE case later. So the fact that we cannot do exclusion constraints kind of implies that we can either shelve IGNORE and maybe look at it later, or accept that we'll never support exclusion constraints with IGNORE. We'd then include IGNORE without exclusion constraint support now and forever. I tend to think that we'll end up doing the latter anyway, but I really don't want to add additional risk of this not getting into 9.5 by arguing about that now. It doesn't matter that much. I suppose there's no reason why we couldn't use a no-op ON CONFLICT UPDATE anyway Right. IGNORE isn't really all that compelling for that reason. Note that this will still lock the unmodified row, though. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Thu, Jan 29, 2015 at 11:38 PM, Peter Geoghegan pg(at)heroku(dot)com wrote: Simply removing IGNORE support until such time as we straighten that all out (9.6?) seems like the simplest solution. No need to block the progress of UPSERT, since exclusion constraint support was only ever going to be useful for the less compelling IGNORE variant. What do other people think? Do you agree with my view that we should shelve IGNORE support for now, Heikki? I appreciate the work you're doing and (as a user rather than a pg-hacker) don't want to butt in but if it would be possible to allow support for IGNORE for unique but not exclusion constraints that would be really helpful for my own use cases, where being able to insert from a dataset into a table containing unique constraints without having to first check the dataset for uniqueness (within both itself and the target table) is very useful. It's possible that I've misunderstood anyway and that you mean purely that exclusion constraint IGNORE should be shelved until 9.6, in which case I apologise. Of course if there's no way to allow unique constraint IGNORE without exclusion constraints then just ignore me; I (along I'm sure with all the others who are following this conversation from afar) will be incredibly grateful for the work you've done either way. I suppose there's no reason why we couldn't use a no-op ON CONFLICT UPDATE anyway, but that does seem rather messy and (I imagine) would involve rather more work (unless the optimizer were able to optimize away the update? I don't know enough to be able to say if it would). Thanks Geoff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Sat, Jan 10, 2015 at 8:32 PM, Peter Geoghegan p...@heroku.com wrote: I also include various bugfixes to approach #2 to value locking (these were all previously separately posted, but are now integrated into the main ON CONFLICT commit). Specifically, these are fixes for the bugs that emerged thanks to Jeff Janes' great work on stress testing [4]. With these fixes, I have been unable to reproduce any problem with this patch with the test suite, even after many days of running the script on a quad-core server, with constant concurrent VACUUM runs, etc. I continued with this since posting V2.0. I've run this bash script, that invokes Jeff's script at various client counts, with runs of various duration (since each client does a fixed amount of work): https://github.com/petergeoghegan/jjanes_upsert/blob/master/run_test.sh As previously discussed, Jeff's script comprehensively verifies the correctness of the final values of a few thousand rows within a table after many concurrent upserts, within and across upserting sessions, and with concurrent deletions, too. When building Postgres for this stress test, I included Jeff's modifications that increase the XID burn rate artificially (I chose a burn rate of X50). This makes anti-wraparound VACUUMs much more frequent. I'm also looking out for outlier query execution durations, because in theory they could indicate an unknown lock starvation problem. I haven't seen any notable outliers after over a week of testing. I think that we still need to think about the issues that transpired with exclusion constraints, but since I couldn't find another problem with an adapted version of Jeff's tool that tested exclusion constraints, I'm inclined to think that it should be possible to support exclusion constraints for the IGNORE variant. Exclusion constraints were my focus with stress testing today. I performed equivalent verification of upserts using exclusion constraints (this is a hack; exclusion constraints are only intended to be used with the IGNORE variant, but I get better test coverage than I might otherwise this way). Unfortunately, even with the recent bugfixes, there are still problems. On this server (rather than my laptop), with 8 clients I can see errors like this before too long (note that this output includes custom instrumentation from Jeff): 6670 2015-01-17 18:02:54 PST LOG: JJ scan_all 1, relfrozenid -813636509 6670 2015-01-17 18:02:54 PST LOG: JJ freezeLimit -661025537 6670 2015-01-17 18:02:54 PST LOG: JJ freeze_min_age 5000 vacuum_freeze_table_age 15000 freeze_table_age 15000 ReadNew -611025384 6670 2015-01-17 18:02:54 PST LOG: JJ scan_all 1, relfrozenid -813636101 6670 2015-01-17 18:02:54 PST LOG: JJ transaction ID wrap limit is 1352632427, limited by database with OID 12746 6670 2015-01-17 18:02:54 PST LOG: autovacuum: done processing database postgres at recent Xid of 3683945176 recent mxid of 1 6668 2015-01-17 18:02:54 PST ERROR: conflicting key value violates exclusion constraint upsert_race_test_index_excl 6668 2015-01-17 18:02:54 PST DETAIL: Key (index)=(7142) conflicts with existing key (index)=(600). 6668 2015-01-17 18:02:54 PST STATEMENT: insert into upsert_race_test (index, count) values ('7142','1') on conflict update set count=TARGET.count + EXCLUDED.count where TARGET.index = EXCLUDED.index returning upsert_race_test.count It's always an exclusion violation problem that I see here. As you can see, the query involved has no unique index inference specification, per the hack to make this work with exclusion constraints (the artificially much greater XID burn rate might have also increased the likelihood of this error dramatically). You'll also note that the DETAIL message seems to indicate that this btree_gist-based exclusion constraint doesn't behave like a unique constraint at all, because the conflicting new value (7142) is not at all the same as the existing value (600). But that's wrong -- it's supposed to be B-Tree-like. In short, there are further race conditions with exclusion constraints. I think that the fundamental, unfixable race condition here is the disconnect between index tuple insertion and checking for would-be exclusion violations that exclusion constraints naturally have here, that unique indexes naturally don't have [1] (note that I'm talking only about approach #2 to value locking here; approach #1 isn't in V2.0). I suspect that the feature is not technically feasible to make work correctly with exclusion constraints, end of story. VACUUM interlocking is probably also involved here, but the unfixable race condition seems like our fundamental problem. We could possibly spend a lot of time discussing whether or not I'm right about it being inherently impossible to make INSERT ... ON CONFLICT IGNORE work with exclusion constraints. However, I strongly suggest that we cut scope and at least leave them out of