On 14.10.2013 07:12, Peter Geoghegan wrote:
On Wed, Oct 9, 2013 at 1:11 PM, Peter Geoghegan <p...@heroku.com> wrote:
Unfortunately, I have a very busy schedule in the month ahead,
including travelling to Ireland and Japan, so I don't think I'm going
to get the opportunity to work on this too much. I'll try and produce
a V4 that formally proposes some variant of my ideas around visibility
of locked tuples.

V4 is attached.

Most notably, this adds the modifications to HeapTupleSatisfiesMVCC(),
though they're neater than in the snippet I sent earlier.

There is also some clean-up around row-level locking. That code has
been simplified. I also try and handle serialization failures in a
better way, though that really needs the attention of a subject matter
expert.

There are a few additional XXX comments highlighting areas of concern,
particularly around serializable behavior. I've deferred making higher
isolation levels care about wrongfully relying on the special
HeapTupleSatisfiesMVCC() exception (e.g. they won't throw a
serialization failure, mostly because I couldn't decide on where to do
the test on time prior to travelling tomorrow).

I've added code to do heap_prepare_insert before value locks are held.
Whatever our eventual value locking implementation, that's going to be
a useful optimization. Though unfortunately I ran out of time to give
this the scrutiny it really deserves, I suppose that it's something
that we can return to later.

I ask that reviewers continue to focus on concurrency issues and broad
design issues, and continue to defer discussion about an eventual
value locking implementation. I continue to think that that's the most
useful way of proceeding for the time being. My earlier points about
probable areas of concern [1] remain a good place for reviewers to
start.

I think it's important to recap the design goals of this. I don't think these have been listed before, so let me try:

* It should be usable and perform well for both large batch updates and small transactions.

* It should perform well both when there are no duplicates, and when there are lots of duplicates

And from that follows some finer requirements:

* Performance when there are no duplicates should be close to raw INSERT performance.

* Performance when all rows are duplicates should be close to raw UPDATE performance.

* We should not leave behind large numbers of dead tuples in either case.

Anything else I'm missing?


What about exclusion constraints? I'd like to see this work for them as well. Currently, exclusion constraints are checked after the tuple is inserted, and you abort if the constraint was violated. We could still insert the heap and index tuples first, but instead of aborting on violation, we would kill the heap tuple we already inserted and retry. There are some complications there, like how to wake up any other backends that are waiting to grab a lock on the tuple we just killed, but it seems doable.

That would, however, perform badly and leave garbage behind if there are duplicates. A refinement of that would be to first check for constraint violations, then insert the tuple, and then check again. That would avoid the garbage in most cases, but would perform much more poorly when there are no duplicates, because it needs two index scans for every insertion. A further refinement would be to keep track of how many duplicates there have been recently, and switch between the two strategies based on that.

That cost of doing two scans could be alleviated by using markpos/restrpos to do the second scan. That is presumably cheaper than starting a whole new scan with the same key. (markpos/restrpos don't currently work for non-MVCC snapshots, so that'd need to be fixed, though)

And that detour with exclusion constraints takes me back to the current patch :-). What if you implemented the unique check in a similar fashion too (when doing INSERT ON DUPLICATE KEY ...)? First, scan for a conflicting key, and mark the position. Then do the insertion to that position. If the insertion fails because of a duplicate key (which was inserted after we did the first scan), mark the heap tuple as dead, and start over. The indexam changes would be quite similar to the changes you made in your patch, but instead of keeping the page locked, you'd only hold a pin on the target page (if even that). The first indexam call would check that the key doesn't exist, and remember the insert position. The second call would re-find the previous position, and insert the tuple, checking again that there really wasn't a duplicate key violation. The locking aspects would be less scary than your current patch.

I'm not sure if that would perform as well as your current patch. I must admit your current approach is pretty optimal performance-wise. But I'd like to see it, and that would be a solution for exclusion constraints in any case.

One fairly limitation with your current approach is that the number of lwlocks you can hold simultaneously is limited (MAX_SIMUL_LWLOCKS == 100). Another limitation is that the minimum for shared_buffers is only 16. Neither of those is a serious problem in real applications - no-one runs with shared_buffers=16 and no sane schema has a hundred unique indexes, but it's still something to consider.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to