Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-05 Thread Robert Haas
On Wed, Sep 4, 2013 at 8:08 PM, Andres Freund and...@2ndquadrant.com wrote:
 You've proposed an framework and algorithm for something I'd really, really 
 like to see. I don't think that it can work explicitly as you proposed, so I 
 roughly sketched out a solution I can see.
 I don't want my idea to win, I want a idea to win. I haven't fleshed out my 
 idea to the point where I would consider it something ready to implement or 
 something.
 You're the patch author here whose plans are laid open to be scrutinized ;). 
 If you think my idea has merit, use and adapt it to reality. If not, find 
 another, better, solution.

 Even if our path to that goal is confrontational at times, the goal is to 
 find a good solution, not the argument itself.

Totally agreed.

 I haven't argued about INSERT ... DUPLICATE LOCK because the page locking 
 scheme doesn't seem to work out for plain DUPLICATE. No need to think/argue 
 about the fancier version in that case.

Check.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-04 Thread Robert Haas
On Sat, Aug 31, 2013 at 2:34 PM, Andres Freund and...@2ndquadrant.com wrote:
 After some thinking I don't think any solution primarily based on
 holding page level locks across other index operations is going to scale
 ok.

I'd like to chime in with a large +1 for this sentiment and pretty
much everything else Andres said further downthread.  The operations
across which you're proposing to hold buffer locks seem at least an
order of magnitude too complex to get away with something like that.
Concurrent readers will block in a non-interruptible wait if they try
to access a buffer, and that's a situation that will be intolerable
if, for example, it can persist across a disk I/O.  And I don't see
any way to avoid that.

One possible alternative to inserting promises into the index pages
themselves might be to use some kind of heavyweight lock.  The way
that SIREAD locks work is not entirely dissimilar to what's needed
here, I think.  Of course, the performance implications of checking
for lots of extra locks during inserts could be pretty bad, so you'd
probably need some way of avoiding that in common cases, which I don't
know exactly how to do, but maybe there's a way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-04 Thread Peter Geoghegan
On Wed, Sep 4, 2013 at 1:26 PM, Robert Haas robertmh...@gmail.com wrote:
 Concurrent readers will block in a non-interruptible wait if they try
 to access a buffer, and that's a situation that will be intolerable
 if, for example, it can persist across a disk I/O.  And I don't see
 any way to avoid that.

Then I have some bad news for you - that's already quite possible.
_bt_insertonpg() is called with the very same buffer exclusive locked,
and is where we do btree page splits. The first thing that _bt_split
does is this:

/* Acquire a new page to split into */
rbuf = _bt_getbuf(rel, P_NEW, BT_WRITE);

(Obviously this may ultimately result in the storage manager extending
the index relation).

Plus the split is WAL-logged immediately afterwards, which could
result in us blocking on someone else's I/O under contention (granted,
the XLogInsert scaling patch has now considerably ameliorated that
general problem). All the while holding an exclusive lock on the same
buffer. Note also that _bt_insertonpg() is called again recursively
after a page split. And of course we WAL-log btree index tuple
insertion proper all the time.

Let me be clear about something, though: I am not at all dismissive of
Andres' concerns. I was concerned about many of the same things before
I posted the patch.

I think that Andres and I ought to re-frame this discussion a little
bit. Right now, the presumption we seem to be making, perhaps without
even realizing it, is this is about providing functionality equivalent
to MySQL's INSERT IGNORE; insert tuples proposed for insertion where
possible, otherwise do not. However, Andres and I, not to mention
almost every Postgres user, are actually much more interested in
something like INSERT...ON DUPLICATE KEY LOCK FOR UPDATE.

That's what this mechanism has to support, even if it is *technically*
possible to commit just INSERT...ON DUPLICATE KEY IGNORE and punt on
these other questions. It was silly of me not to do that up-front. So
I'm going to try and produce a patch that does this as well for my
next revision.

Maybe this will enable Andres to refute my position that the buffer
locking approach to speculative insertion/value locking may actually
be acceptable. If that gets us closer to having the feature committed
in some form, then I welcome it. I fully expect to be held to this new
standard - it would be insane to do anything less. I don't want to
throw out an old IGNORE value locking mechanism and invent a whole new
one for upserting a little bit down the line.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-04 Thread Andres Freund
On 2013-09-04 15:01:57 -0700, Peter Geoghegan wrote:
 On Wed, Sep 4, 2013 at 1:26 PM, Robert Haas robertmh...@gmail.com wrote:
  Concurrent readers will block in a non-interruptible wait if they try
  to access a buffer, and that's a situation that will be intolerable
  if, for example, it can persist across a disk I/O.  And I don't see
  any way to avoid that.
 
 Then I have some bad news for you - that's already quite possible.
 _bt_insertonpg() is called with the very same buffer exclusive locked,
 and is where we do btree page splits. The first thing that _bt_split
 does is this:
 
 /* Acquire a new page to split into */
 rbuf = _bt_getbuf(rel, P_NEW, BT_WRITE);
 
 (Obviously this may ultimately result in the storage manager extending
 the index relation).

I don't think that's an argument for much TBH. Those operations are way
much less heavyweight than the ones you're proposing to hold the pages
locked over and there actually is a forward guarantee. And it's very
hard to avoid locking a page exlusively once you've decided that you
need to split the page. You cannot just release the lock while you look
for a victim buffer.

 I think that Andres and I ought to re-frame this discussion a little
 bit. Right now, the presumption we seem to be making, perhaps without
 even realizing it, is this is about providing functionality equivalent
 to MySQL's INSERT IGNORE; insert tuples proposed for insertion where
 possible, otherwise do not. However, Andres and I, not to mention
 almost every Postgres user, are actually much more interested in
 something like INSERT...ON DUPLICATE KEY LOCK FOR UPDATE.

Yes, the promises approach gets more advantageous if you think about
UPSERT because most of the work will be paid of when the UPDATE occurs.

 Maybe this will enable Andres to refute my position that the buffer
 locking approach to speculative insertion/value locking may actually
 be acceptable.

Sorry to be harsh here, but I don't think I need to do that. I've
explained most of the reasons I see that that approach won't work out
and so far I don't see those refuted. And to me those issues seem to be
fatal for the approach. If you find a solution to the problems noted
uppon - great. So far it seems neither Robert nor me see how that is
possible, but that obviously doesn't mean it's impossible that you find
a way.
But why should I argue further until you proof me wrong (newer patch or
explaining changed algorithms)? If you don't think my arguments are
valid, well, I've brought those up I see as relevant and that's
it. Can't do much further.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-04 Thread Peter Geoghegan
On Wed, Sep 4, 2013 at 3:39 PM, Andres Freund and...@2ndquadrant.com wrote:
 Sorry to be harsh here, but I don't think I need to do that. I've
 explained most of the reasons I see that that approach won't work out
 and so far I don't see those refuted. And to me those issues seem to be
 fatal for the approach. If you find a solution to the problems noted
 uppon - great. So far it seems neither Robert nor me see how that is
 possible, but that obviously doesn't mean it's impossible that you find
 a way.

It seems you've misunderstood. My position was that I don't think it's
terribly useful to have a discussion about approaches to value locking
without considering how that needs to fit in with row locking too. So
it makes sense as an immediate goal to introduce that into the patch.
Any scheme is going to be constrained by having to think about the
interplay with value locking and row locking going forward.

For example, in my scheme, I couldn't block on locking the row if that
meant that buffer locks would be held indefinitely. There are also
deadlock hazards there for either scheme that must be carefully
considered.

What possible objection could you have? Suppose it was the case that I
was dead set on using buffer locking like this, because I'm stubborn
or whatever. I've just made life harder for myself, while probably not
also putting the same degree of burden on alternative proposals. Maybe
I am stubborn, but I don't think I'm stubborn about the basic approach
taken in this particular patch. I've merely been pointing out, as I
feel is my role as a participant in the community's adversarial system
of reaching agreement, the problems that exist with your proposal, and
some inconsistencies in your objections to mine.

Obviously the basic approach will remain the most difficult and
probably controversial part of this. Even if I threw my hands up and
immediately accepted everything you said, that would still be true. We
need to get all of the constraints in place sooner rather than later.

 But why should I argue further until you proof me wrong (newer patch or
 explaining changed algorithms)?

I didn't ask you to. You shouldn't.

 If you don't think my arguments are
 valid, well, I've brought those up I see as relevant and that's
 it. Can't do much further.

Uh, I just said that I thought your arguments were totally valid. I
couldn't have been clearer about that. Actually, I'm pretty surprised
that you haven't been the one insisting that I add a row locking
component from quite early on for exactly these reasons.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-04 Thread Andres Freund
Hi,

We seem to be miscommunication a bit. 

You've proposed an framework and algorithm for something I'd really, really 
like to see. I don't think that it can work explicitly as you proposed, so I 
roughly sketched out a solution I can see.
I don't want my idea to win, I want a idea to win. I haven't fleshed out my 
idea to the point where I would consider it something ready to implement or 
something.
You're the patch author here whose plans are laid open to be scrutinized ;). If 
you think my idea has merit, use and adapt it to reality. If not, find another, 
better, solution.

Even if our path to that goal is confrontational at times, the goal is to find 
a good solution, not the argument itself.

I haven't argued about INSERT ... DUPLICATE LOCK because the page locking 
scheme doesn't seem to work out for plain DUPLICATE. No need to think/argue 
about the fancier version in that case.

Regards,

Andres
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-04 Thread Peter Geoghegan
On Wed, Sep 4, 2013 at 5:08 PM, Andres Freund and...@2ndquadrant.com wrote:
 I don't want my idea to win, I want a idea to win.

I know. I want the same thing.

 You're the patch author here whose plans are laid open to be scrutinized ;). 
 If you think my idea has merit, use and adapt it to reality. If not, find 
 another, better, solution.

Sure.

 Even if our path to that goal is confrontational at times, the goal is to 
 find a good solution, not the argument itself.

Agreed.

 I haven't argued about INSERT ... DUPLICATE LOCK because the page locking 
 scheme doesn't seem to work out for plain DUPLICATE. No need to think/argue 
 about the fancier version in that case.

I see where you're coming from, but my point is precisely that adding
a row locking component *isn't* fancier. I've come to realize that
it's an integral part of the patch, and that my previous omission of
row locking - and the subsequent defence of that decision I made in
passing - was ridiculous. In a world where IGNORE/not locking is a
feature we support, it can only exist as an adjunct to ON DUPLICATE
KEY LOCK - certainly not the other way around.

The tail cannot be allowed to wag the dog.

In posting the patch with a row locking component, I'll only be asking
you to consider that aspect separately. You may find that seeing the
problems I encounter and how I handle them will make you (or others)
re-assess your thoughts on value locking in a direction that nobody
expects right now. Equally, I myself may reassess things.

Now, I don't guarantee that that's the case, but it certainly seems
very possible. And so even if I were to concede right now that the
buffer locking approach is not workable, I feel it would be a little
premature to seriously get down to talking about the alternatives in
detail.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-04 Thread Gavin Flower

On 05/09/13 08:26, Robert Haas wrote:

On Sat, Aug 31, 2013 at 2:34 PM, Andres Freund and...@2ndquadrant.com wrote:

After some thinking I don't think any solution primarily based on
holding page level locks across other index operations is going to scale
ok.

I'd like to chime in with a large +1 for this sentiment and pretty
much everything else Andres said further downthread.  The operations
across which you're proposing to hold buffer locks seem at least an
order of magnitude too complex to get away with something like that.
Concurrent readers will block in a non-interruptible wait if they try
to access a buffer, and that's a situation that will be intolerable
if, for example, it can persist across a disk I/O.  And I don't see
any way to avoid that.

One possible alternative to inserting promises into the index pages
themselves might be to use some kind of heavyweight lock.  The way
that SIREAD locks work is not entirely dissimilar to what's needed
here, I think.  Of course, the performance implications of checking
for lots of extra locks during inserts could be pretty bad, so you'd
probably need some way of avoiding that in common cases, which I don't
know exactly how to do, but maybe there's a way.

How about an 'Expensive bit' (of course, renamed to sound more 
professional and to better indicate what it does!) - if the bit is set, 
then do the expensive processing. This should have minimal impact for 
the common case, so extensive checking would only be required when lots 
of locks need to be checked.


I strongly suspect that the situation, is way more complicated, than I 
imply above - but possibly, a more sophisticated version of the above 
might help?


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-03 Thread Craig Ringer
On 09/03/2013 06:18 AM, Peter Geoghegan wrote:
 On Mon, Sep 2, 2013 at 6:25 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 It'll be yet another way for people to get upsert wrong, of course.
 They'll use a wCTE with RETURNING REJECTS to do an UPDATE of the rejects
 w/o locking the table against writes first. Documenting this pitfall
 should be enough, though.
 
 My preferred solution is to actually provide a variant to lock the
 rows implicated in the would-be unique constraint violation. Obviously
 that's a harder problem to solve.

That'd certainly be the ideal, but as you say is far from simple, and
IIRC prior discussions here have suggested the SSI / predicate locking
stuff won't help.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-03 Thread Peter Geoghegan
On Tue, Sep 3, 2013 at 12:52 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 That'd certainly be the ideal, but as you say is far from simple, and
 IIRC prior discussions here have suggested the SSI / predicate locking
 stuff won't help.

I think that by far the strongest argument for what Andres suggested
(that has, incidentally, been suggested by a number of others), which
is to insert what he termed promise index tuples before a heap tuple,
is that there is no sensible way to lock the tuples while holding an
exclusive buffer lock on an index.

It could take basically any amount of time to acquire those locks (see
GetTupleForTrigger() to get a rough idea of how that ought to work),
and it would be totally unacceptable if that meant that lots of people
blocked on the page lock that an upserter held while it tried to
acquire locks on tuples (we could only unlock the buffer lock/locked
value when the rows were locked). So while I think what I've done here
is probably workable for the ON DUPLICATE KEY IGNORE case, perhaps I'm
being short-sighted in focusing on that. I'm surprised Andres didn't
call me out on this himself, actually.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-03 Thread Peter Geoghegan
On Tue, Sep 3, 2013 at 2:11 AM, Peter Geoghegan p...@heroku.com wrote:
 and it would be totally unacceptable if that meant that lots of people
 blocked on the page lock that an upserter held while it tried to
 acquire locks on tuples


By which I mean: it seems shaky that I'd then be assuming that to lock
all of a row's unique index values was to lock the row itself, and so
that there was no need to worry about blocking on acquiring an actual
row lock while holding all those exclusive/write buffer locks.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-03 Thread Andres Freund
On 2013-09-02 21:59:42 -0700, Peter Geoghegan wrote:
  I can't really agree with this part. First off, a heap_insert()
  certainly isn't lightweight operation. There's several things that can
  take extended time:
  * toasting of rows (writing a gigabyte worth of data...)
  * extension of the heap/toast table
  * writeout of a dirty victim buffer if the page isn't in s_b
  * reading in the target page if the page isn't in s_b
  * waiting for an exclusive lock on the target pages

 These are all valid concerns. I'll need to better assess just how bad
 this can get (hopefully with help from others). But some of this stuff
 can surely be done separately for my case. For example, surely we
 could make TOASTing occur after index insertion proper when locks are
 released, without too much work.

I don't think those are that easy to solve, but I'd like to be surprised.

E.g. the toasting cannot really happen after the main table insert since
the inserted rows needs to refer to the toast id. And you don't even
know the final size of a row without doing the toasting, since how much
it will get compressed depends on the amount of free space on the
current target buffer.

  I'd expect that with the current implementation strategy if you
  e.g. made pgbench use INSERT .. ON DUPLICATE IGNORE without ever
  actually hitting duplicates, you'd see a very, very noticeable hit in
  scalability. I wouldn't expect a single session to get much slower bug
  trying a dozen or two should show a very noticeable dip.

 I do not accept the premise of your questions regarding the patch's
 performance, which appears to be that it's fair to expect the same
 level of performance of INSERT .,. ON DUPLICATE IGNORE as it is to
 expect of regular INSERT. No scheme is going to get that, including
 your scheme, which necessitates WAL logging everything related to
 index tuple insertion twice for each unique index. We certainly didn't
 require SSI to perform as well as the current REPEATABLE READ, and
 certainly not as well as READ COMMITTED, and for good reasons.

I think it's ok that INSERT .. IGNORE itself is noticeably slower than a
normal INSERT. What I am concerned about is it negatively impacting
overall scalability.

 I'm not going to bother benchmarking the patch myself until I give it
 another whack. There is some really obvious low-hanging performance
 fruit. Obviously by this I mean doing as little as possible when
 holding the exclusive lock.

Sounds fair.

  I am pretty sure it would be better. We know that nbtree works pretty
  damn well under concurrency. And the promise logic wouldn't change
  anything of that.

 It would have multiple extra steps. Including WAL-logging of the
 initial insertion, and clean-up for the IGNORE case. And WAL-logging
 of the cleanup. Instead of doing exactly nothing extra in the IGNORE
 case, which, under my proposal, has an additional duration of the
 exclusive lock being held of zero (instead of the ereport() call we
 just immediately release the lock and report the duplicate). The
 IGNORE case matters too, and there is going to be considerably more
 exclusive locking for that case with your proposal, before we even
 talk about other overhead.

What I mean is that none of the individual proposed steps require
changing the locking semantics/granularity. I.e. we will never hold
locks on more pages at once than we do for a normal insert. Yes, one
page will get locked twice, but for shorter than a normal insert (all
you need to do is to rewrite the target of the item) and without having
any other pages locked.

  Furthermore, I am pretty sure that locking an index page *and* related
  heap pages exclusively at the same time will be a deadlock
  nightmare. Even in the case of only having a single unique index.

 It's seems like what you're saying here is I don't know, this
 extended index locking thing seems pretty sketchy to me. Which is a
 perfectly reasonable thing to say in the absence of better analysis
 from either of us - it is sketchy.

Yes, that's basically what I am saying.

I'd really like some input from others on this. Perhaps you could write
up an email/thread just about the locking semantics after the next
version of the patch (since that presumably will change the semantics)?

 I have taken a certain amount of reassurance from the fact that I was
 not able to observe any deadlocks when pushing the patch hard -
 perhaps that's naive. I mean hammering a table with inserts using
 pgbench for 3 hours, with 64 INSERT...DUPLICATE KEY IGNORE sessions.
 PK values were over a range of values from 115 million. This was
 with shared_buffers set to 512MB and assertions enabled. It stood up
 without any deadlocks or other manifestations of bugs. Granted, I did
 that before fixing the bug you've highlighted (but after you pointed
 it out), and so I haven't tested the multi-index case in the same way.

I can readily believe that there are no deadlocks with rather
straightforward schemas/queries 

Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-02 Thread Peter Eisentraut
On Fri, 2013-08-30 at 15:09 -0700, Peter Geoghegan wrote:
 The attached WIP patch implements this for Postgres, with a few
 notable differences:
 
Your patch breaks the ECPG regression tests:

../../preproc/ecpg --regression -I./../../include -o insupd.c -I. insupd.pgc
insupd.pgc:22: ERROR: syntax error at or near into
make[3]: *** [insupd.c] Error 3




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-02 Thread Peter Geoghegan
On Sun, Sep 1, 2013 at 11:38 PM, Peter Eisentraut pete...@gmx.net wrote:
 ../../preproc/ecpg --regression -I./../../include -o insupd.c -I. insupd.pgc
 insupd.pgc:22: ERROR: syntax error at or near into
 make[3]: *** [insupd.c] Error 3

I'll work on fixing it then. Thanks.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-02 Thread Craig Ringer
On 08/31/2013 06:40 AM, Josh Berkus wrote:
  3) RETURNING is expanded - RETURNING REJECTS * is now possible where
  that makes sense.
 Oh, nifty!  OK, now I can *really* use this feature.

Absolutely; especially combined with COPY to a staging TEMPORARY or
UNLOGGED table.

It'll be yet another way for people to get upsert wrong, of course.
They'll use a wCTE with RETURNING REJECTS to do an UPDATE of the rejects
w/o locking the table against writes first. Documenting this pitfall
should be enough, though.

Speaking of upsert, I'm starting to think that to solve the upsert
problem without forcing full-table locking on people we need a lock type
that conflicts only with DELETE/INSERT and a way to prevent UPDATEs from
changing the key. Kind of like the table level inverse of FOR KEY UPDATE
- a way to say you can change rows, just cannot add or remove from the
set of keys.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-02 Thread Peter Geoghegan
On Mon, Sep 2, 2013 at 6:25 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 It'll be yet another way for people to get upsert wrong, of course.
 They'll use a wCTE with RETURNING REJECTS to do an UPDATE of the rejects
 w/o locking the table against writes first. Documenting this pitfall
 should be enough, though.

My preferred solution is to actually provide a variant to lock the
rows implicated in the would-be unique constraint violation. Obviously
that's a harder problem to solve.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-02 Thread Peter Geoghegan
On Sun, Sep 1, 2013 at 4:06 AM, Andres Freund and...@2ndquadrant.com wrote:
 We're looking for the first duplicate. So it would probably be okay
 for the IGNORE case to not bother retrying and re-locking if the other
 transaction committed (which, at least very broadly speaking, is the
 likely outcome).

 Hm. Could you explain this a bit more? Do you mean that we can stop
 trying to insert after the first violation (yes, sure if so)?

Yeah, that's all I mean. Nothing controversial there.

 I can't really agree with this part. First off, a heap_insert()
 certainly isn't lightweight operation. There's several things that can
 take extended time:
 * toasting of rows (writing a gigabyte worth of data...)
 * extension of the heap/toast table
 * writeout of a dirty victim buffer if the page isn't in s_b
 * reading in the target page if the page isn't in s_b
 * waiting for an exclusive lock on the target pages

These are all valid concerns. I'll need to better assess just how bad
this can get (hopefully with help from others). But some of this stuff
can surely be done separately for my case. For example, surely we
could make TOASTing occur after index insertion proper when locks are
released, without too much work.

 Secondly, you seem to be saying that such a lock would only block other
 sessions from finishing of tuple insertions - but it will block normal
 index searches as well? I have a hard time accepting the fact that one
 session using INSERT ... KEY IGNORE will block lookup of a range of values
 from the index in *other* sessions.

I'm sorry that I created the impression that I was denying an impact
on read queries as compared to regular insertion. Of course, regular
insertion also blocks read queries for very small intervals, and there
could be quite a bit of work to be done with that exclusive lock held
already. I am only proposing increasing that period for this case, and
usually by not terribly much.

 I'd expect that with the current implementation strategy if you
 e.g. made pgbench use INSERT .. ON DUPLICATE IGNORE without ever
 actually hitting duplicates, you'd see a very, very noticeable hit in
 scalability. I wouldn't expect a single session to get much slower bug
 trying a dozen or two should show a very noticeable dip.

I do not accept the premise of your questions regarding the patch's
performance, which appears to be that it's fair to expect the same
level of performance of INSERT .,. ON DUPLICATE IGNORE as it is to
expect of regular INSERT. No scheme is going to get that, including
your scheme, which necessitates WAL logging everything related to
index tuple insertion twice for each unique index. We certainly didn't
require SSI to perform as well as the current REPEATABLE READ, and
certainly not as well as READ COMMITTED, and for good reasons.

That said, I'm not dismissive of these concerns. As I said already, by
all means, let us scrutinise the performance of the patch.

I'm not going to bother benchmarking the patch myself until I give it
another whack. There is some really obvious low-hanging performance
fruit. Obviously by this I mean doing as little as possible when
holding the exclusive lock.

 I think we could optimize away the second search with some
 cleverness. If we remember  pin the page from the first search we know
 that the promise will have to be either on that page or - after a page
 split - potentially to ones on the right. Which we should be able to find
 by repeatedly stepping right. Similar to logic of doing the
 _bt_moveright() in _bt_doinsert().

That may well be possible, but we'd still have to do a binary search
on the page. And under your proposal, that will happen twice - each
time with an exclusive buffer lock held. Since our intent would be to
update in-place, the entire binary search operation would need a full
write/exclusive lock the second time around as well. You might even
end up doing more work with an exclusive/write lock held than under my
proposal, before you even factor in the other increased costs
(granted, that's not that likely, but you're still doing much more
with exclusive locks held than the regular insertion case).

What you describe here is the kind of trick that could be very useful
to another, existing common cases -- blocking on an xid in the event
of a possible duplicate (as you know, it's okay to hold pins pretty
much indefinitely) -- and yet has remained unimplemented for many
years.

 I am pretty sure it would be better. We know that nbtree works pretty
 damn well under concurrency. And the promise logic wouldn't change
 anything of that.

It would have multiple extra steps. Including WAL-logging of the
initial insertion, and clean-up for the IGNORE case. And WAL-logging
of the cleanup. Instead of doing exactly nothing extra in the IGNORE
case, which, under my proposal, has an additional duration of the
exclusive lock being held of zero (instead of the ereport() call we
just immediately release the lock and report the duplicate). 

Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-01 Thread Peter Geoghegan
On Sat, Aug 31, 2013 at 7:53 PM, Peter Geoghegan p...@heroku.com wrote:
 With a table with many unique indexes and many reasons to decide
 to reject a tuple proposed for insertion by INSERT...ON DUPLICATE KEY
 IGNORE, it isn't hard to imagine them all becoming heavily bloated
 very quickly.

There may be no index tuple directly corresponding to some heap tuple
where you might expect that, due to transactions aborting (as with a
traditional unique constraint violations) or because of HOT, but,
prior to now, never the other way around (heap tuples are created
first and physically killed last). That is an assumption that has
existed since the Berkeley days, and it is something that may be
treated as an invariant in some places. Autovacuum is one such place,
I'd say. Isn't it the case that the autovacuum launcher daemon might
not hear about all of this index-only bloat, even if it was extremely
serious? Isn't the threshold it follows only in respect of heap tuples
(even if the statistics collector does collect index tuple stats
separately)?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-01 Thread Andres Freund
On 2013-08-31 19:38:49 -0700, Peter Geoghegan wrote:
 On Sat, Aug 31, 2013 at 11:34 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
   Won't S2 in this case exclusively lock a page in foo_a_key (the one
   where 2 will reside on) for 3600s, while it's waiting for S1 to finish
   while getting the speculative insertion into foo_b_key?
 
  Yes. The way the patch currently handles that case is unacceptable. It
  needs to be more concerned about holding an exclusive lock on
  earlier-locked indexes when locking the second and subsequent indexes
  iff it blocks on another transaction in the course of locking the
  second and subsequent indexes.

 I think that the fix here may be to do something analogous to what the
 code already does: release all buffer locks, wait for the other
 transaction to commit/abort, and try again. The difference being that
 we try again across all unique indexes rather than just this one. I
 have other obligations to take care of over the next couple of days,
 so I don't really have time to code that up and satisfy myself that
 it's robust; I'd prefer to have something more concrete to back this
 thought up, but that's my immediate reaction for what it's worth.

That could probably made work. I still have some concerns with the
fundamentals of your concept I voice later in the mail...

 We're looking for the first duplicate. So it would probably be okay
 for the IGNORE case to not bother retrying and re-locking if the other
 transaction committed (which, at least very broadly speaking, is the
 likely outcome).

Hm. Could you explain this a bit more? Do you mean that we can stop
trying to insert after the first violation (yes, sure if so)?

  After some thinking I don't think any solution primarily based on
  holding page level locks across other index operations is going to scale
  ok.

 Just to be clear: I'm certainly not suggesting that it's okay to have
 exclusive buffer locks held for any amount of time longer than
 instantaneous. Furthermore, it isn't necessary to do anything
 different to what we currently do with non-unique btree indexes during
 speculative insertion - I don't treat them any differently, and they
 only get locked in the usual way at the usual time, after heap
 insertion. Much of the time, there'll only be a primary key index to
 lock, and you'll have increased the window of the exclusive write-lock
 (compared to the existing code when doing a regular insertion) by only
 a very small amount - the duration of heap tuple insertion (the
 baseline here is the duration of finding an insertion location on page
 for index tuple, tuple insertion + possible page split). We're talking
 about a write-lock on a btree leaf page, which is only going to block
 other sessions from *finishing off* tuple insertion, and then only for
 a minority of such insertions much of the time.

I can't really agree with this part. First off, a heap_insert()
certainly isn't lightweight operation. There's several things that can
take extended time:
* toasting of rows (writing a gigabyte worth of data...)
* extension of the heap/toast table
* writeout of a dirty victim buffer if the page isn't in s_b
* reading in the target page if the page isn't in s_b
* waiting for an exclusive lock on the target pages

Secondly, you seem to be saying that such a lock would only block other
sessions from finishing of tuple insertions - but it will block normal
index searches as well? I have a hard time accepting the fact that one
session using INSERT ... KEY IGNORE will block lookup of a range of values
from the index in *other* sessions.

 The question must be asked: if you don't think this will scale okay,
 what are you comparing it to? The most useful basis of comparison is
 likely to be other systems. Asking about uniqueness across many unique
 indexes, and getting them all to commit to their unanimous position
 for a moment in time is inherently involved - I don't see a way to do
 it other than lock values, and I don't see any granularity at which
 that's possible other than the btree page granularity.

I'd expect that with the current implementation strategy if you
e.g. made pgbench use INSERT .. ON DUPLICATE IGNORE without ever
actually hitting duplicates, you'd see a very, very noticeable hit in
scalability. I wouldn't expect a single session to get much slower bug
trying a dozen or two should show a very noticeable dip.

  What I had in mind when playing around with this is the following trick:
  Just as in your patch split index insertion into two phases and perform
  one of them before inserting the heap tuple. In what you called
  speculative insertion don't stop at locking the page, but actually
  insert an index tuple in the index. As we haven't yet inserted the heap
  tuple we obviously can't insert an actual tuple. Instead set a flag on
  the item and reuse the the item pointer to store the xid of the
  inserting process. I coined those items insertion promises...

 I did consider that sort 

Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-01 Thread Andres Freund
On 2013-08-31 23:02:46 -0700, Peter Geoghegan wrote:
 On Sat, Aug 31, 2013 at 7:53 PM, Peter Geoghegan p...@heroku.com wrote:
  With a table with many unique indexes and many reasons to decide
  to reject a tuple proposed for insertion by INSERT...ON DUPLICATE KEY
  IGNORE, it isn't hard to imagine them all becoming heavily bloated
  very quickly.
 
 There may be no index tuple directly corresponding to some heap tuple
 where you might expect that, due to transactions aborting (as with a
 traditional unique constraint violations) or because of HOT, but,
 prior to now, never the other way around (heap tuples are created
 first and physically killed last). That is an assumption that has
 existed since the Berkeley days, and it is something that may be
 treated as an invariant in some places. Autovacuum is one such place,
 I'd say. Isn't it the case that the autovacuum launcher daemon might
 not hear about all of this index-only bloat, even if it was extremely
 serious? Isn't the threshold it follows only in respect of heap tuples
 (even if the statistics collector does collect index tuple stats
 separately)?

I don't think the amount of bloat should get that big since we should
immediately kill the promises when we've detected that we conflict on a
following unique index. The reason vacuum needs to care at all is only
that we might have crashed after inserting the promise but before
deleting it and thus haven't cleaned up. That itself isn't too bad since
that should happen infrequently but if we then get a xid wraparound
later we could be waiting on the wrong transaction.

Sure, there would need to be some changes to that code. I think we can
trigger that fairly easy by just triggering relation vacuums via
relfrozenxid and enforce a index bulkdelete on full table vacuums even
if there are no dead tuples.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-08-31 Thread Andres Freund
Hi,

On 2013-08-30 19:38:24 -0700, Peter Geoghegan wrote:
 On Fri, Aug 30, 2013 at 5:47 PM, Andres Freund and...@2ndquadrant.com wrote:
  While I ponder on it some more, could you explain whether I understood
  something correctly? Consider the following scenario:
 
  CREATE TABLE foo(a int UNIQUE, b int UNIQUE);
 
  S1: INSERT INTO foo(0, 0);
  S1: BEGIN;
  S1: INSERT INTO foo(1, 1);
  S1: SELECT pg_sleep(3600);
  S2: BEGIN;
  S2: INSERT INTO foo(2, 1);
  S3: SELECT * FROM foo WHERE a = 0;
 
  Won't S2 in this case exclusively lock a page in foo_a_key (the one
  where 2 will reside on) for 3600s, while it's waiting for S1 to finish
  while getting the speculative insertion into foo_b_key?
 
 Yes. The way the patch currently handles that case is unacceptable. It
 needs to be more concerned about holding an exclusive lock on
 earlier-locked indexes when locking the second and subsequent indexes
 iff it blocks on another transaction in the course of locking the
 second and subsequent indexes.

After some thinking I don't think any solution primarily based on
holding page level locks across other index operations is going to scale
ok.

What I had in mind when playing around with this is the following trick:
Just as in your patch split index insertion into two phases and perform
one of them before inserting the heap tuple. In what you called
speculative insertion don't stop at locking the page, but actually
insert an index tuple in the index. As we haven't yet inserted the heap
tuple we obviously can't insert an actual tuple. Instead set a flag on
the item and reuse the the item pointer to store the xid of the
inserting process. I coined those items insertion promises...

Index searches will ignore promises they see, but concurrent index
insertions will notice it's a promise and wait for the xid (or not, if
it has aborted). That fits pretty well into the existing btree code.  If
the insertion of an index promise worked for all unique indexes and the
heap tuple has been inserted, convert those promises into actual item
pointers pointing to the heap tuple. That probably requires a second
search and some cuteness to find the correct pointer because of
concurrent splits, but that's not too bad (possibly you can do away with
that if we continue to hold a pin).

The fact that now xids can be in the index creates some slight
complications because it now needs to be vacuumed - but that's
relatively easy to fit into the bulkdelete api and I don't think the
contortions to the vacuum code will be too bad.

Imo that solution is fairly elegant because it doesn't cause any heap
bloat, and it causes the same amount of index bloat
insert-into-heap-first type of solutions cause. It also has pretty
good concurrency behaviour because you never need to retain a page lock
for longer than a normal index insertion.
It's certainly a bit more complex than your solution tho...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-08-31 Thread Peter Geoghegan
On Sat, Aug 31, 2013 at 11:34 AM, Andres Freund and...@2ndquadrant.com wrote:
  Won't S2 in this case exclusively lock a page in foo_a_key (the one
  where 2 will reside on) for 3600s, while it's waiting for S1 to finish
  while getting the speculative insertion into foo_b_key?

 Yes. The way the patch currently handles that case is unacceptable. It
 needs to be more concerned about holding an exclusive lock on
 earlier-locked indexes when locking the second and subsequent indexes
 iff it blocks on another transaction in the course of locking the
 second and subsequent indexes.

I think that the fix here may be to do something analogous to what the
code already does: release all buffer locks, wait for the other
transaction to commit/abort, and try again. The difference being that
we try again across all unique indexes rather than just this one. I
have other obligations to take care of over the next couple of days,
so I don't really have time to code that up and satisfy myself that
it's robust; I'd prefer to have something more concrete to back this
thought up, but that's my immediate reaction for what it's worth.

We're looking for the first duplicate. So it would probably be okay
for the IGNORE case to not bother retrying and re-locking if the other
transaction committed (which, at least very broadly speaking, is the
likely outcome).

 After some thinking I don't think any solution primarily based on
 holding page level locks across other index operations is going to scale
 ok.

Just to be clear: I'm certainly not suggesting that it's okay to have
exclusive buffer locks held for any amount of time longer than
instantaneous. Furthermore, it isn't necessary to do anything
different to what we currently do with non-unique btree indexes during
speculative insertion - I don't treat them any differently, and they
only get locked in the usual way at the usual time, after heap
insertion. Much of the time, there'll only be a primary key index to
lock, and you'll have increased the window of the exclusive write-lock
(compared to the existing code when doing a regular insertion) by only
a very small amount - the duration of heap tuple insertion (the
baseline here is the duration of finding an insertion location on page
for index tuple, tuple insertion + possible page split). We're talking
about a write-lock on a btree leaf page, which is only going to block
other sessions from *finishing off* tuple insertion, and then only for
a minority of such insertions much of the time.

It's probably worth doing as much work up-front as possible (in terms
of asking questions about if we even want to lock the indexes and so
on, within ExecLockIndexTuples()), so as to decrease the window that
those locks are held. That's just an obvious optimization.

As users add unique indexes to tables, things do of course get worse
and worse. However, since in order for any of this to matter, there
has to be lots of concurrent activity in clustered ranges of values,
it's pretty likely that you'd conclude that tuple insertion should not
go ahead early on, and not end up doing too much exclusive/write
locking per tuple.

The question must be asked: if you don't think this will scale okay,
what are you comparing it to? The most useful basis of comparison is
likely to be other systems. Asking about uniqueness across many unique
indexes, and getting them all to commit to their unanimous position
for a moment in time is inherently involved - I don't see a way to do
it other than lock values, and I don't see any granularity at which
that's possible other than the btree page granularity.

 What I had in mind when playing around with this is the following trick:
 Just as in your patch split index insertion into two phases and perform
 one of them before inserting the heap tuple. In what you called
 speculative insertion don't stop at locking the page, but actually
 insert an index tuple in the index. As we haven't yet inserted the heap
 tuple we obviously can't insert an actual tuple. Instead set a flag on
 the item and reuse the the item pointer to store the xid of the
 inserting process. I coined those items insertion promises...

I did consider that sort of approach from an early stage - Simon
suggested it to me privately early last year. Actually, I even pointed
out in the 2012 developer meeting that there were unused IndexTuple
flag bits available for such purposes.

 Index searches will ignore promises they see, but concurrent index
 insertions will notice it's a promise and wait for the xid (or not, if
 it has aborted).

If inserters must block, they'll have to restart - I'm sure you'll
agree that they can't sit on even a shared/read buffer lock
indefinitely. So, have you really saved anything? Especially since
you'll have to search at least twice (more shared locks on *root*
pages) where my scheme only needs to search once. Also, are you really
sure that index searches should have license to ignore these promises?
What about dirty snapshots?


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-08-31 Thread Peter Geoghegan
On Sat, Aug 31, 2013 at 7:38 PM, Peter Geoghegan p...@heroku.com wrote:
 Imo that solution is fairly elegant because it doesn't cause any heap
 bloat, and it causes the same amount of index bloat
 insert-into-heap-first type of solutions cause.

 I don't think that's a reasonable comparison. Bloating indexes with
 dead *duplicate* tuples is, in general, worse than doing the same with
 the heap. If you end up with a heavily bloated index with lots of
 duplicates, that will adversely affect performance worse than with
 non-duplicate index tuples (see [1] for example). That was one reason
 that HOT helped as much as it did, I believe.

Bear in mind also that one unique index could hardly ever have real
duplicates, but it would still get broken promise/bloat tuples
because another, later index said it had a duplicate. All of the
indexes get bloated (quite possibly) just because of a duplicate in
one. With a table with many unique indexes and many reasons to decide
to reject a tuple proposed for insertion by INSERT...ON DUPLICATE KEY
IGNORE, it isn't hard to imagine them all becoming heavily bloated
very quickly.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-08-30 Thread Peter Geoghegan
For many years now, MySQL has a feature called INSERT IGNORE [1];
SQLite has INSERT ON CONFLICT IGNORE [2]; SQL Server has an option
called IGNORE_DUP_KEY and Oracle has a hint called
IGNORE_ROW_ON_DUPKEY_INDEX (they acknowledge that it's a bit odd that
a hint changes the semantics of a DML statement - of
course, MERGE has always been able to do something approximately equivalent).

Each of these features has their respective systems simply not insert
certain tuples that will result in a duplicate key violation, while
potentially proceeding with the insertion of other, non-violating
tuples.

The attached WIP patch implements this for Postgres, with a few
notable differences:

1) The patch is only interested in unique index violations
(specifically, violations of amcanunique access method unique
indexes); it will not do anything with NULL constraint violations, as
the MySQL feature does, for example. It also doesn't care about
exclusion constraints, even though it's currently possible to defer
exclusion constraint enforcement in a way that is analogous to how
it's done for btree indexes.

2) The clause that I've proposed is, as you'll have already gathered,
spelt slightly differently to any of these systems. I'm not
particularly attached to how I've spelt it. I've spelt it this way so
as to avoid suggesting that it's fully compatible with the MySQL
feature. I don't think we want weird NULLness behavior, but I may be
mistaken. If we want that additional behavior, we probably want it as
an optional adjunct, or perhaps an entirely independent clause to what
I've specified here.

3) RETURNING is expanded - RETURNING REJECTS * is now possible where
that makes sense.

It is possible for clients to interrogate the wire protocol to see the
number of rows affected (this can be done with a PQcmdTuples() call by
libpq clients, for example), and see how that compares with the number
of tuples they tried to insert. In addition, a client can use
RETURNING to see what rows were successfully inserted. Here is a
simple example session that shows this usage:

postgres=# create table tab(id int4 primary key, val text);
CREATE TABLE
postgres=# insert into tab(id, val) values(1, 'val'), (3, 'val'), (4, 'val');
INSERT 0 3
postgres=# insert into tab(id, val) values(1, 'nosert'), (2,
'nosert'), (3, 'nosert') on duplicate key ignore returning id;
 id

  2
(1 row)

INSERT 0 1
postgres=# select xmin, * from tab;
  xmin  | id | val
++-
 580843 |  1 | val
 580843 |  3 | val
 580843 |  4 | val
 580844 |  2 | nosert
(4 rows)

If this all happened in two separate, concurrent sessions, the
transaction with xid 580844 might well have blocked on the outcome of
580843 in the usual way (ultimately inserting or doing nothing as
appropriate for each tuple), just as you'd expect. Note, however, that
the xid of the noserter is only seen here for the tuple that it
actually successfully inserted - in effect, the noserter session does
not lock tuples that it did not originally insert. That's consistent
with MySQL's INSERT IGNORE, for example. However, some people might
prefer it if this did share lock rows that prevented tuples from being
inserted, or if that was at least an option. That would be something
quite a bit closer to a fully-fledged upsert - I imagine we'd have to
lock the old tuple using EvalPlanQual, and if a submission did that
correctly then I think we'd be well on our way to solving all the
problems.

RETURNING REJECTS is a way of getting the inverse of affected tuples
of an ordinary INSERT...RETURNING statement - rather than returning
rows successfully inserted only (i.e. tuples where either a BEFORE
trigger returned NULL or we didn't proceed with insertion), we
returning rows that were not successfully inserted only. This is only
meaningful in the context of INSERT...ON DUPLICATE KEY IGNORE. Sure,
clients could figure this out for themselves using vanilla RETURNING
*, but I think that this addition is justified by the fact that it's
generally expected that the number of rejects will be relatively
small. People are going to want to use this feature in an ad-hoc
manner within a psql session, and supporting this will particularly
help there. It might be nice to find a way to indicate the exact
constraint violated per row returned, but I preferred to wait and see
if people thought it was a good idea generally.

Use-cases
=

The use-cases for this feature are fairly obvious. It can be used as a
way of merging tuples easily, if clients can live with the present
limitations.

In addition to that, the feature is may be of value to clients of the
logical change-set generation infrastructure (such as the proposed BDR
contrib module[3]), if and when that infrastructure is finally
committed. Currently, the BDR prototype simply balks at INSERT
conflicts. This is because the amount of subtransactions/xids
generated to do anything else is considered prohibitive. If Postgres
is ever going to solve the kinds of problems 

Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-08-30 Thread Josh Berkus
On 08/30/2013 03:09 PM, Peter Geoghegan wrote:
 The attached WIP patch implements this for Postgres, with a few
 notable differences:

Thank you for addressing this.  If nobody is going to hack out MERGE, we
could really use at least this feature.

 3) RETURNING is expanded - RETURNING REJECTS * is now possible where
 that makes sense.

Oh, nifty!  OK, now I can *really* use this feature.

 This patch is a spin-off from a broader effort to implement
 INSERT...ON DUPLICATE KEY UPDATE (upsert). During the 2012 developer

Yeah, I was wondering when we'd get to that.  Obviously there will be
users clamoring for it ...

 Unlike some other systems like DB2, we have always allowed BEFORE ROW
 triggers to execute arbitrary SQL. I've frequently thought this was a
 bit of a wart (e.g. [10]), and certainly not supportive of sensible,
 idiomatic trigger use, but there isn't much we can do about it at this
 stage. Right now, BEFORE ROW triggers will still fire when the new
 code decides to not do the insert. It certainly wouldn't be acceptable
 to allow before triggers to run *after* the first phase of speculative
 insertion, because they might try and access an index with an
 exclusive locked buffer, resulting in backend self-deadlock. Besides,
 we cannot really know *what* to lock until after the before triggers
 fire. That leaves us with two options, as far as I can tell:
 
 1. Just have users live with those semantics. This is what the patch
 does presently, and anyone who is using triggers in what I consider to
 be a sensible way doesn't have to care. For everyone else, it's a
 gotcha that they have to deal with, to be noted prominently.

+1.  We already allow BEFORE triggers to violate referential integrity.
 I don't think that allowing them to behave oddly around INSERT ...
IGNORE is a problem compared to that.  We just need to document it, so
that users know that their BEFORE code will fire even if the INSERT is
being ignored, and that a BEFORE trigger can cause an INSERT ... IGNORE
to error out.

 I have done no performance testing to date. Reviewers will want to pay
 attention to the performance implications, particularly in the regular
 insert case; it's conceivable that I've regressed things, though I
 don't specifically suspect that I have.

Yeah, we'll also want to document the performance overhead for the bulk
loading case.  I know I'll want to use this syntax as a primitive form
of MERGE, and I'll want to know what it costs me.

Does this work with multiple VALUES rows?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-08-30 Thread Peter Geoghegan
On Fri, Aug 30, 2013 at 3:40 PM, Josh Berkus j...@agliodbs.com wrote:
 Does this work with multiple VALUES rows?

Yes.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-08-30 Thread Andres Freund
Hi,

This is awesome.

On 2013-08-30 15:09:59 -0700, Peter Geoghegan wrote:
 1) The patch is only interested in unique index violations
 (specifically, violations of amcanunique access method unique
 indexes); it will not do anything with NULL constraint violations, as
 the MySQL feature does, for example. It also doesn't care about
 exclusion constraints, even though it's currently possible to defer
 exclusion constraint enforcement in a way that is analogous to how
 it's done for btree indexes.

All that seems sane to me. I very, very much do not want it to deal with
NOT NULL violations.

 [syntax]

Haven't thought about syntax at all so far, that's food for another day.

 Use-cases
 =
 ...
 In addition to that, the feature is may be of value to clients of the
 logical change-set generation infrastructure (such as the proposed BDR
 contrib module[3]), if and when that infrastructure is finally
 committed. Currently, the BDR prototype simply balks at INSERT
 conflicts. This is because the amount of subtransactions/xids
 generated to do anything else is considered prohibitive. If Postgres
 is ever going to solve the kinds of problems that BDR proposes to
 solve well (except for the relatively narrow use-case where balking at
 INSERT conflicts is acceptable), it needs something like what I've
 proposed here. The only alternative is probably to replay a
 transaction without using subtransactions, and if that fails, remember
 that fact and wrap every single change in a subxact. This is really
 going to be painful for large transactions.

Exactly. In completely unscientific tests it reduces speed to less than
1/10 of the version without subxacts. Without a single conflict that is.

 Andres privately said some weeks back (prior to my sharing of this
 information, and even prior to writing most of the code posted here)
 that he'd like to see an INSERT...ON DUPLICATE KEY LOCK. I suppose
 that he must have meant that he'd like to see shared locks obtained on
 rows that blocked a noserter from actually inserting some of their
 proposed tuples, for the purposes of conflict resolution (which, as
 I've said, this doesn't do). Perhaps he'd like to comment on these
 issues here.

Ok, so what we would like to do is basically to follow a protocol
(simplified) like:

1) replay INSERT, using ON DUPLICATE IGNORE
2) if INSERT succeeded, be happy, no conflict (another INSERT with the
   same key might conflict though)
3) if the INSERT failed, lock tuple for UPDATE
4) if the tuple got deleted by another session, goto 1
5) check whether local tuple or the already inserted tuple wins conflict 
resolution
6) UPDATE or skip accordingly

If there's a variant of INSERT ... ON DUPLICATE that gets a FOR UPDATE
lock on the existing row this gets simpler because there's no chance
anymore we need to loop or look for other versions of the row.

Makes sense?

 Mechanism
 =

 [...]

 From a high level, the patch works by adding something that is
 tentatively internally referred to as speculative insertion in
 lower-level code comments. The basic approach is:

 * Within ExecInsert, when inserting into a regular heap relation from
 a tuple table slot (after BEFORE triggers fire but before inserting
 the heap tuple), there is an extra function call to a new function,
 ExecLockIndexTuples().
 
 * For each applicable amcanunique unique index, ExecLockIndexTuples()
 calls a new am method, amlock. For each unique index, we end up
 acquiring a write buffer lock in a similar fashion to _bt_doinsert.
 When we return from the new amlock function, we report whether or not
 there was a duplicate without having inserted any index tuples or
 having otherwise done something that requires a heap tuple to be
 present. I like to call this point the point of no return, where
 individual unique indexes commit themselves to insertion (which is to
 say, the point after which a unique constraint violation becomes
 impossible but before actual index tuple insertion in _bt_doinsert
 [9]). Another duplicate key cannot be inserted by anyone else until
 that write lock is released, which it won't be until insertion or
 IGNORE, for the same reason why that's currently true once control
 reaches the point of no return. A pagesplit might need to happen
 before insertion proper of that index tuple, for example, and that's
 okay.
 
 * If we have total consensus among all unique indexes, we proceed with
 a heap insertion of the tuple formed from the tuple table slot
 ExecInsert() was called in respect of. Yes, this means that
 potentially multiple exclusive buffer locks are held for the duration
 of the heap tuple insertion. If not, we release the locks early and
 return NULL from ExecInsert() before heap insertion ever happens (and
 certainly before index tuple insertion ever happens).
 
 * If and when we get to ExecInsertIndexTuples(), and ultimately to
 _bt_doinsert, the new _bt_doinsert knows that for any unique indexes
 it sees, that the process has 

Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-08-30 Thread Peter Geoghegan
On Fri, Aug 30, 2013 at 5:47 PM, Andres Freund and...@2ndquadrant.com wrote:
 This is awesome.

Thanks.

 All that seems sane to me. I very, very much do not want it to deal with
 NOT NULL violations.

Sure. But there's nothing stopping us from doing that as a totally
orthogonal thing. Not that I personally find it to be terribly
compelling or anything. It's an easy addition, but I'm certainly not
going to try and mandate it as integral to what I've done here if that
doesn't suit you.

 Ok, so what we would like to do is basically to follow a protocol
 (simplified) like:

 1) replay INSERT, using ON DUPLICATE IGNORE
 2) if INSERT succeeded, be happy, no conflict (another INSERT with the
same key might conflict though)
 3) if the INSERT failed, lock tuple for UPDATE
 4) if the tuple got deleted by another session, goto 1
 5) check whether local tuple or the already inserted tuple wins conflict 
 resolution
 6) UPDATE or skip accordingly

Right, I thought that's what you meant. I'll have to ponder it
further. However, I don't think locking the old tuple(s) is mandatory
to make this a useful feature - as I've noted, MySQL doesn't do that.
That said, I really want to support that, perhaps as another DUPLICATE
KEY variant. As I've noted, if we had that, we almost wouldn't need
upsert - loosely speaking it would be mere syntactic sugar on top of
what you require.

 If there's a variant of INSERT ... ON DUPLICATE that gets a FOR UPDATE
 lock on the existing row this gets simpler because there's no chance
 anymore we need to loop or look for other versions of the row.

 Makes sense?

Perfect sense.

 Puh. It took me some time to even start to understand what you're doing
 here...

 While I ponder on it some more, could you explain whether I understood
 something correctly? Consider the following scenario:

 CREATE TABLE foo(a int UNIQUE, b int UNIQUE);

 S1: INSERT INTO foo(0, 0);
 S1: BEGIN;
 S1: INSERT INTO foo(1, 1);
 S1: SELECT pg_sleep(3600);
 S2: BEGIN;
 S2: INSERT INTO foo(2, 1);
 S3: SELECT * FROM foo WHERE a = 0;

 Won't S2 in this case exclusively lock a page in foo_a_key (the one
 where 2 will reside on) for 3600s, while it's waiting for S1 to finish
 while getting the speculative insertion into foo_b_key?

Yes. The way the patch currently handles that case is unacceptable. It
needs to be more concerned about holding an exclusive lock on
earlier-locked indexes when locking the second and subsequent indexes
iff it blocks on another transaction in the course of locking the
second and subsequent indexes.

 ISTM we could use something like a new lock level to make that work more
 sensibly, basically something like LW_FOR_UPDATE which does not conflict
 with _SHARED but conflicts with _EXCLUSIVE.

I'll ponder it further. There are probably a number of options.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers