Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-30 Thread Robert Haas
On Tue, Jul 29, 2014 at 1:28 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Jul 29, 2014 at 9:56 AM, Robert Haas robertmh...@gmail.com wrote:
 I think it would be advisable to separate the syntax from the
 implementation.  Presumably you can make your implementation use some
 reasonable syntax we can all agree on, and conversely my proposed
 syntax could be made to have a different set of semantics.  There's
 some connection between the syntax and semantics, of course, but it's
 not 100%.  I mention this because I was mostly concerned with getting
 to a reasonable syntax proposal, not so much the implementation
 details.  It may well be that your implementation details are perfect
 at this point; I don't know because I haven't looked, and I'm not an
 expert on that area of the code anyway.  But I have looked at your
 syntax, which I wasn't altogether keen on.

 Fair enough. I think the syntax should reflect the fact that upserts
 are driven by inserts, though. Users will get into trouble with a
 syntax that allows a predicate that is evaluated before any rows are
 locked.

You've convinced me that only indexable predicates can be sensibly
used here.  I'm not sure that's the same as saying that upserts are
driven by inserts, though.  I'd tend to interpret that to mean
insert-the-heap-tuple-first, but I think what you're doing is
check-the-indexes-first.  The terminology in this area is tricky.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-30 Thread Peter Geoghegan
On Wed, Jul 30, 2014 at 10:36 AM, Robert Haas robertmh...@gmail.com wrote:
 You've convinced me that only indexable predicates can be sensibly
 used here.  I'm not sure that's the same as saying that upserts are
 driven by inserts, though.  I'd tend to interpret that to mean
 insert-the-heap-tuple-first, but I think what you're doing is
 check-the-indexes-first.  The terminology in this area is tricky.

I'm glad. Indeed, I am proposing checking/locking indexes first, but
that is actually pretty similar to what Heikki did in his prototype
(insert a heap tuple first), in that those heap tuples were
conceptually speculative/optimistic value locks. Heikki's need to
super delete in the event of a conflict (by setting the speculative
heap tuple's xmin to InvalidTransactionId) demonstrates that.
Furthermore, Heikki's last revision ended up having places like
_bt_checkunique() looking in the procarray to see if the xact to wait
on was still around and had a speculative token, so that we could
wait on that to be released rather than the whole xact to commit/abort
(avoiding unprincipled deadlocking hinges upon this):

+ /*
+  * If it's a speculative insertion, wait for it to finish (ie.
+  * to go ahead with the insertion, or kill the tuple). Otherwise
+  * wait for the transaction to finish as usual.
+  */
+ if (speculativeToken)
+ SpeculativeInsertionWait(xwait, speculativeToken);
+ else
+ XactLockTableWait(xwait);

So, as I said, what he ended up with was actually very close to what I
wanted to do. I felt it was more logical to just use an index page
lock, given the fact that they have been used before, and given the
limited code footprint (although I do accept that in general altering
the nbtree code is best avoided). But Heikki did ultimately understand
the considerations that informed my design, and so ultimately IMV the
disagreement there was more or less just on that one detail. (Not that
the rest of the patch was perfect or anything, but it was useful to
have my design *understood*). You can hopefully see why I'd
characterize both of those designs as insertion driven.

The terminology is very tricky here, though. It might even be the
trickiest aspect to the whole thing, because at times I've had a hard
time communicating what I mean effectively on list. Anyway, I'll be
sure to emphasize the distinction in future.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-30 Thread Bruce Momjian
On Mon, Jul 28, 2014 at 11:37:07AM -0400, Robert Haas wrote:
  Yes, but what if you don't see a conflict because it isn't visible to
  your snapshot, and then you insert, and only then (step 5), presumably
  with a dirty snapshot, you find a conflict? How does the loop
  terminate if that brings you back to step 1 with the same MVCC
  snapshot feeding the update?
 
 Good point.  Maybe the syntax should be something like:
 
 UPSERT table (keycol [, keycol] ...) { VALUES (val [, val] ...) [,
 ...] | select_query }

One idea would be to allow UPSERT with constants (single row), and use
CTEs with a SELECT or INSERT/RETURNING for multi-row upserts.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-29 Thread Robert Haas
On Mon, Jul 28, 2014 at 1:43 PM, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Jul 28, 2014 at 8:37 AM, Robert Haas robertmh...@gmail.com wrote:
 AFAIUI, this is because your implementation uses lwlocks in a way that
 Andres and I both find unacceptable.

 That's not the case. My implementation uses page-level heavyweight
 locks. The nbtree AM used to use them for other stuff. Plenty of other
 systems use index level locks managed by a heavyweight lock manager.

Oh, OK.  I think I missed that development somewhere.

 Good point.  Maybe the syntax should be something like:

 UPSERT table (keycol [, keycol] ...) { VALUES (val [, val] ...) [,
 ...] | select_query }

 That would address both the concern about being able to pipe multiple
 tuples through it and the point you just raised.  We look for a row
 that matches each given tuple on the key columns; if one is found, we
 update it; if none is found, we insert.

 That basically is my design, except that (tangentially) yours risks
 bloat in exchange for not having to use a real locking mechanism, and
 has a different syntax.

I think it would be advisable to separate the syntax from the
implementation.  Presumably you can make your implementation use some
reasonable syntax we can all agree on, and conversely my proposed
syntax could be made to have a different set of semantics.  There's
some connection between the syntax and semantics, of course, but it's
not 100%.  I mention this because I was mostly concerned with getting
to a reasonable syntax proposal, not so much the implementation
details.  It may well be that your implementation details are perfect
at this point; I don't know because I haven't looked, and I'm not an
expert on that area of the code anyway.  But I have looked at your
syntax, which I wasn't altogether keen on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-29 Thread Peter Geoghegan
On Tue, Jul 29, 2014 at 9:56 AM, Robert Haas robertmh...@gmail.com wrote:
 I think it would be advisable to separate the syntax from the
 implementation.  Presumably you can make your implementation use some
 reasonable syntax we can all agree on, and conversely my proposed
 syntax could be made to have a different set of semantics.  There's
 some connection between the syntax and semantics, of course, but it's
 not 100%.  I mention this because I was mostly concerned with getting
 to a reasonable syntax proposal, not so much the implementation
 details.  It may well be that your implementation details are perfect
 at this point; I don't know because I haven't looked, and I'm not an
 expert on that area of the code anyway.  But I have looked at your
 syntax, which I wasn't altogether keen on.


Fair enough. I think the syntax should reflect the fact that upserts
are driven by inserts, though. Users will get into trouble with a
syntax that allows a predicate that is evaluated before any rows are
locked.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-28 Thread Robert Haas
On Wed, Jul 23, 2014 at 7:32 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Because nobody wants an operation to either insert 1 tuple or update
 n=1 tuples.  The intention is that the predicate should probably be
 something like WHERE unique_key = 'some_value', but you can use
 something else.  So it's kinda like saying which index you care about
 for a particular operation, except without having to explicitly name
 an index.  But in any event you should use a predicate that uniquely
 identifies the tuple you want to update.

 This seemed a nice idea when I first read it earlier today, but now I'm
 not so sure.  Are you saying that it wouldn't be allowed to use an
 UPSERT with some sort of join, such that each joined row would produce
 either one insert or one update?  To clarify: suppose I import some
 external data into a temp table, then run UPSERT USING that table so
 that the rows end up in a permanent table; some of the rows might be
 already in the permanent table, some others might not.  I would hope
 that by the time UPSERT is done, all the rows are in the permanent
 table.  Would that raise an error, with your proposed design?

Yeah, my syntax didn't have a mechanism for that.  I agree we should
have one.  I was just brainstorming.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-28 Thread Robert Haas
On Wed, Jul 23, 2014 at 7:35 PM, Peter Geoghegan p...@heroku.com wrote:
 It's certain arguable whether you should INSERT and then turn failures
 into an update or try to UPDATE and then turn failures into an INSERT;
 we might even want to have both options available, though that smells
 a little like airing too much of our dirty laundry.  But I think I
 generally favor optimizing for the UPDATE case for more or less the
 same reasons Kevin articulated.

 I don't see the connection between this and Kevin's remarks. And FWIW,
 I don't see a reason to favor inserts or updates. Fortunately, what I
 have balances both cases very well, and doesn't cause bloat. The work
 of descending the index to lock it isn't wasted if an update is
 required. My implementation decides to either insert or update at
 literally the latest possible moment.

AFAIUI, this is because your implementation uses lwlocks in a way that
Andres and I both find unacceptable.  My suspicion is that any version
of this that ends up getting committed is going to involve a risk of
bloat in cases involving retries, and I think it will be easier to
minimize bloat in an update-driven implementation.  But I suppose
that's speculative.

 Here you seem to be suggested that I intended to propose your existing
 design rather than something else, which I didn't.  In this design,
 you find the conflict (at most one) but scanning for the tuple to be
 updated.

 Yes, but what if you don't see a conflict because it isn't visible to
 your snapshot, and then you insert, and only then (step 5), presumably
 with a dirty snapshot, you find a conflict? How does the loop
 terminate if that brings you back to step 1 with the same MVCC
 snapshot feeding the update?

Good point.  Maybe the syntax should be something like:

UPSERT table (keycol [, keycol] ...) { VALUES (val [, val] ...) [,
...] | select_query }

That would address both the concern about being able to pipe multiple
tuples through it and the point you just raised.  We look for a row
that matches each given tuple on the key columns; if one is found, we
update it; if none is found, we insert.

 I agree that you want to uniquely identify each tuple. What I meant
 was, why should we not be able to upsert multiple rows in a single
 command? What's wrong with that?

Nothing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-28 Thread Peter Geoghegan
On Mon, Jul 28, 2014 at 8:37 AM, Robert Haas robertmh...@gmail.com wrote:
 AFAIUI, this is because your implementation uses lwlocks in a way that
 Andres and I both find unacceptable.

That's not the case. My implementation uses page-level heavyweight
locks. The nbtree AM used to use them for other stuff. Plenty of other
systems use index level locks managed by a heavyweight lock manager.

 Here you seem to be suggested that I intended to propose your existing
 design rather than something else, which I didn't.  In this design,
 you find the conflict (at most one) but scanning for the tuple to be
 updated.

 Yes, but what if you don't see a conflict because it isn't visible to
 your snapshot, and then you insert, and only then (step 5), presumably
 with a dirty snapshot, you find a conflict? How does the loop
 terminate if that brings you back to step 1 with the same MVCC
 snapshot feeding the update?

 Good point.  Maybe the syntax should be something like:

 UPSERT table (keycol [, keycol] ...) { VALUES (val [, val] ...) [,
 ...] | select_query }

 That would address both the concern about being able to pipe multiple
 tuples through it and the point you just raised.  We look for a row
 that matches each given tuple on the key columns; if one is found, we
 update it; if none is found, we insert.

That basically is my design, except that (tangentially) yours risks
bloat in exchange for not having to use a real locking mechanism, and
has a different syntax. The parts of inserting into an index scan that
I stagger include an initial part that is more or less just an index
scan. With this design you'll have to set up things so that all
indexes can be directly accessed in the manner of ExecInsert() (get a
list of them from the relcache, open them in an order that avoids
possible deadlocks, etc). Why not just use ExecInsert()? I don't think
I'm alone in seeing things that way.

On a mostly unrelated note, I'll remind you of the reason that I felt
it was best to lock indexes. It wasn't so much about avoiding bloat as
it was about avoiding deadlocks. When I highlighted the issue,
Heikki's prototype, which did insert optimistically rather than
locking, was then made to go and physically super delete the
upsert-insert conflicting heap tuple (inserted optimistically before
its index tuples), before going to lock a row, in order to avoid
unprincipled deadlocking. In contrast, my design just used a callback
that released page level heavyweight locks before going to lock a row.
Heikki's prototype involved making it so that *even someone else's
dirty snapshot* didn't see our dead speculatively-inserted heap tuple.

Anyway, making all that happen is fairly invasive to a bunch of places
that are just as critical as the nbtree code. I'm not saying it can't
be done, or even that it definitely shouldn't be, but taking an
approach that produces bloat, rather than locking values the way other
systems do (and, to a limited extent Postgres already does) is at
least way more invasive than it first appears. Plus,  I ask you to
consider that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-28 Thread Peter Geoghegan
On Mon, Jul 28, 2014 at 10:43 AM, Peter Geoghegan p...@heroku.com wrote:
 Plus,  I ask you to
 consider that.

Excuse me. I meant Plus, you avoid bloat. I ask you to consider that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-28 Thread Peter Geoghegan
On Mon, Jul 28, 2014 at 10:43 AM, Peter Geoghegan p...@heroku.com wrote:
 On a mostly unrelated note, I'll remind you of the reason that I felt
 it was best to lock indexes. It wasn't so much about avoiding bloat as
 it was about avoiding deadlocks. When I highlighted the issue,
 Heikki's prototype, which did insert optimistically rather than
 locking, was then made to go and physically super delete the
 upsert-insert conflicting heap tuple (inserted optimistically before
 its index tuples), before going to lock a row, in order to avoid
 unprincipled deadlocking. In contrast, my design just used a callback
 that released page level heavyweight locks before going to lock a row.
 Heikki's prototype involved making it so that *even someone else's
 dirty snapshot* didn't see our dead speculatively-inserted heap tuple.

 Anyway, making all that happen is fairly invasive to a bunch of places
 that are just as critical as the nbtree code.

I think I should be more concrete about why this is more complicated
than it first appears. Andres said at pgCon that he would still go
with a design where promise tuples are inserted into indexes ahead
of any heap tuple (which differs from Heikki's prototype, where heap
tuple insertion occurs first). Accounting for deadlocking issues could
be particularly problematic with that design, since we must kill
tuples from each index in turn before row locking. In any case the
need to efficiently *release* locks must be weighed carefully. It
isn't obvious, but we must release locks if there is a conflict.

After Heikki acknowledged the problem [1], he produced a revision
addressing it. The details of the workaround and a patch were posted
[2]. I think it's fair to say that this area is a lot messier than it
first appears. If anyone wants to propose an alternative design, they
are of course quite welcome to, but I ask that the very real
difficulties with those designs be acknowledged. AFAICT, only Heikki
has independently acknowledged these issue on list.

In case it isn't obvious, let me be clear about what I care about
here. I feel it's important to get something that is easy to reason
about - you write a DML statement, and within certain fairly
reasonable parameters Postgres does the rest. I think we should not
accept something that may even occasionally through deadlock errors,
or unique violations, or RC-level serialization failures through no
fault of the user. That would be inferior to the plpgql looping
pattern we promote that does the right thing and avoids all of this.
It would be awful to have to tell users who hit problems like this
that they should just stop doing so much upserting, or use the old
pattern.

[1] http://www.postgresql.org/message-id/52b4aaf0.5090...@vmware.com
[2] http://www.postgresql.org/message-id/52d00d2d.6030...@vmware.com

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread Robert Haas
On Sat, Jul 19, 2014 at 10:04 PM, Peter Geoghegan p...@heroku.com wrote:
 On Fri, Jul 18, 2014 at 11:23 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
 Meh. A understandable syntax wouldn't require the pullups with a special
 scan node and such.

 Well, in general ExecModifyTable()/ExecUpdate() trusts the tid passed
 to match the qual in the query. Unless you're willing to give up on
 the idea of having a qual on the update (other than something like an
 ON CONFLICTS qual, obviously) I think you'll probably end up with some
 kind of pseudo-scan node anyway, if only for consistency with how
 things already work, to give you somewhere to show the Filter in
 explain output and so on. ExecScan() probably needs to ExecQual().
 Inserts don't have scan nodes, but updates do, and so it seems pretty
 odd to make the Insert node (a ModifyTable node) pullup from a
 result node (as always), and the Update node (also a ModifyTable
 node) treat the first ModifyTable node as its own pseudo-scan node,
 when in fact we are scanning the heap in a manner not unlike a
 conventional update. Or do you envision a second result node where an
 update qual might be evaluated? I am not enthused about either
 possibility.

 The idea of not having the ability to put a predicate on the update at
 all, much like the MySQL thing isn't completely outrageous, but it
 certainly isn't going to go down well those that have already
 expressed concerns about how much of a foot gun this could be.

This all seems completely to one side of Andres's point.  I think what
he's saying is: don't implement an SQL syntax of the form INSERT ON
CONFLICT and let people use that to implement upsert.  Instead,
directly implement an SQL command called UPSERT or similar.  That's
long been my intuition as well.  I think generality here is not your
friend.

I'd suggest something like:

UPSERT table SET col = value [, ...] WHERE predicate;

with semantics like this:

1. Search the table, using any type of scan you like, for a row
matching the given predicate.
2. If you find more than one tuple that is visible to your scan, error.
3. If you find exactly one tuple that is visible to your scan, update it.  Stop.
4. If you find no tuple that is visible to your scan, but you do find
at least one tuple whose xmin has committed and whose xmax has not
committed, then (4a) if the isolation level is REPEATABLE READ or
higher, error; (4b) if there is more than one such tuple, error; else
(4b) update it, in violation of normal MVCC visibility rules.
5. Construct a new tuple using the col/value pairs from the SET clause
and try to insert it.  If this would fail with a unique index
violation, back out and go back to step 1.

Having said all that, I believe the INSERT ON CONFLICT syntax is more
easily comprehensible than previous proposals.  But I still tend to
agree with Andres that an explicit UPSERT syntax or something like it,
that captures all of the MVCC games inside itself, is likely
preferable from a user standpoint, whatever the implementation ends up
looking like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread Peter Geoghegan
On Wed, Jul 23, 2014 at 9:58 AM, Robert Haas robertmh...@gmail.com wrote:
 This all seems completely to one side of Andres's point.  I think what
 he's saying is: don't implement an SQL syntax of the form INSERT ON
 CONFLICT and let people use that to implement upsert.  Instead,
 directly implement an SQL command called UPSERT or similar.  That's
 long been my intuition as well.  I think generality here is not your
 friend.

Sure, I was just making one fairly narrow point in relation to Andres'
concern about executor structuring of the UPSERT.

 I'd suggest something like:

 UPSERT table SET col = value [, ...] WHERE predicate;

Without dismissing the overall concern shared by you and Andres, that
particular update-driven syntax doesn't strike me as something that
should be pursued. I would like to be able to insert or update more
than a single row at a time, for one thing. For another, what exactly
appears in the predicate? Is it more or less the same as an UPDATE's
predicate?

 with semantics like this:

 1. Search the table, using any type of scan you like, for a row
 matching the given predicate.

Perhaps I've misunderstood, but this is fundamentally different to
what I'd always thought would need to happen. I always understood that
the UPSERT should be insert-driven, and so not really have an initial
scan at all in the same sense that every Insert lacks one. Moreover,
I've always thought that everyone agreed on that. We go to insert, and
then in the course of inserting index tuples do something with dirty
snapshots. That's how we get information on conflicts.

For one thing we cannot use any kind of scan unless there is a new
mechanism for seeing not-yet-visible tuples, like some kind of
non-MVCC snapshot that those scan nodes can selectively use. Now, I
guess that you intend that in this scenario you go straight to 5, and
then your insert finds the not-yet-visible conflict. But it's not
clear that when 5 fails, and we pick up from 1 that we then get a new
MVCC Snapshot or something else that gives us a hope of succeeding
this time around. And it seems quite wasteful to not use knowledge of
conflicts from the insert at all - that's one thing I'm trying to
address here; removing unnecessary index scans when we actually know
what our conflict TIDs are.

 2. If you find more than one tuple that is visible to your scan, error.

This point seems to concern making the UPSERT UPDATE only affect zero
or one rows. Is that right? If so, why do you think that's a useful
constraint to impose?

 3. If you find exactly one tuple that is visible to your scan, update it.  
 Stop.
 4. If you find no tuple that is visible to your scan, but you do find
 at least one tuple whose xmin has committed and whose xmax has not
 committed, then (4a) if the isolation level is REPEATABLE READ or
 higher, error; (4b) if there is more than one such tuple, error; else
 (4b) update it, in violation of normal MVCC visibility rules.

Point 4b (if there is more than one such tuple...) seems like it
needs some practical or theoretical justification - do you just not
want to follow an update chain? If so, why? What would the error
message actually be? And, to repeat myself: Why should an MVCC
snapshot see more than one such tuple in the ordinary course of
scanning a relation, since there is by definition a unique constraint
that prevents this? Or maybe I'm wrong; maybe you don't even require
that. That isn't clear.

As you know, I've always opposed any type of READ COMMITTED
serialization failure. If you're worried about lock starvation hazards
- although I don't think strong concerns here are justified - we can
always put in a fail-safe number of retries before throwing an error.
I'm comfortable with that because I think based on experimentation
that it's going to be virtually impossible to hit. Apparently other
snapshot isolation databases acquire a new snapshot, and re-do the
command rather than using an EvalPlanQual()-like mechanism and thereby
violating snapshot isolation. Those other systems have just such a
hard limit on retries before erroring, and it seems to work out okay
for them.

 5. Construct a new tuple using the col/value pairs from the SET clause
 and try to insert it.  If this would fail with a unique index
 violation, back out and go back to step 1.

Again, I can't see why you'd do this step last and not first, since
this is naturally the place to initially see into the future with a
dirty snapshot.

 Having said all that, I believe the INSERT ON CONFLICT syntax is more
 easily comprehensible than previous proposals.  But I still tend to
 agree with Andres that an explicit UPSERT syntax or something like it,
 that captures all of the MVCC games inside itself, is likely
 preferable from a user standpoint, whatever the implementation ends up
 looking like.

Okay then. If you both feel that way, I will come up with something
closer to what you sketch. But for now I still strongly feel it ought
to be driven by an insert. 

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread Greg Stark
On Wed, Jul 23, 2014 at 5:58 PM, Robert Haas robertmh...@gmail.com wrote:
 I'd suggest something like:

 UPSERT table SET col = value [, ...] WHERE predicate;

I don't think this is actually good enough. Typical use cases are
things like increment this counter or insert a new one starting at
0.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread Kevin Grittner
Peter Geoghegan p...@heroku.com wrote:

 I still strongly feel it ought to be driven by an insert

Could you clarify that?  Does this mean that you feel that we
should write to the heap before reading the index to see if the row
will be a duplicate?  If so, I think that is a bad idea, since this
will sometimes be used to apply a new data set which hasn't changed
much from the old, and that approach will perform poorly for this
use case, causing a lot of bloat.  It certainly would work well for
the case that most of the rows are expected to be INSERTs rather
than DELETEs, but I'm not sure that's justification for causing
extreme bloat in the other cases.

Also, just a reminder that I'm going to squawk loudly if the
implementation does not do something fairly predictable and sane
for the case that the table has more than one UNIQUE index and you
attempt to UPSERT a row that is a duplicate of one row on one of
the indexes and a different row on a different index.  The example
discussed during your PGCon talk was something like a city table
with two column, each with a UNIQUE constraint, containing:

 city_id | city_name
-+---
   1 | Toronto
   2 | Ottawa

... and an UPSERT comes through for (1, 'Ottawa').  We would all 
like for that never to happen, but it will.  There must be sane and
documented behavior in that case.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread Robert Haas
On Wed, Jul 23, 2014 at 4:05 PM, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Jul 23, 2014 at 9:58 AM, Robert Haas robertmh...@gmail.com wrote:
 This all seems completely to one side of Andres's point.  I think what
 he's saying is: don't implement an SQL syntax of the form INSERT ON
 CONFLICT and let people use that to implement upsert.  Instead,
 directly implement an SQL command called UPSERT or similar.  That's
 long been my intuition as well.  I think generality here is not your
 friend.

 Sure, I was just making one fairly narrow point in relation to Andres'
 concern about executor structuring of the UPSERT.

 I'd suggest something like:

 UPSERT table SET col = value [, ...] WHERE predicate;

 Without dismissing the overall concern shared by you and Andres, that
 particular update-driven syntax doesn't strike me as something that
 should be pursued. I would like to be able to insert or update more
 than a single row at a time, for one thing. For another, what exactly
 appears in the predicate? Is it more or less the same as an UPDATE's
 predicate?

To the last question, yes.  To the first point, I'm not bent on this
particular syntax, but I am +1 for the idea that the command is
something specifically upsert-like rather than something more generic
like an ON CONFLICT SELECT that lets you do arbitrary things with the
returned rows.

 with semantics like this:

 1. Search the table, using any type of scan you like, for a row
 matching the given predicate.

 Perhaps I've misunderstood, but this is fundamentally different to
 what I'd always thought would need to happen. I always understood that
 the UPSERT should be insert-driven, and so not really have an initial
 scan at all in the same sense that every Insert lacks one. Moreover,
 I've always thought that everyone agreed on that. We go to insert, and
 then in the course of inserting index tuples do something with dirty
 snapshots. That's how we get information on conflicts.

It's certain arguable whether you should INSERT and then turn failures
into an update or try to UPDATE and then turn failures into an INSERT;
we might even want to have both options available, though that smells
a little like airing too much of our dirty laundry.  But I think I
generally favor optimizing for the UPDATE case for more or less the
same reasons Kevin articulated.

 For one thing we cannot use any kind of scan unless there is a new
 mechanism for seeing not-yet-visible tuples, like some kind of
 non-MVCC snapshot that those scan nodes can selectively use. Now, I
 guess that you intend that in this scenario you go straight to 5, and
 then your insert finds the not-yet-visible conflict. But it's not
 clear that when 5 fails, and we pick up from 1 that we then get a new
 MVCC Snapshot or something else that gives us a hope of succeeding
 this time around. And it seems quite wasteful to not use knowledge of
 conflicts from the insert at all - that's one thing I'm trying to
 address here; removing unnecessary index scans when we actually know
 what our conflict TIDs are.

Here you seem to be suggested that I intended to propose your existing
design rather than something else, which I didn't.  In this design,
you find the conflict (at most one) but scanning for the tuple to be
updated.

 2. If you find more than one tuple that is visible to your scan, error.

 This point seems to concern making the UPSERT UPDATE only affect zero
 or one rows. Is that right? If so, why do you think that's a useful
 constraint to impose?

Because nobody wants an operation to either insert 1 tuple or update
n=1 tuples.  The intention is that the predicate should probably be
something like WHERE unique_key = 'some_value', but you can use
something else.  So it's kinda like saying which index you care about
for a particular operation, except without having to explicitly name
an index.  But in any event you should use a predicate that uniquely
identifies the tuple you want to update.

 3. If you find exactly one tuple that is visible to your scan, update it.  
 Stop.
 4. If you find no tuple that is visible to your scan, but you do find
 at least one tuple whose xmin has committed and whose xmax has not
 committed, then (4a) if the isolation level is REPEATABLE READ or
 higher, error; (4b) if there is more than one such tuple, error; else
 (4b) update it, in violation of normal MVCC visibility rules.

 Point 4b (if there is more than one such tuple...) seems like it
 needs some practical or theoretical justification - do you just not
 want to follow an update chain? If so, why? What would the error
 message actually be? And, to repeat myself: Why should an MVCC
 snapshot see more than one such tuple in the ordinary course of
 scanning a relation, since there is by definition a unique constraint
 that prevents this? Or maybe I'm wrong; maybe you don't even require
 that. That isn't clear.

In case (4b), like in case (2), you've done something like UPSERT tab
SET ... WHERE non_unique_index = 

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread David G Johnston
Peter Geoghegan-3 wrote
 with semantics like this:

 1. Search the table, using any type of scan you like, for a row
 matching the given predicate.
 
 Perhaps I've misunderstood, but this is fundamentally different to
 what I'd always thought would need to happen. I always understood that
 the UPSERT should be insert-driven, and so not really have an initial
 scan at all in the same sense that every Insert lacks one. Moreover,
 I've always thought that everyone agreed on that. We go to insert, and
 then in the course of inserting index tuples do something with dirty
 snapshots. That's how we get information on conflicts.

SQL standard not-withstanding there really is no way to pick one
implementation and not have it be sub-optimal in some situations.  Unless
there is a high barrier why not introduce syntax:

SCAN FIRST; INSERT FIRST

that allows the user to specify the behavior that they expect would be most
efficient given their existing/new data ratio?


 Having said all that, I believe the INSERT ON CONFLICT syntax is more
 easily comprehensible than previous proposals.  But I still tend to
 agree with Andres that an explicit UPSERT syntax or something like it,
 that captures all of the MVCC games inside itself, is likely
 preferable from a user standpoint, whatever the implementation ends up
 looking like.
 
 Okay then. If you both feel that way, I will come up with something
 closer to what you sketch. But for now I still strongly feel it ought
 to be driven by an insert. Perhaps I've misunderstood you entirely,
 though.

Getting a little syntax crazy here but having all of:

UPSERT [SCAN|INSERT] FIRST
INSERT ON CONFLICT UPDATE - same as INSERT FIRST
UPDATE ON MISSING INSERT - same as SCAN FIRST

with the corresponding 2 implementations would make the user interface
slightly more complicated but able to be conformed to the actual data that
the user has.


You could basically perform a two-phase pass where you run the
user-requested algorithm and then for all failures attempt the alternate
algorithm and then error if both fail.

I am not at all fluent on the concurrency issues here, and the MVCC
violations and re-tries that might be considered, but at a high-level there
is disagreement here simply because both answers are correct and ideally
both can be provided to the user.

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Making-joins-involving-ctid-work-for-the-benefit-of-UPSERT-tp5811919p5812640.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread Peter Geoghegan
On Wed, Jul 23, 2014 at 3:01 PM, Kevin Grittner kgri...@ymail.com wrote:
 Could you clarify that?  Does this mean that you feel that we
 should write to the heap before reading the index to see if the row
 will be a duplicate?  If so, I think that is a bad idea, since this
 will sometimes be used to apply a new data set which hasn't changed
 much from the old, and that approach will perform poorly for this
 use case, causing a lot of bloat.  It certainly would work well for
 the case that most of the rows are expected to be INSERTs rather
 than DELETEs, but I'm not sure that's justification for causing
 extreme bloat in the other cases.

No, I think we should stagger ordinary index insertion in a way that
locks indexes - we lock indexes, then if successful insert a heap
tuple before finally inserting index tuples using the existing
heavyweight page-level index locks. My design doesn't cause bloat
under any circumstances. Heikki's design, which he sketched with an
actual POC implementation involved possible bloating in the event of a
conflict. He also had to go and delete the promise tuple (from within
ExecInsert()) in the event of the conflict before row locking in order
to prevent unprincipled deadlocking. Andres wanted to do something
else similarly involving promise tuples, where the xid on the
inserter was stored in indexes with a special flag. That could also
cause bloat. I think that could be particularly bad when conflicts
necessitate visiting indexes one by one to kill promise tuples, as
opposed to just killing one heap tuple as in the case of Heikki's
design.

Anyway, both of those designs, and my own design are insert-driven.
The main difference between the design that Heikki sketched and my own
is that mine does not cause bloat, but is more invasive to the nbtree
code (but less invasive to a lot of other places to make the
deadlocking-ultimately-conflicting tuple killing work). But I believe
that Heikki's design is identical to my own in terms of user-visible
semantics. That said, his design was just a sketched and it wouldn't
be fair to hold him to it.

 Also, just a reminder that I'm going to squawk loudly if the
 implementation does not do something fairly predictable and sane
 for the case that the table has more than one UNIQUE index and you
 attempt to UPSERT a row that is a duplicate of one row on one of
 the indexes and a different row on a different index.

Duly noted.  :-)

I think that it's going to have to support that one way or the other.
It might be the case that I'll want to make the choice of unique index
optionally implicit, but it's clear that we want to be able to
specify a specific unique index in one form or another. Actually, I've
already added that. It's just optional right now. I haven't found a
better way than by just specifying the name of the unique index in
DML, which is ugly, which is the main reason I want to make it
optional. Perhaps we can overcome this.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread Alvaro Herrera
Robert Haas wrote:
 On Wed, Jul 23, 2014 at 4:05 PM, Peter Geoghegan p...@heroku.com wrote:
  On Wed, Jul 23, 2014 at 9:58 AM, Robert Haas robertmh...@gmail.com wrote:

  2. If you find more than one tuple that is visible to your scan, error.
 
  This point seems to concern making the UPSERT UPDATE only affect zero
  or one rows. Is that right? If so, why do you think that's a useful
  constraint to impose?
 
 Because nobody wants an operation to either insert 1 tuple or update
 n=1 tuples.  The intention is that the predicate should probably be
 something like WHERE unique_key = 'some_value', but you can use
 something else.  So it's kinda like saying which index you care about
 for a particular operation, except without having to explicitly name
 an index.  But in any event you should use a predicate that uniquely
 identifies the tuple you want to update.

This seemed a nice idea when I first read it earlier today, but now I'm
not so sure.  Are you saying that it wouldn't be allowed to use an
UPSERT with some sort of join, such that each joined row would produce
either one insert or one update?  To clarify: suppose I import some
external data into a temp table, then run UPSERT USING that table so
that the rows end up in a permanent table; some of the rows might be
already in the permanent table, some others might not.  I would hope
that by the time UPSERT is done, all the rows are in the permanent
table.  Would that raise an error, with your proposed design?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread Peter Geoghegan
On Wed, Jul 23, 2014 at 3:27 PM, Robert Haas robertmh...@gmail.com wrote:
 To the last question, yes.  To the first point, I'm not bent on this
 particular syntax, but I am +1 for the idea that the command is
 something specifically upsert-like rather than something more generic
 like an ON CONFLICT SELECT that lets you do arbitrary things with the
 returned rows.

FWIW I wasn't proposing that you'd be able to do arbitrary things.
There'd be a few restrictions, such as forbidding unexpected DML
commands, and requiring that only a special update reference the
special rejected-projecting CTE. They just wouldn't be arbitrary
restrictions. But that's not that interesting to me anymore; you and
Andres want a dedicated DML command with the update part built in,
that isn't as flexible. Okay, fine. I don't think it makes the
implementation any easier, but that's what I'll do.

 It's certain arguable whether you should INSERT and then turn failures
 into an update or try to UPDATE and then turn failures into an INSERT;
 we might even want to have both options available, though that smells
 a little like airing too much of our dirty laundry.  But I think I
 generally favor optimizing for the UPDATE case for more or less the
 same reasons Kevin articulated.

I don't see the connection between this and Kevin's remarks. And FWIW,
I don't see a reason to favor inserts or updates. Fortunately, what I
have balances both cases very well, and doesn't cause bloat. The work
of descending the index to lock it isn't wasted if an update is
required. My implementation decides to either insert or update at
literally the latest possible moment.

 For one thing we cannot use any kind of scan unless there is a new
 mechanism for seeing not-yet-visible tuples, like some kind of
 non-MVCC snapshot that those scan nodes can selectively use. Now, I
 guess that you intend that in this scenario you go straight to 5, and
 then your insert finds the not-yet-visible conflict. But it's not
 clear that when 5 fails, and we pick up from 1 that we then get a new
 MVCC Snapshot or something else that gives us a hope of succeeding
 this time around. And it seems quite wasteful to not use knowledge of
 conflicts from the insert at all - that's one thing I'm trying to
 address here; removing unnecessary index scans when we actually know
 what our conflict TIDs are.

 Here you seem to be suggested that I intended to propose your existing
 design rather than something else, which I didn't.  In this design,
 you find the conflict (at most one) but scanning for the tuple to be
 updated.

Yes, but what if you don't see a conflict because it isn't visible to
your snapshot, and then you insert, and only then (step 5), presumably
with a dirty snapshot, you find a conflict? How does the loop
terminate if that brings you back to step 1 with the same MVCC
snapshot feeding the update?

 2. If you find more than one tuple that is visible to your scan, error.

 This point seems to concern making the UPSERT UPDATE only affect zero
 or one rows. Is that right? If so, why do you think that's a useful
 constraint to impose?

 Because nobody wants an operation to either insert 1 tuple or update
 n=1 tuples.  The intention is that the predicate should probably be
 something like WHERE unique_key = 'some_value', but you can use
 something else.  So it's kinda like saying which index you care about
 for a particular operation, except without having to explicitly name
 an index.  But in any event you should use a predicate that uniquely
 identifies the tuple you want to update.

I agree that you want to uniquely identify each tuple. What I meant
was, why should we not be able to upsert multiple rows in a single
command? What's wrong with that?

 Point 4b (if there is more than one such tuple...) seems like it
 needs some practical or theoretical justification - do you just not
 want to follow an update chain? If so, why? What would the error
 message actually be? And, to repeat myself: Why should an MVCC
 snapshot see more than one such tuple in the ordinary course of
 scanning a relation, since there is by definition a unique constraint
 that prevents this? Or maybe I'm wrong; maybe you don't even require
 that. That isn't clear.

 In case (4b), like in case (2), you've done something like UPSERT tab
 SET ... WHERE non_unique_index = 'value_appearing_more_than_once'.
 You shouldn't do that, because you have only one replacement tuple
 (embodied in the SET clause).

Oh, I totally agree that we should throw an error if any one row  is
affected more than once by the same command. Indeed, SQL MERGE
requires this, since to do otherwise would leave the final value of
the row unspecified. It seemed to me from your original explanation of
point 4b that you were saying something much stronger. But if that's
all you meant, then I agree.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread Peter Geoghegan
On Wed, Jul 23, 2014 at 9:58 AM, Robert Haas robertmh...@gmail.com wrote:
 I'd suggest something like:

 UPSERT table SET col = value [, ...] WHERE predicate;

I think I see another problem with this. The UPDATE must provide a
WHERE clause Var on a unique indexed column (let's say it's
constrained to provide a (Var [unique-indexed opclass support
function 3 op] Const) qual during parse analysis because you asked
for UPSERT. But it can also affect 0 rows for other reasons, since
this UPDATE qual can have arbitrary other expressions. So in general
you have any number of reasons for not updating, which implies that
you must insert, which might not be possible because there might even
still be duplicate key violation in a not-yet-visible-to-you row (even
just in the unique index you intended to merge on). Whereas, when
inserting, there is exactly one reason (per row) to go and update - a
conflict (i.e. a would-be duplicate violation). And this is a conflict
that is meaningful to every transaction, regardless of their snapshot,
since it represents an objective fact about the physical presence of
an index tuple.

So, do you make the UPDATE differentiate between different reasons for
the UPDATE failing, only inserting when an UPSERT would have happened
had you omitted the extra stuff in your UPSERT predicate? Can you
really differentiate anyway? And isn't the choice to do the insert
based on what your MVCC snapshot happens to see - rather than
something that there is necessarily objective truth on, the physical
presence of a duplicate tuple in an index - rather arbitrary? It's a
different situation to my implementation, where you do an insert-like
thing, and then find a conflict row to lock and then update, because
at that point the row is successfully locked, and the WHERE clause may
be evaluated against rejects and the *most recent version* of the
locked, conflict-causing row. The most recent, locked version is not
arbitrary at all. That's the version you ought to evaluate a predicate
against before finally deciding to UPDATE.

I assume you agree with my view that UPSERT should always insert or
update. This kind of business sounds closer to SQL MERGE, where an
insert may not drive things, and you accept these kinds of anomalies
in exchange for a lot more flexibility in not having inserts drive
things. Did you happen to read my post on that?
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-19 Thread Peter Geoghegan
On Fri, Jul 18, 2014 at 11:23 AM, Andres Freund and...@2ndquadrant.com wrote:
 Meh. A understandable syntax wouldn't require the pullups with a special
 scan node and such.

Well, in general ExecModifyTable()/ExecUpdate() trusts the tid passed
to match the qual in the query. Unless you're willing to give up on
the idea of having a qual on the update (other than something like an
ON CONFLICTS qual, obviously) I think you'll probably end up with some
kind of pseudo-scan node anyway, if only for consistency with how
things already work, to give you somewhere to show the Filter in
explain output and so on. ExecScan() probably needs to ExecQual().
Inserts don't have scan nodes, but updates do, and so it seems pretty
odd to make the Insert node (a ModifyTable node) pullup from a
result node (as always), and the Update node (also a ModifyTable
node) treat the first ModifyTable node as its own pseudo-scan node,
when in fact we are scanning the heap in a manner not unlike a
conventional update. Or do you envision a second result node where an
update qual might be evaluated? I am not enthused about either
possibility.

The idea of not having the ability to put a predicate on the update at
all, much like the MySQL thing isn't completely outrageous, but it
certainly isn't going to go down well those that have already
expressed concerns about how much of a foot gun this could be.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-18 Thread Andres Freund
Hi,

On 2014-07-17 18:18:41 -0700, Peter Geoghegan wrote:
 I'm working on UPSERT again. I think that in order to make useful
 progress, I'll have to find a better way of overcoming the visibility
 issues (i.e. the problem of what to do about a
 still-in-progress-to-our-snapshot row being locked at READ COMMITTED
 isolation level [1][2]).

Agreed.

 I've made some tentative additions to my earlier patch to change the syntax:
 
 postgres=# explain analyze
 with c as (
   insert into ints(key, val) values (1, 'a'), (2, 'b')
   on conflict
   select * for update)
   update ints set val = c.val from c conflicts;

I still don't agree that this is a sane syntax for upsert. Avoiding it
would reduce the scope of the problems you have measureably. We should
provide a syntax that does the UPSERT itself, instead of delegating that
to the user as you're proposing above.  Afaics nearly none of the
problems you're debating in your email would exist if upsert were
entirely done internally.

I think the things that use wierd visibility semantics are pretty much
all doing that internally (things being EvalPlanQual stuff for
INSERT/UPDATE/DELETE and the referential integrity triggers). I don't
see sufficient reason why we should break away from that here. Yes,
there's stuff that cannot be done when it's done internally, but we can
live with those not being possible.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-18 Thread Peter Geoghegan
On Fri, Jul 18, 2014 at 10:46 AM, Andres Freund and...@2ndquadrant.com wrote:
 I think the things that use wierd visibility semantics are pretty much
 all doing that internally (things being EvalPlanQual stuff for
 INSERT/UPDATE/DELETE and the referential integrity triggers). I don't
 see sufficient reason why we should break away from that here. Yes,
 there's stuff that cannot be done when it's done internally, but we can
 live with those not being possible.

The whole point of what I was proposing was that those semantics would
only apply to a special tid scan node. Perhaps I missed something, but
I'm not sure why you'd consider that I was breaking away from that
here at all.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-18 Thread Andres Freund
On 2014-07-18 10:53:36 -0700, Peter Geoghegan wrote:
 On Fri, Jul 18, 2014 at 10:46 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  I think the things that use wierd visibility semantics are pretty much
  all doing that internally (things being EvalPlanQual stuff for
  INSERT/UPDATE/DELETE and the referential integrity triggers). I don't
  see sufficient reason why we should break away from that here. Yes,
  there's stuff that cannot be done when it's done internally, but we can
  live with those not being possible.
 
 The whole point of what I was proposing was that those semantics would
 only apply to a special tid scan node. Perhaps I missed something, but
 I'm not sure why you'd consider that I was breaking away from that
 here at all.

I don't see why you'd need such a node at all if we had a fully builtin
UPSERT. The whole stuff with ON CONFLICT SELECT FOR UPDATE and then
UPDATE ... FROM c CONFLICTS is too complicated and exposes stuff that
barely anybody will understand, let alone use correctly in queries they
write themselves.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-18 Thread Peter Geoghegan
On Fri, Jul 18, 2014 at 11:06 AM, Andres Freund and...@2ndquadrant.com wrote:
 I don't see why you'd need such a node at all if we had a fully builtin
 UPSERT. The whole stuff with ON CONFLICT SELECT FOR UPDATE and then
 UPDATE ... FROM c CONFLICTS is too complicated and exposes stuff that
 barely anybody will understand, let alone use correctly in queries they
 write themselves.

I accept that there will be a need for certain restrictions. Most
obviously, if you update the target table referencing a CTE like this,
not using the special CONFLICTS clause in the UPDATE (or DELETE) is an
error. And as I mentioned, you may only join the projected duplicates
to the UPDATE ModifyTable - an attempt to join any more relations is
an error. In short, this *is* a fully built-in upsert.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-18 Thread Andres Freund
On 2014-07-18 11:14:34 -0700, Peter Geoghegan wrote:
 On Fri, Jul 18, 2014 at 11:06 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  I don't see why you'd need such a node at all if we had a fully builtin
  UPSERT. The whole stuff with ON CONFLICT SELECT FOR UPDATE and then
  UPDATE ... FROM c CONFLICTS is too complicated and exposes stuff that
  barely anybody will understand, let alone use correctly in queries they
  write themselves.
 
 I accept that there will be a need for certain restrictions. Most
 obviously, if you update the target table referencing a CTE like this,
 not using the special CONFLICTS clause in the UPDATE (or DELETE) is an
 error. And as I mentioned, you may only join the projected duplicates
 to the UPDATE ModifyTable - an attempt to join any more relations is
 an error. In short, this *is* a fully built-in upsert.

Meh. A understandable syntax wouldn't require the pullups with a special
scan node and such. I think you're attempting a sort of genericity
that's making your (important!) goal much harder to reach.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-18 Thread Peter Geoghegan
On Fri, Jul 18, 2014 at 11:23 AM, Andres Freund and...@2ndquadrant.com wrote:
 Meh. A understandable syntax wouldn't require the pullups with a special
 scan node and such. I think you're attempting a sort of genericity
 that's making your (important!) goal much harder to reach.

Well, maybe. If the genericity of this syntax isn't what people want,
I may go with something else. At the suggestion of Robert I did start
a dedicated thread on that question months back (explicitly putting
aside the nitty-gritty details of the implementation), but that didn't
go anywhere.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-18 Thread Peter Geoghegan
On Fri, Jul 18, 2014 at 11:32 AM, Peter Geoghegan p...@heroku.com wrote:
 Well, maybe. If the genericity of this syntax isn't what people want,
 I may go with something else.

By the way, I didn't mention that there is now also the optional
ability to specify a merge on unique index directly in DML. It would
be much nicer to specify a sort key instead, but that might be tricky
in the general case. I imagine that other systems solve the problem by
being very restrictive in what will be (implicitly) accepted as a
merge-on index. Seemingly there are problems with all major SQL MERGE
implementations, so I don't necessarily expect to be able to draw
lessons from them in any way here.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-17 Thread Peter Geoghegan
I'm working on UPSERT again. I think that in order to make useful
progress, I'll have to find a better way of overcoming the visibility
issues (i.e. the problem of what to do about a
still-in-progress-to-our-snapshot row being locked at READ COMMITTED
isolation level [1][2]).

I've made some tentative additions to my earlier patch to change the syntax:

postgres=# explain analyze
with c as (
  insert into ints(key, val) values (1, 'a'), (2, 'b')
  on conflict
  select * for update)
  update ints set val = c.val from c conflicts;
  QUERY PLAN
--
 Update on ints  (cost=0.03..62.34 rows=2 width=104) (actual
time=0.095..0.095 rows=0 loops=1)
   CTE c
 -  Insert on ints ints_1  (cost=0.00..0.03 rows=2 width=36)
(actual time=0.039..0.049 rows=2 loops=1)
   -  Values Scan on *VALUES*  (cost=0.00..0.03 rows=2
width=36) (actual time=0.001..0.002 rows=2 loops=1)
   -  Nested Loop  (cost=0.00..62.31 rows=2 width=104) (actual
time=0.063..0.080 rows=2 loops=1)
 Join Filter: (ints.ctid = c.ctid)
 Rows Removed by Join Filter: 6
 -  CTE Scan on c  (cost=0.00..0.04 rows=2 width=100) (actual
time=0.044..0.055 rows=2 loops=1)
 -  Materialize  (cost=0.00..28.45 rows=1230 width=10)
(actual time=0.005..0.006 rows=4 loops=2)
   -  Seq Scan on ints  (cost=0.00..22.30 rows=1230
width=10) (actual time=0.009..0.009 rows=4 loops=1)
 Planning time: 0.132 ms
 Execution time: 0.159 ms
(12 rows)

This works, as far as it goes. Parse analysis adds the ctid join to
the top-level UPDATE based on the fact that a magical UPDATE statement
(magical due to having a CONFLICTS clause) references a CTE in its
FromList (this is mandated by CONFLICTS during parse analysis), a CTE
containing an INSERT ON CONFLICT, and itself projecting a magical set
of rejected rows sufficient to get a locked c.ctid implicitly. (FWIW,
DELETEs may also accept a CONFLICTS clause and be used like this).

The optimizer does not support joins involving ctid, which is why
there is a seqscan plan - even setting enable_seqscan = off does not
alter the plan right now.

Apart from the obvious fact that we're doing an unnecessary Seq Scan
on the target table, which is unacceptable on performance grounds
(since we do in fact already know exactly what list of tids to pass to
the top-level ModifyTable node to UPDATE), there is another problem:
Since I cannot use a tid scan, I cannot play tricks with visibility
within tidscans for the benefit of solving the basic problem of doing
the right thing for READ COMMITTED. Clearly I need to add a new MVCC
violation in the spirit of EvalPlanQual() to avoid this problem
(since READ COMMITTED serialization failures are unacceptable to
users), but I'd like this new MVCC violation to be as selective as
possible.

Let me take a step back, though. Presently UPDATE WHERE CURRENT OF
does something that isn't a million miles from what I have in mind to
address the problem. That feature exists for sensitive cursors,
where we lock rows (since FOR UPDATE was specified with the DECLARE
CURSOR statement) as they're fetched from the cursor. The upshot is
that in general, rows can change out from under us since they're not
yet locked, including in ways that might result in a new row version
not satisfying the original SELECT FOR UPDATE query predicate -
clearly users don't want to have to deal with that when they
subsequently go to UPDATE WHERE CURRENT OF. Special handling is
required. Purely because UPDATE WHERE CURRENT OF was specified, as
part of this special handling that update must use a tid scan. There
is a little bit of special machinery within the optimizer (within
cost_tidscan()) to force this for the isCurrentOf case.

The tid scan code within the executor also has special UPDATE WHERE
CURRENT OF handling. TidListCreate() specially handles the case by
asking execCurrentOf() to look up the cursor's portal, and to get a
tid from its QueryDesc for the current cursor position. TidNext()
specially fetches the most recent version visible to our estate's
Snapshot when we UPDATE WHERE CURRENT OF, without considering the
original SELECT FOR UPDATE predicate (and whether or not it would in
any sense continue to be satisfied). Each tid is then fetched from the
heap and returned. I think I ought to be able to do something similar
with this new CONFLICTS clause, by similarly marshaling some list of
tids ultimately originating from the ModifyTable node that locked a
potentially-not-yet-visible row (or maybe the optimizer recognizes the
internal special case and creates the tid scan node visibility
credulous). I'd then return heap tuples by using a dirty snapshot.
Since this can only happen for the UPDATE's tid scan, and there must
be a tid scan, it should be correct (on its own fairly reasonable
terms, for READ COMMITTED). After all, at 

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-17 Thread Peter Geoghegan
On Thu, Jul 17, 2014 at 6:18 PM, Peter Geoghegan p...@heroku.com wrote:
 This appears to be a Simple Matter of Programming (at least
 for someone that happens to already have a good understanding of the
 optimizer), and is anticipated by this comment within tidpath.c:

  * There is currently no special support for joins involving CTID; in
  * particular nothing corresponding to best_inner_indexscan().  Since it's
  * not very useful to store TIDs of one table in another table, there
  * doesn't seem to be enough use-case to justify adding a lot of code
  * for that.

By the way, this comment is obsolete since best_inner_indexscan() was
removed in 2012 by the parameterized paths patch.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers