subject:"Re\: \[HACKERS\] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE"

On Sat, Jan 18, 2014 at 5:28 AM, Robert Haas robertmh...@gmail.com wrote:
 I was wondering if that might cause deadlocks if an existing index is
 changed from unique to non-unique, or vice versa, as the ordering would
 change. But we don't have a DDL command to change that, so the question is
 moot.

 It's not hard to imagine someone wanting to add such a DDL command.

Perhaps, but the burden of solving that problem ought to rest with
whoever eventually propose the command. Certainly, if someone did so
today, I would object on the grounds that their patch precluded us
from ever prioritizing unique indexes, to get them out of the way
during insertion, so I am not actually making such an effort more
difficult than it already is. Moreover, avoiding entirely predictable
index bloat is more important than making supporting this yet to be
proposed feature's implementation easier. I was surprised when I
learned that things didn't already work this way.

Attached patch, broken off from my patch has relcache sort indexes by
(!indisunique, relindexid).

-- 
Peter Geoghegan
*** a/src/backend/utils/cache/relcache.c
--- b/src/backend/utils/cache/relcache.c
*** typedef struct relidcacheent
*** 108,113 
--- 108,125 
  	Relation	reldesc;
  } RelIdCacheEnt;
  
+ /*
+  *		Representation of indexes for sorting purposes
+  *
+  *		We use this to sort indexes globally by a specific sort order, per
+  *		RelationGetIndexList().
+  */
+ typedef struct relidunq
+ {
+ 	bool		indisunique;
+ 	Oid			relindexid;
+ } relidunq;
+ 
  static HTAB *RelationIdCache;
  
  /*
*** static TupleDesc GetPgClassDescriptor(vo
*** 246,252 
  static TupleDesc GetPgIndexDescriptor(void);
  static void AttrDefaultFetch(Relation relation);
  static void CheckConstraintFetch(Relation relation);
! static List *insert_ordered_oid(List *list, Oid datum);
  static void IndexSupportInitialize(oidvector *indclass,
  	   RegProcedure *indexSupport,
  	   Oid *opFamily,
--- 258,264 
  static TupleDesc GetPgIndexDescriptor(void);
  static void AttrDefaultFetch(Relation relation);
  static void CheckConstraintFetch(Relation relation);
! static int relidunq_cmp(const void *a, const void *b);
  static void IndexSupportInitialize(oidvector *indclass,
  	   RegProcedure *indexSupport,
  	   Oid *opFamily,
*** CheckConstraintFetch(Relation relation)
*** 3445,3455 
   * Such indexes are expected to be dropped momentarily, and should not be
   * touched at all by any caller of this function.
   *
!  * The returned list is guaranteed to be sorted in order by OID.  This is
!  * needed by the executor, since for index types that we obtain exclusive
!  * locks on when updating the index, all backends must lock the indexes in
!  * the same order or we will get deadlocks (see ExecOpenIndices()).  Any
!  * consistent ordering would do, but ordering by OID is easy.
   *
   * Since shared cache inval causes the relcache's copy of the list to go away,
   * we return a copy of the list palloc'd in the caller's context.  The caller
--- 3457,3471 
   * Such indexes are expected to be dropped momentarily, and should not be
   * touched at all by any caller of this function.
   *
!  * The returned list is guaranteed to be in (!indisunique, OID) order.  This is
!  * needed by the executor, since for index types that we obtain exclusive locks
!  * on when updating the index, all backends must lock the indexes in the same
!  * order or we will get deadlocks (see ExecOpenIndices()).  For most purposes
!  * any consistent ordering would do, but there is further consideration, which
!  * is why we put unique indexes first: it is generally useful to get insertion
!  * into unique indexes out of the way, since unique violations are the cause of
!  * many aborted transactions.  We can always avoid bloating non-unique indexes
!  * of the same slot.
   *
   * Since shared cache inval causes the relcache's copy of the list to go away,
   * we return a copy of the list palloc'd in the caller's context.  The caller
*** RelationGetIndexList(Relation relation)
*** 3469,3475 
  	SysScanDesc indscan;
  	ScanKeyData skey;
  	HeapTuple	htup;
! 	List	   *result;
  	char		replident = relation-rd_rel-relreplident;
  	Oid			oidIndex = InvalidOid;
  	Oid			pkeyIndex = InvalidOid;
--- 3485,3495 
  	SysScanDesc indscan;
  	ScanKeyData skey;
  	HeapTuple	htup;
! 	relidunq   *indexTypes;
! 	int			nIndexType;
! 	int			i;
! 	Size		szIndexTypes;
! 	List	   *result = NIL;
  	char		replident = relation-rd_rel-relreplident;
  	Oid			oidIndex = InvalidOid;
  	Oid			pkeyIndex = InvalidOid;
*** RelationGetIndexList(Relation relation)
*** 3486,3494 
  	 * list into the relcache entry.  This avoids cache-context memory leakage
  	 * if we get some sort of error partway through.
  	 */
- 	result = NIL;
  	oidIndex = InvalidOid;
  
  	/* Prepare to scan pg_index for entries having indrelid = this rel. */

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

On Thu, Jan 16, 2014 at 6:31 PM, Peter Geoghegan p...@heroku.com wrote:
 I think we need to give this some more thought. I have not addressed
 the implications for MVCC snapshots here.

So I gave this some more thought, and this is what I came up with:

+ static bool
+ ExecLockHeapTupleForUpdateSpec(EState *estate,
+  ResultRelInfo 
*relinfo,
+  ItemPointer tid)
+ {
+   Relationrelation = 
relinfo-ri_RelationDesc;
+   HeapTupleData   tuple;
+   HeapUpdateFailureData   hufd;
+   HTSU_Result test;
+   Buffer  buffer;
+
+   Assert(ItemPointerIsValid(tid));
+
+   /* Lock tuple for update */
+   tuple.t_self = *tid;
+   test = heap_lock_tuple(relation, tuple,
+  estate-es_output_cid,
+  LockTupleExclusive, false, 
/* wait */
+  true, buffer, hufd);
+   ReleaseBuffer(buffer);
+
+   switch (test)
+   {
+   case HeapTupleInvisible:
+   /*
+* Tuple may have originated from this transaction, in 
which case
+* it's already locked.  However, to avoid having to 
consider the
+* case where the user locked an instantaneously 
invisible row
+* inserted in the same command, throw an error.
+*/
+   if 
(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple.t_data)))
+   ereport(ERROR,
+   
(errcode(ERRCODE_UNIQUE_VIOLATION),
+errmsg(could not lock 
instantaneously invisible tuple
inserted in same transaction),
+errhint(Ensure that no rows 
proposed for insertion in the
same command have constrained values that duplicate each other.)));
+   if (IsolationUsesXactSnapshot())
+   ereport(ERROR,
+   
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+errmsg(could not serialize 
access due to concurrent update)));
+   /* Tuple became invisible due to concurrent update; try 
again */
+   return false;
+   case HeapTupleSelfUpdated:
+   /*

I'm just throwing an error when locking the tuple returns
HeapTupleInvisible, and the xmin of the tuple is our xid.

It's sufficient to just check
TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple.t_data)),
because there is no way that _bt_check_unique() could consider the
tuple dirty visible + conclusively fit for a lock attempt if it came
from our xact, while at the same time for the same tuple
HeapTupleSatisfiesUpdate() indicated invisibility, unless the tuple
originated from the same command. Checking against subxacts or
ancestor xacts is at worst redundant.

I am happy with this. ISTM that it'd be hard to argue that any
reasonable and well-informed person would ever thank us for trying
harder here, although it took me a while to reach that position. To
understand what I mean, consider what MySQL does when in a similar
position. I didn't actually check, but based on the fact that their
docs don't consider this question I guess MySQL would go update the
tuple inserted by that same INSERTON DUPLICATE KEY UPDATE
command. Most of the time the conflicting tuples proposed for
insertion by the user are in *some* way different (i.e. if the table
was initially empty and you did a regular insert, inserting those same
tuples would cause a unique constraint violation all on their own, but
without there being any fully identical tuples among these
hypothetical tuples proposed for insertion). It seems obvious that the
order in which each tuple is evaluated for insert-or-update on MySQL
is more or less undefined. And so by allowing this, they arguably
allow their users to miss something they should not: they don't end up
doing anything useful with the datums originally inserted in the
command, but then subsequently updated over with something else in the
same command.

MySQL users are not notified that this happened, and are probably
blissfully unaware that there has been a limited form of data loss. So
it's The Right Thing to say to Postgres users: if you inserted these
rows into the table when it was empty, there'd *still* definitely be a
unique constraint violation, and you need to sort that out before
asking Postgres to handle conflicts with concurrent sessions and
existing data, where rows that come from earlier commands in your xact

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

On Sat, Jan 18, 2014 at 6:17 PM, Peter Geoghegan p...@heroku.com wrote:
 MySQL users are not notified that this happened, and are probably
 blissfully unaware that there has been a limited form of data loss. So
 it's The Right Thing to say to Postgres users: if you inserted these
 rows into the table when it was empty, there'd *still* definitely be a
 unique constraint violation, and you need to sort that out before
 asking Postgres to handle conflicts with concurrent sessions and
 existing data, where rows that come from earlier commands in your xact
 counts as existing data.

I Googled and found evidence indicating that a number of popular
proprietary system's SQL MERGE implementations do much the same thing.
You may get an attempt to UPDATE the same row twice error on both
SQL Server and Oracle. I wouldn't like to speculate if the standard
requires this of MERGE, but to require it seems very sensible.

 The only problem I can see with that is that
 we cannot complain consistently for practical reasons, as when we lock
 *some other* xact's tuple rather than inserting in the same command
 two or more times.

Actually, maybe it would be practical to complain that the same UPSERT
command attempted to lock a row twice with at least *almost* total
accuracy, and not just for the particularly problematic case where
tuple visibility is not assured.

Personally, I favor just making case HeapTupleSelfUpdated: within
the patch's ExecLockHeapTupleForUpdateSpec() function complain when
hufd.cmax == estate-es_output_cid) (currently there is a separate
complaint, but only when those two variables are unequal). That's
probably almost perfect in practice.

If we wanted perfection, which would be to always complain when two
rows were locked by the same UPSERT command, it would be a matter of
having heap_lock_tuple indicate to the patch's
ExecLockHeapTupleForUpdateSpec() caller that the row was already
locked, so that it could complain in a special way for the
locked-not-updated case. But that is hard, because there is no way for
it to know if the current *command* locked the tuple, and that's the
only case that we are justified in raising an error for.

But now that I think about it some more, maybe always complaining when
we lock but have not yet updated is not just not worth the trouble,
but is in fact bogus. It's not obvious what precise behavior is
correct here. I was worried about someone updating something twice,
but maybe it's fully sufficient to do what I've already proposed,
while in addition documenting that you cannot on-duplicate-key-lock a
tuple that has already been inserted or updated within the same
command. It will be very rare for anyone to trip up over that in
practice (e.g. by locking twice and spuriously updating the same row
twice or more in a later command). Users learn to not try this kind of
thing by having it break immediately; the fact that it doesn't break
with 100% reliability is good enough (plus it doesn't *really* fail to
break when it should because of how things are documented).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

On Sat, Jan 18, 2014 at 7:49 PM, Peter Geoghegan p...@heroku.com wrote:
 Personally, I favor just making case HeapTupleSelfUpdated: within
 the patch's ExecLockHeapTupleForUpdateSpec() function complain when
 hufd.cmax == estate-es_output_cid) (currently there is a separate
 complaint, but only when those two variables are unequal). That's
 probably almost perfect in practice.

Actually, there isn't really a need to do so, since I believe in
practice the tuple locked will always be instantaneously invisible
(when we have the scope to avoid this updated the tuple twice in the
same command problem by forbidding it in the style of SQL MERGE).
However, I think I'm going to propose that we still do something in
the ExecLockHeapTupleForUpdateSpec() HeapTupleSelfUpdated handler (in
addition to HeapTupleInvisible), because that'll still be illustrative
dead code.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-16 Thread Heikki Linnakangas


On 01/16/2014 03:25 AM, Peter Geoghegan wrote:

I think you should consider breaking off the relcache parts of my
patch and committing them, because they're independently useful. If we
are going to have a lot of conflicts that need to be handled by a
heap_delete(), there is no point in inserting non-unique index tuples
for what is not yet conclusively a proper (non-promise) tuple. Those
should always come last. And even without upsert, strictly inserting
into unique indexes first seems like a useful thing relative to the
cost. Unique violations are the cause of many aborted transactions,
and there is no need to ever bloat non-unique indexes of the same slot
when that happens.


Makes sense. Can you extract that into a separate patch, please?

I was wondering if that might cause deadlocks if an existing index is 
changed from unique to non-unique, or vice versa, as the ordering would 
change. But we don't have a DDL command to change that, so the question 
is moot.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-16 Thread Peter Geoghegan

On Thu, Jan 16, 2014 at 12:35 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Makes sense. Can you extract that into a separate patch, please?

Okay.

On an unrelated note, here are results for a benchmark that compares
the two patches for an insert heavy workload:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/insert-heavy-cmp/

I should point out that this is a sympathetic case for the exclusion
approach; there is only one unique index involved, and the heap tuples
were relatively wide:

pg@gerbil:~/pgbench-tools/tests$ cat tpc-b-upsert.sql
\set nbranches 10
\set naccounts 10
\setrandom aid 1 :naccounts
\setrandom bid 1 :nbranches
\setrandom delta -5000 5000
with rej as(insert into pgbench_accounts(aid, bid, abalance, filler)
values(:aid, :bid, :delta, 'filler') on duplicate key lock for update
returning rejects aid, abalance) update pgbench_accounts set abalance
= pgbench_accounts.abalance + rej.abalance from rej where
pgbench_accounts.aid = rej.aid;

(This benchmark used an unlogged table, if only because to do
otherwise would severely starve this particular server of I/O).
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-16 Thread Peter Geoghegan

On Wed, Jan 15, 2014 at 11:02 PM, Peter Geoghegan p...@heroku.com wrote:
 It might just be a matter of:

 @@ -186,6 +186,13 @@ ExecLockHeapTupleForUpdateSpec(EState *estate,
 switch (test)
 {
 case HeapTupleInvisible:
 +   /*
 +* Tuple may have originated from this command, in 
 which case it's
 +* already locked
 +*/
 +   if 
 (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple.t_data))
 
 +   HeapTupleHeaderGetCmin(tuple.t_data) == 
 estate-es_output_cid)
 +   return true;
 /* Tuple became invisible;  try again */
 if (IsolationUsesXactSnapshot())
 ereport(ERROR,

I think we need to give this some more thought. I have not addressed
the implications for MVCC snapshots here. I think that I'll need to
raise a WARNING along the lines of your snapshot isn't going to
consider the locked tuple visible because the same command inserted
it, or perhaps even raise an ERROR regardless of isolation level
(although note that I'm not suggesting that we raise an ERROR in the
event of receiving HeapTupleInvisible from heap_lock_tuple()/HTSU()
for other reasons, which *is* possible, nor am I suggesting that later
commands of the same xact would ever see this ERROR). I'm comfortable
with the idea of what you might loosely describe as a READ COMMITTED
mode serialization failure here, because this case is so much more
narrow than the other case I've proposed making a special exception to
the general semantics of MVCC snapshots to accommodate (i.e. the case
where a tuple is locked from an xact logically still-in-progress to
our snapshot in RC mode).

I think I'll be happy to declare that usage of the feature that hits
this issue is somewhere between questionable and wrong. It probably
isn't worth making another, similar HTSMVCC exception for this case.
But ISTM that we still have to do *something* other than simply credit
users with taking care to avoid tripping up on this.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-15 Thread Peter Geoghegan

On Tue, Jan 14, 2014 at 3:25 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Attached is a patch doing that, to again demonstrate what I mean. I'm not
 sure if setting xmin to invalid is really the best way to mark the tuple
 dead; I don't think a tuple's xmin can currently be InvalidTransaction under
 any other circumstances, so there might be some code out there that's not
 prepared for it. So using an infomask bit might indeed be better. Or
 something else entirely.

Have you thought about the implications for other snapshot types (or
other tqual.c routines)? My concern is that a client of that
infrastructure (either current or future) could spuriously conclude
that a heap tuple satisfied it, when in fact only a promise tuple
satisfied it. It wouldn't necessarily follow that the promise would be
fulfilled, nor that there would be some other proper heap tuple
equivalent to that fulfilled promise tuple as far as those clients are
concerned.

heap_delete() will not call HeapTupleSatisfiesUpdate() when you're
deleting a promise tuple, which on the face of it is fine - it's
always going to technically be instantaneously invisible, because it's
always created by the same command id (i.e. HeapTupleSatisfiesUpdate()
would just return HeapTupleInvisible if called). So far so good, but
we are technically doing something else quite new - deleting a
would-be instantaneously invisible tuple. So like your concern about
setting xmin to invalid, my concern is that code may exist that treats
cmin  cmax as an invariant. Now, you might think that that would be a
manageable concern, and to be fair a look at the ComboCids code that
mostly arbitrates that stuff seems to indicate that it's okay, but
it's still worth noting.

I think you should consider breaking off the relcache parts of my
patch and committing them, because they're independently useful. If we
are going to have a lot of conflicts that need to be handled by a
heap_delete(), there is no point in inserting non-unique index tuples
for what is not yet conclusively a proper (non-promise) tuple. Those
should always come last. And even without upsert, strictly inserting
into unique indexes first seems like a useful thing relative to the
cost. Unique violations are the cause of many aborted transactions,
and there is no need to ever bloat non-unique indexes of the same slot
when that happens.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-15 Thread Peter Geoghegan

On Tue, Jan 14, 2014 at 3:07 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Right, but with your approach, can you really be sure that you have
 the right rejecting tuple ctid (not reject)? In other words, as you
 wait for the exclusion constraint to conclusively indicate that there
 is a conflict, minutes may have passed in which time other conflicts
 may emerge in earlier unique indexes. Whereas with an approach where
 values are locked, you are guaranteed that earlier unique indexes have
 no conflicting values. Maintaining that property seems useful, since
 we check in a well-defined order, and we're still projecting a ctid.
 Unlike when row locking is involved, we can make no assumptions or
 generalizations around where conflicts will occur. Although that may
 also be a general concern with your approach when row locking, for
 multi-master replication use-cases. There may be some value in knowing
 it cannot have been earlier unique indexes (and so the existing values
 for those unique indexes in the locked row should stay the same -
 don't many conflict resolution policies work that way?).

 I don't understand what you're saying. Can you give an example?

 In the use case I was envisioning above, ie. you insert N rows, and if any
 of them violate constraint, you still want to insert the non-violating
 instead of rolling back the whole transaction, you don't care. You don't
 care what existing rows the new rows conflicted with.

 Even if you want to know what you conflicted with, I can't make sense of
 what you're saying. In the btreelock approach, the value locks are
 immediately released once you discover that there's conflict. So by the time
 you get to do anything with the ctid of the existing tuple you conflicted
 with, new conflicting tuples might've appeared.

That's true, but at least the timeframe in which an additional
conflict may occur on just-locked index values in bound to more or
less an instant. In any case how important this is is an interesting
question, and perhaps one that Andres can weigh in on as someone that
knows a lot about multi-master replication. This issue is particularly
interesting because this testcase appears to make both patches
livelock, for reasons that I believe are related:

https://github.com/petergeoghegan/upsert/blob/master/torture.sh

I have an idea of what I could do to fix this, but I don't have time
to make sure that my hunch is correct. I'm travelling tomorrow to give
a talk at PDX pug, so I'll have limited access to e-mail.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-15 Thread Peter Geoghegan

On Wed, Jan 15, 2014 at 8:23 PM, Peter Geoghegan p...@heroku.com wrote:
 I have an idea of what I could do to fix this, but I don't have time
 to make sure that my hunch is correct.

It might just be a matter of:

@@ -186,6 +186,13 @@ ExecLockHeapTupleForUpdateSpec(EState *estate,
switch (test)
{
case HeapTupleInvisible:
+   /*
+* Tuple may have originated from this command, in 
which case it's
+* already locked
+*/
+   if 
(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple.t_data))

+   HeapTupleHeaderGetCmin(tuple.t_data) == 
estate-es_output_cid)
+   return true;
/* Tuple became invisible;  try again */
if (IsolationUsesXactSnapshot())
ereport(ERROR,

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-14 Thread Peter Geoghegan

On Mon, Jan 13, 2014 at 6:45 PM, Peter Geoghegan p...@heroku.com wrote:
+ uint32
+ SpeculativeInsertionIsInProgress(TransactionId xid, RelFileNode rel,
ItemPointer tid)
+ {

For the purposes of preventing unprincipled deadlocking, commenting
out the following (the only caller of the above) has no immediately
discernible effect with any of the test-cases that I've published:

/* XXX shouldn't we fall through to look at xmax? */
+ /* XXX why? or is that now covered by the above check?
*/
+ snapshot-speculativeToken =
+
SpeculativeInsertionIsInProgress(HeapTupleHeaderGetRawXmin(tuple),
+
rnode,
+
htup-t_self);
+
+ snapshot-xmin = HeapTupleHeaderGetRawXmin(tuple);
return true;/* in insertion by other */

I think that the prevention of unprincipled deadlocking is all down to
this immediately prior piece of code, at least in those test cases:

! /*
!* in insertion by other.
!*
!* Before returning true, check for the special case
that the
!* tuple was deleted by the same transaction that
inserted it.
!* Such a tuple will never be visible to anyone else,
whether
!* the transaction commits or aborts.
!*/
! if (!(tuple-t_infomask HEAP_XMAX_INVALID)
! !(tuple-t_infomask HEAP_XMAX_COMMITTED)
! !(tuple-t_infomask HEAP_XMAX_IS_MULTI)
! !HEAP_XMAX_IS_LOCKED_ONLY(tuple-t_infomask)
! HeapTupleHeaderGetRawXmax(tuple) ==
HeapTupleHeaderGetRawXmin(tuple))
! {
! return false;
! }

But why should it be acceptable to change the semantics of dirty
snapshots like this, which previously always returned true when
control reached here? It is a departure from their traditional
behavior, not limited to clients of this new promise tuple
infrastructure. Now, it becomes entirely a matter of whether we tried
to insert before or after the deleting xact's deletion (of a tuple it
originally inserted) as to whether or not we block. So in general we
don't get to keep our old value locks until xact end when we update
or delete. Even if you don't consider this a bug for existing dirty
snapshot clients (I myself do - we can't rely on deleting a row and
re-inserting the same values now, which could be particularly
undesirable for updates), I have already described how we can take
advantage of deleting tuples while still holding on to their value
locks [1] to Andres. I think it'll be very important for multi-master
conflict resolution. I've already described this useful property of
dirty snapshots numerous times on this thread in relation to different
aspects, as it happens. It's essential.

Anyway, I guess you're going to need an infomask bit to fix this, so
you can differentiate between 'promise' tuples and 'proper' tuples.
Those are in short supply. I still think this problem is more or less
down to a modularity violation, and I suspect that this is not the
last problem that will be found along these lines if we continue to
pursue this approach.

[1]
http://www.postgresql.org/message-id/CAM3SWZQpLSGPS2Kd=-n6hvyiqkf_mcxmx-q72ar9upzq-x6...@mail.gmail.com
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE


On 01/14/2014 12:20 PM, Peter Geoghegan wrote:

I think that the prevention of unprincipled deadlocking is all down to
this immediately prior piece of code, at least in those test cases:



!   /*
!* in insertion by other.
!*
!* Before returning true, check for the special case 
that the
!* tuple was deleted by the same transaction that 
inserted it.
!* Such a tuple will never be visible to anyone else, 
whether
!* the transaction commits or aborts.
!*/
!   if (!(tuple-t_infomask  HEAP_XMAX_INVALID) 
!   !(tuple-t_infomask  HEAP_XMAX_COMMITTED) 
!   !(tuple-t_infomask  HEAP_XMAX_IS_MULTI) 
!   !HEAP_XMAX_IS_LOCKED_ONLY(tuple-t_infomask) 
!   HeapTupleHeaderGetRawXmax(tuple) == 
HeapTupleHeaderGetRawXmin(tuple))
!   {
!   return false;
!   }

But why should it be acceptable to change the semantics of dirty
snapshots like this, which previously always returned true when
control reached here? It is a departure from their traditional
behavior, not limited to clients of this new promise tuple
infrastructure. Now, it becomes entirely a matter of whether we tried
to insert before or after the deleting xact's deletion (of a tuple it
originally inserted) as to whether or not we block. So in general we
don't get to keep our old value locks until xact end when we update
or delete.


Hmm. So the scenario would be that a process inserts a tuple, but kills 
it again later in the transaction, and then re-inserts the same value. 
The expectation is that because it inserted the value once already, 
inserting it again will not block. Ie. inserting and deleting a tuple 
effectively acquires a value-lock on the inserted values.



Even if you don't consider this a bug for existing dirty
snapshot clients (I myself do - we can't rely on deleting a row and
re-inserting the same values now, which could be particularly
undesirable for updates),


Yeah, it would be bad if updates start failing because of this. We could 
add a check for that, and return true if the tuple was updated rather 
than deleted.



I have already described how we can take
advantage of deleting tuples while still holding on to their value
locks [1] to Andres. I think it'll be very important for multi-master
conflict resolution. I've already described this useful property of
dirty snapshots numerous times on this thread in relation to different
aspects, as it happens. It's essential.


I didn't understand that description.


Anyway, I guess you're going to need an infomask bit to fix this, so
you can differentiate between 'promise' tuples and 'proper' tuples.


Yeah, that's one way. Or you could set xmin to invalid, to make the 
killed tuple look thoroughly dead to everyone.



Those are in short supply. I still think this problem is more or less
down to a modularity violation, and I suspect that this is not the
last problem that will be found along these lines if we continue to
pursue this approach.


You have suspected that many times throughout this thread, and every 
time there's been a relatively simple solutions to the issues you've 
raised. I suspect that's also going to be true for whatever mundane next 
issue you come up with.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE


On 01/14/2014 12:44 AM, Peter Geoghegan wrote:

On Mon, Jan 13, 2014 at 12:58 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

Well, even if you don't agree that locking all the conflicting rows for
update is sensible, it's still perfectly sensible to return the rejected
rows to the user. For example, you're inserting N rows, and if some of them
violate a constraint, you still want to insert the non-conflicting rows
instead of rolling back the whole transaction.


Right, but with your approach, can you really be sure that you have
the right rejecting tuple ctid (not reject)? In other words, as you
wait for the exclusion constraint to conclusively indicate that there
is a conflict, minutes may have passed in which time other conflicts
may emerge in earlier unique indexes. Whereas with an approach where
values are locked, you are guaranteed that earlier unique indexes have
no conflicting values. Maintaining that property seems useful, since
we check in a well-defined order, and we're still projecting a ctid.
Unlike when row locking is involved, we can make no assumptions or
generalizations around where conflicts will occur. Although that may
also be a general concern with your approach when row locking, for
multi-master replication use-cases. There may be some value in knowing
it cannot have been earlier unique indexes (and so the existing values
for those unique indexes in the locked row should stay the same -
don't many conflict resolution policies work that way?).


I don't understand what you're saying. Can you give an example?

In the use case I was envisioning above, ie. you insert N rows, and if 
any of them violate constraint, you still want to insert the 
non-violating instead of rolling back the whole transaction, you don't 
care. You don't care what existing rows the new rows conflicted with.


Even if you want to know what you conflicted with, I can't make sense of 
what you're saying. In the btreelock approach, the value locks are 
immediately released once you discover that there's conflict. So by the 
time you get to do anything with the ctid of the existing tuple you 
conflicted with, new conflicting tuples might've appeared.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-14 Thread Peter Geoghegan

On Tue, Jan 14, 2014 at 2:43 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Hmm. So the scenario would be that a process inserts a tuple, but kills it
 again later in the transaction, and then re-inserts the same value. The
 expectation is that because it inserted the value once already, inserting it
 again will not block. Ie. inserting and deleting a tuple effectively
 acquires a value-lock on the inserted values.

Right.

 Yeah, it would be bad if updates start failing because of this. We could add
 a check for that, and return true if the tuple was updated rather than
 deleted.

Why would you fix it that way?

 I have already described how we can take
 advantage of deleting tuples while still holding on to their value
 locks [1] to Andres. I think it'll be very important for multi-master
 conflict resolution. I've already described this useful property of
 dirty snapshots numerous times on this thread in relation to different
 aspects, as it happens. It's essential.

 I didn't understand that description.

I was describing how deleting existing locked rows, and re-inserting,
could deal with multiple conflicts for multi-master replication
use-cases. It hardly matters much though, because it's not as if the
usefulness and necessity of this property of dirty snapshots is in
question.

 Anyway, I guess you're going to need an infomask bit to fix this, so
 you can differentiate between 'promise' tuples and 'proper' tuples.

 Yeah, that's one way. Or you could set xmin to invalid, to make the killed
 tuple look thoroughly dead to everyone.

I'm think you'll have to use an infomask bit so everyone knows that
this is a promise tuple from the start. Otherwise, I suspect that
there are race conditions. The problem was that
inserted-then-deleted-in-same-xact tuples (both regular and promise)
were invisible to all xacts' dirty snapshots, when they should have
only been invisible to the deleting xact's dirty snapshot. So it isn't
obvious to me how you interlock things such that another xact doesn't
incorrectly decide that it has to wait on what is really a promise
tuple's xact for the full duration of that xact, having found no
speculative insertion token to ShareLock (which implies unprincipled
deadlocking), while simultaneously having other sessions not fail to
see as dirty-visible a same-xact-inserted-deleted non-promise tuple
(thereby ensuring those other sessions correctly conclude that it is
necessary to wait for the end of the xmin/xmax xact). If you set the
xmin to invalid too late, it doesn't help any existing waiters.

Even if setting xmin to invalid is workable, it's a strike against the
performance of your approach, because it's another heap buffer
exclusive lock.

 You have suspected that many times throughout this thread, and every time
 there's been a relatively simple solutions to the issues you've raised. I
 suspect that's also going to be true for whatever mundane next issue you
 come up with.

I don't think it's a mundane issue. But in any case, you haven't
addressed why you think your proposal is more or less better than my
proposal, which is the pertinent question. You haven't given me so
much as a high level summary of whatever misgivings you may have about
it, even though I've asked you to comment on my approach to value
locking several times. You haven't pointed out that it has any
specific bug (which is not to suppose that that's because there are
none). The point is that it is not my contention that what you're
proposing is totally unworkable. Rather, I think that the original
proposal will probably ultimately perform better in all cases, is
easier to reason about and is certainly far more modular. It appears
to me to be the more conservative of the two proposals. In all
sincerity, I simply don't know what factors you're weighing here. In
saying that, I really don't mean to imply that you're assigning weight
to things in a way that I am in disagreement with. I simply don't
understand what is important to you here, and why your proposal
preserves or enhances the things that you believe are important. Would
you please explain your position along those lines?

Now, I'll concede that it will be harder to make the IGNORE syntax
work with exclusion constraints with what I've done, which would be
nice. However, in my opinion that should be given far less weight than
these other issues. It's ON DUPLICATE KEY...; no one could reasonably
assume that exclusion constraints were covered. Also, upserting with
exclusion constraints is a non-starter. It's only applicable to the
case where you're using exclusion constraints exactly as you would use
unique constraints, which has to be very rare. It will cause much more
confusion than anything else.

INSERT IGNORE in MySQL works with NOT NULL constraints, unique
constraints, and all other constraints. FWIW I think that it would be
kind of arbitrary to make IGNORE work with exclusion constraints and
not other types of constraints, whereas when it's

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE


On 01/14/2014 11:22 PM, Peter Geoghegan wrote:

On Tue, Jan 14, 2014 at 2:43 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

You have suspected that many times throughout this thread, and every time
there's been a relatively simple solutions to the issues you've raised. I
suspect that's also going to be true for whatever mundane next issue you
come up with.


I don't think it's a mundane issue. But in any case, you haven't
addressed why you think your proposal is more or less better than my
proposal, which is the pertinent question.


1. It's simpler.

2. Works for exclusion constraints.


You haven't given me so
much as a high level summary of whatever misgivings you may have about
it, even though I've asked you to comment on my approach to value
locking several times. You haven't pointed out that it has any
specific bug (which is not to suppose that that's because there are
none). The point is that it is not my contention that what you're
proposing is totally unworkable. Rather, I think that the original
proposal will probably ultimately perform better in all cases, is
easier to reason about and is certainly far more modular. It appears
to me to be the more conservative of the two proposals. In all
sincerity, I simply don't know what factors you're weighing here. In
saying that, I really don't mean to imply that you're assigning weight
to things in a way that I am in disagreement with. I simply don't
understand what is important to you here, and why your proposal
preserves or enhances the things that you believe are important. Would
you please explain your position along those lines?


I guess that simplicity is in the eye of the beholder, but please take a 
look at git diff --stat:


 41 files changed, 1224 insertions(+), 107 deletions(-)

vs.

 50 files changed, 2215 insertions(+), 240 deletions(-)

Admittedly, some of the difference comes from the fact that you've spent 
a lot more time commenting and polishing the btreelock patch. But mostly 
I dislike additional complexity required in b-tree for this.


I don't think B-tree locking is more conservative. The 
insert-and-then-check approach is already used by exclusion constraints, 
I'm just extending it to not abort on conflict, but do something else.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE


On 01/14/2014 11:22 PM, Peter Geoghegan wrote:

The problem was that
inserted-then-deleted-in-same-xact tuples (both regular and promise)
were invisible to all xacts' dirty snapshots, when they should have
only been invisible to the deleting xact's dirty snapshot.


Right.


So it isn't
obvious to me how you interlock things such that another xact doesn't
incorrectly decide that it has to wait on what is really a promise
tuple's xact for the full duration of that xact, having found no
speculative insertion token to ShareLock (which implies unprincipled
deadlocking), while simultaneously having other sessions not fail to
see as dirty-visible a same-xact-inserted-deleted non-promise tuple
(thereby ensuring those other sessions correctly conclude that it is
necessary to wait for the end of the xmin/xmax xact). If you set the
xmin to invalid too late, it doesn't help any existing waiters.


If a backend finds no speculative insertion token to ShareLock, then it 
really isn't a speculative insertion, and the process should sleep on 
the xid as usual.


Once we remove the modification in HeapTupleSatisfiesDirty() that made 
it return false when xmin == xmax, the problem that arises is that 
another backend that sees the killed tuple incorrectly determines that 
it has to wait for it that transaction to finish, even though it was a 
speculatively inserted tuple that was killed, and hence can be ignored. 
We can avoid that problem by setting xmin to invalid, or otherwise 
marking the tuple as dead.


Attached is a patch doing that, to again demonstrate what I mean. I'm 
not sure if setting xmin to invalid is really the best way to mark the 
tuple dead; I don't think a tuple's xmin can currently be 
InvalidTransaction under any other circumstances, so there might be some 
code out there that's not prepared for it. So using an infomask bit 
might indeed be better. Or something else entirely.


- Heikki


speculative-insertions-revisions.2014_01_15.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-14 Thread Peter Geoghegan

On Tue, Jan 14, 2014 at 2:16 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I don't think it's a mundane issue. But in any case, you haven't
 addressed why you think your proposal is more or less better than my
 proposal, which is the pertinent question.

 1. It's simpler.

 2. Works for exclusion constraints.

Thank you for clarifying where you're coming from.

 I guess that simplicity is in the eye of the beholder, but please take a
 look at git diff --stat:

  41 files changed, 1224 insertions(+), 107 deletions(-)

 vs.

  50 files changed, 2215 insertions(+), 240 deletions(-)

 Admittedly, some of the difference comes from the fact that you've spent a
 lot more time commenting and polishing the btreelock patch. But mostly I
 dislike additional complexity required in b-tree for this.

It's very much down to differences in how well commented and
documented each patch is. I have a fully formed amendment to the AM
interface, complete with documentation of the AM and btree aspects,
and detailed comments around how the parts fit together. But you've
already explored doing something similar to what I do, to similarly
avoid having to refind the page (less the heavyweight locking), which
seems almost equivalent to what I propose in terms of its impact on
btree, before we consider anything else.

 I don't think B-tree locking is more conservative. The insert-and-then-check
 approach is already used by exclusion constraints, I'm just extending it to
 not abort on conflict, but do something else.

If you examine what I actually do, you'll see that it's pretty much
equivalent to how the extant value locking of unique btree indexes has
always worked. It's just that the process is staggered at an exact
point, the point where traditionally we hold no buffer locks, only a
buffer pin (although we do additionally verify that the index gives
the go-ahead before getting to later indexes, to get consensus to
proceed with insertion).

The suggestion that mine is the conservative approach is also based on
the fact that database systems have made use of page level exclusive
locks on indexes, managed by the lock manager, persisting over complex
operations in many different contexts for many years.  This includes
Postgres, where for many years relcache takes precautions again
deadlocking in such AMs by ordering the list of indexes associated
with a relation by pg_index.indexrelid. Currently this may not be
necessary, but the principle stands.

The insert-then-check approach of exclusion constraints is quite
different to what is proposed here, because exclusion constraints only
ever have to abort the xact if things don't work out. There is no
value locking. That's far easier to pin down. You definitely don't
have to do anything new with visibility.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-13 Thread Heikki Linnakangas


On 01/11/2014 12:40 AM, Peter Geoghegan wrote:

My problem is that in general I'm not sold on the actual utility of
making this kind of row locking work with exclusion constraints. I'm
sincerely having a hard time thinking of a practical use-case
(although, as I've said, I want to make it work with IGNORE). Even if
you work all this row locking stuff out, and the spill-to-disk aspect
out, the interface is still wrong, because you need to figure out a
way to project more than one reject per slot. Maybe I lack imagination
around how to make that work, but there are a lot of ifs and buts
either way.


Exclusion constraints can be used to implement uniqueness checks with 
SP-GiST or GiST indexes. For example, if you want to enforce that there 
are no two tuples with the same x and y coordinates, ie. use a point as 
the key. You could add a b-tree index just to enforce the constraint, 
but it's better if you don't have to. In general, it's just always 
better if features don't have implementation-specific limitations like this.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-13 Thread Heikki Linnakangas


On 01/11/2014 12:39 PM, Peter Geoghegan wrote:

In any case, my patch is bound to win decisively for the other
extreme, the insert-only case, because the overhead of doing an index
scan first is always wasted there with your approach, and the overhead
of extended btree leaf page locking has been shown to be quite low.


Quite possibly. Run the benchmark, and we'll see how big a difference 
we're talking about.



In
the past you've spoken of avoiding that overhead through an adaptive
strategy based on statistics, but I think you'll have a hard time
beating a strategy where the decision comes as late as possible, and
is informed by highly localized page-level metadata already available.
My implementation can abort an attempt to just read an existing
would-be duplicate very inexpensively (with no strong locks), going
back to just after the _bt_search() to get a heavyweight lock if just
reading doesn't work out (if there is no duplicate found), so as to
not waste all of its prior work. Doing one of the two extremes of
insert-mostly or update-only well is relatively easy; dynamically
adapting to one or the other is much harder. Especially if it's a
consistent mix of inserts and updates, where general observations
aren't terribly useful.


Another way to optimize it is to keep the b-tree page pinned after doing 
the pre-check. Then you don't need to descend the tree again when doing 
the insert. That would require small indexam API changes, but wouldn't 
be too invasive, I think.



All other concerns of mine still remain, including the concern over
the extra locking of the proc array - I'm concerned about the
performance impact of that on other parts of the system not exercised
by this test.


Yeah, I'm not thrilled about that part either. Fortunately there are 
other ways to implement that. In fact, I think you could just not bother 
taking the ProcArrayLock when setting the fields. The danger is that 
another backend sees a mixed state of the fields, but that's OK. The 
worst that can happen is that it will do an unnecessary lock/release on 
the heavy-weight lock. And to reduce the overhead when reading the 
fields, you could merge the SpeculativeInsertionIsInProgress() check 
into TransactionIdIsInProgress(). The call site in tqual.c always calls 
it together with TransactionIdIsInProgress(), which scans the proc array 
anyway, while holding the lock.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

On Mon, Jan 13, 2014 at 12:23 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Exclusion constraints can be used to implement uniqueness checks with
 SP-GiST or GiST indexes. For example, if you want to enforce that there are
 no two tuples with the same x and y coordinates, ie. use a point as the key.
 You could add a b-tree index just to enforce the constraint, but it's better
 if you don't have to. In general, it's just always better if features don't
 have implementation-specific limitations like this.

That seems rather narrow. Among other things, I worry about the
baggage for users in documenting supporting SP-GiST/GiST. We support
it, but it only really works for the case where you're using exclusion
constraints as unique constraints, something that might make sense in
certain narrow contexts, contrary to our earlier general statement
that a unique index should be preferred there. We catalog amcanunique
methods as the way that we support unique indexes. I really do feel
that that's the appropriate level to support the feature at, and I
have not precluded other amcanunique implementations from doing the
same, having documented the intended value locking interface/contract
for the benefit of any future amcanunique AM author. It's ON DUPLICATE
KEY, not ON OVERLAPPING KEY, or any other syntax suggestive of
exclusion constraints and their arbitrary commutative operators.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-13 Thread Robert Haas

On Mon, Jan 13, 2014 at 1:53 PM, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Jan 13, 2014 at 12:23 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 Exclusion constraints can be used to implement uniqueness checks with
 SP-GiST or GiST indexes. For example, if you want to enforce that there are
 no two tuples with the same x and y coordinates, ie. use a point as the key.
 You could add a b-tree index just to enforce the constraint, but it's better
 if you don't have to. In general, it's just always better if features don't
 have implementation-specific limitations like this.

 That seems rather narrow. Among other things, I worry about the
 baggage for users in documenting supporting SP-GiST/GiST. We support
 it, but it only really works for the case where you're using exclusion
 constraints as unique constraints, something that might make sense in
 certain narrow contexts, contrary to our earlier general statement
 that a unique index should be preferred there. We catalog amcanunique
 methods as the way that we support unique indexes. I really do feel
 that that's the appropriate level to support the feature at, and I
 have not precluded other amcanunique implementations from doing the
 same, having documented the intended value locking interface/contract
 for the benefit of any future amcanunique AM author. It's ON DUPLICATE
 KEY, not ON OVERLAPPING KEY, or any other syntax suggestive of
 exclusion constraints and their arbitrary commutative operators.

For what it's worth, I agree with Heikki.  There's probably nothing
sensible an upsert can do if it conflicts with more than one tuple,
but if it conflicts with just exactly one, it oughta be OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

On Mon, Jan 13, 2014 at 12:17 PM, Robert Haas robertmh...@gmail.com wrote:
 For what it's worth, I agree with Heikki.  There's probably nothing
 sensible an upsert can do if it conflicts with more than one tuple,
 but if it conflicts with just exactly one, it oughta be OK.

If there is exactly one, *and* the existing value is exactly the same
as the value proposed for insertion (or, I suppose, a subset of the
existing value, but that's so narrow that it might as well not apply).
In short, when you're using an exclusion constraint as a unique
constraint. Which is very narrow indeed. Weighing the costs and the
benefits, that seems like far more cost than benefit, before we even
consider anything beyond simply explaining the applicability and
limitations of upserting with exclusion constraints. It's generally
far cleaner to define speculative insertion as something that happens
with unique indexes only.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-13 Thread Heikki Linnakangas


On 01/13/2014 10:53 PM, Peter Geoghegan wrote:

On Mon, Jan 13, 2014 at 12:17 PM, Robert Haas robertmh...@gmail.com wrote:

For what it's worth, I agree with Heikki.  There's probably nothing
sensible an upsert can do if it conflicts with more than one tuple,
but if it conflicts with just exactly one, it oughta be OK.


If there is exactly one, *and* the existing value is exactly the same
as the value proposed for insertion (or, I suppose, a subset of the
existing value, but that's so narrow that it might as well not apply).
In short, when you're using an exclusion constraint as a unique
constraint. Which is very narrow indeed. Weighing the costs and the
benefits, that seems like far more cost than benefit, before we even
consider anything beyond simply explaining the applicability and
limitations of upserting with exclusion constraints. It's generally
far cleaner to define speculative insertion as something that happens
with unique indexes only.


Well, even if you don't agree that locking all the conflicting rows for 
update is sensible, it's still perfectly sensible to return the rejected 
rows to the user. For example, you're inserting N rows, and if some of 
them violate a constraint, you still want to insert the non-conflicting 
rows instead of rolling back the whole transaction.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

On Mon, Jan 13, 2014 at 12:58 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Well, even if you don't agree that locking all the conflicting rows for
 update is sensible, it's still perfectly sensible to return the rejected
 rows to the user. For example, you're inserting N rows, and if some of them
 violate a constraint, you still want to insert the non-conflicting rows
 instead of rolling back the whole transaction.

Right, but with your approach, can you really be sure that you have
the right rejecting tuple ctid (not reject)? In other words, as you
wait for the exclusion constraint to conclusively indicate that there
is a conflict, minutes may have passed in which time other conflicts
may emerge in earlier unique indexes. Whereas with an approach where
values are locked, you are guaranteed that earlier unique indexes have
no conflicting values. Maintaining that property seems useful, since
we check in a well-defined order, and we're still projecting a ctid.
Unlike when row locking is involved, we can make no assumptions or
generalizations around where conflicts will occur. Although that may
also be a general concern with your approach when row locking, for
multi-master replication use-cases. There may be some value in knowing
it cannot have been earlier unique indexes (and so the existing values
for those unique indexes in the locked row should stay the same -
don't many conflict resolution policies work that way?).


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

On Mon, Jan 13, 2014 at 12:49 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 In any case, my patch is bound to win decisively for the other
 extreme, the insert-only case, because the overhead of doing an index
 scan first is always wasted there with your approach, and the overhead
 of extended btree leaf page locking has been shown to be quite low.

 Quite possibly. Run the benchmark, and we'll see how big a difference we're
 talking about.

I'll come up with something and let you know.

 Another way to optimize it is to keep the b-tree page pinned after doing the
 pre-check. Then you don't need to descend the tree again when doing the
 insert. That would require small indexam API changes, but wouldn't be too
 invasive, I think.

You'll still need a callback to drop the pin when it transpires that
there is a conflict in a later unique index, and state to pass a bt
stack back, at which point you've already made exactly the same
changes to the AM interface as in my proposal. The only difference is
that the core code doesn't rely on the value locks being released
after an instant, but that isn't something that you take advantage of.
Furthermore, AFAIK there is no reason to think that anything other
than btree will benefit, which makes it a bit unfortunate that the AM
has to support it generally.

So, again, it's kind of a modularity violation, and it may not even
actually be possible, since _bt_search() is only callable with an
insertion scankey, which is the context in which the existing
guarantee around releasing locks and re-searching from that point
applies, for reasons that seem to me to be very subtle. At the very
least you need to pass a btstack to _bt_doinsert() to save the work of
re-scanning, as I do.

 All other concerns of mine still remain, including the concern over
 the extra locking of the proc array - I'm concerned about the
 performance impact of that on other parts of the system not exercised
 by this test.

 Yeah, I'm not thrilled about that part either. Fortunately there are other
 ways to implement that. In fact, I think you could just not bother taking
 the ProcArrayLock when setting the fields. The danger is that another
 backend sees a mixed state of the fields, but that's OK. The worst that can
 happen is that it will do an unnecessary lock/release on the heavy-weight
 lock. And to reduce the overhead when reading the fields, you could merge
 the SpeculativeInsertionIsInProgress() check into
 TransactionIdIsInProgress(). The call site in tqual.c always calls it
 together with TransactionIdIsInProgress(), which scans the proc array
 anyway, while holding the lock.

Currently in your patch all insertions do
SpeculativeInsertionLockAcquire(GetCurrentTransactionId()) -
presumably this is not something you intend to keep. Also, you should
not do this for regular insertion:

if (options  HEAP_INSERT_SPECULATIVE)
SetSpeculativeInsertion(relation-rd_node, heaptup-t_self);

Can you explain the following, please?:

+ /*
+  * Returns a speculative insertion token for waiting for the insertion to
+  * finish.
+  */
+ uint32
+ SpeculativeInsertionIsInProgress(TransactionId xid, RelFileNode rel,
ItemPointer tid)
+ {
+   uint32  result = 0;
+   ProcArrayStruct *arrayP = procArray;
+   int index;

Why is this optimization correct? Presently it allows your patch to
avoid getting a shared ProcArrayLock from HeapTupleSatisfiesDirty().

+   if (TransactionIdPrecedes(xid, TransactionXmin))
+   return false;

So from HeapTupleSatisfiesDirty(), you're checking if xid (the
passed tuple's xmin) precedes our transaction's xmin (well, that of
our last snapshot updated by GetSnapshotData()). This is set within
GetSnapshotData(), but we're dealing with a dirty snapshot with no
xmin, so TransactionXmin pertains to our MVCC snapshot, not our dirty
snapshot.

It isn't really true that TransactionIdIsInProgress() gets the same
shared ProcArrayLock in a similar fashion, for a full linear search; I
think that the various fast-paths make it far less likely than it is
for SpeculativeInsertionIsInProgress() (or, perhaps, should be). Here
is what that other routine does in around the same place:

/*
 * Don't bother checking a transaction older than RecentXmin; it could 
not
 * possibly still be running.  (Note: in particular, this guarantees 
that
 * we reject InvalidTransactionId, FrozenTransactionId, etc as not
 * running.)
 */
if (TransactionIdPrecedes(xid, RecentXmin))
{
xc_by_recent_xmin_inc();
return false;
}

This extant code checks against RecentXmin, *not* TransactionXmin.  It
also caches things quite effectively, but that caching isn't very
useful to you here. It checks latestCompletedXid before doing a linear
search through the proc array too.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-11 Thread Peter Geoghegan

On Fri, Jan 10, 2014 at 7:59 PM, Peter Geoghegan p...@heroku.com wrote:
 *shrug*. I'm not too concerned about performance during contention. But
 let's see how this fixed version performs. Could you repeat the tests you
 did with this?

 Why would you not be too concerned about the performance with
 contention? It's a very important aspect. But even if you don't, if
 you look at the transaction throughput with only a single client in
 the update-heavy benchmark [1] (with one client there is a fair mix of
 inserts and updates), my approach still comes out far ahead.
 Transaction throughput is almost 100% higher, with the *difference*
 exceeding 150% at 8 clients but never reaching too much higher. I
 think that the problem isn't so much with contention between clients
 as much as with contention between inserts and updates, which affects
 everyone to approximately the same degree. And the average max latency
 across runs for one client is 130.447 ms, as opposed to 0.705 ms with
 my patch - that's less than 1%. Whatever way you cut it, the
 performance of my approach is far superior. Although we should
 certainly investigate the impact of your most recent revision, and I
 intend to, how can you not consider those differences to be extremely
 significant?

So I re-ran the same old benchmark, where we're almost exclusively
updating. Results for your latest revision were very similar to my
patch:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/exclusion-no-deadlock/

This suggests that the main problem encountered was lock contention
among old, broken promise tuples. Note that this benchmark doesn't
involve any checkpointing, and everything fits in memory.
Opportunistic pruning is possible, which I'd imagine helps a lot with
the bloat, at least in this benchmark - there are only every 100,000
live tuples. That might not always be true, of course.

In any case, my patch is bound to win decisively for the other
extreme, the insert-only case, because the overhead of doing an index
scan first is always wasted there with your approach, and the overhead
of extended btree leaf page locking has been shown to be quite low. In
the past you've spoken of avoiding that overhead through an adaptive
strategy based on statistics, but I think you'll have a hard time
beating a strategy where the decision comes as late as possible, and
is informed by highly localized page-level metadata already available.
My implementation can abort an attempt to just read an existing
would-be duplicate very inexpensively (with no strong locks), going
back to just after the _bt_search() to get a heavyweight lock if just
reading doesn't work out (if there is no duplicate found), so as to
not waste all of its prior work. Doing one of the two extremes of
insert-mostly or update-only well is relatively easy; dynamically
adapting to one or the other is much harder. Especially if it's a
consistent mix of inserts and updates, where general observations
aren't terribly useful.

All other concerns of mine still remain, including the concern over
the extra locking of the proc array - I'm concerned about the
performance impact of that on other parts of the system not exercised
by this test.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-11 Thread Peter Geoghegan

On Sat, Jan 11, 2014 at 2:39 AM, Peter Geoghegan p...@heroku.com wrote:
 So I re-ran the same old benchmark, where we're almost exclusively
 updating. Results for your latest revision were very similar to my
 patch:

 http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/exclusion-no-deadlock/

To put that in context, here is a previously unpublished repeat of the
same benchmark on the slightly improved second most recently submitted
revision of mine, v6:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/upsert-cmp-3/

(recall that I improved things a bit by remember row-locking
conflicts, not just conflicts when we try value locking - that made a
small additional difference, reflected here but not in /upsert-cmp-2/
).

The numbers for each patch are virtually identical. I guess I could
improve my patch by not always getting a heavyweight lock on the first
insert attempt, based on the general observation that we have
previously always updated. My concern would be that that would happen
at the expense of the other case.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE


On 01/10/2014 05:36 AM, Peter Geoghegan wrote:

While I realize that everyone is busy, I'm concerned about the lack of
discussing here. It's been 6 full days since I posted my benchmark,
which I expected to quickly clear some things up, or at least garner
interest, and yet no one has commented here since.


Nah, that's nothing. I have a patch in the January commitfest that was 
already posted for the previous commitfest. It received zero review back 
then, and still has no reviewer signed up, let alone anyone actually 
reviewing it. And arguably it's a bug fix!


http://www.postgresql.org/message-id/5285071b.1040...@vmware.com

Wink wink, if you're looking for patches to review... ;-)


  The alternative exclusion* patch still deadlocks in an unprincipled
fashion, when simple, idiomatic usage encounters contention. Heikki
intends to produce a revision that fixes the problem, though having
considered it carefully myself, I don't know what mechanism he has in
mind, and frankly I'm skeptical.


Here's an updated patch. Hope it works now... This is based on an older 
version, and doesn't include any fixes from your latest 
btreelock_insert_on_dup.v7.2014_01_07.patch. Please check the common 
parts, and copy over any relevant changes.


The fix for the deadlocking issue consists of a few parts. First, 
there's a new heavy-weight lock type, a speculative insertion lock, 
which works somewhat similarly to XactLockTableWait(), but is only held 
for the duration of a single speculative insertion. When a backend is 
about to begin a speculative insertion, it first acquires the 
speculative insertion lock. When it's done with the insertion, meaning 
it has either cancelled it by killing the already-inserted tuple or 
decided that it's going to go ahead with it, the lock is released.


The speculative insertion lock is keyed by Xid, and token. The lock can 
be taken many times in the same transaction, and token's purpose is to 
distinguish which insertion is currently in progress. The token is 
simply a backend-local counter, incremented each time the lock is taken.


In addition to the heavy-weight lock, there are new fields in PGPROC to 
indicate which tuple the backend is currently inserting. When the tuple 
is inserted, the backend fills in the relation's relfilenode and item 
pointer in MyProc-specInsert* fields, while still holding the buffer 
lock. The current speculative insertion token is also stored there.


With that mechanism, when another backend sees a tuple whose xmin is 
still in progress, it can check if the insertion is a speculative 
insertion. To do that, scan the proc array, and find the backend with 
the given xid. Then, check that the relfilenode and itempointer in that 
backend's PGPROC slot match the tuple, and make note of the token the 
backend had advertised.


HeapTupleSatisfiesDirty() does the proc array check, and returns the 
token in the snapshot, alongside snapshot-xmin. The caller can then use 
that information in place of XactLockTableWait().



There would be other ways to skin the cat, but this seemed like the 
quickest to implement. One more straightforward approach would be to use 
the tuple's TID directly in the speculative insertion lock's key, 
instead of Xid+token, but then the inserter would have to grab the 
heavy-weight lock while holding the buffer lock, which seems dangerous. 
Another alternative would be to store token in the heap tuple header, 
instead of PGPROC; a tuple that's still being speculatively inserted has 
no xmax, so it could be placed in that field. Or ctid.



More importantly, I have to question
whether we should continue to pursue that alternative approach, giving
what we now know about its performance characteristics.


Yes.


It could be
improved, but not by terribly much, particularly for the case where
there is plenty of update contention, which was shown in [1] to be
approximately 2-3 times slower than extended page locking (*and*  it's
already looking for would-be duplicates*first*). I'm trying to be as
fair as possible, and yet the difference is huge.


*shrug*. I'm not too concerned about performance during contention. But 
let's see how this fixed version performs. Could you repeat the tests 
you did with this?


Any guesses what the bottleneck is? At a quick glance at a profile of a 
pgbench run with this patch, I didn't see anything out of ordinary, so 
I'm guessing it's lock contention somewhere.


- Heikki


speculative-insertions-2014_01_10.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE


On 01/08/2014 06:46 AM, Peter Geoghegan wrote:

A new revision of my patch is attached.


I'm getting deadlocks with this patch, using the test script you posted 
earlier in 
http://www.postgresql.org/message-id/CAM3SWZQh=8xnvgbbzyhjexujbhwznjutjez9t-dbo9t_mx_...@mail.gmail.com. 
Am doing something wrong, or is that a regression?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

On Fri, Jan 10, 2014 at 7:12 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I'm getting deadlocks with this patch, using the test script you posted
 earlier in
 http://www.postgresql.org/message-id/CAM3SWZQh=8xnvgbbzyhjexujbhwznjutjez9t-dbo9t_mx_...@mail.gmail.com.
 Am doing something wrong, or is that a regression?

Yes. The point of that test case was that it made your V1 livelock
(which you fixed), not deadlock in a way detected by the deadlock
detector, which is the correct behavior.

This testcase was the one that showed up *unprincipled* deadlocking:

http://www.postgresql.org/message-id/cam3swzshbe29kpod44cvc3vpzjgmder6k_6fghiszeozgmt...@mail.gmail.com

I'd focus on that test case.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE


On 01/10/2014 08:37 PM, Peter Geoghegan wrote:

On Fri, Jan 10, 2014 at 7:12 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

I'm getting deadlocks with this patch, using the test script you posted
earlier in
http://www.postgresql.org/message-id/CAM3SWZQh=8xnvgbbzyhjexujbhwznjutjez9t-dbo9t_mx_...@mail.gmail.com.
Am doing something wrong, or is that a regression?


Yes. The point of that test case was that it made your V1 livelock
(which you fixed), not deadlock in a way detected by the deadlock
detector, which is the correct behavior.


Oh, ok. Interesting. With the patch version I posted today, I'm not 
getting deadlocks. I'm not getting duplicates in the table either, so it 
looks like the promise tuple approach somehow avoids the deadlocks, 
while the btreelock patch does not.


Why does it deadlock with the btreelock patch? I don't see why it 
should. If you have two backends inserting a single tuple, and they 
conflict, one of them should succeed to insert, and the other one should 
update.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

On Fri, Jan 10, 2014 at 11:28 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Why does it deadlock with the btreelock patch? I don't see why it should. If
 you have two backends inserting a single tuple, and they conflict, one of
 them should succeed to insert, and the other one should update.

Are you sure that it doesn't make your patch deadlock too, with enough
pressure? I've made that mistake myself.

That test-case made my patch deadlock (in a detected fashion) when it
used buffer locks as a value locking prototype - I say as much right
there in the November mail you linked to. I think that's acceptable,
because it's non-sensible use of the feature (my point was only that
it shouldn't livelock). The test case is naively locking a row without
knowing ahead of time (or pro-actively checking) if the conflict is on
the first or second unique index. So before too long, you're updating
the wrong row (no existing lock is really held), based on the 'a'
column's projected value, when in actuality the conflict was on the
'b' column's projected value. Conditions are right for deadlock,
because two rows are locked, not one.

Although I have not yet properly considered your most recent revision,
I can't imagine why the same would not apply there, since the row
locking component is (probably) still identical. Granted, that
distinction between row locking and value locking is a bit fuzzy in
your approach, but if you happened to not insert any rows in any
previous iterations (i.e. there were no unfilled promise tuples), and
you happened to perform conflict handling first, it could still
happen, albeit with lower probability, no?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE


On 01/10/2014 10:00 PM, Peter Geoghegan wrote:

On Fri, Jan 10, 2014 at 11:28 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

Why does it deadlock with the btreelock patch? I don't see why it should. If
you have two backends inserting a single tuple, and they conflict, one of
them should succeed to insert, and the other one should update.


Are you sure that it doesn't make your patch deadlock too, with enough
pressure? I've made that mistake myself.

That test-case made my patch deadlock (in a detected fashion) when it
used buffer locks as a value locking prototype - I say as much right
there in the November mail you linked to. I think that's acceptable,
because it's non-sensible use of the feature (my point was only that
it shouldn't livelock). The test case is naively locking a row without
knowing ahead of time (or pro-actively checking) if the conflict is on
the first or second unique index. So before too long, you're updating
the wrong row (no existing lock is really held), based on the 'a'
column's projected value, when in actuality the conflict was on the
'b' column's projected value. Conditions are right for deadlock,
because two rows are locked, not one.


I see. Yeah, I also get deadlocks when I change update statement to use 
foo.b = rej.b instead of foo.a = rej.a. I think it's down to the 
indexes are processed, ie. which conflict you see first.


This is pretty much the same issue we discussed wrt. exclusion 
contraints. If the tuple being inserted conflicts with several existing 
tuples, what to do? I think the best answer would be to return and lock 
them all. It could still deadlock, but it's nevertheless less surprising 
behavior than returning one of the tuples in random. Actually, we could 
even avoid the deadlock by always locking the tuples in a certain order, 
although I'm not sure if it's worth the trouble.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

On Fri, Jan 10, 2014 at 1:25 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 This is pretty much the same issue we discussed wrt. exclusion contraints.
 If the tuple being inserted conflicts with several existing tuples, what to
 do? I think the best answer would be to return and lock them all. It could
 still deadlock, but it's nevertheless less surprising behavior than
 returning one of the tuples in random. Actually, we could even avoid the
 deadlock by always locking the tuples in a certain order, although I'm not
 sure if it's worth the trouble.

I understand and accept that as long as we're intent on locking more
than one row per transaction, that action could deadlock with another
session doing something similar. Actually, I've even encountered
people giving advice in relation to proprietary systems along the
lines of: if your big SQL MERGE statement is deadlocking excessively,
you might try hinting to make sure a nested loop join is used. I
think that this kind of ugly compromise is unavoidable in those
scenarios (in reality the most popular strategy is probably cross
your fingers). But as everyone agrees, the common case where an xact
only upserts one row should never deadlock with another, similar xact.
So *that* isn't a problem I have with making row locking work for
exclusion constraints.

My problem is that in general I'm not sold on the actual utility of
making this kind of row locking work with exclusion constraints. I'm
sincerely having a hard time thinking of a practical use-case
(although, as I've said, I want to make it work with IGNORE). Even if
you work all this row locking stuff out, and the spill-to-disk aspect
out, the interface is still wrong, because you need to figure out a
way to project more than one reject per slot. Maybe I lack imagination
around how to make that work, but there are a lot of ifs and buts
either way.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Jim Nasby


On 1/10/14, 4:40 PM, Peter Geoghegan wrote:

My problem is that in general I'm not sold on the actual utility of
making this kind of row locking work with exclusion constraints. I'm
sincerely having a hard time thinking of a practical use-case
(although, as I've said, I want to make it work with IGNORE). Even if
you work all this row locking stuff out, and the spill-to-disk aspect
out, the interface is still wrong, because you need to figure out a
way to project more than one reject per slot. Maybe I lack imagination
around how to make that work, but there are a lot of ifs and buts
either way.


Well, the usual example for exclusion constraints is resource scheduling (ie: 
scheduling what room a class will be held in). In that context is it hard to 
believe that you might want to MERGE a set of new classroom assignments in?
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

On Fri, Jan 10, 2014 at 4:09 PM, Jim Nasby j...@nasby.net wrote:
Well, the usual example for exclusion constraints is resource scheduling
(ie: scheduling what room a class will be held in). In that context is it
hard to believe that you might want to MERGE a set of new classroom
assignments in?

So you schedule a class that clashes with 3 other classes, and you
want to update all 3 rows/classes with details from your one row
proposed for insertion? That makes no sense, unless the classes were
in fixed time slots, in which case you could use a unique constraint
to begin with. You can't change the rows to have the same time range
for all 3. So you have to delete two first, and update the range of
one. Which two? And you can't really rely on having locked existing
rows operating as a kind of persistent value lock, as I do, because
you've locked a row with a different range to the one you care about -
someone can still insert another row that doesn't block on that one
but blocks on your range. So you really do need a sophisticated, fully
formed value locking infrastructure to make it work, for a feature of
marginal utility at best. I'm having a hard time imagining any user
actually wanting to do any of this, and I'm having a harder time still
imagining anyone putting in the work to make it possible, if indeed it
is possible.

No one has ever implemented fully formed predicate locking in a
commercial database system, because it's an NP-complete problem [1],
[2]. Only very limited special cases are practicable, and I'm pretty
sure this isn't one of them.

[1] http://library.riphah.edu.pk/acm/disk_1/text/1-2/SIGMOD79/P127.PDF

[2]
http://books.google.com/books?id=wV5Ran71zNoCpg=PA284lpg=PA284dq=predicate+locking+np+completesource=blots=PgNJ5H3L8Vsig=fOZ2Wr4fIxj0eFQD0tCGPLTsfY0hl=ensa=Xei=PpTQUquoBMfFsATtw4CADAved=0CDIQ6AEwAQ#v=onepageq=predicate%20locking%20np%20completef=false
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Jim Nasby


On 1/10/14, 6:51 PM, Peter Geoghegan wrote:

On Fri, Jan 10, 2014 at 4:09 PM, Jim Nasbyj...@nasby.net  wrote:

Well, the usual example for exclusion constraints is resource scheduling
(ie: scheduling what room a class will be held in). In that context is it
hard to believe that you might want to MERGE a set of new classroom
assignments in?

So you schedule a class that clashes with 3 other classes, and you
want to update all 3 rows/classes with details from your one row
proposed for insertion?


Nuts, I was misunderstanding the scenario. I thought this was simply going to 
violate exclusion constraints.

I see what you're saying now, and I'm not coming up with a scenario either. 
Perhaps Jeff Davis could, since he created them... if he can't then I'd say 
we're safe ignoring that aspect.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

On Fri, Jan 10, 2014 at 7:09 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Nah, that's nothing. I have a patch in the January commitfest that was
 already posted for the previous commitfest. It received zero review back
 then, and still has no reviewer signed up, let alone anyone actually
 reviewing it. And arguably it's a bug fix!

 http://www.postgresql.org/message-id/5285071b.1040...@vmware.com

 Wink wink, if you're looking for patches to review... ;-)

Yeah, I did intend to take a closer look at that one (I've looked at
it but have nothing to share yet). I've been a little busy with other
things. That patch is more of the kind where it's a matter of
determining if what you've done is exactly correct (no one would
disagree with the substance of what you propose), whereas there is
uncertainty about whether I've gotten the semantics right and so on.
But that's no excuse. :-)

   The alternative exclusion* patch still deadlocks in an unprincipled
 fashion

 Here's an updated patch. Hope it works now... This is based on an older
 version, and doesn't include any fixes from your latest
 btreelock_insert_on_dup.v7.2014_01_07.patch. Please check the common parts,
 and copy over any relevant changes.

Okay, attached is a revision with some of my fixes for other parts of
the code merged (in particular, for the grammer, ecpg and some aspects
of row locking and visibility).

Some quick observations on your patch - maybe this is obvious, and you
have work-arounds in mind, but this is just my first impression:

* You're always passing HEAP_INSERT_SPECULATIVE to heap_insert, and
therefore in the event of any sort of insertion always getting an
exclusive lock on the procArray. I guess the fact that this always
happens, and not just when upserting is an oversight (I know you just
wanted to throw together a POC), but even still that seems kind of
questionable. Everyone knows that contention during GetSnapshotData is
still a big problem for us. Taking an exclusive ProcArrayLock perhaps
as frequently as more than once per slot seems like a really bad idea,
even if it's limited to speculative inserters.

* It seems questionable that you don't at least have a shared
ProcArrayLock when you set the token value in
SetSpeculativeInsertionToken() (as you know, MyPgXact-xmin can be set
with such a shared lock, so doing something similar here might be
okay, but it's far from obvious that no lock is okay). Now, I guess
you can point towards MinimumActiveBackends() as a kind of precedent,
but that seems substantially less scary than what you've done, because
that's just reading if a field is zero or non-zero. Obviously the
implications of actually doing this are that things get even worse for
performance. And even a shared lock might not be good enough - I'd
have to think about it some more to give a firmer opinion.

 The fix for the deadlocking issue consists of a few parts. First, there's a
 new heavy-weight lock type, a speculative insertion lock, which works
 somewhat similarly to XactLockTableWait(), but is only held for the duration
 of a single speculative insertion. When a backend is about to begin a
 speculative insertion, it first acquires the speculative insertion lock.
 When it's done with the insertion, meaning it has either cancelled it by
 killing the already-inserted tuple or decided that it's going to go ahead
 with it, the lock is released.

I'm afraid I must reiterate my earlier objection to the general thrust
of what you're doing, which is that it is evidently unnecessary to
spread knowledge of value locking around the system, as opposed to
localizing knowledge of it to one module, in this case nbtinsert.c.
While it's true that the idea of the AM abstraction is already perhaps
a little strained, this seems like a big expansion on that problem.
Why should this approach make sense for every conceivable AM that
supports some notion of a constraint? Heavyweight exclusive locks on
indexes (at the page level typically), persisting across complex
operations are not a new thing for Postgres.

 HeapTupleSatisfiesDirty() does the proc array check, and returns the token
 in the snapshot, alongside snapshot-xmin. The caller can then use that
 information in place of XactLockTableWait().

That seems like a modularity violation too. The HeapTupleSatisfiesMVCC
changes reflect a genuine need to make every MVCC snapshot care about
the special visibility exception, whereas only one or two
HeapTupleSatisfiesDirty() callers will ever care about speculative
insertion. Even if you're unmoved by the modularity/aesthetic argument
(which is not to suppose that you actually are), the fact that you're
calling SpeculativeInsertionIsInProgress(), which acquires a shared
ProcArrayLock much of the time from within HeapTupleSatisfiesDirty(),
may have seriously regressed foreign key enforcement, for example.
You're going to need something like a new type of snapshot, basically,
and we probably already have too many of those. But then,

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Greg Stark

On Sat, Jan 11, 2014 at 12:51 AM, Peter Geoghegan p...@heroku.com wrote:
 On Fri, Jan 10, 2014 at 4:09 PM, Jim Nasby j...@nasby.net wrote:
 Well, the usual example for exclusion constraints is resource scheduling
 (ie: scheduling what room a class will be held in). In that context is it
 hard to believe that you might want to MERGE a set of new classroom
 assignments in?

 So you schedule a class that clashes with 3 other classes, and you
 want to update all 3 rows/classes with details from your one row
 proposed for insertion?


Well, perhaps you want to mark the events as conflicting with your new event?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

On Fri, Jan 10, 2014 at 10:01 PM, Greg Stark st...@mit.edu wrote:
 So you schedule a class that clashes with 3 other classes, and you
 want to update all 3 rows/classes with details from your one row
 proposed for insertion?


 Well, perhaps you want to mark the events as conflicting with your new event?

But short of a sophisticated persistent value locking implementation
(which I'm pretty skeptical of the feasibility of), more conflicting
events could be added at any moment. I doubt that you're appreciably
any better off than if you were to simply check with a select query,
even though that approach is obviously broken. In general, making row
locking work for exclusion constraints, so you can treat them in a way
that allows you to merge on arbitrary operators seems to me like a tar
pit.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-09 Thread Peter Geoghegan

On Tue, Jan 7, 2014 at 8:46 PM, Peter Geoghegan p...@heroku.com wrote:
 I've worked on a simple set of tests, written quickly in bash, that I
 think exercise interesting cases:

 https://github.com/petergeoghegan/upsert

 Perhaps most notably, these tests make comparisons between the
 performance of ordinary inserts with a serial primary key table, and
 effectively equivalent upserts that always insert.

While I realize that everyone is busy, I'm concerned about the lack of
discussing here. It's been 6 full days since I posted my benchmark,
which I expected to quickly clear some things up, or at least garner
interest, and yet no one has commented here since.

Here is a summary of the situation, at least as I understand it:

* My patch has been shown to perform much better than the alternative
promise tuples proposal. The benchmark previously published,
referred to above makes this evident for workloads with lots of
contention [1].

Now, to cover everything, I've gone on to benchmark inserts into a
table foo(serial, int4, text) that lock the row using the new
infrastructure. The SERIAL column is the primary key. I'm trying to
characterize the overhead of the extended value locking here, by
showing the same case (a worst case) with and without the overhead.
Here are the results:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/vs-vanilla-insert/

(asynchronous commits, logged table)

With both extremes covered, the data suggests that my patch performs
very well by *any* standard. But if we consider how things compare to
the alternative proposal, all indications are that performance is far
superior (at least for representative cases without too many unique
indexes, not that I suspect things are much different with many).
Previous concerns about the cost of extended leaf page locking ought
to be almost put to rest by this benchmark, because inserting a
sequence of btree index tuple integers in succession is a particularly
bad case, and yet in practice the implementation does very well. (With
my patch, we're testing the same statement with an ON DUPLICATE KEY
LOCK FOR UPDATE part, but there are naturally no conflicts on the
SERIAL PK - on master we're testing the same INSERT statement without
that, inserting sequence values just as before, only without the
worst-case value locking overhead).

* The alternative exclusion* patch still deadlocks in an unprincipled
fashion, when simple, idiomatic usage encounters contention. Heikki
intends to produce a revision that fixes the problem, though having
considered it carefully myself, I don't know what mechanism he has in
mind, and frankly I'm skeptical. More importantly, I have to question
whether we should continue to pursue that alternative approach, giving
what we now know about its performance characteristics. It could be
improved, but not by terribly much, particularly for the case where
there is plenty of update contention, which was shown in [1] to be
approximately 2-3 times slower than extended page locking (*and* it's
already looking for would-be duplicates *first*). I'm trying to be as
fair as possible, and yet the difference is huge. It's going to be
really hard to beat something where the decision to try to see if we
should insert or update comes so late: the decision is made as late as
possible, is based on strong indicators of likely outcome, while the
cost of making the wrong choice is very low. With shared buffer locks
held calling _bt_check_unique(), we still lock out concurrent would-be
duplicate insertion, and so don't need to restart from scratch (to
insert instead) in the same way as with the alternative proposal's
largely AM-naive approach.

* I am not aware that anyone considers that there are any open items
yet. I've addressed all of those now. Progress here is entirely
blocked on waiting for review feedback.

With the new priorConflict lock strength optimization, my patch is in
some ways similar to what Heikki proposed (in the exclusion* patch).
It's as if the first phase, the locking operation is an index scan
with an identity crisis. It can decide to continue to be an index
scan (albeit an oddball one with an insertion scankey that using
shared buffer locks prevents concurrent duplicate insertion, for very
efficient uniqueness checks), or it can decide to actually insert, at
the last possible moment. The second phase is picked up with much of
the work already complete from the first, so the amount of wasted work
is very close to zero in all cases. How can anything beat that?

If the main argument for the exclusion approach is that it works with
exclusion constraints, then I can still go and make what I've done
work there too (for the IGNORE case, which I maintain is the only
exclusion constraint variant of this that is useful to users). In any
case I think making anything work for exclusion constraints should be
a relatively low priority.

I'd like to hear more opinions on what I've done here, if anyone has
bandwidth to spare.

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-03 Thread Peter Eisentraut

This patch doesn't apply anymore.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-03 Thread Peter Geoghegan

On Fri, Jan 3, 2014 at 7:39 AM, Peter Eisentraut pete...@gmx.net wrote:
 This patch doesn't apply anymore.

Yes, there was some bit-rot. I previous deferred dealing with a
shift/reduce conflict implied by commit
1b4f7f93b4693858cb983af3cd557f6097dab67b. I've fixed that problem now
using non operator precedence, and performed a clean rebase on master.
I've also fixed the basis of your much earlier complaint about
breakage of ecpg's regression tests (without adding support for the
feature to ecpg). All make check-world tests pass. Patch is attached.
I have yet to figure out how to make REJECTS a non-reserved keyword,
or even just a type_func_name_keyword, though intuitively I have a
sense that the latter ought to be possible.

This is the same basic patch as benchmarked above, with various tricks
to avoid stronger lock acquisition when that's likely profitable (we
can even do _bt_check_unique() with only a shared lock and no hwlock
much of the time, on the well-informed suspicion that it won't be
necessary to insert, but only to return a TID). There has also been
some clean-up to aspects of serializable behavior, but that needs
further attention and scrutiny from a subject matter expert, hopefully
Heikki. Though it's probably also true that I should find time to
think about transaction isolation some more.

I've since had another idea relating to performance optimization,
which was to hint that the last attempt to insert a key was
unsuccessful, so the next one (after the conflicting transaction's
commit/abort) of that same value will very likely conflict too, making
lock avoidance profitable on average. This appears to be much more
effective than the previous woolly heuristic (never published, just
benchmarked), which I've left in as an additional reason to avoid
heavyweight locking, if only for discussion. This benchmark now shows
my approach winning convincingly with this additional priorConflict
optimization:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/upsert-cmp-2/

If someone had time to independently recreate the benchmark I have
here, or perhaps to benchmark the patch in some other way, that would
be useful (for full details see my recent e-mail about the prior
benchmark, where the exact details are described - this is the same,
but with one more run for the priorConflict optimization).

Subtleties of visibility also obviously deserve closer inspection, but
perhaps I shouldn't be so hasty: No consensus on the way forward looks
even close to emerging. How do people feel about my approach now?

-- 
Peter Geoghegan


btreelock_insert_on_dup.v6.2014_01_03.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-03 Thread Peter Geoghegan

On Fri, Dec 13, 2013 at 4:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 BTW, so far as the syntax goes, I'm quite distressed by having to make
 REJECTS into a fully-reserved word.  It's not reserved according to the
 standard, and it seems pretty likely to be something that apps might be
 using as a table or column name.

I've been looking at this, but I'm having a hard time figuring out a
way to eliminate shift/reduce conflicts while not maintaining REJECTS
as a fully reserved keyword - I'm pretty sure it's impossible with an
LALR parser. I'm not totally enamored with the exact syntax proposed
-- I appreciate the flexibility on the one hand, but on the other hand
I suppose that REJECTS could just as easily be any number of other
words.

One possible compromise would be to use a synonym that is not imagined
to be in use very widely, although I looked up reject in a thesaurus
and didn't feel too great about that idea afterwards. Another idea
would be to have a REJECTING keyword, as the sort of complement of
RETURNING (currently you can still ask for RETURNING, without REJECTS
but with ON DUPLICATE KEY LOCK FOR UPDATE if that happens to make
sense). I think that would work fine, and might actually be more
elegant. Now, REJECTING will probably have to be a reserved keyword,
but that seems less problematic, particularly as RETURNING is itself a
reserved keyword not described by the standard. In my opinion
REJECTING would reinforce the notion of projecting the complement of
what RETURNING would project in the same context.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-02 Thread Andres Freund

On 2013-12-27 14:11:44 -0800, Peter Geoghegan wrote:
 On Fri, Dec 27, 2013 at 12:57 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  I don't think the current syntax the feature implements can be used as
  the sole argument what the feature should be able to support.
 
  If you think from the angle of a async MM replication solution
  replicating a table with multiple unique keys, not having to specify a
  single index we to expect conflicts from, is surely helpful.
 
 Well, you're not totally on your own for something like that with this
 feature. You can project the conflicter's tid, and possibly do a more
 sophisticated recovery, like inspecting the locked row and iterating.

Yea, but in that case I *do* conflict with more than one index and old
values need to stay locked. Otherwise anything resembling
forward-progress guarantee is lost.

 That's probably not at all ideal, but then I can't even imagine what
 the best interface for what you describe here looks like. If there are
 multiple conflicts, do you delete or update some or all of them? How
 do you express that concept from a DML statement?

For my usecases just getting the tid back is fine - it's in C
anyway. But I'd rather be in a position to do it from SQL as well...

If there are multiple conflicts the conflicting row should be
updated. If we didn't release the value locks on the individual indexes,
we can know beforehand whether only one row is going to be affected. If
there really are more than one, error out.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

On Thu, Jan 2, 2014 at 1:49 AM, Andres Freund and...@2ndquadrant.com wrote:
 Well, you're not totally on your own for something like that with this
 feature. You can project the conflicter's tid, and possibly do a more
 sophisticated recovery, like inspecting the locked row and iterating.

 Yea, but in that case I *do* conflict with more than one index and old
 values need to stay locked. Otherwise anything resembling
 forward-progress guarantee is lost.

I'm not sure I understand. In a very real sense they do stay locked.
What is insufficient about locking the definitively visible row with
the value, rather than the value itself? What distinction are you
making? On the first conflict you can delete the row you locked, and
then re-try, possibly further merging some stuff from the just-deleted
row when you next upsert.

It's possible that an earlier unique index value that is unlocked
before row locking proceeds will get a new would-be duplicate after
you're returned a locked row, but it's not obvious that that's a
problem for your use-case (a problem that can't be worked around), or
that promise tuples get you anything better.

 That's probably not at all ideal, but then I can't even imagine what
 the best interface for what you describe here looks like. If there are
 multiple conflicts, do you delete or update some or all of them? How
 do you express that concept from a DML statement?

 For my usecases just getting the tid back is fine - it's in C
 anyway. But I'd rather be in a position to do it from SQL as well...

I believe you can.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-02 Thread Andres Freund

On 2014-01-02 02:20:02 -0800, Peter Geoghegan wrote:
 On Thu, Jan 2, 2014 at 1:49 AM, Andres Freund and...@2ndquadrant.com wrote:
  Well, you're not totally on your own for something like that with this
  feature. You can project the conflicter's tid, and possibly do a more
  sophisticated recovery, like inspecting the locked row and iterating.
 
  Yea, but in that case I *do* conflict with more than one index and old
  values need to stay locked. Otherwise anything resembling
  forward-progress guarantee is lost.
 
 I'm not sure I understand. In a very real sense they do stay locked.
 What is insufficient about locking the definitively visible row with
 the value, rather than the value itself?

Locking the definitely visible row only works if there's a row matching
the index's columns. If the values of the new row don't have
corresponding values in all the indexes you have the same old race
conditions again.
I think to be useful for many cases you really need to be able to ask
for a potentially conflicting row and be sure that if there's none you
are able to insert the row separately.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-02 Thread Robert Haas

On Tue, Dec 31, 2013 at 4:12 AM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Dec 31, 2013 at 12:52 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 1. PromiseTupleInsertionLockAcquire(my xid)
 2. Insert heap tuple
 3. Insert index tuples
 4. Check if conflict happened. Kill the already-inserted tuple on conflict.
 5. PromiseTupleInsertionLockRelease(my xid)

 IOW, the only change to the current patch is that you acquire the new kind
 of lock before starting the insertion, and you release it after you've
 killed the tuple, or you know you're not going to kill it.

 Where does row locking fit in there? - you may need to retry when that
 part is incorporated, of course. What if you have multiple promise
 tuples from a contended attempt to insert a single slot, or multiple
 broken promise tuples across multiple slots or even multiple commands
 in the same xact?

Yeah, it seems like PromiseTupleInsertionLockAcquire should be locking
the tuple, rather than the XID.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-02 Thread Heikki Linnakangas


On 01/02/2014 02:53 PM, Robert Haas wrote:

On Tue, Dec 31, 2013 at 4:12 AM, Peter Geoghegan p...@heroku.com wrote:

On Tue, Dec 31, 2013 at 12:52 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

1. PromiseTupleInsertionLockAcquire(my xid)
2. Insert heap tuple
3. Insert index tuples
4. Check if conflict happened. Kill the already-inserted tuple on conflict.
5. PromiseTupleInsertionLockRelease(my xid)

IOW, the only change to the current patch is that you acquire the new kind
of lock before starting the insertion, and you release it after you've
killed the tuple, or you know you're not going to kill it.


Where does row locking fit in there? - you may need to retry when that
part is incorporated, of course. What if you have multiple promise
tuples from a contended attempt to insert a single slot, or multiple
broken promise tuples across multiple slots or even multiple commands
in the same xact?


You can only have one speculative insertion in progress at a time. After 
you've done all the index insertions and checked that you really didn't 
conflict with anyone, you're not going to go back and kill the tuple 
anymore. After that point, the insertion is not speculation anymore.



Yeah, it seems like PromiseTupleInsertionLockAcquire should be locking
the tuple, rather than the XID.


Well, that would be ideal, because we already have tuple locks. It would 
be nice to use the same concept for this. It's a bit tricky, however. I 
guess the most straightforward way to do it would be to grab a 
heavy-weight lock after you've inserted the tuple, but before releasing 
the buffer lock. I don't immediately see a problem with that, although 
it's a bit scary to acquire a heavy-weight lock while holding a buffer lock.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-02 Thread Robert Haas

On Thu, Jan 2, 2014 at 11:08 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 01/02/2014 02:53 PM, Robert Haas wrote:
 On Tue, Dec 31, 2013 at 4:12 AM, Peter Geoghegan p...@heroku.com wrote:

 On Tue, Dec 31, 2013 at 12:52 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:

 1. PromiseTupleInsertionLockAcquire(my xid)
 2. Insert heap tuple
 3. Insert index tuples
 4. Check if conflict happened. Kill the already-inserted tuple on
 conflict.
 5. PromiseTupleInsertionLockRelease(my xid)

 IOW, the only change to the current patch is that you acquire the new
 kind
 of lock before starting the insertion, and you release it after you've
 killed the tuple, or you know you're not going to kill it.


 Where does row locking fit in there? - you may need to retry when that
 part is incorporated, of course. What if you have multiple promise
 tuples from a contended attempt to insert a single slot, or multiple
 broken promise tuples across multiple slots or even multiple commands
 in the same xact?

 You can only have one speculative insertion in progress at a time. After
 you've done all the index insertions and checked that you really didn't
 conflict with anyone, you're not going to go back and kill the tuple
 anymore. After that point, the insertion is not speculation anymore.

Yeah... but how does someone examining the tuple know that?  We need
to avoid having them block on the promise-tuple insertion lock if
we've reacquired it meanwhile for a new speculative insertion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

I decided to make at least a cursory attempt to measure or
characterize the performance of each of our approaches to value
locking. Being fair here is a non-trivial matter, because of the fact
that upserts can behave quite differently based on the need to insert
or update, lock contention and so on. Also, I knew that anything I
came up with would not be comparing like with like: as things stand,
the btree locking code is more or less correct, and the alternative
exclusion constraint supporting implementation is more or less
incorrect (of course, you may yet describe a way to fix the
unprincipled deadlocking previously revealed by my testcase [1], but
it is far from clear what impact this fix will have on performance).
Still, something is better than nothing.

This was run on a Linux server (Linux 3.8.0-31-generic
#46~precise1-Ubuntu) with these specifications:
https://www.hetzner.de/en/hosting/produkte_rootserver/ex40 .
Everything fits in shared_buffers, but the I/O system is probably the
weakest link here.

To be 100% clear, I am comparing
btreelock_insert_on_dup.v5.2013_12_28.patch.gz [2] with
exclusion_insert_on_dup.2013_12_19.patch.gz [3]. I'm also testing a
third approach, involving avoidance of exclusive buffer locks and
heavyweight locks for upserts in the first phase of speculative
insertion. That patch is unposted, but shows a modest improvement over
[2].

I ran this against the table foo:

pgbench=# \d+ foo
  Table public.foo
 Column |  Type   | Modifiers | Storage  | Stats target | Description
+-+---+--+--+-
 a  | integer | not null  | plain|  |
 b  | integer |   | plain|  |
 c  | text|   | extended |  |
Indexes:
foo_pkey PRIMARY KEY, btree (a)
Has OIDs: no

My custom script was:

\setrandom rec 1 :scale
with rej as(insert into foo(a, b, c) values(:rec, :rec, 'insert') on
duplicate key lock for update returning rejects *) update foo set c =
'update' from rej where foo.a = rej.a;

I specified that each pgbench client in each run should last for 200k
upserts (with 100k possible distinct key values), not that it should
last some number of seconds. The idea is that there is a reasonable
mix of inserts and updates initially, for lower client counts, but
exactly the same number of queries are run for each patch, so as to
normalize the effects of contention across both runs (this sure is
hand-wavy, but likely better than nothing). I'm just looking for
approximate numbers here, and I'm sure that you could find more than
one way to benchmark this feature, with varying degrees of sympathy
towards each of our two approaches to value locking. This benchmark
isn't sympathetic to btree locking at all, because there is a huge
amount of contention for the higher client counts, with 100% of
possible rows updated by the time we're done at 16 clients, for
example.

To compensate somewhat for the relatively low duration of each run, I
take an average-of-5, rather than an average-of-3 as representative
for each client count + run/patch combination.

Full report of results are here:
http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/upsert-cmp/

My executive summary is that the exclusion patch performs about the
same on lower client counts, presumably due to not having the
additional window of btree lock contention. By 8 clients, the
exclusion patch does noticeably better, but it's a fairly modest
improvement.

Forgive me if I'm belaboring the point, but even though I'm
benchmarking the simplest possible upsert statements, had I chosen
small pgbench scale factors (e.g. scales that would see 100 - 1000
possible distinct key values in total) the btree locking
implementation would surely win very convincingly, just because the
alternative implementation would spend almost all of its time
deadlocked, waiting for the deadlock detector to free clients in one
second deadlock_timeout cycles. My goal here was just to put a rough
number on how these two approaches compare, while trying to be as fair
as possible.

I have to wonder about the extent to which the exclusion approach
benefits from holding old value locks. So even if the unprincipled
deadlocking issue can be fixed without much additional cost, it might
be that the simple fact that that approach holds those pseudo value
locks (i.e. old dead rows from previous iterations on the same tuple
slot) indefinitely helps performance, and losing that property alone
will hurt performance, even though it's necessary.

For those that wonder what the effect on multiple unique index would
be, that isn't really all that relevant, since contention on multiple
unique indexes isn't expected with idiomatic usage (though I suppose
an upsert's non-HOT update would have to compete).

[1] 
http://www.postgresql.org/message-id/cam3swzshbe29kpod44cvc3vpzjgmder6k_6fghiszeozgmt...@mail.gmail.com

[2]

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

On Thu, Jan 2, 2014 at 8:08 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Yeah, it seems like PromiseTupleInsertionLockAcquire should be locking
 the tuple, rather than the XID.

 Well, that would be ideal, because we already have tuple locks. It would be
 nice to use the same concept for this. It's a bit tricky, however. I guess
 the most straightforward way to do it would be to grab a heavy-weight lock
 after you've inserted the tuple, but before releasing the buffer lock. I
 don't immediately see a problem with that, although it's a bit scary to
 acquire a heavy-weight lock while holding a buffer lock.

That's a really big modularity violation. Everything after
RelationPutHeapTuple() but before the buffer unlock in heap_insert()
is currently critical section. I'm not saying that it can't be done,
but it certainly is scary.

We also have heavyweight page locks, currently used by hash indexes.
That approach does not require us to contort the row locking code, and
certainly does not require us to acquire heavyweight locks with buffer
locks already held. I could understand your initial disinclination to
doing things this way, particularly when the unprincipled deadlocking
problem was not well understood, but I think that this must tip the
balance in favor of the approach I advocate. What I've done with
heavyweight locks is a modest, localized, logical expansion on the
existing mechanism, that is easy to reason about, with room for
further optimization in the future, that still has reasonable
performance characteristics today, including I believe better
worst-case latency. Heavyweight locks on btree pages are very well
precedented, if you look beyond Postgres.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

On Thu, Jan 2, 2014 at 2:37 AM, Andres Freund and...@2ndquadrant.com wrote:
 Locking the definitely visible row only works if there's a row matching
 the index's columns. If the values of the new row don't have
 corresponding values in all the indexes you have the same old race
 conditions again.

I still don't get it - perhaps you should break down exactly what you
mean with an example. I'm talking about potentially doing multiple
upserts per row proposed for insertion to handle multiple conflicts,
perhaps with some deletes between upserts, not just one upsert with a
single update part.

 I think to be useful for many cases you really need to be able to ask
 for a potentially conflicting row and be sure that if there's none you
 are able to insert the row separately.

Why? What work do you need to perform after reserving the right to
insert but before inserting? Can't you just upsert resulting in
insert, and then perform that work, potentially deleting the row
inserted if and when you change your mind? Is there any real
difference between what that does for you, and what any particular
variety of promise tuple might do for you?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE