Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-04-17 Thread Bruce Momjian
On Mon, Feb 10, 2014 at 06:40:30PM +, Peter Geoghegan wrote:
 On Sun, Jan 19, 2014 at 2:17 AM, Peter Geoghegan p...@heroku.com wrote:
  I'm just throwing an error when locking the tuple returns
  HeapTupleInvisible, and the xmin of the tuple is our xid.
 
 I would like some feedback on this point. We need to consider how
 exactly to avoid updating the same tuple inserted by our command.
 Updating a tuple we inserted cannot be allowed to happen, not least
 because to do so causes livelock.
 
 A related consideration that I raised in mid to late January that
 hasn't been commented on is avoiding updating the same tuple twice,
 and where we come down on that with respect to where our
 responsibility to the user starts and ends. For example, SQL MERGE
 officially forbids this, but MySQL's INSERT...ON DUPLICATE KEY UPDATE
 seems not to, probably due to implementation considerations.

Where are we on this?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-04-17 Thread Peter Geoghegan
On Thu, Apr 17, 2014 at 9:52 AM, Bruce Momjian br...@momjian.us wrote:
 Where are we on this?

My hope is that I can get agreement on a way forward during pgCon. Or,
at the very least, explain the issues as I see them in a relatively
accessible and succinct way to those interested.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-02-10 Thread Heikki Linnakangas

On 02/07/2014 01:27 PM, Peter Geoghegan wrote:

On Thu, Jan 16, 2014 at 12:35 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

I think you should consider breaking off the relcache parts of my
patch and committing them, because they're independently useful.


Makes sense. Can you extract that into a separate patch, please?


Perhaps you can take a look at this again, when you get a chance.


The relcache parts? I don't think a separate patch ever appeared that 
could be reviewed.


Looking again at the last emails in this whole thread, I don't have 
anything to add. At this point, I think it's pretty clear this won't 
make it into 9.4, so I'm going to mark this as returned with feedback. 
If someone else thinks this still has a chance and is willing to review 
this and beat it into shape, please resurrect it quickly.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-02-10 Thread Peter Geoghegan
On Mon, Feb 10, 2014 at 11:57 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 The relcache parts? I don't think a separate patch ever appeared that could
 be reviewed.


I posted the patch on January 18th:
http://www.postgresql.org/message-id/cam3swzth4vkesot7dcrwbprn7zzhnz-wa6zmvo1ff7gbnoj...@mail.gmail.com

I was under the impression that you agreed that this was independently
valuable, regardless of the outcome here.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-02-10 Thread Peter Geoghegan
On Sun, Jan 19, 2014 at 2:17 AM, Peter Geoghegan p...@heroku.com wrote:
 I'm just throwing an error when locking the tuple returns
 HeapTupleInvisible, and the xmin of the tuple is our xid.

I would like some feedback on this point. We need to consider how
exactly to avoid updating the same tuple inserted by our command.
Updating a tuple we inserted cannot be allowed to happen, not least
because to do so causes livelock.

A related consideration that I raised in mid to late January that
hasn't been commented on is avoiding updating the same tuple twice,
and where we come down on that with respect to where our
responsibility to the user starts and ends. For example, SQL MERGE
officially forbids this, but MySQL's INSERT...ON DUPLICATE KEY UPDATE
seems not to, probably due to implementation considerations.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-02-07 Thread Peter Geoghegan
On Thu, Jan 16, 2014 at 12:35 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I think you should consider breaking off the relcache parts of my
 patch and committing them, because they're independently useful.

 Makes sense. Can you extract that into a separate patch, please?

Perhaps you can take a look at this again, when you get a chance.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-18 Thread Robert Haas
On Thu, Jan 16, 2014 at 3:35 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Makes sense. Can you extract that into a separate patch, please?

 I was wondering if that might cause deadlocks if an existing index is
 changed from unique to non-unique, or vice versa, as the ordering would
 change. But we don't have a DDL command to change that, so the question is
 moot.

It's not hard to imagine someone wanting to add such a DDL command.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-18 Thread Peter Geoghegan
On Sat, Jan 18, 2014 at 5:28 AM, Robert Haas robertmh...@gmail.com wrote:
 I was wondering if that might cause deadlocks if an existing index is
 changed from unique to non-unique, or vice versa, as the ordering would
 change. But we don't have a DDL command to change that, so the question is
 moot.

 It's not hard to imagine someone wanting to add such a DDL command.

Perhaps, but the burden of solving that problem ought to rest with
whoever eventually propose the command. Certainly, if someone did so
today, I would object on the grounds that their patch precluded us
from ever prioritizing unique indexes, to get them out of the way
during insertion, so I am not actually making such an effort more
difficult than it already is. Moreover, avoiding entirely predictable
index bloat is more important than making supporting this yet to be
proposed feature's implementation easier. I was surprised when I
learned that things didn't already work this way.

Attached patch, broken off from my patch has relcache sort indexes by
(!indisunique, relindexid).

-- 
Peter Geoghegan
*** a/src/backend/utils/cache/relcache.c
--- b/src/backend/utils/cache/relcache.c
*** typedef struct relidcacheent
*** 108,113 
--- 108,125 
  	Relation	reldesc;
  } RelIdCacheEnt;
  
+ /*
+  *		Representation of indexes for sorting purposes
+  *
+  *		We use this to sort indexes globally by a specific sort order, per
+  *		RelationGetIndexList().
+  */
+ typedef struct relidunq
+ {
+ 	bool		indisunique;
+ 	Oid			relindexid;
+ } relidunq;
+ 
  static HTAB *RelationIdCache;
  
  /*
*** static TupleDesc GetPgClassDescriptor(vo
*** 246,252 
  static TupleDesc GetPgIndexDescriptor(void);
  static void AttrDefaultFetch(Relation relation);
  static void CheckConstraintFetch(Relation relation);
! static List *insert_ordered_oid(List *list, Oid datum);
  static void IndexSupportInitialize(oidvector *indclass,
  	   RegProcedure *indexSupport,
  	   Oid *opFamily,
--- 258,264 
  static TupleDesc GetPgIndexDescriptor(void);
  static void AttrDefaultFetch(Relation relation);
  static void CheckConstraintFetch(Relation relation);
! static int relidunq_cmp(const void *a, const void *b);
  static void IndexSupportInitialize(oidvector *indclass,
  	   RegProcedure *indexSupport,
  	   Oid *opFamily,
*** CheckConstraintFetch(Relation relation)
*** 3445,3455 
   * Such indexes are expected to be dropped momentarily, and should not be
   * touched at all by any caller of this function.
   *
!  * The returned list is guaranteed to be sorted in order by OID.  This is
!  * needed by the executor, since for index types that we obtain exclusive
!  * locks on when updating the index, all backends must lock the indexes in
!  * the same order or we will get deadlocks (see ExecOpenIndices()).  Any
!  * consistent ordering would do, but ordering by OID is easy.
   *
   * Since shared cache inval causes the relcache's copy of the list to go away,
   * we return a copy of the list palloc'd in the caller's context.  The caller
--- 3457,3471 
   * Such indexes are expected to be dropped momentarily, and should not be
   * touched at all by any caller of this function.
   *
!  * The returned list is guaranteed to be in (!indisunique, OID) order.  This is
!  * needed by the executor, since for index types that we obtain exclusive locks
!  * on when updating the index, all backends must lock the indexes in the same
!  * order or we will get deadlocks (see ExecOpenIndices()).  For most purposes
!  * any consistent ordering would do, but there is further consideration, which
!  * is why we put unique indexes first: it is generally useful to get insertion
!  * into unique indexes out of the way, since unique violations are the cause of
!  * many aborted transactions.  We can always avoid bloating non-unique indexes
!  * of the same slot.
   *
   * Since shared cache inval causes the relcache's copy of the list to go away,
   * we return a copy of the list palloc'd in the caller's context.  The caller
*** RelationGetIndexList(Relation relation)
*** 3469,3475 
  	SysScanDesc indscan;
  	ScanKeyData skey;
  	HeapTuple	htup;
! 	List	   *result;
  	char		replident = relation-rd_rel-relreplident;
  	Oid			oidIndex = InvalidOid;
  	Oid			pkeyIndex = InvalidOid;
--- 3485,3495 
  	SysScanDesc indscan;
  	ScanKeyData skey;
  	HeapTuple	htup;
! 	relidunq   *indexTypes;
! 	int			nIndexType;
! 	int			i;
! 	Size		szIndexTypes;
! 	List	   *result = NIL;
  	char		replident = relation-rd_rel-relreplident;
  	Oid			oidIndex = InvalidOid;
  	Oid			pkeyIndex = InvalidOid;
*** RelationGetIndexList(Relation relation)
*** 3486,3494 
  	 * list into the relcache entry.  This avoids cache-context memory leakage
  	 * if we get some sort of error partway through.
  	 */
- 	result = NIL;
  	oidIndex = InvalidOid;
  
  	/* Prepare to scan pg_index for entries having indrelid = this rel. */
  	

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-18 Thread Peter Geoghegan
On Thu, Jan 16, 2014 at 6:31 PM, Peter Geoghegan p...@heroku.com wrote:
 I think we need to give this some more thought. I have not addressed
 the implications for MVCC snapshots here.

So I gave this some more thought, and this is what I came up with:

+ static bool
+ ExecLockHeapTupleForUpdateSpec(EState *estate,
+  ResultRelInfo 
*relinfo,
+  ItemPointer tid)
+ {
+   Relationrelation = 
relinfo-ri_RelationDesc;
+   HeapTupleData   tuple;
+   HeapUpdateFailureData   hufd;
+   HTSU_Result test;
+   Buffer  buffer;
+
+   Assert(ItemPointerIsValid(tid));
+
+   /* Lock tuple for update */
+   tuple.t_self = *tid;
+   test = heap_lock_tuple(relation, tuple,
+  estate-es_output_cid,
+  LockTupleExclusive, false, 
/* wait */
+  true, buffer, hufd);
+   ReleaseBuffer(buffer);
+
+   switch (test)
+   {
+   case HeapTupleInvisible:
+   /*
+* Tuple may have originated from this transaction, in 
which case
+* it's already locked.  However, to avoid having to 
consider the
+* case where the user locked an instantaneously 
invisible row
+* inserted in the same command, throw an error.
+*/
+   if 
(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple.t_data)))
+   ereport(ERROR,
+   
(errcode(ERRCODE_UNIQUE_VIOLATION),
+errmsg(could not lock 
instantaneously invisible tuple
inserted in same transaction),
+errhint(Ensure that no rows 
proposed for insertion in the
same command have constrained values that duplicate each other.)));
+   if (IsolationUsesXactSnapshot())
+   ereport(ERROR,
+   
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+errmsg(could not serialize 
access due to concurrent update)));
+   /* Tuple became invisible due to concurrent update; try 
again */
+   return false;
+   case HeapTupleSelfUpdated:
+   /*

I'm just throwing an error when locking the tuple returns
HeapTupleInvisible, and the xmin of the tuple is our xid.

It's sufficient to just check
TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple.t_data)),
because there is no way that _bt_check_unique() could consider the
tuple dirty visible + conclusively fit for a lock attempt if it came
from our xact, while at the same time for the same tuple
HeapTupleSatisfiesUpdate() indicated invisibility, unless the tuple
originated from the same command. Checking against subxacts or
ancestor xacts is at worst redundant.

I am happy with this. ISTM that it'd be hard to argue that any
reasonable and well-informed person would ever thank us for trying
harder here, although it took me a while to reach that position. To
understand what I mean, consider what MySQL does when in a similar
position. I didn't actually check, but based on the fact that their
docs don't consider this question I guess MySQL would go update the
tuple inserted by that same INSERTON DUPLICATE KEY UPDATE
command. Most of the time the conflicting tuples proposed for
insertion by the user are in *some* way different (i.e. if the table
was initially empty and you did a regular insert, inserting those same
tuples would cause a unique constraint violation all on their own, but
without there being any fully identical tuples among these
hypothetical tuples proposed for insertion). It seems obvious that the
order in which each tuple is evaluated for insert-or-update on MySQL
is more or less undefined. And so by allowing this, they arguably
allow their users to miss something they should not: they don't end up
doing anything useful with the datums originally inserted in the
command, but then subsequently updated over with something else in the
same command.

MySQL users are not notified that this happened, and are probably
blissfully unaware that there has been a limited form of data loss. So
it's The Right Thing to say to Postgres users: if you inserted these
rows into the table when it was empty, there'd *still* definitely be a
unique constraint violation, and you need to sort that out before
asking Postgres to handle conflicts with concurrent sessions and
existing data, where rows that come from earlier commands in your xact

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-18 Thread Peter Geoghegan
On Sat, Jan 18, 2014 at 6:17 PM, Peter Geoghegan p...@heroku.com wrote:
 MySQL users are not notified that this happened, and are probably
 blissfully unaware that there has been a limited form of data loss. So
 it's The Right Thing to say to Postgres users: if you inserted these
 rows into the table when it was empty, there'd *still* definitely be a
 unique constraint violation, and you need to sort that out before
 asking Postgres to handle conflicts with concurrent sessions and
 existing data, where rows that come from earlier commands in your xact
 counts as existing data.

I Googled and found evidence indicating that a number of popular
proprietary system's SQL MERGE implementations do much the same thing.
You may get an attempt to UPDATE the same row twice error on both
SQL Server and Oracle. I wouldn't like to speculate if the standard
requires this of MERGE, but to require it seems very sensible.

 The only problem I can see with that is that
 we cannot complain consistently for practical reasons, as when we lock
 *some other* xact's tuple rather than inserting in the same command
 two or more times.

Actually, maybe it would be practical to complain that the same UPSERT
command attempted to lock a row twice with at least *almost* total
accuracy, and not just for the particularly problematic case where
tuple visibility is not assured.

Personally, I favor just making case HeapTupleSelfUpdated: within
the patch's ExecLockHeapTupleForUpdateSpec() function complain when
hufd.cmax == estate-es_output_cid) (currently there is a separate
complaint, but only when those two variables are unequal). That's
probably almost perfect in practice.

If we wanted perfection, which would be to always complain when two
rows were locked by the same UPSERT command, it would be a matter of
having heap_lock_tuple indicate to the patch's
ExecLockHeapTupleForUpdateSpec() caller that the row was already
locked, so that it could complain in a special way for the
locked-not-updated case. But that is hard, because there is no way for
it to know if the current *command* locked the tuple, and that's the
only case that we are justified in raising an error for.

But now that I think about it some more, maybe always complaining when
we lock but have not yet updated is not just not worth the trouble,
but is in fact bogus. It's not obvious what precise behavior is
correct here. I was worried about someone updating something twice,
but maybe it's fully sufficient to do what I've already proposed,
while in addition documenting that you cannot on-duplicate-key-lock a
tuple that has already been inserted or updated within the same
command. It will be very rare for anyone to trip up over that in
practice (e.g. by locking twice and spuriously updating the same row
twice or more in a later command). Users learn to not try this kind of
thing by having it break immediately; the fact that it doesn't break
with 100% reliability is good enough (plus it doesn't *really* fail to
break when it should because of how things are documented).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-18 Thread Peter Geoghegan
On Sat, Jan 18, 2014 at 7:49 PM, Peter Geoghegan p...@heroku.com wrote:
 Personally, I favor just making case HeapTupleSelfUpdated: within
 the patch's ExecLockHeapTupleForUpdateSpec() function complain when
 hufd.cmax == estate-es_output_cid) (currently there is a separate
 complaint, but only when those two variables are unequal). That's
 probably almost perfect in practice.

Actually, there isn't really a need to do so, since I believe in
practice the tuple locked will always be instantaneously invisible
(when we have the scope to avoid this updated the tuple twice in the
same command problem by forbidding it in the style of SQL MERGE).
However, I think I'm going to propose that we still do something in
the ExecLockHeapTupleForUpdateSpec() HeapTupleSelfUpdated handler (in
addition to HeapTupleInvisible), because that'll still be illustrative
dead code.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-16 Thread Heikki Linnakangas

On 01/16/2014 03:25 AM, Peter Geoghegan wrote:

I think you should consider breaking off the relcache parts of my
patch and committing them, because they're independently useful. If we
are going to have a lot of conflicts that need to be handled by a
heap_delete(), there is no point in inserting non-unique index tuples
for what is not yet conclusively a proper (non-promise) tuple. Those
should always come last. And even without upsert, strictly inserting
into unique indexes first seems like a useful thing relative to the
cost. Unique violations are the cause of many aborted transactions,
and there is no need to ever bloat non-unique indexes of the same slot
when that happens.


Makes sense. Can you extract that into a separate patch, please?

I was wondering if that might cause deadlocks if an existing index is 
changed from unique to non-unique, or vice versa, as the ordering would 
change. But we don't have a DDL command to change that, so the question 
is moot.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-16 Thread Peter Geoghegan
On Thu, Jan 16, 2014 at 12:35 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Makes sense. Can you extract that into a separate patch, please?

Okay.

On an unrelated note, here are results for a benchmark that compares
the two patches for an insert heavy workload:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/insert-heavy-cmp/

I should point out that this is a sympathetic case for the exclusion
approach; there is only one unique index involved, and the heap tuples
were relatively wide:

pg@gerbil:~/pgbench-tools/tests$ cat tpc-b-upsert.sql
\set nbranches 10
\set naccounts 10
\setrandom aid 1 :naccounts
\setrandom bid 1 :nbranches
\setrandom delta -5000 5000
with rej as(insert into pgbench_accounts(aid, bid, abalance, filler)
values(:aid, :bid, :delta, 'filler') on duplicate key lock for update
returning rejects aid, abalance) update pgbench_accounts set abalance
= pgbench_accounts.abalance + rej.abalance from rej where
pgbench_accounts.aid = rej.aid;

(This benchmark used an unlogged table, if only because to do
otherwise would severely starve this particular server of I/O).
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-16 Thread Peter Geoghegan
On Wed, Jan 15, 2014 at 11:02 PM, Peter Geoghegan p...@heroku.com wrote:
 It might just be a matter of:

 @@ -186,6 +186,13 @@ ExecLockHeapTupleForUpdateSpec(EState *estate,
 switch (test)
 {
 case HeapTupleInvisible:
 +   /*
 +* Tuple may have originated from this command, in 
 which case it's
 +* already locked
 +*/
 +   if 
 (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple.t_data))
 
 +   HeapTupleHeaderGetCmin(tuple.t_data) == 
 estate-es_output_cid)
 +   return true;
 /* Tuple became invisible;  try again */
 if (IsolationUsesXactSnapshot())
 ereport(ERROR,

I think we need to give this some more thought. I have not addressed
the implications for MVCC snapshots here. I think that I'll need to
raise a WARNING along the lines of your snapshot isn't going to
consider the locked tuple visible because the same command inserted
it, or perhaps even raise an ERROR regardless of isolation level
(although note that I'm not suggesting that we raise an ERROR in the
event of receiving HeapTupleInvisible from heap_lock_tuple()/HTSU()
for other reasons, which *is* possible, nor am I suggesting that later
commands of the same xact would ever see this ERROR). I'm comfortable
with the idea of what you might loosely describe as a READ COMMITTED
mode serialization failure here, because this case is so much more
narrow than the other case I've proposed making a special exception to
the general semantics of MVCC snapshots to accommodate (i.e. the case
where a tuple is locked from an xact logically still-in-progress to
our snapshot in RC mode).

I think I'll be happy to declare that usage of the feature that hits
this issue is somewhere between questionable and wrong. It probably
isn't worth making another, similar HTSMVCC exception for this case.
But ISTM that we still have to do *something* other than simply credit
users with taking care to avoid tripping up on this.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-15 Thread Peter Geoghegan
On Tue, Jan 14, 2014 at 3:25 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Attached is a patch doing that, to again demonstrate what I mean. I'm not
 sure if setting xmin to invalid is really the best way to mark the tuple
 dead; I don't think a tuple's xmin can currently be InvalidTransaction under
 any other circumstances, so there might be some code out there that's not
 prepared for it. So using an infomask bit might indeed be better. Or
 something else entirely.

Have you thought about the implications for other snapshot types (or
other tqual.c routines)? My concern is that a client of that
infrastructure (either current or future) could spuriously conclude
that a heap tuple satisfied it, when in fact only a promise tuple
satisfied it. It wouldn't necessarily follow that the promise would be
fulfilled, nor that there would be some other proper heap tuple
equivalent to that fulfilled promise tuple as far as those clients are
concerned.

heap_delete() will not call HeapTupleSatisfiesUpdate() when you're
deleting a promise tuple, which on the face of it is fine - it's
always going to technically be instantaneously invisible, because it's
always created by the same command id (i.e. HeapTupleSatisfiesUpdate()
would just return HeapTupleInvisible if called). So far so good, but
we are technically doing something else quite new - deleting a
would-be instantaneously invisible tuple. So like your concern about
setting xmin to invalid, my concern is that code may exist that treats
cmin  cmax as an invariant. Now, you might think that that would be a
manageable concern, and to be fair a look at the ComboCids code that
mostly arbitrates that stuff seems to indicate that it's okay, but
it's still worth noting.

I think you should consider breaking off the relcache parts of my
patch and committing them, because they're independently useful. If we
are going to have a lot of conflicts that need to be handled by a
heap_delete(), there is no point in inserting non-unique index tuples
for what is not yet conclusively a proper (non-promise) tuple. Those
should always come last. And even without upsert, strictly inserting
into unique indexes first seems like a useful thing relative to the
cost. Unique violations are the cause of many aborted transactions,
and there is no need to ever bloat non-unique indexes of the same slot
when that happens.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-15 Thread Peter Geoghegan
On Tue, Jan 14, 2014 at 3:07 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Right, but with your approach, can you really be sure that you have
 the right rejecting tuple ctid (not reject)? In other words, as you
 wait for the exclusion constraint to conclusively indicate that there
 is a conflict, minutes may have passed in which time other conflicts
 may emerge in earlier unique indexes. Whereas with an approach where
 values are locked, you are guaranteed that earlier unique indexes have
 no conflicting values. Maintaining that property seems useful, since
 we check in a well-defined order, and we're still projecting a ctid.
 Unlike when row locking is involved, we can make no assumptions or
 generalizations around where conflicts will occur. Although that may
 also be a general concern with your approach when row locking, for
 multi-master replication use-cases. There may be some value in knowing
 it cannot have been earlier unique indexes (and so the existing values
 for those unique indexes in the locked row should stay the same -
 don't many conflict resolution policies work that way?).

 I don't understand what you're saying. Can you give an example?

 In the use case I was envisioning above, ie. you insert N rows, and if any
 of them violate constraint, you still want to insert the non-violating
 instead of rolling back the whole transaction, you don't care. You don't
 care what existing rows the new rows conflicted with.

 Even if you want to know what you conflicted with, I can't make sense of
 what you're saying. In the btreelock approach, the value locks are
 immediately released once you discover that there's conflict. So by the time
 you get to do anything with the ctid of the existing tuple you conflicted
 with, new conflicting tuples might've appeared.

That's true, but at least the timeframe in which an additional
conflict may occur on just-locked index values in bound to more or
less an instant. In any case how important this is is an interesting
question, and perhaps one that Andres can weigh in on as someone that
knows a lot about multi-master replication. This issue is particularly
interesting because this testcase appears to make both patches
livelock, for reasons that I believe are related:

https://github.com/petergeoghegan/upsert/blob/master/torture.sh

I have an idea of what I could do to fix this, but I don't have time
to make sure that my hunch is correct. I'm travelling tomorrow to give
a talk at PDX pug, so I'll have limited access to e-mail.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-15 Thread Peter Geoghegan
On Wed, Jan 15, 2014 at 8:23 PM, Peter Geoghegan p...@heroku.com wrote:
 I have an idea of what I could do to fix this, but I don't have time
 to make sure that my hunch is correct.

It might just be a matter of:

@@ -186,6 +186,13 @@ ExecLockHeapTupleForUpdateSpec(EState *estate,
switch (test)
{
case HeapTupleInvisible:
+   /*
+* Tuple may have originated from this command, in 
which case it's
+* already locked
+*/
+   if 
(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple.t_data))

+   HeapTupleHeaderGetCmin(tuple.t_data) == 
estate-es_output_cid)
+   return true;
/* Tuple became invisible;  try again */
if (IsolationUsesXactSnapshot())
ereport(ERROR,

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-14 Thread Peter Geoghegan
On Mon, Jan 13, 2014 at 6:45 PM, Peter Geoghegan p...@heroku.com wrote:
 + uint32
 + SpeculativeInsertionIsInProgress(TransactionId xid, RelFileNode rel,
 ItemPointer tid)
 + {

For the purposes of preventing unprincipled deadlocking, commenting
out the following (the only caller of the above) has no immediately
discernible effect with any of the test-cases that I've published:

/* XXX shouldn't we fall through to look at xmax? */
+   /* XXX why? or is that now covered by the above check? 
*/
+   snapshot-speculativeToken =
+   
SpeculativeInsertionIsInProgress(HeapTupleHeaderGetRawXmin(tuple),
+   
 rnode,
+   
 htup-t_self);
+
+   snapshot-xmin = HeapTupleHeaderGetRawXmin(tuple);
return true;/* in insertion by other */

I think that the prevention of unprincipled deadlocking is all down to
this immediately prior piece of code, at least in those test cases:

!   /*
!* in insertion by other.
!*
!* Before returning true, check for the special case 
that the
!* tuple was deleted by the same transaction that 
inserted it.
!* Such a tuple will never be visible to anyone else, 
whether
!* the transaction commits or aborts.
!*/
!   if (!(tuple-t_infomask  HEAP_XMAX_INVALID) 
!   !(tuple-t_infomask  HEAP_XMAX_COMMITTED) 
!   !(tuple-t_infomask  HEAP_XMAX_IS_MULTI) 
!   !HEAP_XMAX_IS_LOCKED_ONLY(tuple-t_infomask) 
!   HeapTupleHeaderGetRawXmax(tuple) == 
HeapTupleHeaderGetRawXmin(tuple))
!   {
!   return false;
!   }

But why should it be acceptable to change the semantics of dirty
snapshots like this, which previously always returned true when
control reached here? It is a departure from their traditional
behavior, not limited to clients of this new promise tuple
infrastructure. Now, it becomes entirely a matter of whether we tried
to insert before or after the deleting xact's deletion (of a tuple it
originally inserted) as to whether or not we block. So in general we
don't get to keep our old value locks until xact end when we update
or delete. Even if you don't consider this a bug for existing dirty
snapshot clients (I myself do - we can't rely on deleting a row and
re-inserting the same values now, which could be particularly
undesirable for updates), I have already described how we can take
advantage of deleting tuples while still holding on to their value
locks [1] to Andres. I think it'll be very important for multi-master
conflict resolution. I've already described this useful property of
dirty snapshots numerous times on this thread in relation to different
aspects, as it happens. It's essential.

Anyway, I guess you're going to need an infomask bit to fix this, so
you can differentiate between 'promise' tuples and 'proper' tuples.
Those are in short supply. I still think this problem is more or less
down to a modularity violation, and I suspect that this is not the
last problem that will be found along these lines if we continue to
pursue this approach.

[1] 
http://www.postgresql.org/message-id/CAM3SWZQpLSGPS2Kd=-n6hvyiqkf_mcxmx-q72ar9upzq-x6...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-14 Thread Heikki Linnakangas

On 01/14/2014 12:20 PM, Peter Geoghegan wrote:

I think that the prevention of unprincipled deadlocking is all down to
this immediately prior piece of code, at least in those test cases:



!   /*
!* in insertion by other.
!*
!* Before returning true, check for the special case 
that the
!* tuple was deleted by the same transaction that 
inserted it.
!* Such a tuple will never be visible to anyone else, 
whether
!* the transaction commits or aborts.
!*/
!   if (!(tuple-t_infomask  HEAP_XMAX_INVALID) 
!   !(tuple-t_infomask  HEAP_XMAX_COMMITTED) 
!   !(tuple-t_infomask  HEAP_XMAX_IS_MULTI) 
!   !HEAP_XMAX_IS_LOCKED_ONLY(tuple-t_infomask) 
!   HeapTupleHeaderGetRawXmax(tuple) == 
HeapTupleHeaderGetRawXmin(tuple))
!   {
!   return false;
!   }

But why should it be acceptable to change the semantics of dirty
snapshots like this, which previously always returned true when
control reached here? It is a departure from their traditional
behavior, not limited to clients of this new promise tuple
infrastructure. Now, it becomes entirely a matter of whether we tried
to insert before or after the deleting xact's deletion (of a tuple it
originally inserted) as to whether or not we block. So in general we
don't get to keep our old value locks until xact end when we update
or delete.


Hmm. So the scenario would be that a process inserts a tuple, but kills 
it again later in the transaction, and then re-inserts the same value. 
The expectation is that because it inserted the value once already, 
inserting it again will not block. Ie. inserting and deleting a tuple 
effectively acquires a value-lock on the inserted values.



Even if you don't consider this a bug for existing dirty
snapshot clients (I myself do - we can't rely on deleting a row and
re-inserting the same values now, which could be particularly
undesirable for updates),


Yeah, it would be bad if updates start failing because of this. We could 
add a check for that, and return true if the tuple was updated rather 
than deleted.



I have already described how we can take
advantage of deleting tuples while still holding on to their value
locks [1] to Andres. I think it'll be very important for multi-master
conflict resolution. I've already described this useful property of
dirty snapshots numerous times on this thread in relation to different
aspects, as it happens. It's essential.


I didn't understand that description.


Anyway, I guess you're going to need an infomask bit to fix this, so
you can differentiate between 'promise' tuples and 'proper' tuples.


Yeah, that's one way. Or you could set xmin to invalid, to make the 
killed tuple look thoroughly dead to everyone.



Those are in short supply. I still think this problem is more or less
down to a modularity violation, and I suspect that this is not the
last problem that will be found along these lines if we continue to
pursue this approach.


You have suspected that many times throughout this thread, and every 
time there's been a relatively simple solutions to the issues you've 
raised. I suspect that's also going to be true for whatever mundane next 
issue you come up with.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-14 Thread Heikki Linnakangas

On 01/14/2014 12:44 AM, Peter Geoghegan wrote:

On Mon, Jan 13, 2014 at 12:58 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

Well, even if you don't agree that locking all the conflicting rows for
update is sensible, it's still perfectly sensible to return the rejected
rows to the user. For example, you're inserting N rows, and if some of them
violate a constraint, you still want to insert the non-conflicting rows
instead of rolling back the whole transaction.


Right, but with your approach, can you really be sure that you have
the right rejecting tuple ctid (not reject)? In other words, as you
wait for the exclusion constraint to conclusively indicate that there
is a conflict, minutes may have passed in which time other conflicts
may emerge in earlier unique indexes. Whereas with an approach where
values are locked, you are guaranteed that earlier unique indexes have
no conflicting values. Maintaining that property seems useful, since
we check in a well-defined order, and we're still projecting a ctid.
Unlike when row locking is involved, we can make no assumptions or
generalizations around where conflicts will occur. Although that may
also be a general concern with your approach when row locking, for
multi-master replication use-cases. There may be some value in knowing
it cannot have been earlier unique indexes (and so the existing values
for those unique indexes in the locked row should stay the same -
don't many conflict resolution policies work that way?).


I don't understand what you're saying. Can you give an example?

In the use case I was envisioning above, ie. you insert N rows, and if 
any of them violate constraint, you still want to insert the 
non-violating instead of rolling back the whole transaction, you don't 
care. You don't care what existing rows the new rows conflicted with.


Even if you want to know what you conflicted with, I can't make sense of 
what you're saying. In the btreelock approach, the value locks are 
immediately released once you discover that there's conflict. So by the 
time you get to do anything with the ctid of the existing tuple you 
conflicted with, new conflicting tuples might've appeared.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-14 Thread Peter Geoghegan
On Tue, Jan 14, 2014 at 2:43 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Hmm. So the scenario would be that a process inserts a tuple, but kills it
 again later in the transaction, and then re-inserts the same value. The
 expectation is that because it inserted the value once already, inserting it
 again will not block. Ie. inserting and deleting a tuple effectively
 acquires a value-lock on the inserted values.

Right.

 Yeah, it would be bad if updates start failing because of this. We could add
 a check for that, and return true if the tuple was updated rather than
 deleted.

Why would you fix it that way?

 I have already described how we can take
 advantage of deleting tuples while still holding on to their value
 locks [1] to Andres. I think it'll be very important for multi-master
 conflict resolution. I've already described this useful property of
 dirty snapshots numerous times on this thread in relation to different
 aspects, as it happens. It's essential.

 I didn't understand that description.

I was describing how deleting existing locked rows, and re-inserting,
could deal with multiple conflicts for multi-master replication
use-cases. It hardly matters much though, because it's not as if the
usefulness and necessity of this property of dirty snapshots is in
question.

 Anyway, I guess you're going to need an infomask bit to fix this, so
 you can differentiate between 'promise' tuples and 'proper' tuples.

 Yeah, that's one way. Or you could set xmin to invalid, to make the killed
 tuple look thoroughly dead to everyone.

I'm think you'll have to use an infomask bit so everyone knows that
this is a promise tuple from the start. Otherwise, I suspect that
there are race conditions. The problem was that
inserted-then-deleted-in-same-xact tuples (both regular and promise)
were invisible to all xacts' dirty snapshots, when they should have
only been invisible to the deleting xact's dirty snapshot. So it isn't
obvious to me how you interlock things such that another xact doesn't
incorrectly decide that it has to wait on what is really a promise
tuple's xact for the full duration of that xact, having found no
speculative insertion token to ShareLock (which implies unprincipled
deadlocking), while simultaneously having other sessions not fail to
see as dirty-visible a same-xact-inserted-deleted non-promise tuple
(thereby ensuring those other sessions correctly conclude that it is
necessary to wait for the end of the xmin/xmax xact). If you set the
xmin to invalid too late, it doesn't help any existing waiters.

Even if setting xmin to invalid is workable, it's a strike against the
performance of your approach, because it's another heap buffer
exclusive lock.

 You have suspected that many times throughout this thread, and every time
 there's been a relatively simple solutions to the issues you've raised. I
 suspect that's also going to be true for whatever mundane next issue you
 come up with.

I don't think it's a mundane issue. But in any case, you haven't
addressed why you think your proposal is more or less better than my
proposal, which is the pertinent question. You haven't given me so
much as a high level summary of whatever misgivings you may have about
it, even though I've asked you to comment on my approach to value
locking several times. You haven't pointed out that it has any
specific bug (which is not to suppose that that's because there are
none). The point is that it is not my contention that what you're
proposing is totally unworkable. Rather, I think that the original
proposal will probably ultimately perform better in all cases, is
easier to reason about and is certainly far more modular. It appears
to me to be the more conservative of the two proposals. In all
sincerity, I simply don't know what factors you're weighing here. In
saying that, I really don't mean to imply that you're assigning weight
to things in a way that I am in disagreement with. I simply don't
understand what is important to you here, and why your proposal
preserves or enhances the things that you believe are important. Would
you please explain your position along those lines?

Now, I'll concede that it will be harder to make the IGNORE syntax
work with exclusion constraints with what I've done, which would be
nice. However, in my opinion that should be given far less weight than
these other issues. It's ON DUPLICATE KEY...; no one could reasonably
assume that exclusion constraints were covered. Also, upserting with
exclusion constraints is a non-starter. It's only applicable to the
case where you're using exclusion constraints exactly as you would use
unique constraints, which has to be very rare. It will cause much more
confusion than anything else.

INSERT IGNORE in MySQL works with NOT NULL constraints, unique
constraints, and all other constraints. FWIW I think that it would be
kind of arbitrary to make IGNORE work with exclusion constraints and
not other types of constraints, whereas when it's 

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-14 Thread Heikki Linnakangas

On 01/14/2014 11:22 PM, Peter Geoghegan wrote:

On Tue, Jan 14, 2014 at 2:43 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

You have suspected that many times throughout this thread, and every time
there's been a relatively simple solutions to the issues you've raised. I
suspect that's also going to be true for whatever mundane next issue you
come up with.


I don't think it's a mundane issue. But in any case, you haven't
addressed why you think your proposal is more or less better than my
proposal, which is the pertinent question.


1. It's simpler.

2. Works for exclusion constraints.


You haven't given me so
much as a high level summary of whatever misgivings you may have about
it, even though I've asked you to comment on my approach to value
locking several times. You haven't pointed out that it has any
specific bug (which is not to suppose that that's because there are
none). The point is that it is not my contention that what you're
proposing is totally unworkable. Rather, I think that the original
proposal will probably ultimately perform better in all cases, is
easier to reason about and is certainly far more modular. It appears
to me to be the more conservative of the two proposals. In all
sincerity, I simply don't know what factors you're weighing here. In
saying that, I really don't mean to imply that you're assigning weight
to things in a way that I am in disagreement with. I simply don't
understand what is important to you here, and why your proposal
preserves or enhances the things that you believe are important. Would
you please explain your position along those lines?


I guess that simplicity is in the eye of the beholder, but please take a 
look at git diff --stat:


 41 files changed, 1224 insertions(+), 107 deletions(-)

vs.

 50 files changed, 2215 insertions(+), 240 deletions(-)

Admittedly, some of the difference comes from the fact that you've spent 
a lot more time commenting and polishing the btreelock patch. But mostly 
I dislike additional complexity required in b-tree for this.


I don't think B-tree locking is more conservative. The 
insert-and-then-check approach is already used by exclusion constraints, 
I'm just extending it to not abort on conflict, but do something else.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-14 Thread Heikki Linnakangas

On 01/14/2014 11:22 PM, Peter Geoghegan wrote:

The problem was that
inserted-then-deleted-in-same-xact tuples (both regular and promise)
were invisible to all xacts' dirty snapshots, when they should have
only been invisible to the deleting xact's dirty snapshot.


Right.


So it isn't
obvious to me how you interlock things such that another xact doesn't
incorrectly decide that it has to wait on what is really a promise
tuple's xact for the full duration of that xact, having found no
speculative insertion token to ShareLock (which implies unprincipled
deadlocking), while simultaneously having other sessions not fail to
see as dirty-visible a same-xact-inserted-deleted non-promise tuple
(thereby ensuring those other sessions correctly conclude that it is
necessary to wait for the end of the xmin/xmax xact). If you set the
xmin to invalid too late, it doesn't help any existing waiters.


If a backend finds no speculative insertion token to ShareLock, then it 
really isn't a speculative insertion, and the process should sleep on 
the xid as usual.


Once we remove the modification in HeapTupleSatisfiesDirty() that made 
it return false when xmin == xmax, the problem that arises is that 
another backend that sees the killed tuple incorrectly determines that 
it has to wait for it that transaction to finish, even though it was a 
speculatively inserted tuple that was killed, and hence can be ignored. 
We can avoid that problem by setting xmin to invalid, or otherwise 
marking the tuple as dead.


Attached is a patch doing that, to again demonstrate what I mean. I'm 
not sure if setting xmin to invalid is really the best way to mark the 
tuple dead; I don't think a tuple's xmin can currently be 
InvalidTransaction under any other circumstances, so there might be some 
code out there that's not prepared for it. So using an infomask bit 
might indeed be better. Or something else entirely.


- Heikki


speculative-insertions-revisions.2014_01_15.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-14 Thread Peter Geoghegan
On Tue, Jan 14, 2014 at 2:16 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I don't think it's a mundane issue. But in any case, you haven't
 addressed why you think your proposal is more or less better than my
 proposal, which is the pertinent question.

 1. It's simpler.

 2. Works for exclusion constraints.

Thank you for clarifying where you're coming from.

 I guess that simplicity is in the eye of the beholder, but please take a
 look at git diff --stat:

  41 files changed, 1224 insertions(+), 107 deletions(-)

 vs.

  50 files changed, 2215 insertions(+), 240 deletions(-)

 Admittedly, some of the difference comes from the fact that you've spent a
 lot more time commenting and polishing the btreelock patch. But mostly I
 dislike additional complexity required in b-tree for this.

It's very much down to differences in how well commented and
documented each patch is. I have a fully formed amendment to the AM
interface, complete with documentation of the AM and btree aspects,
and detailed comments around how the parts fit together. But you've
already explored doing something similar to what I do, to similarly
avoid having to refind the page (less the heavyweight locking), which
seems almost equivalent to what I propose in terms of its impact on
btree, before we consider anything else.

 I don't think B-tree locking is more conservative. The insert-and-then-check
 approach is already used by exclusion constraints, I'm just extending it to
 not abort on conflict, but do something else.

If you examine what I actually do, you'll see that it's pretty much
equivalent to how the extant value locking of unique btree indexes has
always worked. It's just that the process is staggered at an exact
point, the point where traditionally we hold no buffer locks, only a
buffer pin (although we do additionally verify that the index gives
the go-ahead before getting to later indexes, to get consensus to
proceed with insertion).

The suggestion that mine is the conservative approach is also based on
the fact that database systems have made use of page level exclusive
locks on indexes, managed by the lock manager, persisting over complex
operations in many different contexts for many years.  This includes
Postgres, where for many years relcache takes precautions again
deadlocking in such AMs by ordering the list of indexes associated
with a relation by pg_index.indexrelid. Currently this may not be
necessary, but the principle stands.

The insert-then-check approach of exclusion constraints is quite
different to what is proposed here, because exclusion constraints only
ever have to abort the xact if things don't work out. There is no
value locking. That's far easier to pin down. You definitely don't
have to do anything new with visibility.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-13 Thread Heikki Linnakangas

On 01/11/2014 12:40 AM, Peter Geoghegan wrote:

My problem is that in general I'm not sold on the actual utility of
making this kind of row locking work with exclusion constraints. I'm
sincerely having a hard time thinking of a practical use-case
(although, as I've said, I want to make it work with IGNORE). Even if
you work all this row locking stuff out, and the spill-to-disk aspect
out, the interface is still wrong, because you need to figure out a
way to project more than one reject per slot. Maybe I lack imagination
around how to make that work, but there are a lot of ifs and buts
either way.


Exclusion constraints can be used to implement uniqueness checks with 
SP-GiST or GiST indexes. For example, if you want to enforce that there 
are no two tuples with the same x and y coordinates, ie. use a point as 
the key. You could add a b-tree index just to enforce the constraint, 
but it's better if you don't have to. In general, it's just always 
better if features don't have implementation-specific limitations like this.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-13 Thread Heikki Linnakangas

On 01/11/2014 12:39 PM, Peter Geoghegan wrote:

In any case, my patch is bound to win decisively for the other
extreme, the insert-only case, because the overhead of doing an index
scan first is always wasted there with your approach, and the overhead
of extended btree leaf page locking has been shown to be quite low.


Quite possibly. Run the benchmark, and we'll see how big a difference 
we're talking about.



In
the past you've spoken of avoiding that overhead through an adaptive
strategy based on statistics, but I think you'll have a hard time
beating a strategy where the decision comes as late as possible, and
is informed by highly localized page-level metadata already available.
My implementation can abort an attempt to just read an existing
would-be duplicate very inexpensively (with no strong locks), going
back to just after the _bt_search() to get a heavyweight lock if just
reading doesn't work out (if there is no duplicate found), so as to
not waste all of its prior work. Doing one of the two extremes of
insert-mostly or update-only well is relatively easy; dynamically
adapting to one or the other is much harder. Especially if it's a
consistent mix of inserts and updates, where general observations
aren't terribly useful.


Another way to optimize it is to keep the b-tree page pinned after doing 
the pre-check. Then you don't need to descend the tree again when doing 
the insert. That would require small indexam API changes, but wouldn't 
be too invasive, I think.



All other concerns of mine still remain, including the concern over
the extra locking of the proc array - I'm concerned about the
performance impact of that on other parts of the system not exercised
by this test.


Yeah, I'm not thrilled about that part either. Fortunately there are 
other ways to implement that. In fact, I think you could just not bother 
taking the ProcArrayLock when setting the fields. The danger is that 
another backend sees a mixed state of the fields, but that's OK. The 
worst that can happen is that it will do an unnecessary lock/release on 
the heavy-weight lock. And to reduce the overhead when reading the 
fields, you could merge the SpeculativeInsertionIsInProgress() check 
into TransactionIdIsInProgress(). The call site in tqual.c always calls 
it together with TransactionIdIsInProgress(), which scans the proc array 
anyway, while holding the lock.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-13 Thread Peter Geoghegan
On Mon, Jan 13, 2014 at 12:23 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Exclusion constraints can be used to implement uniqueness checks with
 SP-GiST or GiST indexes. For example, if you want to enforce that there are
 no two tuples with the same x and y coordinates, ie. use a point as the key.
 You could add a b-tree index just to enforce the constraint, but it's better
 if you don't have to. In general, it's just always better if features don't
 have implementation-specific limitations like this.

That seems rather narrow. Among other things, I worry about the
baggage for users in documenting supporting SP-GiST/GiST. We support
it, but it only really works for the case where you're using exclusion
constraints as unique constraints, something that might make sense in
certain narrow contexts, contrary to our earlier general statement
that a unique index should be preferred there. We catalog amcanunique
methods as the way that we support unique indexes. I really do feel
that that's the appropriate level to support the feature at, and I
have not precluded other amcanunique implementations from doing the
same, having documented the intended value locking interface/contract
for the benefit of any future amcanunique AM author. It's ON DUPLICATE
KEY, not ON OVERLAPPING KEY, or any other syntax suggestive of
exclusion constraints and their arbitrary commutative operators.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-13 Thread Robert Haas
On Mon, Jan 13, 2014 at 1:53 PM, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Jan 13, 2014 at 12:23 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 Exclusion constraints can be used to implement uniqueness checks with
 SP-GiST or GiST indexes. For example, if you want to enforce that there are
 no two tuples with the same x and y coordinates, ie. use a point as the key.
 You could add a b-tree index just to enforce the constraint, but it's better
 if you don't have to. In general, it's just always better if features don't
 have implementation-specific limitations like this.

 That seems rather narrow. Among other things, I worry about the
 baggage for users in documenting supporting SP-GiST/GiST. We support
 it, but it only really works for the case where you're using exclusion
 constraints as unique constraints, something that might make sense in
 certain narrow contexts, contrary to our earlier general statement
 that a unique index should be preferred there. We catalog amcanunique
 methods as the way that we support unique indexes. I really do feel
 that that's the appropriate level to support the feature at, and I
 have not precluded other amcanunique implementations from doing the
 same, having documented the intended value locking interface/contract
 for the benefit of any future amcanunique AM author. It's ON DUPLICATE
 KEY, not ON OVERLAPPING KEY, or any other syntax suggestive of
 exclusion constraints and their arbitrary commutative operators.

For what it's worth, I agree with Heikki.  There's probably nothing
sensible an upsert can do if it conflicts with more than one tuple,
but if it conflicts with just exactly one, it oughta be OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-13 Thread Peter Geoghegan
On Mon, Jan 13, 2014 at 12:17 PM, Robert Haas robertmh...@gmail.com wrote:
 For what it's worth, I agree with Heikki.  There's probably nothing
 sensible an upsert can do if it conflicts with more than one tuple,
 but if it conflicts with just exactly one, it oughta be OK.

If there is exactly one, *and* the existing value is exactly the same
as the value proposed for insertion (or, I suppose, a subset of the
existing value, but that's so narrow that it might as well not apply).
In short, when you're using an exclusion constraint as a unique
constraint. Which is very narrow indeed. Weighing the costs and the
benefits, that seems like far more cost than benefit, before we even
consider anything beyond simply explaining the applicability and
limitations of upserting with exclusion constraints. It's generally
far cleaner to define speculative insertion as something that happens
with unique indexes only.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-13 Thread Heikki Linnakangas

On 01/13/2014 10:53 PM, Peter Geoghegan wrote:

On Mon, Jan 13, 2014 at 12:17 PM, Robert Haas robertmh...@gmail.com wrote:

For what it's worth, I agree with Heikki.  There's probably nothing
sensible an upsert can do if it conflicts with more than one tuple,
but if it conflicts with just exactly one, it oughta be OK.


If there is exactly one, *and* the existing value is exactly the same
as the value proposed for insertion (or, I suppose, a subset of the
existing value, but that's so narrow that it might as well not apply).
In short, when you're using an exclusion constraint as a unique
constraint. Which is very narrow indeed. Weighing the costs and the
benefits, that seems like far more cost than benefit, before we even
consider anything beyond simply explaining the applicability and
limitations of upserting with exclusion constraints. It's generally
far cleaner to define speculative insertion as something that happens
with unique indexes only.


Well, even if you don't agree that locking all the conflicting rows for 
update is sensible, it's still perfectly sensible to return the rejected 
rows to the user. For example, you're inserting N rows, and if some of 
them violate a constraint, you still want to insert the non-conflicting 
rows instead of rolling back the whole transaction.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-13 Thread Peter Geoghegan
On Mon, Jan 13, 2014 at 12:58 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Well, even if you don't agree that locking all the conflicting rows for
 update is sensible, it's still perfectly sensible to return the rejected
 rows to the user. For example, you're inserting N rows, and if some of them
 violate a constraint, you still want to insert the non-conflicting rows
 instead of rolling back the whole transaction.

Right, but with your approach, can you really be sure that you have
the right rejecting tuple ctid (not reject)? In other words, as you
wait for the exclusion constraint to conclusively indicate that there
is a conflict, minutes may have passed in which time other conflicts
may emerge in earlier unique indexes. Whereas with an approach where
values are locked, you are guaranteed that earlier unique indexes have
no conflicting values. Maintaining that property seems useful, since
we check in a well-defined order, and we're still projecting a ctid.
Unlike when row locking is involved, we can make no assumptions or
generalizations around where conflicts will occur. Although that may
also be a general concern with your approach when row locking, for
multi-master replication use-cases. There may be some value in knowing
it cannot have been earlier unique indexes (and so the existing values
for those unique indexes in the locked row should stay the same -
don't many conflict resolution policies work that way?).


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-13 Thread Peter Geoghegan
On Mon, Jan 13, 2014 at 12:49 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 In any case, my patch is bound to win decisively for the other
 extreme, the insert-only case, because the overhead of doing an index
 scan first is always wasted there with your approach, and the overhead
 of extended btree leaf page locking has been shown to be quite low.

 Quite possibly. Run the benchmark, and we'll see how big a difference we're
 talking about.

I'll come up with something and let you know.

 Another way to optimize it is to keep the b-tree page pinned after doing the
 pre-check. Then you don't need to descend the tree again when doing the
 insert. That would require small indexam API changes, but wouldn't be too
 invasive, I think.

You'll still need a callback to drop the pin when it transpires that
there is a conflict in a later unique index, and state to pass a bt
stack back, at which point you've already made exactly the same
changes to the AM interface as in my proposal. The only difference is
that the core code doesn't rely on the value locks being released
after an instant, but that isn't something that you take advantage of.
Furthermore, AFAIK there is no reason to think that anything other
than btree will benefit, which makes it a bit unfortunate that the AM
has to support it generally.

So, again, it's kind of a modularity violation, and it may not even
actually be possible, since _bt_search() is only callable with an
insertion scankey, which is the context in which the existing
guarantee around releasing locks and re-searching from that point
applies, for reasons that seem to me to be very subtle. At the very
least you need to pass a btstack to _bt_doinsert() to save the work of
re-scanning, as I do.

 All other concerns of mine still remain, including the concern over
 the extra locking of the proc array - I'm concerned about the
 performance impact of that on other parts of the system not exercised
 by this test.

 Yeah, I'm not thrilled about that part either. Fortunately there are other
 ways to implement that. In fact, I think you could just not bother taking
 the ProcArrayLock when setting the fields. The danger is that another
 backend sees a mixed state of the fields, but that's OK. The worst that can
 happen is that it will do an unnecessary lock/release on the heavy-weight
 lock. And to reduce the overhead when reading the fields, you could merge
 the SpeculativeInsertionIsInProgress() check into
 TransactionIdIsInProgress(). The call site in tqual.c always calls it
 together with TransactionIdIsInProgress(), which scans the proc array
 anyway, while holding the lock.

Currently in your patch all insertions do
SpeculativeInsertionLockAcquire(GetCurrentTransactionId()) -
presumably this is not something you intend to keep. Also, you should
not do this for regular insertion:

if (options  HEAP_INSERT_SPECULATIVE)
SetSpeculativeInsertion(relation-rd_node, heaptup-t_self);

Can you explain the following, please?:

+ /*
+  * Returns a speculative insertion token for waiting for the insertion to
+  * finish.
+  */
+ uint32
+ SpeculativeInsertionIsInProgress(TransactionId xid, RelFileNode rel,
ItemPointer tid)
+ {
+   uint32  result = 0;
+   ProcArrayStruct *arrayP = procArray;
+   int index;

Why is this optimization correct? Presently it allows your patch to
avoid getting a shared ProcArrayLock from HeapTupleSatisfiesDirty().

+   if (TransactionIdPrecedes(xid, TransactionXmin))
+   return false;

So from HeapTupleSatisfiesDirty(), you're checking if xid (the
passed tuple's xmin) precedes our transaction's xmin (well, that of
our last snapshot updated by GetSnapshotData()). This is set within
GetSnapshotData(), but we're dealing with a dirty snapshot with no
xmin, so TransactionXmin pertains to our MVCC snapshot, not our dirty
snapshot.

It isn't really true that TransactionIdIsInProgress() gets the same
shared ProcArrayLock in a similar fashion, for a full linear search; I
think that the various fast-paths make it far less likely than it is
for SpeculativeInsertionIsInProgress() (or, perhaps, should be). Here
is what that other routine does in around the same place:

/*
 * Don't bother checking a transaction older than RecentXmin; it could 
not
 * possibly still be running.  (Note: in particular, this guarantees 
that
 * we reject InvalidTransactionId, FrozenTransactionId, etc as not
 * running.)
 */
if (TransactionIdPrecedes(xid, RecentXmin))
{
xc_by_recent_xmin_inc();
return false;
}

This extant code checks against RecentXmin, *not* TransactionXmin.  It
also caches things quite effectively, but that caching isn't very
useful to you here. It checks latestCompletedXid before doing a linear
search through the proc array too.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-11 Thread Peter Geoghegan
On Fri, Jan 10, 2014 at 7:59 PM, Peter Geoghegan p...@heroku.com wrote:
 *shrug*. I'm not too concerned about performance during contention. But
 let's see how this fixed version performs. Could you repeat the tests you
 did with this?

 Why would you not be too concerned about the performance with
 contention? It's a very important aspect. But even if you don't, if
 you look at the transaction throughput with only a single client in
 the update-heavy benchmark [1] (with one client there is a fair mix of
 inserts and updates), my approach still comes out far ahead.
 Transaction throughput is almost 100% higher, with the *difference*
 exceeding 150% at 8 clients but never reaching too much higher. I
 think that the problem isn't so much with contention between clients
 as much as with contention between inserts and updates, which affects
 everyone to approximately the same degree. And the average max latency
 across runs for one client is 130.447 ms, as opposed to 0.705 ms with
 my patch - that's less than 1%. Whatever way you cut it, the
 performance of my approach is far superior. Although we should
 certainly investigate the impact of your most recent revision, and I
 intend to, how can you not consider those differences to be extremely
 significant?

So I re-ran the same old benchmark, where we're almost exclusively
updating. Results for your latest revision were very similar to my
patch:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/exclusion-no-deadlock/

This suggests that the main problem encountered was lock contention
among old, broken promise tuples. Note that this benchmark doesn't
involve any checkpointing, and everything fits in memory.
Opportunistic pruning is possible, which I'd imagine helps a lot with
the bloat, at least in this benchmark - there are only every 100,000
live tuples. That might not always be true, of course.

In any case, my patch is bound to win decisively for the other
extreme, the insert-only case, because the overhead of doing an index
scan first is always wasted there with your approach, and the overhead
of extended btree leaf page locking has been shown to be quite low. In
the past you've spoken of avoiding that overhead through an adaptive
strategy based on statistics, but I think you'll have a hard time
beating a strategy where the decision comes as late as possible, and
is informed by highly localized page-level metadata already available.
My implementation can abort an attempt to just read an existing
would-be duplicate very inexpensively (with no strong locks), going
back to just after the _bt_search() to get a heavyweight lock if just
reading doesn't work out (if there is no duplicate found), so as to
not waste all of its prior work. Doing one of the two extremes of
insert-mostly or update-only well is relatively easy; dynamically
adapting to one or the other is much harder. Especially if it's a
consistent mix of inserts and updates, where general observations
aren't terribly useful.

All other concerns of mine still remain, including the concern over
the extra locking of the proc array - I'm concerned about the
performance impact of that on other parts of the system not exercised
by this test.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-11 Thread Peter Geoghegan
On Sat, Jan 11, 2014 at 2:39 AM, Peter Geoghegan p...@heroku.com wrote:
 So I re-ran the same old benchmark, where we're almost exclusively
 updating. Results for your latest revision were very similar to my
 patch:

 http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/exclusion-no-deadlock/

To put that in context, here is a previously unpublished repeat of the
same benchmark on the slightly improved second most recently submitted
revision of mine, v6:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/upsert-cmp-3/

(recall that I improved things a bit by remember row-locking
conflicts, not just conflicts when we try value locking - that made a
small additional difference, reflected here but not in /upsert-cmp-2/
).

The numbers for each patch are virtually identical. I guess I could
improve my patch by not always getting a heavyweight lock on the first
insert attempt, based on the general observation that we have
previously always updated. My concern would be that that would happen
at the expense of the other case.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Heikki Linnakangas

On 01/10/2014 05:36 AM, Peter Geoghegan wrote:

While I realize that everyone is busy, I'm concerned about the lack of
discussing here. It's been 6 full days since I posted my benchmark,
which I expected to quickly clear some things up, or at least garner
interest, and yet no one has commented here since.


Nah, that's nothing. I have a patch in the January commitfest that was 
already posted for the previous commitfest. It received zero review back 
then, and still has no reviewer signed up, let alone anyone actually 
reviewing it. And arguably it's a bug fix!


http://www.postgresql.org/message-id/5285071b.1040...@vmware.com

Wink wink, if you're looking for patches to review... ;-)


  The alternative exclusion* patch still deadlocks in an unprincipled
fashion, when simple, idiomatic usage encounters contention. Heikki
intends to produce a revision that fixes the problem, though having
considered it carefully myself, I don't know what mechanism he has in
mind, and frankly I'm skeptical.


Here's an updated patch. Hope it works now... This is based on an older 
version, and doesn't include any fixes from your latest 
btreelock_insert_on_dup.v7.2014_01_07.patch. Please check the common 
parts, and copy over any relevant changes.


The fix for the deadlocking issue consists of a few parts. First, 
there's a new heavy-weight lock type, a speculative insertion lock, 
which works somewhat similarly to XactLockTableWait(), but is only held 
for the duration of a single speculative insertion. When a backend is 
about to begin a speculative insertion, it first acquires the 
speculative insertion lock. When it's done with the insertion, meaning 
it has either cancelled it by killing the already-inserted tuple or 
decided that it's going to go ahead with it, the lock is released.


The speculative insertion lock is keyed by Xid, and token. The lock can 
be taken many times in the same transaction, and token's purpose is to 
distinguish which insertion is currently in progress. The token is 
simply a backend-local counter, incremented each time the lock is taken.


In addition to the heavy-weight lock, there are new fields in PGPROC to 
indicate which tuple the backend is currently inserting. When the tuple 
is inserted, the backend fills in the relation's relfilenode and item 
pointer in MyProc-specInsert* fields, while still holding the buffer 
lock. The current speculative insertion token is also stored there.


With that mechanism, when another backend sees a tuple whose xmin is 
still in progress, it can check if the insertion is a speculative 
insertion. To do that, scan the proc array, and find the backend with 
the given xid. Then, check that the relfilenode and itempointer in that 
backend's PGPROC slot match the tuple, and make note of the token the 
backend had advertised.


HeapTupleSatisfiesDirty() does the proc array check, and returns the 
token in the snapshot, alongside snapshot-xmin. The caller can then use 
that information in place of XactLockTableWait().



There would be other ways to skin the cat, but this seemed like the 
quickest to implement. One more straightforward approach would be to use 
the tuple's TID directly in the speculative insertion lock's key, 
instead of Xid+token, but then the inserter would have to grab the 
heavy-weight lock while holding the buffer lock, which seems dangerous. 
Another alternative would be to store token in the heap tuple header, 
instead of PGPROC; a tuple that's still being speculatively inserted has 
no xmax, so it could be placed in that field. Or ctid.



More importantly, I have to question
whether we should continue to pursue that alternative approach, giving
what we now know about its performance characteristics.


Yes.


It could be
improved, but not by terribly much, particularly for the case where
there is plenty of update contention, which was shown in [1] to be
approximately 2-3 times slower than extended page locking (*and*  it's
already looking for would-be duplicates*first*). I'm trying to be as
fair as possible, and yet the difference is huge.


*shrug*. I'm not too concerned about performance during contention. But 
let's see how this fixed version performs. Could you repeat the tests 
you did with this?


Any guesses what the bottleneck is? At a quick glance at a profile of a 
pgbench run with this patch, I didn't see anything out of ordinary, so 
I'm guessing it's lock contention somewhere.


- Heikki


speculative-insertions-2014_01_10.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Heikki Linnakangas

On 01/08/2014 06:46 AM, Peter Geoghegan wrote:

A new revision of my patch is attached.


I'm getting deadlocks with this patch, using the test script you posted 
earlier in 
http://www.postgresql.org/message-id/CAM3SWZQh=8xnvgbbzyhjexujbhwznjutjez9t-dbo9t_mx_...@mail.gmail.com. 
Am doing something wrong, or is that a regression?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Peter Geoghegan
On Fri, Jan 10, 2014 at 7:12 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I'm getting deadlocks with this patch, using the test script you posted
 earlier in
 http://www.postgresql.org/message-id/CAM3SWZQh=8xnvgbbzyhjexujbhwznjutjez9t-dbo9t_mx_...@mail.gmail.com.
 Am doing something wrong, or is that a regression?

Yes. The point of that test case was that it made your V1 livelock
(which you fixed), not deadlock in a way detected by the deadlock
detector, which is the correct behavior.

This testcase was the one that showed up *unprincipled* deadlocking:

http://www.postgresql.org/message-id/cam3swzshbe29kpod44cvc3vpzjgmder6k_6fghiszeozgmt...@mail.gmail.com

I'd focus on that test case.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Heikki Linnakangas

On 01/10/2014 08:37 PM, Peter Geoghegan wrote:

On Fri, Jan 10, 2014 at 7:12 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

I'm getting deadlocks with this patch, using the test script you posted
earlier in
http://www.postgresql.org/message-id/CAM3SWZQh=8xnvgbbzyhjexujbhwznjutjez9t-dbo9t_mx_...@mail.gmail.com.
Am doing something wrong, or is that a regression?


Yes. The point of that test case was that it made your V1 livelock
(which you fixed), not deadlock in a way detected by the deadlock
detector, which is the correct behavior.


Oh, ok. Interesting. With the patch version I posted today, I'm not 
getting deadlocks. I'm not getting duplicates in the table either, so it 
looks like the promise tuple approach somehow avoids the deadlocks, 
while the btreelock patch does not.


Why does it deadlock with the btreelock patch? I don't see why it 
should. If you have two backends inserting a single tuple, and they 
conflict, one of them should succeed to insert, and the other one should 
update.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Peter Geoghegan
On Fri, Jan 10, 2014 at 11:28 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Why does it deadlock with the btreelock patch? I don't see why it should. If
 you have two backends inserting a single tuple, and they conflict, one of
 them should succeed to insert, and the other one should update.

Are you sure that it doesn't make your patch deadlock too, with enough
pressure? I've made that mistake myself.

That test-case made my patch deadlock (in a detected fashion) when it
used buffer locks as a value locking prototype - I say as much right
there in the November mail you linked to. I think that's acceptable,
because it's non-sensible use of the feature (my point was only that
it shouldn't livelock). The test case is naively locking a row without
knowing ahead of time (or pro-actively checking) if the conflict is on
the first or second unique index. So before too long, you're updating
the wrong row (no existing lock is really held), based on the 'a'
column's projected value, when in actuality the conflict was on the
'b' column's projected value. Conditions are right for deadlock,
because two rows are locked, not one.

Although I have not yet properly considered your most recent revision,
I can't imagine why the same would not apply there, since the row
locking component is (probably) still identical. Granted, that
distinction between row locking and value locking is a bit fuzzy in
your approach, but if you happened to not insert any rows in any
previous iterations (i.e. there were no unfilled promise tuples), and
you happened to perform conflict handling first, it could still
happen, albeit with lower probability, no?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Heikki Linnakangas

On 01/10/2014 10:00 PM, Peter Geoghegan wrote:

On Fri, Jan 10, 2014 at 11:28 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

Why does it deadlock with the btreelock patch? I don't see why it should. If
you have two backends inserting a single tuple, and they conflict, one of
them should succeed to insert, and the other one should update.


Are you sure that it doesn't make your patch deadlock too, with enough
pressure? I've made that mistake myself.

That test-case made my patch deadlock (in a detected fashion) when it
used buffer locks as a value locking prototype - I say as much right
there in the November mail you linked to. I think that's acceptable,
because it's non-sensible use of the feature (my point was only that
it shouldn't livelock). The test case is naively locking a row without
knowing ahead of time (or pro-actively checking) if the conflict is on
the first or second unique index. So before too long, you're updating
the wrong row (no existing lock is really held), based on the 'a'
column's projected value, when in actuality the conflict was on the
'b' column's projected value. Conditions are right for deadlock,
because two rows are locked, not one.


I see. Yeah, I also get deadlocks when I change update statement to use 
foo.b = rej.b instead of foo.a = rej.a. I think it's down to the 
indexes are processed, ie. which conflict you see first.


This is pretty much the same issue we discussed wrt. exclusion 
contraints. If the tuple being inserted conflicts with several existing 
tuples, what to do? I think the best answer would be to return and lock 
them all. It could still deadlock, but it's nevertheless less surprising 
behavior than returning one of the tuples in random. Actually, we could 
even avoid the deadlock by always locking the tuples in a certain order, 
although I'm not sure if it's worth the trouble.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Peter Geoghegan
On Fri, Jan 10, 2014 at 1:25 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 This is pretty much the same issue we discussed wrt. exclusion contraints.
 If the tuple being inserted conflicts with several existing tuples, what to
 do? I think the best answer would be to return and lock them all. It could
 still deadlock, but it's nevertheless less surprising behavior than
 returning one of the tuples in random. Actually, we could even avoid the
 deadlock by always locking the tuples in a certain order, although I'm not
 sure if it's worth the trouble.

I understand and accept that as long as we're intent on locking more
than one row per transaction, that action could deadlock with another
session doing something similar. Actually, I've even encountered
people giving advice in relation to proprietary systems along the
lines of: if your big SQL MERGE statement is deadlocking excessively,
you might try hinting to make sure a nested loop join is used. I
think that this kind of ugly compromise is unavoidable in those
scenarios (in reality the most popular strategy is probably cross
your fingers). But as everyone agrees, the common case where an xact
only upserts one row should never deadlock with another, similar xact.
So *that* isn't a problem I have with making row locking work for
exclusion constraints.

My problem is that in general I'm not sold on the actual utility of
making this kind of row locking work with exclusion constraints. I'm
sincerely having a hard time thinking of a practical use-case
(although, as I've said, I want to make it work with IGNORE). Even if
you work all this row locking stuff out, and the spill-to-disk aspect
out, the interface is still wrong, because you need to figure out a
way to project more than one reject per slot. Maybe I lack imagination
around how to make that work, but there are a lot of ifs and buts
either way.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Jim Nasby

On 1/10/14, 4:40 PM, Peter Geoghegan wrote:

My problem is that in general I'm not sold on the actual utility of
making this kind of row locking work with exclusion constraints. I'm
sincerely having a hard time thinking of a practical use-case
(although, as I've said, I want to make it work with IGNORE). Even if
you work all this row locking stuff out, and the spill-to-disk aspect
out, the interface is still wrong, because you need to figure out a
way to project more than one reject per slot. Maybe I lack imagination
around how to make that work, but there are a lot of ifs and buts
either way.


Well, the usual example for exclusion constraints is resource scheduling (ie: 
scheduling what room a class will be held in). In that context is it hard to 
believe that you might want to MERGE a set of new classroom assignments in?
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Peter Geoghegan
On Fri, Jan 10, 2014 at 4:09 PM, Jim Nasby j...@nasby.net wrote:
 Well, the usual example for exclusion constraints is resource scheduling
 (ie: scheduling what room a class will be held in). In that context is it
 hard to believe that you might want to MERGE a set of new classroom
 assignments in?

So you schedule a class that clashes with 3 other classes, and you
want to update all 3 rows/classes with details from your one row
proposed for insertion? That makes no sense, unless the classes were
in fixed time slots, in which case you could use a unique constraint
to begin with. You can't change the rows to have the same time range
for all 3. So you have to delete two first, and update the range of
one. Which two? And you can't really rely on having locked existing
rows operating as a kind of persistent value lock, as I do, because
you've locked a row with a different range to the one you care about -
someone can still insert another row that doesn't block on that one
but blocks on your range. So you really do need a sophisticated, fully
formed value locking infrastructure to make it work, for a feature of
marginal utility at best. I'm having a hard time imagining any user
actually wanting to do any of this, and I'm having a harder time still
imagining anyone putting in the work to make it possible, if indeed it
is possible.

No one has ever implemented fully formed predicate locking in a
commercial database system, because it's an NP-complete problem [1],
[2]. Only very limited special cases are practicable, and I'm pretty
sure this isn't one of them.

[1] http://library.riphah.edu.pk/acm/disk_1/text/1-2/SIGMOD79/P127.PDF

[2] 
http://books.google.com/books?id=wV5Ran71zNoCpg=PA284lpg=PA284dq=predicate+locking+np+completesource=blots=PgNJ5H3L8Vsig=fOZ2Wr4fIxj0eFQD0tCGPLTsfY0hl=ensa=Xei=PpTQUquoBMfFsATtw4CADAved=0CDIQ6AEwAQ#v=onepageq=predicate%20locking%20np%20completef=false
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Jim Nasby

On 1/10/14, 6:51 PM, Peter Geoghegan wrote:

On Fri, Jan 10, 2014 at 4:09 PM, Jim Nasbyj...@nasby.net  wrote:

Well, the usual example for exclusion constraints is resource scheduling
(ie: scheduling what room a class will be held in). In that context is it
hard to believe that you might want to MERGE a set of new classroom
assignments in?

So you schedule a class that clashes with 3 other classes, and you
want to update all 3 rows/classes with details from your one row
proposed for insertion?


Nuts, I was misunderstanding the scenario. I thought this was simply going to 
violate exclusion constraints.

I see what you're saying now, and I'm not coming up with a scenario either. 
Perhaps Jeff Davis could, since he created them... if he can't then I'd say 
we're safe ignoring that aspect.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Peter Geoghegan
On Fri, Jan 10, 2014 at 7:09 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Nah, that's nothing. I have a patch in the January commitfest that was
 already posted for the previous commitfest. It received zero review back
 then, and still has no reviewer signed up, let alone anyone actually
 reviewing it. And arguably it's a bug fix!

 http://www.postgresql.org/message-id/5285071b.1040...@vmware.com

 Wink wink, if you're looking for patches to review... ;-)

Yeah, I did intend to take a closer look at that one (I've looked at
it but have nothing to share yet). I've been a little busy with other
things. That patch is more of the kind where it's a matter of
determining if what you've done is exactly correct (no one would
disagree with the substance of what you propose), whereas there is
uncertainty about whether I've gotten the semantics right and so on.
But that's no excuse. :-)

   The alternative exclusion* patch still deadlocks in an unprincipled
 fashion

 Here's an updated patch. Hope it works now... This is based on an older
 version, and doesn't include any fixes from your latest
 btreelock_insert_on_dup.v7.2014_01_07.patch. Please check the common parts,
 and copy over any relevant changes.

Okay, attached is a revision with some of my fixes for other parts of
the code merged (in particular, for the grammer, ecpg and some aspects
of row locking and visibility).

Some quick observations on your patch - maybe this is obvious, and you
have work-arounds in mind, but this is just my first impression:

* You're always passing HEAP_INSERT_SPECULATIVE to heap_insert, and
therefore in the event of any sort of insertion always getting an
exclusive lock on the procArray. I guess the fact that this always
happens, and not just when upserting is an oversight (I know you just
wanted to throw together a POC), but even still that seems kind of
questionable. Everyone knows that contention during GetSnapshotData is
still a big problem for us. Taking an exclusive ProcArrayLock perhaps
as frequently as more than once per slot seems like a really bad idea,
even if it's limited to speculative inserters.

* It seems questionable that you don't at least have a shared
ProcArrayLock when you set the token value in
SetSpeculativeInsertionToken() (as you know, MyPgXact-xmin can be set
with such a shared lock, so doing something similar here might be
okay, but it's far from obvious that no lock is okay). Now, I guess
you can point towards MinimumActiveBackends() as a kind of precedent,
but that seems substantially less scary than what you've done, because
that's just reading if a field is zero or non-zero. Obviously the
implications of actually doing this are that things get even worse for
performance. And even a shared lock might not be good enough - I'd
have to think about it some more to give a firmer opinion.

 The fix for the deadlocking issue consists of a few parts. First, there's a
 new heavy-weight lock type, a speculative insertion lock, which works
 somewhat similarly to XactLockTableWait(), but is only held for the duration
 of a single speculative insertion. When a backend is about to begin a
 speculative insertion, it first acquires the speculative insertion lock.
 When it's done with the insertion, meaning it has either cancelled it by
 killing the already-inserted tuple or decided that it's going to go ahead
 with it, the lock is released.

I'm afraid I must reiterate my earlier objection to the general thrust
of what you're doing, which is that it is evidently unnecessary to
spread knowledge of value locking around the system, as opposed to
localizing knowledge of it to one module, in this case nbtinsert.c.
While it's true that the idea of the AM abstraction is already perhaps
a little strained, this seems like a big expansion on that problem.
Why should this approach make sense for every conceivable AM that
supports some notion of a constraint? Heavyweight exclusive locks on
indexes (at the page level typically), persisting across complex
operations are not a new thing for Postgres.

 HeapTupleSatisfiesDirty() does the proc array check, and returns the token
 in the snapshot, alongside snapshot-xmin. The caller can then use that
 information in place of XactLockTableWait().

That seems like a modularity violation too. The HeapTupleSatisfiesMVCC
changes reflect a genuine need to make every MVCC snapshot care about
the special visibility exception, whereas only one or two
HeapTupleSatisfiesDirty() callers will ever care about speculative
insertion. Even if you're unmoved by the modularity/aesthetic argument
(which is not to suppose that you actually are), the fact that you're
calling SpeculativeInsertionIsInProgress(), which acquires a shared
ProcArrayLock much of the time from within HeapTupleSatisfiesDirty(),
may have seriously regressed foreign key enforcement, for example.
You're going to need something like a new type of snapshot, basically,
and we probably already have too many of those. But then, 

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Greg Stark
On Sat, Jan 11, 2014 at 12:51 AM, Peter Geoghegan p...@heroku.com wrote:
 On Fri, Jan 10, 2014 at 4:09 PM, Jim Nasby j...@nasby.net wrote:
 Well, the usual example for exclusion constraints is resource scheduling
 (ie: scheduling what room a class will be held in). In that context is it
 hard to believe that you might want to MERGE a set of new classroom
 assignments in?

 So you schedule a class that clashes with 3 other classes, and you
 want to update all 3 rows/classes with details from your one row
 proposed for insertion?


Well, perhaps you want to mark the events as conflicting with your new event?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Peter Geoghegan
On Fri, Jan 10, 2014 at 10:01 PM, Greg Stark st...@mit.edu wrote:
 So you schedule a class that clashes with 3 other classes, and you
 want to update all 3 rows/classes with details from your one row
 proposed for insertion?


 Well, perhaps you want to mark the events as conflicting with your new event?

But short of a sophisticated persistent value locking implementation
(which I'm pretty skeptical of the feasibility of), more conflicting
events could be added at any moment. I doubt that you're appreciably
any better off than if you were to simply check with a select query,
even though that approach is obviously broken. In general, making row
locking work for exclusion constraints, so you can treat them in a way
that allows you to merge on arbitrary operators seems to me like a tar
pit.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-09 Thread Peter Geoghegan
On Tue, Jan 7, 2014 at 8:46 PM, Peter Geoghegan p...@heroku.com wrote:
 I've worked on a simple set of tests, written quickly in bash, that I
 think exercise interesting cases:

 https://github.com/petergeoghegan/upsert

 Perhaps most notably, these tests make comparisons between the
 performance of ordinary inserts with a serial primary key table, and
 effectively equivalent upserts that always insert.

While I realize that everyone is busy, I'm concerned about the lack of
discussing here. It's been 6 full days since I posted my benchmark,
which I expected to quickly clear some things up, or at least garner
interest, and yet no one has commented here since.

Here is a summary of the situation, at least as I understand it:

* My patch has been shown to perform much better than the alternative
promise tuples proposal. The benchmark previously published,
referred to above makes this evident for workloads with lots of
contention [1].

Now, to cover everything, I've gone on to benchmark inserts into a
table foo(serial, int4, text) that lock the row using the new
infrastructure. The SERIAL column is the primary key. I'm trying to
characterize the overhead of the extended value locking here, by
showing the same case (a worst case) with and without the overhead.
Here are the results:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/vs-vanilla-insert/

(asynchronous commits, logged table)

With both extremes covered, the data suggests that my patch performs
very well by *any* standard. But if we consider how things compare to
the alternative proposal, all indications are that performance is far
superior (at least for representative cases without too many unique
indexes, not that I suspect things are much different with many).
Previous concerns about the cost of extended leaf page locking ought
to be almost put to rest by this benchmark, because inserting a
sequence of btree index tuple integers in succession is a particularly
bad case, and yet in practice the implementation does very well. (With
my patch, we're testing the same statement with an ON DUPLICATE KEY
LOCK FOR UPDATE part, but there are naturally no conflicts on the
SERIAL PK - on master we're testing the same INSERT statement without
that, inserting sequence values just as before, only without the
worst-case value locking overhead).

* The alternative exclusion* patch still deadlocks in an unprincipled
fashion, when simple, idiomatic usage encounters contention. Heikki
intends to produce a revision that fixes the problem, though having
considered it carefully myself, I don't know what mechanism he has in
mind, and frankly I'm skeptical. More importantly, I have to question
whether we should continue to pursue that alternative approach, giving
what we now know about its performance characteristics. It could be
improved, but not by terribly much, particularly for the case where
there is plenty of update contention, which was shown in [1] to be
approximately 2-3 times slower than extended page locking (*and* it's
already looking for would-be duplicates *first*). I'm trying to be as
fair as possible, and yet the difference is huge. It's going to be
really hard to beat something where the decision to try to see if we
should insert or update comes so late: the decision is made as late as
possible, is based on strong indicators of likely outcome, while the
cost of making the wrong choice is very low. With shared buffer locks
held calling _bt_check_unique(), we still lock out concurrent would-be
duplicate insertion, and so don't need to restart from scratch (to
insert instead) in the same way as with the alternative proposal's
largely AM-naive approach.

* I am not aware that anyone considers that there are any open items
yet. I've addressed all of those now. Progress here is entirely
blocked on waiting for review feedback.

With the new priorConflict lock strength optimization, my patch is in
some ways similar to what Heikki proposed (in the exclusion* patch).
It's as if the first phase, the locking operation is an index scan
with an identity crisis. It can decide to continue to be an index
scan (albeit an oddball one with an insertion scankey that using
shared buffer locks prevents concurrent duplicate insertion, for very
efficient uniqueness checks), or it can decide to actually insert, at
the last possible moment. The second phase is picked up with much of
the work already complete from the first, so the amount of wasted work
is very close to zero in all cases. How can anything beat that?

If the main argument for the exclusion approach is that it works with
exclusion constraints, then I can still go and make what I've done
work there too (for the IGNORE case, which I maintain is the only
exclusion constraint variant of this that is useful to users). In any
case I think making anything work for exclusion constraints should be
a relatively low priority.

I'd like to hear more opinions on what I've done here, if anyone has
bandwidth to spare. 

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-03 Thread Peter Eisentraut
This patch doesn't apply anymore.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-03 Thread Peter Geoghegan
On Fri, Jan 3, 2014 at 7:39 AM, Peter Eisentraut pete...@gmx.net wrote:
 This patch doesn't apply anymore.

Yes, there was some bit-rot. I previous deferred dealing with a
shift/reduce conflict implied by commit
1b4f7f93b4693858cb983af3cd557f6097dab67b. I've fixed that problem now
using non operator precedence, and performed a clean rebase on master.
I've also fixed the basis of your much earlier complaint about
breakage of ecpg's regression tests (without adding support for the
feature to ecpg). All make check-world tests pass. Patch is attached.
I have yet to figure out how to make REJECTS a non-reserved keyword,
or even just a type_func_name_keyword, though intuitively I have a
sense that the latter ought to be possible.

This is the same basic patch as benchmarked above, with various tricks
to avoid stronger lock acquisition when that's likely profitable (we
can even do _bt_check_unique() with only a shared lock and no hwlock
much of the time, on the well-informed suspicion that it won't be
necessary to insert, but only to return a TID). There has also been
some clean-up to aspects of serializable behavior, but that needs
further attention and scrutiny from a subject matter expert, hopefully
Heikki. Though it's probably also true that I should find time to
think about transaction isolation some more.

I've since had another idea relating to performance optimization,
which was to hint that the last attempt to insert a key was
unsuccessful, so the next one (after the conflicting transaction's
commit/abort) of that same value will very likely conflict too, making
lock avoidance profitable on average. This appears to be much more
effective than the previous woolly heuristic (never published, just
benchmarked), which I've left in as an additional reason to avoid
heavyweight locking, if only for discussion. This benchmark now shows
my approach winning convincingly with this additional priorConflict
optimization:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/upsert-cmp-2/

If someone had time to independently recreate the benchmark I have
here, or perhaps to benchmark the patch in some other way, that would
be useful (for full details see my recent e-mail about the prior
benchmark, where the exact details are described - this is the same,
but with one more run for the priorConflict optimization).

Subtleties of visibility also obviously deserve closer inspection, but
perhaps I shouldn't be so hasty: No consensus on the way forward looks
even close to emerging. How do people feel about my approach now?

-- 
Peter Geoghegan


btreelock_insert_on_dup.v6.2014_01_03.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-03 Thread Peter Geoghegan
On Fri, Dec 13, 2013 at 4:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 BTW, so far as the syntax goes, I'm quite distressed by having to make
 REJECTS into a fully-reserved word.  It's not reserved according to the
 standard, and it seems pretty likely to be something that apps might be
 using as a table or column name.

I've been looking at this, but I'm having a hard time figuring out a
way to eliminate shift/reduce conflicts while not maintaining REJECTS
as a fully reserved keyword - I'm pretty sure it's impossible with an
LALR parser. I'm not totally enamored with the exact syntax proposed
-- I appreciate the flexibility on the one hand, but on the other hand
I suppose that REJECTS could just as easily be any number of other
words.

One possible compromise would be to use a synonym that is not imagined
to be in use very widely, although I looked up reject in a thesaurus
and didn't feel too great about that idea afterwards. Another idea
would be to have a REJECTING keyword, as the sort of complement of
RETURNING (currently you can still ask for RETURNING, without REJECTS
but with ON DUPLICATE KEY LOCK FOR UPDATE if that happens to make
sense). I think that would work fine, and might actually be more
elegant. Now, REJECTING will probably have to be a reserved keyword,
but that seems less problematic, particularly as RETURNING is itself a
reserved keyword not described by the standard. In my opinion
REJECTING would reinforce the notion of projecting the complement of
what RETURNING would project in the same context.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-02 Thread Andres Freund
On 2013-12-27 14:11:44 -0800, Peter Geoghegan wrote:
 On Fri, Dec 27, 2013 at 12:57 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  I don't think the current syntax the feature implements can be used as
  the sole argument what the feature should be able to support.
 
  If you think from the angle of a async MM replication solution
  replicating a table with multiple unique keys, not having to specify a
  single index we to expect conflicts from, is surely helpful.
 
 Well, you're not totally on your own for something like that with this
 feature. You can project the conflicter's tid, and possibly do a more
 sophisticated recovery, like inspecting the locked row and iterating.

Yea, but in that case I *do* conflict with more than one index and old
values need to stay locked. Otherwise anything resembling
forward-progress guarantee is lost.

 That's probably not at all ideal, but then I can't even imagine what
 the best interface for what you describe here looks like. If there are
 multiple conflicts, do you delete or update some or all of them? How
 do you express that concept from a DML statement?

For my usecases just getting the tid back is fine - it's in C
anyway. But I'd rather be in a position to do it from SQL as well...

If there are multiple conflicts the conflicting row should be
updated. If we didn't release the value locks on the individual indexes,
we can know beforehand whether only one row is going to be affected. If
there really are more than one, error out.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-02 Thread Peter Geoghegan
On Thu, Jan 2, 2014 at 1:49 AM, Andres Freund and...@2ndquadrant.com wrote:
 Well, you're not totally on your own for something like that with this
 feature. You can project the conflicter's tid, and possibly do a more
 sophisticated recovery, like inspecting the locked row and iterating.

 Yea, but in that case I *do* conflict with more than one index and old
 values need to stay locked. Otherwise anything resembling
 forward-progress guarantee is lost.

I'm not sure I understand. In a very real sense they do stay locked.
What is insufficient about locking the definitively visible row with
the value, rather than the value itself? What distinction are you
making? On the first conflict you can delete the row you locked, and
then re-try, possibly further merging some stuff from the just-deleted
row when you next upsert.

It's possible that an earlier unique index value that is unlocked
before row locking proceeds will get a new would-be duplicate after
you're returned a locked row, but it's not obvious that that's a
problem for your use-case (a problem that can't be worked around), or
that promise tuples get you anything better.

 That's probably not at all ideal, but then I can't even imagine what
 the best interface for what you describe here looks like. If there are
 multiple conflicts, do you delete or update some or all of them? How
 do you express that concept from a DML statement?

 For my usecases just getting the tid back is fine - it's in C
 anyway. But I'd rather be in a position to do it from SQL as well...

I believe you can.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-02 Thread Andres Freund
On 2014-01-02 02:20:02 -0800, Peter Geoghegan wrote:
 On Thu, Jan 2, 2014 at 1:49 AM, Andres Freund and...@2ndquadrant.com wrote:
  Well, you're not totally on your own for something like that with this
  feature. You can project the conflicter's tid, and possibly do a more
  sophisticated recovery, like inspecting the locked row and iterating.
 
  Yea, but in that case I *do* conflict with more than one index and old
  values need to stay locked. Otherwise anything resembling
  forward-progress guarantee is lost.
 
 I'm not sure I understand. In a very real sense they do stay locked.
 What is insufficient about locking the definitively visible row with
 the value, rather than the value itself?

Locking the definitely visible row only works if there's a row matching
the index's columns. If the values of the new row don't have
corresponding values in all the indexes you have the same old race
conditions again.
I think to be useful for many cases you really need to be able to ask
for a potentially conflicting row and be sure that if there's none you
are able to insert the row separately.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-02 Thread Robert Haas
On Tue, Dec 31, 2013 at 4:12 AM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Dec 31, 2013 at 12:52 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 1. PromiseTupleInsertionLockAcquire(my xid)
 2. Insert heap tuple
 3. Insert index tuples
 4. Check if conflict happened. Kill the already-inserted tuple on conflict.
 5. PromiseTupleInsertionLockRelease(my xid)

 IOW, the only change to the current patch is that you acquire the new kind
 of lock before starting the insertion, and you release it after you've
 killed the tuple, or you know you're not going to kill it.

 Where does row locking fit in there? - you may need to retry when that
 part is incorporated, of course. What if you have multiple promise
 tuples from a contended attempt to insert a single slot, or multiple
 broken promise tuples across multiple slots or even multiple commands
 in the same xact?

Yeah, it seems like PromiseTupleInsertionLockAcquire should be locking
the tuple, rather than the XID.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-02 Thread Heikki Linnakangas

On 01/02/2014 02:53 PM, Robert Haas wrote:

On Tue, Dec 31, 2013 at 4:12 AM, Peter Geoghegan p...@heroku.com wrote:

On Tue, Dec 31, 2013 at 12:52 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

1. PromiseTupleInsertionLockAcquire(my xid)
2. Insert heap tuple
3. Insert index tuples
4. Check if conflict happened. Kill the already-inserted tuple on conflict.
5. PromiseTupleInsertionLockRelease(my xid)

IOW, the only change to the current patch is that you acquire the new kind
of lock before starting the insertion, and you release it after you've
killed the tuple, or you know you're not going to kill it.


Where does row locking fit in there? - you may need to retry when that
part is incorporated, of course. What if you have multiple promise
tuples from a contended attempt to insert a single slot, or multiple
broken promise tuples across multiple slots or even multiple commands
in the same xact?


You can only have one speculative insertion in progress at a time. After 
you've done all the index insertions and checked that you really didn't 
conflict with anyone, you're not going to go back and kill the tuple 
anymore. After that point, the insertion is not speculation anymore.



Yeah, it seems like PromiseTupleInsertionLockAcquire should be locking
the tuple, rather than the XID.


Well, that would be ideal, because we already have tuple locks. It would 
be nice to use the same concept for this. It's a bit tricky, however. I 
guess the most straightforward way to do it would be to grab a 
heavy-weight lock after you've inserted the tuple, but before releasing 
the buffer lock. I don't immediately see a problem with that, although 
it's a bit scary to acquire a heavy-weight lock while holding a buffer lock.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-02 Thread Robert Haas
On Thu, Jan 2, 2014 at 11:08 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 01/02/2014 02:53 PM, Robert Haas wrote:
 On Tue, Dec 31, 2013 at 4:12 AM, Peter Geoghegan p...@heroku.com wrote:

 On Tue, Dec 31, 2013 at 12:52 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:

 1. PromiseTupleInsertionLockAcquire(my xid)
 2. Insert heap tuple
 3. Insert index tuples
 4. Check if conflict happened. Kill the already-inserted tuple on
 conflict.
 5. PromiseTupleInsertionLockRelease(my xid)

 IOW, the only change to the current patch is that you acquire the new
 kind
 of lock before starting the insertion, and you release it after you've
 killed the tuple, or you know you're not going to kill it.


 Where does row locking fit in there? - you may need to retry when that
 part is incorporated, of course. What if you have multiple promise
 tuples from a contended attempt to insert a single slot, or multiple
 broken promise tuples across multiple slots or even multiple commands
 in the same xact?

 You can only have one speculative insertion in progress at a time. After
 you've done all the index insertions and checked that you really didn't
 conflict with anyone, you're not going to go back and kill the tuple
 anymore. After that point, the insertion is not speculation anymore.

Yeah... but how does someone examining the tuple know that?  We need
to avoid having them block on the promise-tuple insertion lock if
we've reacquired it meanwhile for a new speculative insertion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-02 Thread Peter Geoghegan
I decided to make at least a cursory attempt to measure or
characterize the performance of each of our approaches to value
locking. Being fair here is a non-trivial matter, because of the fact
that upserts can behave quite differently based on the need to insert
or update, lock contention and so on. Also, I knew that anything I
came up with would not be comparing like with like: as things stand,
the btree locking code is more or less correct, and the alternative
exclusion constraint supporting implementation is more or less
incorrect (of course, you may yet describe a way to fix the
unprincipled deadlocking previously revealed by my testcase [1], but
it is far from clear what impact this fix will have on performance).
Still, something is better than nothing.

This was run on a Linux server (Linux 3.8.0-31-generic
#46~precise1-Ubuntu) with these specifications:
https://www.hetzner.de/en/hosting/produkte_rootserver/ex40 .
Everything fits in shared_buffers, but the I/O system is probably the
weakest link here.

To be 100% clear, I am comparing
btreelock_insert_on_dup.v5.2013_12_28.patch.gz [2] with
exclusion_insert_on_dup.2013_12_19.patch.gz [3]. I'm also testing a
third approach, involving avoidance of exclusive buffer locks and
heavyweight locks for upserts in the first phase of speculative
insertion. That patch is unposted, but shows a modest improvement over
[2].

I ran this against the table foo:

pgbench=# \d+ foo
  Table public.foo
 Column |  Type   | Modifiers | Storage  | Stats target | Description
+-+---+--+--+-
 a  | integer | not null  | plain|  |
 b  | integer |   | plain|  |
 c  | text|   | extended |  |
Indexes:
foo_pkey PRIMARY KEY, btree (a)
Has OIDs: no

My custom script was:

\setrandom rec 1 :scale
with rej as(insert into foo(a, b, c) values(:rec, :rec, 'insert') on
duplicate key lock for update returning rejects *) update foo set c =
'update' from rej where foo.a = rej.a;

I specified that each pgbench client in each run should last for 200k
upserts (with 100k possible distinct key values), not that it should
last some number of seconds. The idea is that there is a reasonable
mix of inserts and updates initially, for lower client counts, but
exactly the same number of queries are run for each patch, so as to
normalize the effects of contention across both runs (this sure is
hand-wavy, but likely better than nothing). I'm just looking for
approximate numbers here, and I'm sure that you could find more than
one way to benchmark this feature, with varying degrees of sympathy
towards each of our two approaches to value locking. This benchmark
isn't sympathetic to btree locking at all, because there is a huge
amount of contention for the higher client counts, with 100% of
possible rows updated by the time we're done at 16 clients, for
example.

To compensate somewhat for the relatively low duration of each run, I
take an average-of-5, rather than an average-of-3 as representative
for each client count + run/patch combination.

Full report of results are here:
http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/upsert-cmp/

My executive summary is that the exclusion patch performs about the
same on lower client counts, presumably due to not having the
additional window of btree lock contention. By 8 clients, the
exclusion patch does noticeably better, but it's a fairly modest
improvement.

Forgive me if I'm belaboring the point, but even though I'm
benchmarking the simplest possible upsert statements, had I chosen
small pgbench scale factors (e.g. scales that would see 100 - 1000
possible distinct key values in total) the btree locking
implementation would surely win very convincingly, just because the
alternative implementation would spend almost all of its time
deadlocked, waiting for the deadlock detector to free clients in one
second deadlock_timeout cycles. My goal here was just to put a rough
number on how these two approaches compare, while trying to be as fair
as possible.

I have to wonder about the extent to which the exclusion approach
benefits from holding old value locks. So even if the unprincipled
deadlocking issue can be fixed without much additional cost, it might
be that the simple fact that that approach holds those pseudo value
locks (i.e. old dead rows from previous iterations on the same tuple
slot) indefinitely helps performance, and losing that property alone
will hurt performance, even though it's necessary.

For those that wonder what the effect on multiple unique index would
be, that isn't really all that relevant, since contention on multiple
unique indexes isn't expected with idiomatic usage (though I suppose
an upsert's non-HOT update would have to compete).

[1] 
http://www.postgresql.org/message-id/cam3swzshbe29kpod44cvc3vpzjgmder6k_6fghiszeozgmt...@mail.gmail.com

[2] 

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-02 Thread Peter Geoghegan
On Thu, Jan 2, 2014 at 8:08 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Yeah, it seems like PromiseTupleInsertionLockAcquire should be locking
 the tuple, rather than the XID.

 Well, that would be ideal, because we already have tuple locks. It would be
 nice to use the same concept for this. It's a bit tricky, however. I guess
 the most straightforward way to do it would be to grab a heavy-weight lock
 after you've inserted the tuple, but before releasing the buffer lock. I
 don't immediately see a problem with that, although it's a bit scary to
 acquire a heavy-weight lock while holding a buffer lock.

That's a really big modularity violation. Everything after
RelationPutHeapTuple() but before the buffer unlock in heap_insert()
is currently critical section. I'm not saying that it can't be done,
but it certainly is scary.

We also have heavyweight page locks, currently used by hash indexes.
That approach does not require us to contort the row locking code, and
certainly does not require us to acquire heavyweight locks with buffer
locks already held. I could understand your initial disinclination to
doing things this way, particularly when the unprincipled deadlocking
problem was not well understood, but I think that this must tip the
balance in favor of the approach I advocate. What I've done with
heavyweight locks is a modest, localized, logical expansion on the
existing mechanism, that is easy to reason about, with room for
further optimization in the future, that still has reasonable
performance characteristics today, including I believe better
worst-case latency. Heavyweight locks on btree pages are very well
precedented, if you look beyond Postgres.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-02 Thread Peter Geoghegan
On Thu, Jan 2, 2014 at 2:37 AM, Andres Freund and...@2ndquadrant.com wrote:
 Locking the definitely visible row only works if there's a row matching
 the index's columns. If the values of the new row don't have
 corresponding values in all the indexes you have the same old race
 conditions again.

I still don't get it - perhaps you should break down exactly what you
mean with an example. I'm talking about potentially doing multiple
upserts per row proposed for insertion to handle multiple conflicts,
perhaps with some deletes between upserts, not just one upsert with a
single update part.

 I think to be useful for many cases you really need to be able to ask
 for a potentially conflicting row and be sure that if there's none you
 are able to insert the row separately.

Why? What work do you need to perform after reserving the right to
insert but before inserting? Can't you just upsert resulting in
insert, and then perform that work, potentially deleting the row
inserted if and when you change your mind? Is there any real
difference between what that does for you, and what any particular
variety of promise tuple might do for you?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-02 Thread Peter Geoghegan
On Thu, Jan 2, 2014 at 11:58 AM, Peter Geoghegan p...@heroku.com wrote:
 My executive summary is that the exclusion patch performs about the
 same on lower client counts, presumably due to not having the
 additional window of btree lock contention. By 8 clients, the
 exclusion patch does noticeably better, but it's a fairly modest
 improvement.

I forgot to mention that synchronous_commit was turned off, so as to
eliminate noise that might have been added by commit latency, while
still obligating btree to WAL log everything with an exclusive buffer
lock held.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-31 Thread Heikki Linnakangas

On 12/31/2013 09:18 AM, Peter Geoghegan wrote:

On Sun, Dec 29, 2013 at 9:09 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

While mulling this over further, I had an idea about this: suppose we
marked the tuple in some fashion that indicates that it's a promise
tuple.  I imagine an infomask bit, although the concept makes me wince
a bit since we don't exactly have bit space coming out of our ears
there.  Leaving that aside for the moment, whenever somebody looks at
the tuple with a mind to calling XactLockTableWait(), they can see
that it's a promise tuple and decide to wait on some other heavyweight
lock instead.  The simplest thing might be for us to acquire a
heavyweight lock on the promise tuple before making index entries for
it, and then have callers wait on that instead always instead of
transitioning from the tuple lock to the xact lock.


Yeah, that seems like it should work. You might not even need an infomask
bit for that; just take the other heavyweight lock always before calling
XactLockTableWait(), whether it's a promise tuple or not. If it's not,
acquiring the extra lock is a waste of time but if you're going to sleep
anyway, the overhead of one extra lock acquisition hardly matters.


Are you suggesting that I lock the tuple only (say, through a special
LockPromiseTuple() call), or lock the tuple *and* call
XactLockTableWait() afterwards? You and Robert don't seem to be in
agreement about which here.


I meant the latter, ie. grab the new kind of lock first, then check if 
the tuple is still there, and then call XactLockTableWait() as usual.



The inserter has to acquire the heavyweight lock before releasing the buffer
lock, because otherwise another inserter (or deleter or updater) might see
the tuple, acquire the heavyweight lock, and fall to sleep on
XactLockTableWait(), before the inserter has grabbed the heavyweight lock.
If that race condition happens, you have the original problem again, ie. the
updater unnecessarily waits for the inserting transaction to finish, even
though it already killed the tuple it inserted.


Right. Can you suggest a workaround to the above problems?


Umm, I did, in the next paragraph ;-) :


That seems easy to avoid. If the heavyweight lock uses the
transaction id as the key, just like
XactLockTableInsert/XactLockTableWait, you can acquire it before
doing the insertion.


Let me elaborate that. The idea is to have new heavy-weight lock that's 
just like the transaction lock used by 
XactLockTableInsert/XactLockTableWait, but separate from that. Let's 
call it PromiseTupleInsertionLock. The insertion procedure in INSERT ... 
ON DUPLICATE looks like this:


1. PromiseTupleInsertionLockAcquire(my xid)
2. Insert heap tuple
3. Insert index tuples
4. Check if conflict happened. Kill the already-inserted tuple on conflict.
5. PromiseTupleInsertionLockRelease(my xid)

IOW, the only change to the current patch is that you acquire the new 
kind of lock before starting the insertion, and you release it after 
you've killed the tuple, or you know you're not going to kill it.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-31 Thread Peter Geoghegan
On Tue, Dec 31, 2013 at 12:52 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 1. PromiseTupleInsertionLockAcquire(my xid)
 2. Insert heap tuple
 3. Insert index tuples
 4. Check if conflict happened. Kill the already-inserted tuple on conflict.
 5. PromiseTupleInsertionLockRelease(my xid)

 IOW, the only change to the current patch is that you acquire the new kind
 of lock before starting the insertion, and you release it after you've
 killed the tuple, or you know you're not going to kill it.

Where does row locking fit in there? - you may need to retry when that
part is incorporated, of course. What if you have multiple promise
tuples from a contended attempt to insert a single slot, or multiple
broken promise tuples across multiple slots or even multiple commands
in the same xact?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-31 Thread Peter Geoghegan
On Tue, Dec 31, 2013 at 12:52 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Are you suggesting that I lock the tuple only (say, through a special
 LockPromiseTuple() call), or lock the tuple *and* call
 XactLockTableWait() afterwards? You and Robert don't seem to be in
 agreement about which here.

 I meant the latter, ie. grab the new kind of lock first, then check if the
 tuple is still there, and then call XactLockTableWait() as usual.

I don't follow this either. Through what exact mechanism does the
waiter know that there was a wait on the
PromiseTupleInsertionLockAcquire() call, and so it should not wait on
XactLockTableWait()? Does whatever mechanism you have in mind not have
race conditions?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-30 Thread Heikki Linnakangas

On 12/30/2013 05:57 AM, Peter Geoghegan wrote:

Now, when you actually posted the prototype, I realized that it was
the same basic design that I'd cited in my very first e-mail on the
IGNORE patch (the one that didn't have row locking at all) - nobody
else wanted to do heap insertion first for promise tuples. I read that
2007 thread [1] a long time ago, but that recognition only came when
you posted your first prototype, or perhaps shortly before when you
started participating on list.


Ah, I didn't remember that thread. Yeah, apparently I proposed the exact 
same design back then. Simon complained about the dead tuples being left 
behind, but I don't think that's a big issue with the design we've been 
playing around now; you only end up with dead tuples when two backends 
try to insert the same value concurrently, which shouldn't happen very 
often. Other than that, there wasn't any discussion on whether that's a 
good approach or not.



In short, I think that my approach may be better because it doesn't
conflate row locking with value locking (therefore making it easier
to reason about, maintaining modularity),


You keep saying that, but I don't understand what you mean. With your 
approach, an already-inserted heap tuple acts like a value lock, just 
like in my approach. You have the page-level locks on b-tree pages in 
addition to that, but the heap-tuple based mechanism is there too.



I'm not sure that this is essential to your design, and I'm not sure
what your thoughts are on this, but Andres has defended the idea of
promise tuples that lock old values indefinitely pending the outcome
of another xact where we'll probably want to update, and I think this
is a bad idea. Andres recently seemed less convinced of this than in
the past [2], but I'd like to hear your thoughts. It's very pertinent,
because I think releasing locks needs to be cheap, and rendering old
promise tuples invisible is not cheap.


Well, killing an old promise tuple is not cheap, but it shouldn't happen 
often. In both approaches, what probably matters more is the overhead of 
the extra heavy-weight locking. But this is all speculation, until we 
see some benchmarks.



I said that I have limited enthusiasm for
expanding the modularity violation that exists within the btree AM.
Based on what Andres has said in the recent past on this thread about
the current btree code, that in my opinion, bt_check_unique() doing
[locking heap and btree buffers concurrently] is a bug that needs
fixing [3], can you really blame me? What this patch does not need is
another controversy. It seems pretty reasonable and sane that we'd
implement this by generalizing from the existing mechanism.


_bt_check_unique() is a modularity violation, agreed. Beauty is in the 
eye of the beholder, I guess, but I don't see either patch making that 
any better or worse.



Now, enough with the venting. Back to drawing board, to figure out how best
to fix the deadlock issue with the insert_on_dup-kill-on-conflict-2.patch.
Please help me with that.


I will help you. I'll look at it tomorrow.


Thanks!


[1] 
http://www.postgresql.org/message-id/1172858409.3760.1618.ca...@silverbirch.site

[2] 
http://www.postgresql.org/message-id/20131227075453.gb17...@alap2.anarazel.de

[3] 
http://www.postgresql.org/message-id/20130914221524.gf4...@awork2.anarazel.de


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-30 Thread Andres Freund
On 2013-12-29 19:57:31 -0800, Peter Geoghegan wrote:
 On Sun, Dec 29, 2013 at 8:18 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  My position is not based on a gut feeling. It is based on carefully
  considering the interactions of the constituent parts, plus the
  experience of actually building a working prototype.
 
 
  I also carefully considered all that stuff, and reached a different
  conclusion. Plus I also actually built a working prototype (for some
  approximation of working - it's still a prototype).
 
 Well, clearly you're in agreement with me about unprincipled
 deadlocking. That's what I was referring to here.

Maybe you should describe what you mean with unprincipled. Sure, the
current patch deadlocks, but I don't see anything fundamental,
unresolvable there. So I don't understand what the word unprincipled
means in that sentence..

 Andres recently seemed less convinced of this than in
 the past [2], but I'd like to hear your thoughts.

Not really, I just don't have the time/energy to fight for it (aka write
a POC) at the moment.
I still think any form of promise tuple, be it index, or heap based,
it's a much better, more general, approach than yours. That doesn't
preclude other approaches from being workable though.

 I didn't say that locking index pages is obviously better, and I
 certainly didn't say anything about what you've done being
 fundamentally flawed. I said that I have limited enthusiasm for
 expanding the modularity violation that exists within the btree AM.
 Based on what Andres has said in the recent past on this thread about
 the current btree code, that in my opinion, bt_check_unique() doing
 [locking heap and btree buffers concurrently] is a bug that needs
 fixing [3], can you really blame me?

Uh. But that was said in the context of *your* approach being
flawed. Because it - at that time, I didn't look at the newest version
yet - extended the concept of holding btree page locks across external
operation to far much more code, even including user defined code!. And
you argued that that isn't a problem using _bt_check_unique() as an
argument.

I don't really see why your patch is less of a modularity violation than
Heikki's POC. It's just a different direction.

  PS. In btreelock_insert_on_dup_v5.2013_12_28.patch, the language used in the
  additional text in README is quite difficult to read. Too many difficult
  sentences and constructs for a non-native English speaker like me. I had to
  look up concomitantly in a dictionary and I'm still not sure I understand
  that sentence :-).

+1 on the simpler language front as a fellow non-native speaker.

Personally, the biggest thing I think you could do in favor of your
position, is trying to be a bit more succinct in the mailing list
discussions. I certainly fail at times at that as well, but I really try
to work on it...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-30 Thread Peter Geoghegan
On Mon, Dec 30, 2013 at 8:19 AM, Andres Freund and...@2ndquadrant.com wrote:
 Maybe you should describe what you mean with unprincipled. Sure, the
 current patch deadlocks, but I don't see anything fundamental,
 unresolvable there. So I don't understand what the word unprincipled
 means in that sentence..

Maybe it is resolvable, and maybe it's worth resolving - I never said
that it wasn't, I just said that I doubt the latter. By unprincipled
deadlocking, I mean deadlocking that cannot be reasonably prevented by
a user. Currently, I think that never deadlocking is a reasonable
aspiration for all applications. It's never really necessary. When it
occurs, we can advise users to do simple analysis and application
refactoring to prevent it. With unprincipled deadlocking, we can give
no such advice. The only advice we can give is to stop doing so much
upserting, which is a big departure from how things are today. AFAICT,
no one disagrees with my view that this is bad, and probably
unacceptable.

 Uh. But that was said in the context of *your* approach being
 flawed. Because it - at that time, I didn't look at the newest version
 yet - extended the concept of holding btree page locks across external
 operation to far much more code, even including user defined code!. And
 you argued that that isn't a problem using _bt_check_unique() as an
 argument.

That's a distortion of my position at the time. I acknowledged from
the start that all buffer locking was problematic (e.g. [1]), and was
exploring alternative locking approaches and the merit of the design.
This is obviously the kind of project that needs to be worked at
through iterative prototyping. While arguing that deadlocking would
not occur, I lost sight of the bigger picture. But even if that wasn't
true, I don't know why you feel the need to go on and on about buffer
locking like this months later. Are you trying to be provocative? Can
you *please* stop?

Everyone knows that the btree heap access is a modularity violation.
Even the AM docs says that the heap access is without a doubt ugly
and non-modular. So my original point remains, which is that
expanding that is obviously going to be controversial, and probably
legitimately so. I thought that your earlier marks on
_bt_check_unique() were a good example of this sentiment, but I hardly
need such an example.

 I don't really see why your patch is less of a modularity violation than
 Heikki's POC. It's just a different direction.

My approach does not regress modularity because it doesn't do anything
extra with the heap at all, and only btree insertion routines are
affected. Locking is totally localized to the btree insertion routines
- one .c file. At no other point does anything else have to care, and
it's obvious that this situation won't change in the future when we
decide to do something else cute with infomask bits or whatever.
That's a *huge* distinction.

[1] 
http://www.postgresql.org/message-id/cam3swzr2x4hjg7rjn0k4+hfdgucyx2prep0y3a7nccejeow...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-30 Thread Andres Freund
On 2013-12-30 12:29:22 -0800, Peter Geoghegan wrote:
 But even if that wasn't
 true, I don't know why you feel the need to go on and on about buffer
 locking like this months later. Are you trying to be provocative? Can
 you *please* stop?

ERR? Peter? *You* quoted a statement of mine that only made sense in
it's original context. And I *did* say that the point about buffer
locking applied to the *past* version of the patch.


Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-30 Thread Adrian Klaver

On 12/30/2013 12:45 PM, Andres Freund wrote:

On 2013-12-30 12:29:22 -0800, Peter Geoghegan wrote:

But even if that wasn't
true, I don't know why you feel the need to go on and on about buffer
locking like this months later. Are you trying to be provocative? Can
you *please* stop?


ERR? Peter? *You* quoted a statement of mine that only made sense in
it's original context. And I *did* say that the point about buffer
locking applied to the *past* version of the patch.


Alright this seems to have gone from confusion about the proposal to 
confusion about the confusion. Might I suggest a cooling off period and 
a return to the discussion in possibly a Wiki page where the 
points/counter points could be laid out more efficiently?





Andres




--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-30 Thread Peter Geoghegan
On Mon, Dec 30, 2013 at 7:22 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Ah, I didn't remember that thread. Yeah, apparently I proposed the exact
 same design back then. Simon complained about the dead tuples being left
 behind, but I don't think that's a big issue with the design we've been
 playing around now; you only end up with dead tuples when two backends try
 to insert the same value concurrently, which shouldn't happen very often.

Right, because you check first, which also has a cost, paid in CPU
cycles and memory bandwidth, and buffer lock contention. As opposed to
a cost almost entirely localized to inserters into a single leaf page
per unique index for only an instant. You're checking *all* unique
indexes.

You call check_exclusion_or_unique_constraint() once per unique index
(or EC), and specify to wait on the xact, at least until a conflict is
found. So if you're waiting on an xact, your conclusion that earlier
unique indexes had no conflicts could soon become totally obsolete. So
for non-idiomatic usage, say like the usage Andres in particular cares
about for MM conflict resolution, I worry about the implications of
that. I'm not asserting that it's a problem, but it does seem like
something that's quite hard to reason about. Maybe Andres can comment.

 In short, I think that my approach may be better because it doesn't
 conflate row locking with value locking (therefore making it easier
 to reason about, maintaining modularity),

 You keep saying that, but I don't understand what you mean. With your
 approach, an already-inserted heap tuple acts like a value lock, just like
 in my approach. You have the page-level locks on b-tree pages in addition to
 that, but the heap-tuple based mechanism is there too.

Yes, but that historic behavior isn't really value locking at all.
That's very much like row locking, because there is a row, not the
uncertain intent to try to insert a row. Provided your transaction
commits and the client's transaction doesn't delete the row, the row
is definitely there. For upsert, conflicts may well be the norm, not
the exception.

Value locking is the exclusive lock on the buffer held during
_bt_check_unique(). I'm trying to safely extend that mechanism, to
reach consensus among unique indexes, which to me seems the most
logical and conservative approach. For idiomatic usage, it's only
sensible for there to be a conflict on one unique index, known ahead
of time. If you don't know where the conflict will be, then typically
your DML statement is unpredictable, just like the MySQL feature.
Otherwise, for MM conflict resolution, I think it makes sense to pick
those conflicts off, one at a time, dealing with exactly one row per
conflict. I mean, even with your approach, you're still not dealing
with later conflicts in later unique indexes, right? The fact that you
prevent conflicts on previously non-conflicting unique indexes only,
and, I suppose, not later ones too, seems quite arbitrary.

 Well, killing an old promise tuple is not cheap, but it shouldn't happen
 often. In both approaches, what probably matters more is the overhead of the
 extra heavy-weight locking. But this is all speculation, until we see some
 benchmarks.

Fair enough. We'll know more when we have fixed the exclusion
constraint supporting patch, which will allow us to make a fair
comparison. I'm working on it. Although I'd suggest that having dead
duplicates in indexes where that's avoidable is a cost that isn't
necessarily that easily characterized. I especially don't like that
you're currently doing the UNIQUE_CHECK_PARTIAL deferred unique
constraint thing of always inserting, continuing on for *all* unique
indexes regardless of finding a duplicate. Whatever overhead my
approach may imply around lock contention, clearly the cost to index
scans is zero.

The other thing is that if you're holding old value locks (i.e.
previously inserted btree tuples, from earlier unique indexes) pending
resolving a value conflict, you're holding those value locks
indefinitely pending the completion of the other guy's xact, just in
case there ends up being no conflict, which in general is unlikely. So
in extreme cases, that could be the difference between waiting all day
(on someone sitting on a lock that they very probably have no use
for), and not waiting at all.

 _bt_check_unique() is a modularity violation, agreed. Beauty is in the eye
 of the beholder, I guess, but I don't see either patch making that any
 better or worse.

Clearly the way in which you envisage releasing locks to prevent
unprincipled deadlocking implies that btree has to know more about the
heap, and maybe even that the heap has to know something about btree,
or at least about amcanunique AMs (including possible future
amcanunique AMs that may or may not be well suited to implementing
this the same way).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your 

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-30 Thread Peter Geoghegan
On Sun, Dec 29, 2013 at 9:09 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 While mulling this over further, I had an idea about this: suppose we
 marked the tuple in some fashion that indicates that it's a promise
 tuple.  I imagine an infomask bit, although the concept makes me wince
 a bit since we don't exactly have bit space coming out of our ears
 there.  Leaving that aside for the moment, whenever somebody looks at
 the tuple with a mind to calling XactLockTableWait(), they can see
 that it's a promise tuple and decide to wait on some other heavyweight
 lock instead.  The simplest thing might be for us to acquire a
 heavyweight lock on the promise tuple before making index entries for
 it, and then have callers wait on that instead always instead of
 transitioning from the tuple lock to the xact lock.

 Yeah, that seems like it should work. You might not even need an infomask
 bit for that; just take the other heavyweight lock always before calling
 XactLockTableWait(), whether it's a promise tuple or not. If it's not,
 acquiring the extra lock is a waste of time but if you're going to sleep
 anyway, the overhead of one extra lock acquisition hardly matters.

Are you suggesting that I lock the tuple only (say, through a special
LockPromiseTuple() call), or lock the tuple *and* call
XactLockTableWait() afterwards? You and Robert don't seem to be in
agreement about which here. From here on I assume Robert's idea (only
get the special promise lock where appropriate), because that makes
more sense to me.

I've taken a look at this idea, but got frustrated. You're definitely
going to need an infomask bit for this. Otherwise, how do you
differentiate between a pending promise tuple and a fulfilled
promise tuple (or a tuple that never had anything to do with promises
in the first place)? You'll want to wake up as soon as it becomes
clear that the former is not going to become the latter on the one
hand. On the other hand, you really will want to wait until xact end
on the pending promise tuple when it becomes a fulfilled promise, or
on an already-fulfilled promise tuple, or a plain old tuple. It's
either locking the promise tuple, or locking the xid; never both,
because the combination makes no sense to any case (unless you're
talking about the case where you lock the promise tuple and then later
*somehow* decide that you need to lock the xid as the upserter
releases promise tuple locks directly within ExecInsert() upon
successful insertion).

The fact that your LockPromiseTuple() call didn't find someone else
with the lock does not mean no one ever promised the tuple (assuming
no infomask bit has the relevant info).

Obviously you can't just have upserters hold on to the promise tuple
locks until xact end if the promiser's insertion succeeds, for the
same reason we don't with regular in-memory tuple locks: they're
totally unbounded. So not only are you going to need an infomask
promise bit, you're going to need to go and unset the bit in the event
of a *successful* insertion, so that waiters know to wait on your xact
now when you finally UnlockPromiseTuple() within ExecInsert() to
finish off successful insertion. *And*, all XactLockTableWait()
promise waiters need to go back and check that just-in-case.

This problem illustrates what I mean about conflating row locking with
value locking.

 I think the interlocking with buffer locks and heavyweight locks to
 make that work could be complex.

 Hmm. Can you elaborate?

What I meant is that you should be wary of what you go on to describe below.

 The inserter has to acquire the heavyweight lock before releasing the buffer
 lock, because otherwise another inserter (or deleter or updater) might see
 the tuple, acquire the heavyweight lock, and fall to sleep on
 XactLockTableWait(), before the inserter has grabbed the heavyweight lock.
 If that race condition happens, you have the original problem again, ie. the
 updater unnecessarily waits for the inserting transaction to finish, even
 though it already killed the tuple it inserted.

Right. Can you suggest a workaround to the above problems?
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-29 Thread Heikki Linnakangas

On 12/26/2013 01:27 AM, Peter Geoghegan wrote:

On Wed, Dec 25, 2013 at 6:25 AM, Andres Freund and...@2ndquadrant.com wrote:

And yes, I still think that promise tuples might be a better solution
regardless of the issues you mentioned, but you know what? That doesn't
matter. Me thinking it's the better approach is primarily based on gut
feeling, and I clearly haven't voiced clear enough reasons to convince
you. So you going with your own, possibly more substantiated, gut
feeling is perfectly alright. Unless I go ahead and write a POC of my
own at least ;)


My position is not based on a gut feeling. It is based on carefully
considering the interactions of the constituent parts, plus the
experience of actually building a working prototype.


I also carefully considered all that stuff, and reached a different 
conclusion. Plus I also actually built a working prototype (for some 
approximation of working - it's still a prototype).



Whoa? What? Not convincing everyone is far from it being a useless
discussion. Such an attitude sure is not the way to go to elicit more
feedback.
And it clearly gave you the feedback that most people regard holding
buffer locks across other nontrivial operations, in a potentially
unbounded number, as a fundamental problem.


Uh, I knew that it was a problem all along. While I explored ways of
ameliorating the problem, I specifically stated that we should discuss
the subsystems interactions/design, which you were far too quick to
dismiss. The overall design is far more pertinent than one specific
mechanism. While I certainly welcome your participation, if you want
to be an effective reviewer I suggest examining your own attitude.
Everyone wants this feature.


Frankly I'm pissed off that you dismissed from the start the approach 
that seems much better to me. I gave you a couple of pointers very early 
on: look at the way we do exclusion constraints, and try to do something 
like promise tuples or killing an already-inserted tuple. You dismissed 
that, so I had to write that prototype myself. Even after that, you have 
spent zero effort to resolve the remaining issues with that approach, 
proclaiming that it's somehow fundamentally flawed and that locking 
index pages is obviously better. It's not. Sure, it still needs work, 
but the remaining issue isn't that difficult to resolve. Surely not any 
more complicated than what you did with heavy-weight locks on b-tree 
pages in your latest patch.


Now, enough with the venting. Back to drawing board, to figure out how 
best to fix the deadlock issue with the 
insert_on_dup-kill-on-conflict-2.patch. Please help me with that.


PS. In btreelock_insert_on_dup_v5.2013_12_28.patch, the language used in 
the additional text in README is quite difficult to read. Too many 
difficult sentences and constructs for a non-native English speaker like 
me. I had to look up concomitantly in a dictionary and I'm still not 
sure I understand that sentence :-).


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-29 Thread Heikki Linnakangas

On 12/27/2013 07:11 AM, Peter Geoghegan wrote:

On Thu, Dec 26, 2013 at 5:58 PM, Robert Haas robertmh...@gmail.com wrote:

While mulling this over further, I had an idea about this: suppose we
marked the tuple in some fashion that indicates that it's a promise
tuple.  I imagine an infomask bit, although the concept makes me wince
a bit since we don't exactly have bit space coming out of our ears
there.  Leaving that aside for the moment, whenever somebody looks at
the tuple with a mind to calling XactLockTableWait(), they can see
that it's a promise tuple and decide to wait on some other heavyweight
lock instead.  The simplest thing might be for us to acquire a
heavyweight lock on the promise tuple before making index entries for
it, and then have callers wait on that instead always instead of
transitioning from the tuple lock to the xact lock.


Yeah, that seems like it should work. You might not even need an 
infomask bit for that; just take the other heavyweight lock always 
before calling XactLockTableWait(), whether it's a promise tuple or not. 
If it's not, acquiring the extra lock is a waste of time but if you're 
going to sleep anyway, the overhead of one extra lock acquisition hardly 
matters.



I think the interlocking with buffer locks and heavyweight locks to
make that work could be complex.


Hmm. Can you elaborate?

The inserter has to acquire the heavyweight lock before releasing the 
buffer lock, because otherwise another inserter (or deleter or updater) 
might see the tuple, acquire the heavyweight lock, and fall to sleep on 
XactLockTableWait(), before the inserter has grabbed the heavyweight 
lock. If that race condition happens, you have the original problem 
again, ie. the updater unnecessarily waits for the inserting transaction 
to finish, even though it already killed the tuple it inserted.


That seems easy to avoid. If the heavyweight lock uses the transaction 
id as the key, just like XactLockTableInsert/XactLockTableWait, you can 
acquire it before doing the insertion.


Peter, can you give that a try, please?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-29 Thread Peter Geoghegan
On Sun, Dec 29, 2013 at 8:18 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 My position is not based on a gut feeling. It is based on carefully
 considering the interactions of the constituent parts, plus the
 experience of actually building a working prototype.


 I also carefully considered all that stuff, and reached a different
 conclusion. Plus I also actually built a working prototype (for some
 approximation of working - it's still a prototype).

Well, clearly you're in agreement with me about unprincipled
deadlocking. That's what I was referring to here.

 Frankly I'm pissed off that you dismissed from the start the approach that
 seems much better to me. I gave you a couple of pointers very early on: look
 at the way we do exclusion constraints, and try to do something like promise
 tuples or killing an already-inserted tuple. You dismissed that, so I had to
 write that prototype myself.

Sorry, but that isn't consistent with my recollection at all. The
first e-mail you sent to any of the threads on this was on 2013-11-18.
Your first cut at a prototype was on 2013-11-19, the very next day. If
you think that I ought to have been able to know what you had in mind
based on conversations at pgConf.EU, you're giving me way too much
credit. The only thing vaguely along those lines that I recall you
mentioning to me in Dublin was that you thought I should make this
work with exclusion constraints - I was mostly explaining what I'd
done, and why. I was pleased that you listened courteously, but I
didn't have a clue what you had in mind, not least because to the best
of my recollection you didn't say anything about killing tuples. I'm
not going to swear that you didn't say something like that, since a
lot of things were said in a relatively short period, but it's
certainly true that I was quite curious about how you might go about
incorporating exclusion constraints into this for a while before you
began visibly participating on list.

Now, when you actually posted the prototype, I realized that it was
the same basic design that I'd cited in my very first e-mail on the
IGNORE patch (the one that didn't have row locking at all) - nobody
else wanted to do heap insertion first for promise tuples. I read that
2007 thread [1] a long time ago, but that recognition only came when
you posted your first prototype, or perhaps shortly before when you
started participating on list.

I am unconvinced that making this work for exclusion constraints is
compelling, except for IGNORE, which is sensible but something I would
not weigh heavily at all. In any case, since your implementation
currently doesn't lock more than one row per tuple proposed for
insertion (even though exclusion constraints could have a huge number
of rows to lock when you propose to insert a row with a range covering
a decade, and many rows need to be locked, where with unique indexes
you only ever lock either 0 or 1 rows per slot). I could fairly easily
extend my patch to have it work for exclusion constraints with IGNORE
only.

You didn't try and convince me that what you have proposed is better
than what I have. You immediately described your approach. You did say
some things about buffer locking, but you didn't differentiate between
what was essential to my design, and what was incidental, merely
calling it scary (i.e. you did something similar to what you're
accusing me of here - you didn't dismiss it, but you didn't address it
either). If you look back at the discussion throughout late November
and much of December, it is true that I am consistently critical, but
that was clearly a useful exercise, because now we know there is a
problem to fix.

Why is your approach better? You never actually said. In short, I
think that my approach may be better because it doesn't conflate row
locking with value locking (therefore making it easier to reason
about, maintaining modularity), and that it never bloats, and that
releasing locks is clearly cheap which may matter a lot sometimes. I
don't think the intent exclusive locking of my most recent revision
is problematic for performance - as the LY paper says, exclusive
locking leaf pages only is not that problematic. Extending that in a
way that still allows reads, only for an instant isn't going to be too
problematic.

I'm not sure that this is essential to your design, and I'm not sure
what your thoughts are on this, but Andres has defended the idea of
promise tuples that lock old values indefinitely pending the outcome
of another xact where we'll probably want to update, and I think this
is a bad idea. Andres recently seemed less convinced of this than in
the past [2], but I'd like to hear your thoughts. It's very pertinent,
because I think releasing locks needs to be cheap, and rendering old
promise tuples invisible is not cheap.

I'm not trying to piss anyone off here - I need all the help I can
get. These are important questions, and I'm not posing them to you to
be contrary.

 Even after that, 

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-28 Thread Peter Geoghegan
Attached revision only uses heavyweight page locks across complex
operations. I haven't benchmarked it, but it appears to perform
reasonably well. I haven't attempted to measure a regression for
regular insertion, but offhand it seems likely that any regression
would be well within the noise - more or less immeasurably small. I
won't repeat too much of what is already well commented in the patch.
For those that would like a relatively quick summary of what I've
done, I include inline a new section that I've added to the nbtree
README:

Notes about speculative insertion
-

As an amcanunique AM, the btree implementation is required to support
speculative insertion.  This means that the value locking method
through which unique index enforcement conventionally occurs is
extended and generalized, such that insertion is staggered:  the core
code attempts to get full consensus on whether values proposed for
insertion will not cause duplicate key violations.  Speculative
insertion is only possible for unique index insertion without deferred
uniqueness checking (since speculative insertion into a deferred
unique constraint's index is a contradiction in terms).

For conventional unique index insertion, the Btree implementation
exclusive locks a buffer holding the first page that the value to be
inserted could possibly be on, though only for an instant, during and
shortly after uniqueness verification.  It would not be acceptable to
hold this lock across complex operations for the duration of the
remainder of the first phase of speculative insertion.  Therefore, we
convert this exclusive buffer lock to an exclusive page lock managed
by the lock manager, thereby greatly ameliorating the consequences of
undiscovered deadlocking implementation bugs (though deadlocks are not
expected), and minimizing the impact on system interruptibility, while
not affecting index scans.

It may be useful to informally think of the page lock type acquired by
speculative insertion as similar to an intention exclusive lock, a
type of lock found in some legacy 2PL database systems that use
multi-granularity locking.  A session establishes the exclusive right
to subsequently establish a full write lock, without actually blocking
reads of the page unless and until a lock conversion actually occurs,
at which point both reads and writes are blocked.  Under this mental
model, buffer shared locks can be thought of as intention shared
locks.

As implemented, these heavyweight locks are only relevant to the
insertion case; at no other point are they actually considered, since
insertion is the only way through which new values are introduced.
The first page a value proposed for insertion into an index could be
on represents a natural choke point for our extended, though still
rather limited system of value locking.  Naturally, when we perform a
lock escalation and acquire an exclusive buffer lock, all other
buffer locks on the same buffer are blocked, which is how the
implementation localizes knowledge about the heavyweight lock to
insertion-related routines.  Apart from deletion, which is
concomitantly prevented by holding a pin on the buffer throughout, all
exclusive locking of Btree buffers happen as a direct or indirect
result of insertion, so this approach is sufficient. (Actually, an
exclusive lock may still be acquired without insertion to initialize a
root page, but that hardly matters.)

Note that all value locks (including buffer pins) are dropped
immediately as speculative insertion is aborted, as the implementation
waits on the outcome of another xact, or as insertion proper occurs.
These page-level locks are not intended to last more than an instant.
In general, the protocol for heavyweight locking Btree pages is that
heavyweight locks are acquired before any buffer locks are held, while
the locks are only released after all buffer locks are released.
While not a hard and fast rule, presently we avoid heavyweight page
locking more than one page per unique index concurrently.


Happy new year
-- 
Peter Geoghegan


btreelock_insert_on_dup.v5.2013_12_28.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-27 Thread Andres Freund
Hi,

On 2013-12-25 15:27:36 -0800, Peter Geoghegan wrote:
 Uh, I knew that it was a problem all along. While I explored ways of
 ameliorating the problem, I specifically stated that we should discuss
 the subsystems interactions/design, which you were far too quick to
 dismiss.

Aha?

 The overall design is far more pertinent than one specific
 mechanism. While I certainly welcome your participation, if you want
 to be an effective reviewer I suggest examining your own attitude.
 Everyone wants this feature.

You know what. I don't particularly feel the need to be a reviewer of
this patch. I comment because there didn't seem enough comments on some
parts and because I see some things as problematic. If you don't want
those comments, ok. No problem.

  Holding value locks for more than an instant doesn't make sense. The
  reason is simple: when upserting, we're tacitly only really looking
  for violations on one particular unique index. We just lock them all
  at once because the implementation doesn't know *which* unique index.
  So in actuality, it's really no different from existing
  potential-violation handling for unique indexes, except we have to do
  some extra work in addition to the usual restart from scratch stuff
  (iff we have multiple unique indexes).
 
  I think the point here really is that that you assume that we're always
  only looking for conflicts with one unique index. If that's all we want
  to support - sure, only the keys in that index need to be locked.
  I don't think that's necessarily a given, especially when you just want
  to look at the conflict in detail, without using a subtransaction.
 
 Why would I not assume that? It's perfectly obvious from the syntax
 that you can't do much if you don't know ahead of time where the
 conflict might be.

Because it's a damn useful feature to have. As I said above:
 if that's all we want to support - sure, only the keys in that index
 need to be locked.

I don't think the current syntax the feature implements can be used as
the sole argument what the feature should be able to support.

If you think from the angle of a async MM replication solution
replicating a table with multiple unique keys, not having to specify a
single index we to expect conflicts from, is surely helpful.

  You never stated a reason why you thought it was
  necessary. If you have one now, please share it. Note that I release
  all value locks before row locking too, which is necessary because to
  do any less will cause unprincipled deadlocking, as we've seen.
 
  I can't sensibly comment upon that right now, I'd need to read more code
  to understand what you're doing there.
 
 You could have looked at it back in September, if only you'd given
 these interactions the up-front consideration that they warranted.
 Nothing has changed there at all.

Holy fuck. Peter. Believe it or not, I don't remember all code, comments
 design that I've read at some point. And that sometimes means that I
need to re-read code to judge some things. That I don't have time to
fully do so on the 24th doesn't strike me as particularly suprising.

  Well, you haven't delivered that part yet, that's pretty much my point,
  no?
  I don't think you can easily do this by just additionally taking a new
  kind of heavyweight locks in the new codepaths - that will still allow
  deadlocks with the old codepaths taking only lwlocks. So this is a
  nontrivial sub-project which very well might influence whether the
  approach is deemed acceptable or not.
 
 I have already written the code, and am in the process of cleaning it
 up and gaining confidence that I haven't missed something. It's not
 trivial, and there are some subtleties, but I think that your level of
 skepticism around the difficulty of doing this is excessive.
 Naturally, non-speculative insertion does have to care about the
 heavyweight locks sometimes, but only when a page-level flag is found
 to be set.

Cool then.

  I've been very consistent even in the face of strong criticism. What I
  have now is essentially the same design as back in early September.
 
  Uh. And why's that necessarily a good thing?
 
 It isn't necessarily, but you've taken my comments out of context.

It's demonstrative of the reaction to a good part of the doubts
expressed.

 Can we focus on the design, and how things fit together,
 please?

I don't understand you here. You want people to discuss the high level
design but then criticize us for discussion the high level design when
it involves *possibly* doing things differently. Evaluating approaches
*is* focusing on the design.
And saying that a basic constituent part doesn't work - like using the
buffer locking for value locking, which you loudly doubted for some time
- *is* design critizism. The pointed out weakness very well might be
non-existant because of a misunderstanding, or relatively easily
fixable.

  Minor details I noticed in passing:
  * Your tqual.c bit isn't correct, you're forgetting 

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-27 Thread Peter Geoghegan
On Fri, Dec 27, 2013 at 12:57 AM, Andres Freund and...@2ndquadrant.com wrote:
 You know what. I don't particularly feel the need to be a reviewer of
 this patch. I comment because there didn't seem enough comments on some
 parts and because I see some things as problematic. If you don't want
 those comments, ok. No problem.

I was attempting to make a point about the controversy this generated
in September. Which is: we were talking past each other. It was an
unfortunate, unproductive use of our time - there was some useful
discussion, but in general far more heat than light was generated. I
don't want to play the blame game. I want to avoid that situation in
the future, since it's obvious to me that it was totally avoidable.
Okay?

 I don't think the current syntax the feature implements can be used as
 the sole argument what the feature should be able to support.

 If you think from the angle of a async MM replication solution
 replicating a table with multiple unique keys, not having to specify a
 single index we to expect conflicts from, is surely helpful.

Well, you're not totally on your own for something like that with this
feature. You can project the conflicter's tid, and possibly do a more
sophisticated recovery, like inspecting the locked row and iterating.
That's probably not at all ideal, but then I can't even imagine what
the best interface for what you describe here looks like. If there are
multiple conflicts, do you delete or update some or all of them? How
do you express that concept from a DML statement? Maybe you could
project the conflict rows (with perhaps more than 1 for each row
proposed for insertion) and not the rejected, but it's hard to imagine
making that intuitive or succinct (conflicting rows could be projected
twice or more for separate conflicts, etc). Maybe what I have here is
in practical terms actually a pretty decent approximation of what you
want.

It seems undesirable to give other use-cases baggage around locking
values for an indefinite period, just to make this work for MM
replication, especially since it isn't clear that it actually could be
used effectively by a MM replication solution given the syntax, or any
conceivable alternative syntax or variant.

Could SQL MERGE do this for you? Offhand I don't think that it could.
In fact, I think it would be much less useful than what I've proposed
for this use-case. Its WHEN NOT MATCHED THEN clause doesn't let you
introspect details of what matched and did not match. Furthermore,
though I haven't verified this, off-hand I suspect other systems are
fussy about what you want to merge on. All examples of MERGE use I've
found after a quick Google search shows merging on a simple equi-join
criteria.

 Can we focus on the design, and how things fit together,
 please?

 I don't understand you here. You want people to discuss the high level
 design but then criticize us for discussion the high level design when
 it involves *possibly* doing things differently. Evaluating approaches
 *is* focusing on the design.

I spent several weeks earnestly thrashing out details of Heikki's
design. I am open to any alternative design that meets the criteria I
outlined to Heikki, with which Heikki was in full agreement. One of
those criterions was that unprincipled deadlocking, that would never
occur with equivalent update statements should not occur.
Unfortunately, Heikki's POC patch did not meet that standard. I have
limited enthusiasm for making it or a similar scheme meet that
standard by further burdening the btree AM with additional knowledge
of the heap or row locking. Since in the past you've expressed general
concerns about the modularity violation within the btree AM today, I
assume that you aren't too enthusiastic about that kind of expansion
either.

 Unfortunately I am afraid that it won't be ok to check
 HEAP_XMAX_IS_LOCKED_ONLY xmaxes only - it might have been a no-key
 update + some concurrent key-share lock where the updater aborted. Now,
 I think you only acquire FOR UPDATE locks so far

That's right. Just FOR UPDATE locks.

 but using
 subtransactions you still can get into such a scenario, even involving
 FOR UPDATE locks.

Sigh.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-26 Thread Robert Haas
On Fri, Dec 20, 2013 at 12:39 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Hmm. If I understand the problem correctly, it's that as soon as another
 backend sees the tuple you've inserted and calls XactLockTableWait(), it
 will not stop waiting even if we later decide to kill the already-inserted
 tuple.

 One approach to fix that would be to release and immediately re-acquire the
 transaction-lock, when you kill an already-inserted tuple. Then teach the
 callers of XactLockTableWait() to re-check if the tuple is still alive. I'm
 just waving hands here, but the general idea is to somehow wake up others
 when you kill the tuple.

While mulling this over further, I had an idea about this: suppose we
marked the tuple in some fashion that indicates that it's a promise
tuple.  I imagine an infomask bit, although the concept makes me wince
a bit since we don't exactly have bit space coming out of our ears
there.  Leaving that aside for the moment, whenever somebody looks at
the tuple with a mind to calling XactLockTableWait(), they can see
that it's a promise tuple and decide to wait on some other heavyweight
lock instead.  The simplest thing might be for us to acquire a
heavyweight lock on the promise tuple before making index entries for
it, and then have callers wait on that instead always instead of
transitioning from the tuple lock to the xact lock.

Then we don't need to do anything screwy like releasing our
transaction lock; if we decide to kill the promise tuple, we have a
lock to release that pertains specifically to that tuple.

This might be a dumb idea; I'm just thinking out loud.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-26 Thread Peter Geoghegan
On Thu, Dec 26, 2013 at 5:58 PM, Robert Haas robertmh...@gmail.com wrote:
 While mulling this over further, I had an idea about this: suppose we
 marked the tuple in some fashion that indicates that it's a promise
 tuple.  I imagine an infomask bit, although the concept makes me wince
 a bit since we don't exactly have bit space coming out of our ears
 there.  Leaving that aside for the moment, whenever somebody looks at
 the tuple with a mind to calling XactLockTableWait(), they can see
 that it's a promise tuple and decide to wait on some other heavyweight
 lock instead.  The simplest thing might be for us to acquire a
 heavyweight lock on the promise tuple before making index entries for
 it, and then have callers wait on that instead always instead of
 transitioning from the tuple lock to the xact lock.

I think the interlocking with buffer locks and heavyweight locks to
make that work could be complex. I'm working on a scheme where we
always acquire a page heavyweight lock ahead of acquiring an
equivalent buffer lock, and without any other buffer locks held (for
the critical choke point buffer, to implement value locking). With my
scheme, you may have to retry, but only in the event of page splits
and only at the choke point. In any case, what you describe here
strikes me as an expansion on the already less than ideal modularity
violation within the btree AM (i.e. the way it buffer locks the heap
with its own index buffers concurrently for uniqueness checking). It
might be that the best argument for explicit value locks (implemented
as page heavyweight locks or whatever) is that they are completely
distinct to row locks, and are an abstraction managed entirely by the
AM itself, quite similar to the historic, limited value locking that
unique index enforcement has always used.

If we take Heikki's POC patch as representative of promise tuple
schemes in general, this scheme might not be good enough. Index tuple
insertions don't wait on each other there, and immediately report
conflict. We need pre-checking to get an actual conflict TID in that
patch, with no help from btree available.

I'm generally opposed to making value locks of any stripe be held for
more than an instant (so we should not hold them indefinitely pending
another conflicting xact finishing). It's not just that it's
convenient to my implementation; I also happen to think that it makes
no sense. Should you really lock a value in an earlier unique index
for hours, pending conflicter xact finishing, because you just might
happen to want to insert said value, but probably not?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-26 Thread Andres Freund
On 2013-12-26 21:11:27 -0800, Peter Geoghegan wrote:
 I'm generally opposed to making value locks of any stripe be held for
 more than an instant (so we should not hold them indefinitely pending
 another conflicting xact finishing). It's not just that it's
 convenient to my implementation; I also happen to think that it makes
 no sense. Should you really lock a value in an earlier unique index
 for hours, pending conflicter xact finishing, because you just might
 happen to want to insert said value, but probably not?

There are some advantages: For one, it allows you to guarantee forward
progress if you do it right, which surely isn't a bad propert to
have. For another, it's much more in line with the way normal uniqueness
checks works.
Possibly the disadvantages outweigh the advantages, but that's a far cry
from making no sense.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-25 Thread Andres Freund
Hi,

On 2013-12-24 13:18:36 -0800, Peter Geoghegan wrote:
 On Tue, Dec 24, 2013 at 4:09 AM, Andres Freund and...@2ndquadrant.com wrote:
  I don't really see the lack of review as being crucial at this point. At
  least I have quite some doubts about the approach you've chosen and I
  have voiced it - so have others.
 
 Apparently you haven't been keeping up with this thread. The approach
 that Heikki outlined with his POC patch was demonstrated to deadlock
 in an unprincipled manner - it took me a relatively long time to
 figure this out because I didn't try a simple enough test-case.

So? I still have the fear that you approach will end up being way too
complicated and full of layering violations. I didn't say it's a no-go
(not that I have veto powers, even if I'd consider it one).

And yes, I still think that promise tuples might be a better solution
regardless of the issues you mentioned, but you know what? That doesn't
matter. Me thinking it's the better approach is primarily based on gut
feeling, and I clearly haven't voiced clear enough reasons to convince
you. So you going with your own, possibly more substantiated, gut
feeling is perfectly alright. Unless I go ahead and write a POC of my
own at least ;)

 In hindsight I should have known better than to think that people
 would be willing to defer discussion of a more acceptable value
 locking implementation to consider the interactions between the
 different subsystems, which I felt were critical and warranted
 up-front discussion, a belief which has now been borne out.
 Lesson learned. It's a pity that that's the way things are, because that
 discussion could have been really useful, and saved us all some time.

Whoa? What? Not convincing everyone is far from it being a useless
discussion. Such an attitude sure is not the way to go to elicit more
feedback.
And it clearly gave you the feedback that most people regard holding
buffer locks across other nontrivial operations, in a potentially
unbounded number, as a fundamental problem.

  I don't think there's too much reviewers can do before you've provided a
  POC implementation of real value locking.
 
 I don't see what is functionally insufficient about the value locking
 that you've already seen.

I still think it's fundamentally unacceptable to hold buffer locks
across any additional complex operations. So yes, I think the current
state is fundamentally insufficient.
Note that the case of the existing uniqueness checking already is bad,
but it at least will never run any user defined code in that context,
just HeapTupleSatisfies* and HOT code. So I don't think arguments of the
we're already doing it in uniqueness checking ilk have much merit.

 If you're  still of the opinion that it is necessary to hold value locks of 
 some
 form on earlier unique indexes, as you wait maybe for hours on some
 conflicting xid, then I still disagree with you for reasons recently
 re-stated [1].

I guess you're referring to:

On 2013-12-23 14:59:31 -0800, Peter Geoghegan wrote:
 Holding value locks for more than an instant doesn't make sense. The
 reason is simple: when upserting, we're tacitly only really looking
 for violations on one particular unique index. We just lock them all
 at once because the implementation doesn't know *which* unique index.
 So in actuality, it's really no different from existing
 potential-violation handling for unique indexes, except we have to do
 some extra work in addition to the usual restart from scratch stuff
 (iff we have multiple unique indexes).

I think the point here really is that that you assume that we're always
only looking for conflicts with one unique index. If that's all we want
to support - sure, only the keys in that index need to be locked.
I don't think that's necessarily a given, especially when you just want
to look at the conflict in detail, without using a subtransaction.

 You never stated a reason why you thought it was
 necessary. If you have one now, please share it. Note that I release
 all value locks before row locking too, which is necessary because to
 do any less will cause unprincipled deadlocking, as we've seen.

I can't sensibly comment upon that right now, I'd need to read more code
to understand what you're doing there.

 Other than that, I have no idea what your continued objection to my
 design would be once the buffer level exclusive locks are replaced
 with page level heavyweight locks across complex (though brief)
 operations

Well, you haven't delivered that part yet, that's pretty much my point,
no?
I don't think you can easily do this by just additionally taking a new
kind of heavyweight locks in the new codepaths - that will still allow
deadlocks with the old codepaths taking only lwlocks. So this is a
nontrivial sub-project which very well might influence whether the
approach is deemed acceptable or not.

 (I guess you might not like the visibility stuff or the
 syntax, but that isn't what you're talking about here).

I don't 

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-25 Thread Peter Geoghegan
On Wed, Dec 25, 2013 at 6:25 AM, Andres Freund and...@2ndquadrant.com wrote:
 So? I still have the fear that you approach will end up being way too
 complicated and full of layering violations. I didn't say it's a no-go
 (not that I have veto powers, even if I'd consider it one).

Apart from not resulting in unprincipled deadlocking, it respects the
AM abstraction more than all other approaches outlined. Inserting
tuples as value locks just isn't a great approach, even if you ignore
the fact you must come up with a whole new way to release your value
locks without ending your xact.

 And yes, I still think that promise tuples might be a better solution
 regardless of the issues you mentioned, but you know what? That doesn't
 matter. Me thinking it's the better approach is primarily based on gut
 feeling, and I clearly haven't voiced clear enough reasons to convince
 you. So you going with your own, possibly more substantiated, gut
 feeling is perfectly alright. Unless I go ahead and write a POC of my
 own at least ;)

My position is not based on a gut feeling. It is based on carefully
considering the interactions of the constituent parts, plus the
experience of actually building a working prototype.

 Whoa? What? Not convincing everyone is far from it being a useless
 discussion. Such an attitude sure is not the way to go to elicit more
 feedback.
 And it clearly gave you the feedback that most people regard holding
 buffer locks across other nontrivial operations, in a potentially
 unbounded number, as a fundamental problem.

Uh, I knew that it was a problem all along. While I explored ways of
ameliorating the problem, I specifically stated that we should discuss
the subsystems interactions/design, which you were far too quick to
dismiss. The overall design is far more pertinent than one specific
mechanism. While I certainly welcome your participation, if you want
to be an effective reviewer I suggest examining your own attitude.
Everyone wants this feature.

 I don't see what is functionally insufficient about the value locking
 that you've already seen.

 I still think it's fundamentally unacceptable to hold buffer locks
 across any additional complex operations. So yes, I think the current
 state is fundamentally insufficient.

I said *functionally* insufficient. Buffer locks demonstrably do a
perfectly fine job of value locking. Of course the current state is
insufficient, but I'm talking about design here.

 Holding value locks for more than an instant doesn't make sense. The
 reason is simple: when upserting, we're tacitly only really looking
 for violations on one particular unique index. We just lock them all
 at once because the implementation doesn't know *which* unique index.
 So in actuality, it's really no different from existing
 potential-violation handling for unique indexes, except we have to do
 some extra work in addition to the usual restart from scratch stuff
 (iff we have multiple unique indexes).

 I think the point here really is that that you assume that we're always
 only looking for conflicts with one unique index. If that's all we want
 to support - sure, only the keys in that index need to be locked.
 I don't think that's necessarily a given, especially when you just want
 to look at the conflict in detail, without using a subtransaction.

Why would I not assume that? It's perfectly obvious from the syntax
that you can't do much if you don't know ahead of time where the
conflict might be. It's just like the MySQL feature - the user had
better know where it might be. Now, at least with my syntax as a user
you have some capacity to recover if you consider ahead of time that
you might get it wrong. But clearly rejected, and not conflicting rows
are projected, and multiple conflicts per row are not accounted for.
We lock on the first conflict, which with idiomatic usage will be the
only possible conflict.

That isn't the only reason why value locks don't need to be held for
more than an instant. It's just the most obvious one.

Incidentally, there are many implementation reasons why true value
locking, where value locks are held indefinitely is extremely
difficult. When I referred to an SLRU, I was just exploring the idea
of making value locks (still only held for an instant) more granular.
On closer examination it looks to me like premature optimization,
though.

 You never stated a reason why you thought it was
 necessary. If you have one now, please share it. Note that I release
 all value locks before row locking too, which is necessary because to
 do any less will cause unprincipled deadlocking, as we've seen.

 I can't sensibly comment upon that right now, I'd need to read more code
 to understand what you're doing there.

You could have looked at it back in September, if only you'd given
these interactions the up-front consideration that they warranted.
Nothing has changed there at all.

 Well, you haven't delivered that part yet, that's pretty much my point,
 no?
 I don't 

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-24 Thread Andres Freund
On 2013-12-23 14:59:31 -0800, Peter Geoghegan wrote:
 On Mon, Dec 23, 2013 at 7:49 AM, Robert Haas robertmh...@gmail.com wrote:
  I don't think this is a project to rush through.  We've lived without
  MERGE/UPSERT for several years now, and we can live without it for
  another release cycle while we try to reach agreement on the way
  forward.

Agreed, but I really think it's one of the biggest weaknesses of
postgres at this point.

   I can tell that you're convinced you know the right way
  forward here, and you may be right, but I don't think you've convinced
  everyone else - maybe not even anyone else.

 That may be. Attention from reviewers has been in relatively short
 supply. Not that that isn't always true.

I don't really see the lack of review as being crucial at this point. At
least I have quite some doubts about the approach you've chosen and I
have voiced it - so have others. Whether yours is workable seems to
hinge entirely on whether you can build a scalable, maintainable
value-locking scheme. Besides some thoughts about using slru.c for it I
haven't seen much about the design of that part - might just have missed
it though. Personally I can't ad-lib a design for it, but I haven't
though about it too much.
I don't think there's too much reviewers can do before you've provided a
POC implementation of real value locking.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-24 Thread Peter Geoghegan
On Tue, Dec 24, 2013 at 4:09 AM, Andres Freund and...@2ndquadrant.com wrote:
 I don't really see the lack of review as being crucial at this point. At
 least I have quite some doubts about the approach you've chosen and I
 have voiced it - so have others.

Apparently you haven't been keeping up with this thread. The approach
that Heikki outlined with his POC patch was demonstrated to deadlock
in an unprincipled manner - it took me a relatively long time to
figure this out because I didn't try a simple enough test-case. There
is every reason to think that alternative promise tuple approaches
would behave similarly, short of some very invasive, radical changes
to how we wait on XID share locks that I really don't think are going
to fly. That's why I chose this approach: at no point did anyone have
a plausible alternative that didn't have similar problems, and I
personally saw no alternative. It wasn't really a choice at all.

In hindsight I should have known better than to think that people
would be willing to defer discussion of a more acceptable value
locking implementation to consider the interactions between the
different subsystems, which I felt were critical and warranted
up-front discussion, a belief which has now been borne out. Lesson
learned. It's a pity that that's the way things are, because that
discussion could have been really useful, and saved us all some time.

 I don't think there's too much reviewers can do before you've provided a
 POC implementation of real value locking.

I don't see what is functionally insufficient about the value locking
that you've already seen. I'm currently working towards extended the
buffer locking to use a heavyweight lock held only for an instant, but
potentially across multiple operations, although of course only when
upserting occurs so as to not regress regular insertion. If you're
still of the opinion that it is necessary to hold value locks of some
form on earlier unique indexes, as you wait maybe for hours on some
conflicting xid, then I still disagree with you for reasons recently
re-stated [1]. You never stated a reason why you thought it was
necessary. If you have one now, please share it. Note that I release
all value locks before row locking too, which is necessary because to
do any less will cause unprincipled deadlocking, as we've seen.

Other than that, I have no idea what your continued objection to my
design would be once the buffer level exclusive locks are replaced
with page level heavyweight locks across complex (though brief)
operations (I guess you might not like the visibility stuff or the
syntax, but that isn't what you're talking about here). More granular
value locking might help boost performance, but maybe not even by
much, since we're only locking a single leaf page per unique index
against insertion, and only for an instant. I see no reason to make
the coarser-than-necessary granularity of the value locking a blocker.
Predicate locks on btree leaf pages acquired by SSI are also coarser
than strictly necessary.

[1] 
http://www.postgresql.org/message-id/cam3swzsodumg4899tjc09r2uortyhb0vl9aasc1fz7aw4gs...@mail.gmail.com

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-23 Thread Robert Haas
On Sun, Dec 22, 2013 at 6:42 PM, Peter Geoghegan p...@heroku.com wrote:
 On Fri, Dec 20, 2013 at 11:59 PM, Peter Geoghegan p...@heroku.com wrote:
 I think that the way forward is to refine my design in order to
 upgrade locks from exclusive buffer locks to something else, managed
 by the lock manager but perhaps through an additional layer of
 indirection. As previously outlined, I'm thinking of a new SLRU-based
 granular value locking infrastructure built for this purpose, with
 btree inserters marking pages as having an entry in this table.

 I'm working on a revision that holds lmgr page-level exclusive locks
 (and buffer pins) across multiple operations.  This isn't too
 different to what you've already seen, since they are still only held
 for an instant. Notably, hash indexes currently quickly grab and
 release lmgr page-level locks, though they're the only existing
 clients of that infrastructure. I think on reflection that
 fully-fledged value locking may be overkill, given the fact that these
 locks are only held for an instant, and only need to function as a
 choke point for unique index insertion, and only when upserting
 occurs.

 This approach seems promising. It didn't take me very long to get it
 to a place where it passed a few prior test-cases of mine, with fairly
 varied input, though the patch isn't likely to be posted for another
 few days. I think I can get it to a place where it doesn't regress
 regular insertion at all. I think that that will tick all of the many
 boxes, without unwieldy complexity and without compromising conceptual
 integrity.

 I mention this now because obviously time is a factor. If you think
 there's something I need to do, or that there's some way that I can
 more usefully coordinate with you, please let me know. Likewise for
 anyone else following.

I don't think this is a project to rush through.  We've lived without
MERGE/UPSERT for several years now, and we can live without it for
another release cycle while we try to reach agreement on the way
forward.  I can tell that you're convinced you know the right way
forward here, and you may be right, but I don't think you've convinced
everyone else - maybe not even anyone else.

I wouldn't suggest modeling anything you do on the way hash indexes
using heavyweight locks.  That is a performance disaster, not to
mention being basically a workaround for the fact that whoever wrote
the code originally didn't bother figuring out any way that splitting
a bucket could be accomplished in a crash-safe manner, even in theory.
 If it weren't for that, we'd be using buffer locks there.  That
doesn't necessarily mean that page-level heavyweight locks aren't the
right thing here, but the performance aspects of any such approach
will need to be examined carefully.

To be honest, I am still not altogether sold on any part of this
feature.  I don't like the fact that it violates MVCC - although I
admit that some form of violation is inevitable in any feature in this
area unless we're content to live with many serialization failures, I
don't like the particular way it violates MVCC, I don't like the
syntax (returns rejects? blech), and I don't like the fact that
getting the locking right, or even getting the semantics right, seems
to be so darned hard.  I think we're in real danger of building
something that will be too complex, or just too weird, for users to
use, and too complex to maintain as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-23 Thread Peter Geoghegan
On Mon, Dec 23, 2013 at 7:49 AM, Robert Haas robertmh...@gmail.com wrote:
 I don't think this is a project to rush through.  We've lived without
 MERGE/UPSERT for several years now, and we can live without it for
 another release cycle while we try to reach agreement on the way
 forward.  I can tell that you're convinced you know the right way
 forward here, and you may be right, but I don't think you've convinced
 everyone else - maybe not even anyone else.

That may be. Attention from reviewers has been in relatively short
supply. Not that that isn't always true.

 I wouldn't suggest modeling anything you do on the way hash indexes
 using heavyweight locks.  That is a performance disaster, not to
 mention being basically a workaround for the fact that whoever wrote
 the code originally didn't bother figuring out any way that splitting
 a bucket could be accomplished in a crash-safe manner, even in theory.
  If it weren't for that, we'd be using buffer locks there.

Having looked at the code for the first time recently, I'd agree that
hash indexes are a disaster. A major advantage of The Lehman and Yao
Algorithm, as prominently noted in the paper, is that exclusive locks
are only acquired on leaf pages to increase concurrency. Since I only
propose to extend this to a heavyweight page lock, and still only for
an instant, it seems reasonable to assume that the performance will be
acceptable for an initial version of this. It's not as if most places
will have to pay any heed to this heavyweight lock - index scans and
non-upserting inserts are generally unaffected. We can later optimize
performance as we measure a need to do so. Early indications are that
the performance is reasonable.

Holding value locks for more than an instant doesn't make sense. The
reason is simple: when upserting, we're tacitly only really looking
for violations on one particular unique index. We just lock them all
at once because the implementation doesn't know *which* unique index.
So in actuality, it's really no different from existing
potential-violation handling for unique indexes, except we have to do
some extra work in addition to the usual restart from scratch stuff
(iff we have multiple unique indexes).

 To be honest, I am still not altogether sold on any part of this
 feature.  I don't like the fact that it violates MVCC - although I
 admit that some form of violation is inevitable in any feature in this
 area unless we're content to live with many serialization failures, I
 don't like the particular way it violates MVCC

Discussions around visibility issues have not been very useful. As
I've said, I don't like the term MVCC violation, because it's
suggestive of some classical, codified definition of MVCC, a
definition that doesn't actually exist anywhere, even in research
papers, AFAICT. So while I understand your concerns around the
modifications to HeapTupleSatisfiesMVCC(), and while I respect that we
need to be mindful of the performance impact, my position is that if
that really is what we need to do, we might as well be honest about
it, and express intent succinctly and directly. This is a position
that is orthogonal to the proposed syntax, even if that is convenient
to my patch. It's already been demonstrated that yes, the MVCC
violation can be problematic when we call HeapTupleSatisfiesUpdate(),
which is a bug that was fixed by making another modest modification to
HeapTupleSatisfiesUpdate(). It is notable that that bug would have
still occurred had a would-be-HTSMVCC-invisible tuple been passed
through any other means. What problem, specifically, do you envisage
avoiding by doing it some other way? What other way do you have in
mind?

We invested huge effort into more granular FK locking when we had a
few complaints about it. I wouldn't be surprised if that effort
modestly regressed HeapTupleSatisfiesMVCC(). On the other hand, this
feature has been in very strong demand for over a decade, and has a
far smaller code footprint. I don't want to denigrate the FK locking
stuff in any way - it is a fantastic piece of work - but it's
important to have a sense of proportion about these things. In order
to make visibility work in the way we require, we're almost always
just doing additional checking of infomask bits, and the t_infomask
variable is probably already in a CPU register (this is a huge
simplification, but is essentially true). Like you, I have noted that
HeapTupleSatisfiesMVCC() is a fairly hot routine during profiling
before, but it's not *that* hot.

It's understandable that you raise these points, but from my
perspective it's hard to address your concerns without more concrete
objections.

 I don't like the
 syntax (returns rejects? blech)

I suppose it isn't ideal in some ways. On the other hand, it is
extremely flexible, with many of the same advantages of SQL MERGE.
Importantly, it will facilitate merging as part of conflict resolution
on multi-master replication systems, which I think is of considerable

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-23 Thread Robert Haas
On Mon, Dec 23, 2013 at 5:59 PM, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Dec 23, 2013 at 7:49 AM, Robert Haas robertmh...@gmail.com wrote:
 I don't think this is a project to rush through.  We've lived without
 MERGE/UPSERT for several years now, and we can live without it for
 another release cycle while we try to reach agreement on the way
 forward.  I can tell that you're convinced you know the right way
 forward here, and you may be right, but I don't think you've convinced
 everyone else - maybe not even anyone else.

 That may be. Attention from reviewers has been in relatively short
 supply. Not that that isn't always true.

I think concrete concerns about usability have largely been
subordinated to abstruse discussions about locking protocols.  A
discussion strictly about what syntax people would consider
reasonable, perhaps on another thread, might elicit broader
participation (although this week might not be the right time to try
to attract an audience).

 Having looked at the code for the first time recently, I'd agree that
 hash indexes are a disaster. A major advantage of The Lehman and Yao
 Algorithm, as prominently noted in the paper, is that exclusive locks
 are only acquired on leaf pages to increase concurrency. Since I only
 propose to extend this to a heavyweight page lock, and still only for
 an instant, it seems reasonable to assume that the performance will be
 acceptable for an initial version of this. It's not as if most places
 will have to pay any heed to this heavyweight lock - index scans and
 non-upserting inserts are generally unaffected. We can later optimize
 performance as we measure a need to do so. Early indications are that
 the performance is reasonable.

OK.

 To be honest, I am still not altogether sold on any part of this
 feature.  I don't like the fact that it violates MVCC - although I
 admit that some form of violation is inevitable in any feature in this
 area unless we're content to live with many serialization failures, I
 don't like the particular way it violates MVCC

 Discussions around visibility issues have not been very useful. As
 I've said, I don't like the term MVCC violation, because it's
 suggestive of some classical, codified definition of MVCC, a
 definition that doesn't actually exist anywhere, even in research
 papers, AFAICT.

I don't know whether or not that's literally true, but like Potter
Stewart, I don't think there's any real ambiguity about the underlying
concept.  The concepts of read-write, write-read, and write-write
dependencies between transactions are well-described in textbooks such
as Jim Gray's Transaction Processing: Concepts and Techniques and this
paper on MVCC:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.142.552rep=rep1type=pdf

I think the definition of an MVCC violation is that a snapshot sees
the effects of a transaction which committed after that snapshot was
taken.  And maybe it's good and right that this patch is introducing a
new way for that to happen, or maybe it's not, but the point is that
we get to decide.

 I've been very consistent even in the face of strong criticism. What I
 have now is essentially the same design as back in early September.
 After the initial ON DUPLICATE KEY IGNORE patch in August, I soon
 realized that value locking and row locking could not be sensibly
 considered in isolation, and over the objections of others pushed
 ahead with integrating the two. I believe now as I believed then that
 value locks need to be cheap to release (or it at least needs to be
 possible), and that it was okay to drop all value locks when we need
 to deal with a possible conflict/getting an xid shared lock - if those
 unlocked pages have separate conflicts on our next attempt, the
 feature is being badly misused (for upserting) or it doesn't matter
 because we only need one conclusive No answer (for IGNOREing, but
 also for upserting).

I'm not saying that you haven't been consistent, or that you've done
anything wrong at all.  I'm just saying that the default outcome is
that we change nothing, and the fact that nobody's been able to
demonstrate an approach is clearly superior to what you've proposed
does not mean we have to accept what you've proposed.  I am not
necessarily advocating for rejecting your proposed approach, although
I do have concerns about it, but I think it is clear that it is not
backed by any meaningful amount of consensus.  Maybe that will change
in the next two months, and maybe it won't.  If it doesn't, whether
through any fault of yours or not, I don't think this is going in.  If
this is all perfectly clear to you already, then I apologize for
belaboring the point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-22 Thread Peter Geoghegan
On Fri, Dec 20, 2013 at 11:59 PM, Peter Geoghegan p...@heroku.com wrote:
 I think that the way forward is to refine my design in order to
 upgrade locks from exclusive buffer locks to something else, managed
 by the lock manager but perhaps through an additional layer of
 indirection. As previously outlined, I'm thinking of a new SLRU-based
 granular value locking infrastructure built for this purpose, with
 btree inserters marking pages as having an entry in this table.

I'm working on a revision that holds lmgr page-level exclusive locks
(and buffer pins) across multiple operations.  This isn't too
different to what you've already seen, since they are still only held
for an instant. Notably, hash indexes currently quickly grab and
release lmgr page-level locks, though they're the only existing
clients of that infrastructure. I think on reflection that
fully-fledged value locking may be overkill, given the fact that these
locks are only held for an instant, and only need to function as a
choke point for unique index insertion, and only when upserting
occurs.

This approach seems promising. It didn't take me very long to get it
to a place where it passed a few prior test-cases of mine, with fairly
varied input, though the patch isn't likely to be posted for another
few days. I think I can get it to a place where it doesn't regress
regular insertion at all. I think that that will tick all of the many
boxes, without unwieldy complexity and without compromising conceptual
integrity.

I mention this now because obviously time is a factor. If you think
there's something I need to do, or that there's some way that I can
more usefully coordinate with you, please let me know. Likewise for
anyone else following.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-21 Thread Peter Geoghegan
On Fri, Dec 20, 2013 at 1:12 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 There are probably other ways to make that general idea work though.  I
 didn't follow this thread carefully, but is the idea that there would be
 many promise tuples live at any one time, or only one?  Because if
 there's only one, or a very limited number, it might be workable to
 sleep on that tuple's lock instead of the xact's lock.

 Only one.

 heap_update() and heap_delete() also grab a heavy-weight lock on the tuple,
 before calling XactLockTableWait(). _bt_doinsert() does not, but it could.
 Perhaps we can take advantage of that.

I am skeptical of this approach. It sounds like you're saying that
you'd like to intersperse value and row locking, such that you'd
definitely get a row lock on your first attempt after detecting a
duplicate. With respect, I dismissed this months ago. Why should it be
okay to leave earlier, actually inserted index tuples (from earlier
unique indexes) behind? You still have to delete those (that is, the
heap tuple) on conflict, and what you outline is sufficiently
hand-wavey for me to strongly doubt the feasibility of making earlier
btree tuples not behave as pseudo value locks ***in all relevant
contexts***. How exactly do you determine that row versions were
*deleted*? How do you sensibly differentiate between updates and
deletes, or do you? What of lock starvation hazards? Perhaps I've
misunderstood, but detecting and reasoning about deletedness like this
seems like a major modularity violation, even by the standards of the
btree AM. Do XactLockTableWait() callers have to re-check
tuple-deletedness both before and after their XactLockTableWait()
call? For regular non-upserting inserters too?

I think that the way forward is to refine my design in order to
upgrade locks from exclusive buffer locks to something else, managed
by the lock manager but perhaps through an additional layer of
indirection. As previously outlined, I'm thinking of a new SLRU-based
granular value locking infrastructure built for this purpose, with
btree inserters marking pages as having an entry in this table. That
doesn't sound like much fun to go and implement, but it's reasonably
well precedented, if authoritative transaction processing papers are
anything to go by, as previously noted [1].

I hate to make a plausibility argument, particularly at this late
stage, but: no one, myself included, has managed to find any holes in
the semantics implied by my implementation in the last few months. It
is relatively easy to reason about, and doesn't leave the idea of an
amcanunique abstraction in tatters, nor does it expand the already
byzantine tuple locking infrastructure in a whole new direction. These
are strong advantages. It really isn't hard to imagine a totally sound
implementation of the same idea -- what I do with buffer locks, but
without actual buffer locks and their obvious attendant disadvantages,
and without appreciably regressing the performance of non-upsert
use-cases. AFAICT, there is way less uncertainty around doing this,
unless you think that unprincipled deadlocking is morally defensible,
which I don't believe you or anyone else does.

[1] 
http://www.postgresql.org/message-id/cam3swzq9xmm8bzynx3memy1amqckqxuusy8t1ifqzz999u_...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-20 Thread Heikki Linnakangas

On 12/20/2013 06:06 AM, Peter Geoghegan wrote:

On Wed, Dec 18, 2013 at 8:39 PM, Peter Geoghegan p...@heroku.com wrote:

Empirically, retrying because ExecInsertIndexTuples() returns some
recheckIndexes occurs infrequently, so maybe that makes all of this
okay. Or maybe it happens infrequently *because* we don't give up on
insertion when it looks like the current iteration is futile. Maybe
just inserting into every unique index, and then blocking on an xid
within ExecCheckIndexConstraints(), works out fairly and performs
reasonably in all common cases. It's pretty damn subtle, though, and I
worry about the worst case performance, and basic correctness issues
for these reasons.


I realized that it's possible to create the problem that I'd
previously predicted with promise tuples [1] some time ago, that are
similar in some regards to what Heikki has here. At the time, Robert
seemed to agree that this was a concern [2].

I have a very simple testcase attached, much simpler that previous
testcases, that reproduces deadlock for the patch
exclusion_insert_on_dup.2013_12_12.patch.gz at scale=1 frequently, and
occasionally when scale=10 (for tiny, single-statement transactions).
With scale=100, I can't get it to deadlock on my laptop (60 clients in
all cases), at least in a reasonable time period. With the patch
btreelock_insert_on_dup.2013_12_12.patch.gz, it will never deadlock,
even with scale=1, simply because value locks are not held on to
across row locking. This is why I characterized the locking as
opportunistic on several occasions in months past.

The test-case is actually much simpler than the one I describe in [1],
and much simpler than all previous test-cases, as there is only one
unique index, though the problem is essentially the same. It is down
to old value locks held across retries - with exclusion_..., we
can't *stop* locking things from previous locking attempts (where a
locking attempt is btree insertion with the UNIQUE_CHECK_PARTIAL
flag), because dirty snapshots still see
inserted-then-deleted-in-other-xact tuples. This deadlocking seems
unprincipled and unjustified, which is a concern that I had all along,
and a concern that Heikki seemed to share more recently [3]. This is
why I felt strongly all along that value locks ought to be cheap to
both acquire and _release_, and it's unfortunate that so much time was
wasted on tangential issues, though I do accept some responsibility
for that.


Hmm. If I understand the problem correctly, it's that as soon as another 
backend sees the tuple you've inserted and calls XactLockTableWait(), it 
will not stop waiting even if we later decide to kill the 
already-inserted tuple.


One approach to fix that would be to release and immediately re-acquire 
the transaction-lock, when you kill an already-inserted tuple. Then 
teach the callers of XactLockTableWait() to re-check if the tuple is 
still alive. I'm just waving hands here, but the general idea is to 
somehow wake up others when you kill the tuple.


We could make use of that facility to also let others to proceed, if you 
delete a tuple in the same transaction that you insert it. It's a corner 
case, not worth much on its own, but I think it would fall out of the 
above machinery for free, and be an easier way to test it than inducing 
deadlocks with ON DUPLICATE.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-20 Thread Robert Haas
On Fri, Dec 20, 2013 at 3:39 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Hmm. If I understand the problem correctly, it's that as soon as another
 backend sees the tuple you've inserted and calls XactLockTableWait(), it
 will not stop waiting even if we later decide to kill the already-inserted
 tuple.

 One approach to fix that would be to release and immediately re-acquire the
 transaction-lock, when you kill an already-inserted tuple. Then teach the
 callers of XactLockTableWait() to re-check if the tuple is still alive.

That particular mechanism sounds like a recipe for unintended consequences.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-20 Thread Alvaro Herrera
Robert Haas escribió:
 On Fri, Dec 20, 2013 at 3:39 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  Hmm. If I understand the problem correctly, it's that as soon as another
  backend sees the tuple you've inserted and calls XactLockTableWait(), it
  will not stop waiting even if we later decide to kill the already-inserted
  tuple.
 
  One approach to fix that would be to release and immediately re-acquire the
  transaction-lock, when you kill an already-inserted tuple. Then teach the
  callers of XactLockTableWait() to re-check if the tuple is still alive.
 
 That particular mechanism sounds like a recipe for unintended consequences.

Yep, what I thought too.

There are probably other ways to make that general idea work though.  I
didn't follow this thread carefully, but is the idea that there would be
many promise tuples live at any one time, or only one?  Because if
there's only one, or a very limited number, it might be workable to
sleep on that tuple's lock instead of the xact's lock.

Another thought is to have a different LockTagType that signals a
transaction that's doing the INSERT/ON DUPLICATE thingy, and remote
backends sleep on that instead of the regular transaction lock.  That
different lock type could be released and reacquired as proposed by
Heikki above without danger of unintended consequences.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-20 Thread Heikki Linnakangas

On 12/20/2013 10:56 PM, Alvaro Herrera wrote:

Robert Haas escribió:

On Fri, Dec 20, 2013 at 3:39 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

Hmm. If I understand the problem correctly, it's that as soon as another
backend sees the tuple you've inserted and calls XactLockTableWait(), it
will not stop waiting even if we later decide to kill the already-inserted
tuple.

One approach to fix that would be to release and immediately re-acquire the
transaction-lock, when you kill an already-inserted tuple. Then teach the
callers of XactLockTableWait() to re-check if the tuple is still alive.


That particular mechanism sounds like a recipe for unintended consequences.


Yep, what I thought too.

There are probably other ways to make that general idea work though.  I
didn't follow this thread carefully, but is the idea that there would be
many promise tuples live at any one time, or only one?  Because if
there's only one, or a very limited number, it might be workable to
sleep on that tuple's lock instead of the xact's lock.


Only one.

heap_update() and heap_delete() also grab a heavy-weight lock on the 
tuple, before calling XactLockTableWait(). _bt_doinsert() does not, but 
it could. Perhaps we can take advantage of that.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-20 Thread Peter Geoghegan
On Fri, Dec 20, 2013 at 12:39 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Hmm. If I understand the problem correctly, it's that as soon as another
 backend sees the tuple you've inserted and calls XactLockTableWait(), it
 will not stop waiting even if we later decide to kill the already-inserted
 tuple.

Forgive me for being pedantic, but I wouldn't describe it that way.
Quite simply, the tuples speculatively inserted (and possibly later
deleted) are functionally value locks, that presently cannot be easily
released (so my point is it doesn't matter if you're currently waiting
on XactLockTableWait() or are just about to). I have to wonder about
the performance implications of fixing this, even if we suppose the
fix is itself inexpensive. The current approach probably benefits from
not having to re-acquire value locks from previous attempts, since
everyone still has to care about value locks from our previous
attempts.

The more I think about it, the more opposed I am to letting this
slide, which is an notion I had considered last night, if only because
MySQL did so for many years. This is qualitatively different from
other cases where we deadlock. Even back when we exclusive locked rows
as part of foreign key enforcement, I think it was more or less always
possible to do an analysis of the dependencies that existed, to ensure
that locks were acquired in a predictable order so that deadlocking
could not occur. Now, maybe that isn't practical for an entire app,
but it is practical to do in a localized way as problems emerge. In
contrast, if we allowed unprincipled deadlocking, the only advice we
could give is stop doing so much upserting.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-18 Thread Peter Geoghegan
On Thu, Dec 12, 2013 at 4:18 PM, Peter Geoghegan p...@heroku.com wrote:
 Both of these revisions have identical ad-hoc test cases included as
 new files - see testcase.sh and upsert.sql. My patch doesn't have any
 unique constraint violations, and has pretty consistent performance,
 while yours has many unique constraint violations. I'd like to hear
 your thoughts on the testcase, and the design implications.

I withdraw the test-case. Both approaches behave similarly if you look
for long enough, and that's okay.

I also think that changes to HeapTupleSatisfiesUpdate() are made
unnecessary by recent bug fixes to that function. The test case
previously described [1] that broke that is no longer recreatable, at
least so far.

Do you think that we need to throw a serialization failure within
ExecLockHeapTupleForUpdateSpec() iff heap_lock_tuple() returns
HeapTupleInvisible and IsolationUsesXactSnapshot()? Also, I'm having a
hard time figuring out a good choke point to catch MVCC snapshots
availing of our special visibility rule where they should not due to
IsolationUsesXactSnapshot(). It seems sufficient to continue to assume
that  Postgres won't attempt to lock any tid invisible under
conventional MVCC rules in the first place, except within
ExecLockHeapTupleForUpdateSpec(), but what do we actually do within
ExecLockHeapTupleForUpdateSpec()? I'm thinking of a new tqual.c
routine concerning the tuple being in the future that we re-check when
IsolationUsesXactSnapshot(). That's not very modular, though. Maybe
we'd go through heapam.c.

I think it doesn't matter that what now constitute MVCC snapshots
(with the new, special reach into the future rule) have that new
rule, for the purposes of higher isolation levels, because we'll have
a serialization failure within ExecLockHeapTupleForUpdateSpec() before
this is allowed to become a problem. In order for the new rule to be
relevant, we'd have to be the Xact to lock in the first place, and as
an xact in non-read-committed mode, we'd be sure to call the new
tqual.c in the future routine or whatever. Only upserters can lock a
row in the future, so it is the job of upserters to care about this
special case.

Incidentally, I tried to rebase recently, and saw some shift/reduce
conflicts due to 1b4f7f93b4693858cb983af3cd557f6097dab67b, Allow
empty target list in SELECT. The fix for that is not immediately
obvious.

So I think we should proceed with the non-conclusive-check-first
approach (if only on pragmatic grounds), but even now I'm not really
sure. I think there might be unprincipled deadlocking should
ExecInsertIndexTuples() fail to be completely consistent about its
ordering of insertion - the use of dirty snapshots (including as part
of conventional !UNIQUE_CHECK_PARTIAL unique index enforcement) plays
a part in this risk. Roughly speaking, heap_delete() doesn't render
the tuple immediately invisible to some-other-xact's dirty snapshot
[2], and I think that could have unpleasant interactions, even if it
is also beneficial in some ways. Our old, dead tuples from previous
attempts stick around, and function as value locks to everyone else,
since for example _bt_check_unique() cares about visibility having
merely been affected, which is grounds for blocking. More
counter-intuitive still, we go ahead with value locking (i.e. btree
UNIQUE_CHECK_PARTIAL tuple insertion originating from the main
speculative ExecInsertIndexTuples() call) even though we already know
that we will delete the corresponding heap row (which, as noted, still
satisfies HeapTupleSatisfiesDirty() and so is value-lock-like).

Empirically, retrying because ExecInsertIndexTuples() returns some
recheckIndexes occurs infrequently, so maybe that makes all of this
okay. Or maybe it happens infrequently *because* we don't give up on
insertion when it looks like the current iteration is futile. Maybe
just inserting into every unique index, and then blocking on an xid
within ExecCheckIndexConstraints(), works out fairly and performs
reasonably in all common cases. It's pretty damn subtle, though, and I
worry about the worst case performance, and basic correctness issues
for these reasons. The fact that deferred unique indexes also use
UNIQUE_CHECK_PARTIAL is cold comfort -- that only ever has to through
an error on conflict, and only once. We haven't earned the right to
lock *all* values in all unique indexes, but kind of do so anyway in
the event of an insertion conflicted after pre-check.

Another concern that bears reiterating is: I think making the
lock-for-update case work for exclusion constraints is a lot of
additional complexity for a very small return.

Do you think it's worth optimizing ExecInsertIndexTuples() to avoid
futile non-unique/exclusion constrained index tuple insertion?

[1] 
http://www.postgresql.org/message-id/CAM3SWZS2--GOvUmYA2ks_aNyfesb0_H6T95_k8+wyx7Pi=c...@mail.gmail.com

[2] 

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-14 Thread Peter Geoghegan
On Fri, Dec 13, 2013 at 4:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I spent a little bit of time looking at btreelock_insert_on_dup.  AFAICT
 it executes FormIndexDatum() for later indexes while holding assorted
 buffer locks in earlier indexes.  That really ain't gonna do, because in
 the case of an expression index, FormIndexDatum will execute nearly
 arbitrary user-defined code, which might well result in accesses to those
 indexes or others.  What we'd have to do is refactor so that all the index
 tuple values get computed before we start to insert any of them.  That
 doesn't seem impossible, but it implies a good deal more refactoring than
 has been done here.

We were proceeding on the basis that what I'd done, if deemed
acceptable in principle, could eventually be replaced by an
alternative value locking implementation that more or less similarly
extends the limited way in which value locking already occurs (i.e.
unique index enforcement's buffer locking), but without the downsides.
While I certainly appreciate your input, I still think that there is a
controversy about what implementation gets us the most useful
semantics, and I think we should now focus on resolving it. I am not
sure that Heikki's approach is functionally equivalent to mine. At the
very least, I think the trade-off of doing one or the other should be
well understood.

 Once we do that, I wonder if we couldn't get rid of the LWLockWeaken/
 Strengthen stuff.  That scares the heck out of me; I think it's deadlock
 problems waiting to happen.

There are specific caveats around using those. I think that they could
be useful elsewhere, but are likely to only ever have a few clients.
As previously mentioned, the same semantics appear in other similar
locking primitives in other domains, so fwiw it really doesn't strike
me as all that controversial. I agree that their *usage* is not
acceptable as-is. I've only left the usage in the patch to give us
some basis for reasoning about the performance on mixed workloads for
comparative purposes. Perhaps I shouldn't have even done that, to
better focus reviewer attention on the semantics implied by each
implementation.

 Also, the lack of any doc updates makes it hard to review this.  I can
 see that you don't want to touch the user-facing docs until the syntax
 is agreed on, but at the very least you ought to produce real updates
 for the indexam API spec, since you're changing that significantly.

I'll certainly do that in any future revision.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-13 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes:
 I attached two revisions - one of my patch (btreelock_insert_on_dup)
 and one of your alternative design (exclusion_insert_on_dup).

I spent a little bit of time looking at btreelock_insert_on_dup.  AFAICT
it executes FormIndexDatum() for later indexes while holding assorted
buffer locks in earlier indexes.  That really ain't gonna do, because in
the case of an expression index, FormIndexDatum will execute nearly
arbitrary user-defined code, which might well result in accesses to those
indexes or others.  What we'd have to do is refactor so that all the index
tuple values get computed before we start to insert any of them.  That
doesn't seem impossible, but it implies a good deal more refactoring than
has been done here.

Once we do that, I wonder if we couldn't get rid of the LWLockWeaken/
Strengthen stuff.  That scares the heck out of me; I think it's deadlock
problems waiting to happen.

Another issue is that the number of buffer locks being held doesn't seem
to be bounded by anything much.  The current LWLock infrastructure has a
hard limit on how many lwlocks can be held per backend.

Also, the lack of any doc updates makes it hard to review this.  I can
see that you don't want to touch the user-facing docs until the syntax
is agreed on, but at the very least you ought to produce real updates
for the indexam API spec, since you're changing that significantly.

BTW, so far as the syntax goes, I'm quite distressed by having to make
REJECTS into a fully-reserved word.  It's not reserved according to the
standard, and it seems pretty likely to be something that apps might be
using as a table or column name.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-12 Thread Heikki Linnakangas
What's the status of this patch? I posted my version using a quite 
different approach than your original patch. You did some testing of 
that, and ran into unrelated bugs. Have they been fixed now?


Where do we go from here? Are you planning to continue based on my 
proof-of-concept patch, fixing the known issues with that? Or do you 
need more convincing?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-12 Thread Peter Geoghegan
On Thu, Dec 12, 2013 at 1:23 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 What's the status of this patch? I posted my version using a quite different
 approach than your original patch. You did some testing of that, and ran
 into unrelated bugs. Have they been fixed now?

Sorry, I dropped the ball on this. I'm doing a bit more testing of an
approach to fixing the new bugs. I'll let you know how I get on
tomorrow (later today for you).


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-11-27 Thread Peter Geoghegan
On Tue, Nov 26, 2013 at 8:19 PM, Peter Geoghegan p...@heroku.com wrote:
 There are some visibility-related race conditions even still

I also see this, sandwiched between the very many deadlock detected
errors recorded over 6 or so hours (this is in chronological order,
with no ERRORs omitted within the range shown):

ERROR:  deadlock detected
ERROR:  deadlock detected
ERROR:  deadlock detected
ERROR:  unable to fetch updated version of tuple
ERROR:  unable to fetch updated version of tuple
ERROR:  unable to fetch updated version of tuple
ERROR:  unable to fetch updated version of tuple
ERROR:  unable to fetch updated version of tuple
ERROR:  unable to fetch updated version of tuple
ERROR:  unable to fetch updated version of tuple
ERROR:  unable to fetch updated version of tuple
ERROR:  unable to fetch updated version of tuple
ERROR:  unable to fetch updated version of tuple
ERROR:  unable to fetch updated version of tuple
ERROR:  deadlock detected
ERROR:  deadlock detected
ERROR:  deadlock detected
ERROR:  deadlock detected

This, along with the already-discussed attempted to update invisible
tuple forms a full account of unexpected ERRORs seen during the
extended run of the test case, so far.

Since it took me a relatively long time to recreate this, it may not
be trivial to do so. Unless you don't think it's useful to do so, I'm
going to give this test a full 24 hours, just in case it shows up
anything else like this.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   >