subject:"Re\: \[HACKERS\] Re\: new patch of MERGE \(merge_204\) \& a question about duplicated ctid"


Heikki Linnakangas wrote:

You can of course LOCK TABLE as a work-around, if that's what you want.


What I was trying to suggest upthread is that while there are other 
possible ways around this problem, the only one that has any hope of 
shipping with 9.1 is to do just that.  So from my perspective, the rest 
of the discussion about the right way to proceed is moot for now.


For some reason it didn't hit me until you said this that I could do the 
locking manually in my test case, without even touching the server-side 
code yet.  Attached are a new pair of scripts where each pgbench UPDATE 
statement executes an explicit LOCK TABLE.  Here's the result of a 
sample run here:


$ pgbench -f update-merge.sql -T 60 -c 16 -j 4 -s 2 pgbench
starting vacuum...end.
transaction type: Custom query
scaling factor: 2
query mode: simple
number of clients: 16
number of threads: 4
duration: 60 s
number of transactions actually processed: 84375
tps = 1405.953672 (including connections establishing)
tps = 1406.137456 (excluding connections establishing)
$ psql -c 'select count(*) as updated FROM pgbench_accounts WHERE NOT 
abalance=0' -d pgbench

updated
-
  68897
(1 row)

$ psql -c 'select count(*) as inserted FROM pgbench_accounts WHERE aid  
10' -d pgbench

inserted
--
   34497
(1 row)

No assertion crashes, no duplicate key failures.  All the weird stuff I 
was running into is gone, so decent evidence the worst of the problems 
were all because the heavy lock I expecting just wasn't integrated into 
the patch.  Congratulations to Boxuan:  for the first time this is 
starting to act like a viable feature addition to me, just one with a 
moderately long list of limitations and performance issues.


1400 TPS worth of UPSERT on my modest 8-core desktop (single drive with 
cheating fsync) isn't uselessly slow.  If I add SET TRANSACTION 
ISOLATION LEVEL SERIALIZABLE; just after the BEGIN;, I don't see any 
serialization errors, and performance is exactly the same.


Run a straight UPDATE over only the existing range of keys, and I get 
7000 TPS instead.  So the locking etc. is reducing performance to 20% of 
its normal rate, on this assertion+debug build.  I can run this tomorrow 
(err, later today I guess looking at the time) on a proper system with 
BBWC and without asseritions to see if the magnitude of the difference 
changes, but I don't think that's the main issue here.


Presuming the code quality issues and other little quirks I've 
documented (and new ones yet to be discovered) can get resolved here, 
and that's a sizeable open question, I could see shipping this with the 
automatic heavy LOCK TABLE in there.  Then simple UPSERT could work out 
of the box via a straightforward MERGE.  We'd need a big warning 
disclaiming that concurrent performance is very limited in this first 
release of the feature, but I don't know that this is at the 
unacceptable level of slow for smaller web apps and such.


Until proper fine-grained concurrency is implemented, I think it would 
be PR suicide to release a version of this without a full table lock 
happening automatically though.  The idea Robert advocated well, that it 
would be possible for advanced users to use even this rough feature in a 
smarter way to avoid conflicts and not suffer the full performance 
penalty, is true.  But if you consider the main purpose here to be 
making it easier to get smaller MySQL apps and the like ported to 
PostgreSQL (which is what I see as goal #1), putting that burden on the 
user is just going to reinforce the old PostgreSQL is so much harder 
than MySQL stereotype.  I'd much prefer to see everyone have a slow but 
simple to use UPSERT via MERGE available initially, rather than to worry 
about optimizing for the advanced user in a way that makes life harder 
for the newbies.  The sort of people who must have optimal performance 
already have trigger functions available to them, that they can write 
and tweak for best performance.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books



test-merge.sh
Description: Bourne shell script
\set nbranches :scale
\set ntellers 10 * :scale
\set naccounts 10 * :scale
\setrandom aid 1 :naccounts
\setrandom bid 1 :nbranches
\setrandom tid 1 :ntellers
\setrandom delta -5000 5000
BEGIN;

-- Optional mode change
-- SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

LOCK TABLE pgbench_accounts;
MERGE INTO pgbench_accounts t USING (SELECT :aid,1+(:aid / 100)::integer,:delta,'') AS s(aid,bid,balance,filler) ON s.aid=t.aid WHEN MATCHED THEN UPDATE SET abalance=abalance + s.balance WHEN NOT MATCHED THEN INSERT VALUES(s.aid,s.bid,s.balance,s.filler);
COMMIT;

-- This syntax worked with MERGE v203 patch, but isn't compatible with v204
--MERGE INTO pgbench_accounts t USING (VALUES (:aid,1+(:aid / 100)::integer,:delta,'')) AS

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid


Robert Haas wrote:

And even if it isn't, the MERGE syntax is insane if what you really
want to do is insert or update ONE record.  If all we have is MERGE,
people will keep doing it with a PL/pgsql stored procedure or some
crummy application logic just so that they don't have to spend several
days trying to understand the syntax.  Heck, I understand the syntax
(or I think I do) and I still think it's more trouble than its worth


I hoped that the manual would have a clear example of this is how you 
do UPSERT with MERGE, preferrably cross-linked to the existing Example 
39-2. Exceptions with UPDATE/INSERT trigger implementation that's been 
the reference implementation for this for a long time, so people can see 
both alternatives.  New users will cut and paste that example into their 
code, and in the beginning neither know nor care how MERGE actually 
works, so long as the example does what it claims.  I would wager the 
majority of PL/pgsql implementations of this requirement start the exact 
same way.  I don't think the learning curve there is really smaller, 
it's just that you've just already been through it.


I've been purposefully ignoring the larger applications of MERGE in the 
interest of keeping focus on a managable subset.  But the more general 
feature set is in fact enormously useful for some types of data 
warehouse applications.  Build REPLACE, and you built REPLACE.  Build 
MERGE that is REPLACE now and eventually full high-performance MERGE, 
and you've done something with a much brighter future.  I don't think 
the concurrency hurdles here are unique to this feature either, as shown 
by the regular overlap noted with the other serialization work. 


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2011-01-04 Thread David Fetter

On Tue, Jan 04, 2011 at 04:44:32AM -0500, Greg Smith wrote:
 Heikki Linnakangas wrote:
 You can of course LOCK TABLE as a work-around, if that's what you want.
 
 Presuming the code quality issues and other little quirks I've
 documented (and new ones yet to be discovered) can get resolved
 here, and that's a sizeable open question, I could see shipping this
 with the automatic heavy LOCK TABLE in there.  Then simple UPSERT
 could work out of the box via a straightforward MERGE.

How about implementing an UPSERT command as take the lock, do the
merge?  That way, we'd have both the simplicity for the simpler cases
and a way to relax consistency guarantees for those who would like to
do so.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2011-01-04 Thread Marko Tiikkaja


On 2011-01-04 6:27 PM, David Fetter wrote:

On Tue, Jan 04, 2011 at 04:44:32AM -0500, Greg Smith wrote:

Heikki Linnakangas wrote:

You can of course LOCK TABLE as a work-around, if that's what you want.


Presuming the code quality issues and other little quirks I've
documented (and new ones yet to be discovered) can get resolved
here, and that's a sizeable open question, I could see shipping this
with the automatic heavy LOCK TABLE in there.  Then simple UPSERT
could work out of the box via a straightforward MERGE.


How about implementing an UPSERT command as take the lock, do the
merge?  That way, we'd have both the simplicity for the simpler cases
and a way to relax consistency guarantees for those who would like to
do so.


That, unfortunately, won't work so well in REPEATABLE READ :-(  But I, 
too, am starting to think that we should have a separate, optimized 
command to do UPSERT/INSERT .. IGNORE efficiently and correctly while 
making MERGE's correctness the user's responsibility.  Preferably with 
huge warning signs on the documentation page.



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2011-01-04 Thread David Fetter

On Tue, Jan 04, 2011 at 07:02:54PM +0200, Marko Tiikkaja wrote:
 On 2011-01-04 6:27 PM, David Fetter wrote:
 On Tue, Jan 04, 2011 at 04:44:32AM -0500, Greg Smith wrote:
 Heikki Linnakangas wrote:
 You can of course LOCK TABLE as a work-around, if that's what you want.
 
 Presuming the code quality issues and other little quirks I've
 documented (and new ones yet to be discovered) can get resolved
 here, and that's a sizeable open question, I could see shipping this
 with the automatic heavy LOCK TABLE in there.  Then simple UPSERT
 could work out of the box via a straightforward MERGE.
 
 How about implementing an UPSERT command as take the lock, do the
 merge?  That way, we'd have both the simplicity for the simpler cases
 and a way to relax consistency guarantees for those who would like to
 do so.
 
 That, unfortunately, won't work so well in REPEATABLE READ :-(

There are caveats all over READ COMMITTED/REPEATABLE READ/SNAPSHOT.
The only really intuitively obvious behavior is SERIALIZABLE, which
we'll have available in 9.1. :)

 But I, too, am starting to think that we should have a separate,
 optimized command to do UPSERT/INSERT .. IGNORE efficiently and
 correctly while making MERGE's correctness the user's
 responsibility.  Preferably with huge warning signs on the
 documentation page.

+1 for the HUGE WARNING SIGNS :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid


Kevin Grittner wrote:

Greg Smith  wrote:
 
  

I could see shipping this with the automatic heavy LOCK TABLE in
there.

 
How would you handle or document behavior in REPEATABLE READ

isolation?  The lock doesn't do much good unless you acquire it
before you get your snapshot, right?
  


Hand-wave and hope you offer a suggested implementation?  I haven't 
gotten to thinking about this part just yet--am still assimilating 
toward a next move after the pleasant surprise that this is actually 
working to some degree now.  You're right that turning the high-level 
idea of just lock the table actually has to be mapped into exact 
snapshot mechanics and pitfalls before moving in that direction will get 
very far.  I'm probably not the right person to answer just exactly how 
feasibile that is this week though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid


David Fetter wrote:

How about implementing an UPSERT command as take the lock, do the
merge?  That way, we'd have both the simplicity for the simpler cases
and a way to relax consistency guarantees for those who would like to
do so.
  


Main argument against is that path leads to a permanent non-standard 
wart to support forever, just to work around what should be a short-term 
problem.  And I'm not sure whether reducing the goals to only this 
actually improves the ability to ship something in the near term too 
much.  Many of the hard problems people are bothered by don't go away, 
it just makes deciding which side of the speed/complexity trade-off 
you're more interested in becomes more obvious.  What I've been 
advocating is making that decision go away altogether by only worrying 
about the simple to use and slow path for now, but that's a highly 
debatable viewpoint I appreciate the resistence to, if it's possible to 
do at all.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2011-01-04 Thread David Fetter

On Tue, Jan 04, 2011 at 09:27:10PM -0500, Greg Smith wrote:
 David Fetter wrote:
 How about implementing an UPSERT command as take the lock, do the
 merge?  That way, we'd have both the simplicity for the simpler cases
 and a way to relax consistency guarantees for those who would like to
 do so.
 
 Main argument against is that path leads to a permanent non-standard
 wart to support forever, just to work around what should be a
 short-term problem.  And I'm not sure whether reducing the goals to
 only this actually improves the ability to ship something in the
 near term too much.

I think I haven't communicated clearly what I'm suggesting, which is
that we ship with both an UPSERT and a MERGE, the former being ugly,
crude and simple, and the latter festooned with dire warnings about
isolation levels and locking.

If shipping with a wart, as you term it, isn't acceptable, then I'd
advocate for going with just MERGE and documenting it inside and out,
including one or more clearly written UPSERT and/or REPLACE INTO
recipes.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

On Mon, 2011-01-03 at 01:53 -0500, Greg Smith wrote:

 In advance of the planned but not available yet ability to 
 lock individual index key values, locking the whole table is the only 
 possible implementation that can work correctly here I'm aware of. 

This was discussed here
http://archives.postgresql.org/pgsql-hackers/2010-10/msg01903.php
with suggested resolutions for this release here
http://archives.postgresql.org/pgsql-hackers/2010-10/msg01907.php

In summary, that means we can either

1. Lock the table for ShareRowExclusiveLock

2. throw a SERIALIZABLE error, if we come up against a row that cannot
be neither MATCHED nor NON MATCHED.

3. Bounce the patch to 9.2, commit early and then work on a full
concurrency solution before commit. The solution strawman is something
like EvalPlanQual with a new snapshot for each re-checked row, emulating
the pattern of snapshots/rechecks that would happen in a PL/pgSQL
version of an UPSERT.

Either way, we're saying that MERGE will not support concurrent
operations safely, in this release.

Given the continued lack of test cases for this patch, and the possible
embarrassment over not doing concurrent actions, do we think (3) is the
right road? 

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

On 03.01.2011 11:37, Simon Riggs wrote:

On Mon, 2011-01-03 at 01:53 -0500, Greg Smith wrote:

In advance of the planned but not available yet ability to
lock individual index key values, locking the whole table is the only
possible implementation that can work correctly here I'm aware of.

This was discussed here
http://archives.postgresql.org/pgsql-hackers/2010-10/msg01903.php
with suggested resolutions for this release here
http://archives.postgresql.org/pgsql-hackers/2010-10/msg01907.php

In summary, that means we can either

1. Lock the table for ShareRowExclusiveLock

2. throw a SERIALIZABLE error, if we come up against a row that cannot
be neither MATCHED nor NON MATCHED.

3. Bounce the patch to 9.2, commit early and then work on a full
concurrency solution before commit. The solution strawman is something
like EvalPlanQual with a new snapshot for each re-checked row, emulating
the pattern of snapshots/rechecks that would happen in a PL/pgSQL
version of an UPSERT.

Either way, we're saying that MERGE will not support concurrent
operations safely, in this release.

Given the continued lack of test cases for this patch, and the possible
embarrassment over not doing concurrent actions, do we think (3) is the
right road?

This patch has never tried to implement concurrency-safe upsert. It
implements the MERGE command as specified by the SQL standard, nothing
more, nothing less. Let's not move the goalposts. Googling around, at
least MS SQL Server's MERGE command is the same
(http://weblogs.sqlteam.com/dang/archive/2009/01/31/UPSERT-Race-Condition-With-MERGE.aspx).
There is nothing embarrassing about it, we just have to document it clearly.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

On Mon, 2011-01-03 at 15:12 +0200, Heikki Linnakangas wrote:

 This patch has never tried to implement concurrency-safe upsert. It 
 implements the MERGE command as specified by the SQL standard, nothing 
 more, nothing less. Let's not move the goalposts. Googling around, at 
 least MS SQL Server's MERGE command is the same 
 (http://weblogs.sqlteam.com/dang/archive/2009/01/31/UPSERT-Race-Condition-With-MERGE.aspx).
  
 There is nothing embarrassing about it, we just have to document it clearly.

That article says that SQLServer supplies a locking hint that completely
removes the issue. Because they use locking, they are able to update in
place, so there is no need for them to use snapshots.

Our version won't allow a workaround yet, just for the record.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

On Mon, Jan 3, 2011 at 8:35 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, 2011-01-03 at 15:12 +0200, Heikki Linnakangas wrote:
 This patch has never tried to implement concurrency-safe upsert. It
 implements the MERGE command as specified by the SQL standard, nothing
 more, nothing less. Let's not move the goalposts. Googling around, at
 least MS SQL Server's MERGE command is the same
 (http://weblogs.sqlteam.com/dang/archive/2009/01/31/UPSERT-Race-Condition-With-MERGE.aspx).
 There is nothing embarrassing about it, we just have to document it clearly.

 That article says that SQLServer supplies a locking hint that completely
 removes the issue. Because they use locking, they are able to update in
 place, so there is no need for them to use snapshots.

 Our version won't allow a workaround yet, just for the record.

Like Heikki, I'd rather have the feature without a workaround for the
concurrency issues than no feature.  But I have to admit that the
discussion we've had thus far gives me very little confidence that
this code is anywhere close to bug-free.  So I think we're going to
end up punting it to 9.2 not so much because it's not concurrency-safe
as because it doesn't work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2011-01-03 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
 Like Heikki, I'd rather have the feature without a workaround for the
 concurrency issues than no feature.

I'm still trying to figure out the problem with having the table-level
lock, unless we really think people will be doing concurrent MERGE's
where they won't overlap..?  I'm also a bit nervous about if the result
of concurrent MERGE's would actually be correct if we're not taking a
bigger lock than row-level (I assume we're taking row-level locks as it
goes through..).

In general, I also thought/expected to have some kind of UPSERT type
capability with our initial MERGE support, even if it requires a big
lock and won't operate concurrently, etc.

 But I have to admit that the
 discussion we've had thus far gives me very little confidence that
 this code is anywhere close to bug-free.  So I think we're going to
 end up punting it to 9.2 not so much because it's not concurrency-safe
 as because it doesn't work.

That's certainly a concern. :/

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid


On 03.01.2011 17:56, Stephen Frost wrote:

* Robert Haas (robertmh...@gmail.com) wrote:

Like Heikki, I'd rather have the feature without a workaround for the
concurrency issues than no feature.


I'm still trying to figure out the problem with having the table-level
lock, unless we really think people will be doing concurrent MERGE's
where they won't overlap..?  I'm also a bit nervous about if the result
of concurrent MERGE's would actually be correct if we're not taking a
bigger lock than row-level (I assume we're taking row-level locks as it
goes through..).

In general, I also thought/expected to have some kind of UPSERT type
capability with our initial MERGE support, even if it requires a big
lock and won't operate concurrently, etc.


You can of course LOCK TABLE as a work-around, if that's what you want.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

On Mon, Jan 3, 2011 at 10:58 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 03.01.2011 17:56, Stephen Frost wrote:

 * Robert Haas (robertmh...@gmail.com) wrote:

 Like Heikki, I'd rather have the feature without a workaround for the
 concurrency issues than no feature.

 I'm still trying to figure out the problem with having the table-level
 lock, unless we really think people will be doing concurrent MERGE's
 where they won't overlap..?  I'm also a bit nervous about if the result
 of concurrent MERGE's would actually be correct if we're not taking a
 bigger lock than row-level (I assume we're taking row-level locks as it
 goes through..).

 In general, I also thought/expected to have some kind of UPSERT type
 capability with our initial MERGE support, even if it requires a big
 lock and won't operate concurrently, etc.

 You can of course LOCK TABLE as a work-around, if that's what you want.

That work-around completely fails to solve the concurrency problem.
Just because you have a lock on the table doesn't mean that there
aren't already tuples in the table which are invisible to your
snapshot (for example because the inserting transactions haven't
committed yet).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2011-01-03 Thread Andrew Dunstan




On 01/03/2011 10:58 AM, Heikki Linnakangas wrote:


In general, I also thought/expected to have some kind of UPSERT type
capability with our initial MERGE support, even if it requires a big
lock and won't operate concurrently, etc.



You can of course LOCK TABLE as a work-around, if that's what you want.


I think we need to state this in large red letters in the docs, if 
that's the requirement.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid


On 03.01.2011 18:02, Robert Haas wrote:

On Mon, Jan 3, 2011 at 10:58 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

On 03.01.2011 17:56, Stephen Frost wrote:


* Robert Haas (robertmh...@gmail.com) wrote:


Like Heikki, I'd rather have the feature without a workaround for the
concurrency issues than no feature.


I'm still trying to figure out the problem with having the table-level
lock, unless we really think people will be doing concurrent MERGE's
where they won't overlap..?  I'm also a bit nervous about if the result
of concurrent MERGE's would actually be correct if we're not taking a
bigger lock than row-level (I assume we're taking row-level locks as it
goes through..).

In general, I also thought/expected to have some kind of UPSERT type
capability with our initial MERGE support, even if it requires a big
lock and won't operate concurrently, etc.


You can of course LOCK TABLE as a work-around, if that's what you want.


That work-around completely fails to solve the concurrency problem.
Just because you have a lock on the table doesn't mean that there
aren't already tuples in the table which are invisible to your
snapshot (for example because the inserting transactions haven't
committed yet).


It works in read committed mode, because you acquire a new snapshot 
after the LOCK TABLE, and anyone else who modified the table must commit 
before the lock is granted. In serializable mode you get a serialization 
error.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

On Mon, Jan 3, 2011 at 11:08 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 03.01.2011 18:02, Robert Haas wrote:

 On Mon, Jan 3, 2011 at 10:58 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com  wrote:

 On 03.01.2011 17:56, Stephen Frost wrote:

 * Robert Haas (robertmh...@gmail.com) wrote:

 Like Heikki, I'd rather have the feature without a workaround for the
 concurrency issues than no feature.

 I'm still trying to figure out the problem with having the table-level
 lock, unless we really think people will be doing concurrent MERGE's
 where they won't overlap..?  I'm also a bit nervous about if the result
 of concurrent MERGE's would actually be correct if we're not taking a
 bigger lock than row-level (I assume we're taking row-level locks as it
 goes through..).

 In general, I also thought/expected to have some kind of UPSERT type
 capability with our initial MERGE support, even if it requires a big
 lock and won't operate concurrently, etc.

 You can of course LOCK TABLE as a work-around, if that's what you want.

 That work-around completely fails to solve the concurrency problem.
 Just because you have a lock on the table doesn't mean that there
 aren't already tuples in the table which are invisible to your
 snapshot (for example because the inserting transactions haven't
 committed yet).

 It works in read committed mode, because you acquire a new snapshot after
 the LOCK TABLE, and anyone else who modified the table must commit before
 the lock is granted.

Oh, I forgot we hold the ROW EXCLUSIVE lock until commit.  That might
be OK, then.

 In serializable mode you get a serialization error.

I don't think this part is true.  You can certainly do this:

CREATE TABLE test (a int);
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT * FROM test;
in another session, insert (1) into test
LOCK TABLE test IN SHARE MODE; -- or just LOCK TABLE test, if you prefer
SELECT * FROM test;  -- still ain't there
INSERT INTO test VALUES (1);

I don't see what would make MERGE immune to this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

On Mon, 2011-01-03 at 18:08 +0200, Heikki Linnakangas wrote:

 It works in read committed mode, because you acquire a new snapshot 
 after the LOCK TABLE, and anyone else who modified the table must commit 
 before the lock is granted. In serializable mode you get a serialization 
 error.

If its not safe without this

LOCK TABLE ... IN SHARE ROW EXCLUSIVE MODE

then we should do that automatically, and document that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid


On 03.01.2011 18:29, Simon Riggs wrote:

On Mon, 2011-01-03 at 18:08 +0200, Heikki Linnakangas wrote:


It works in read committed mode, because you acquire a new snapshot
after the LOCK TABLE, and anyone else who modified the table must commit
before the lock is granted. In serializable mode you get a serialization
error.


If its not safe without this

LOCK TABLE ... IN SHARE ROW EXCLUSIVE MODE

then we should do that automatically, and document that.


No we should not. The SQL standard doesn't require that, and it would 
unnecessarily restrict concurrent updates on unrelated rows in the table.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2011-01-03 Thread Florian Pflug

On Jan3, 2011, at 17:21 , Robert Haas wrote:
 On Mon, Jan 3, 2011 at 11:08 AM, Heikki Linnakangas
 In serializable mode you get a serialization error.
 
 I don't think this part is true.  You can certainly do this:
 
 CREATE TABLE test (a int);
 BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
 SELECT * FROM test;
 in another session, insert (1) into test
 LOCK TABLE test IN SHARE MODE; -- or just LOCK TABLE test, if you prefer
 SELECT * FROM test;  -- still ain't there
 INSERT INTO test VALUES (1);

In SERIALIZABLE mode, you need to take any table-level locks before obtaining
a snapshot. There's even a warning about this in the docs somewhere IIRC...

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

On Mon, 2011-01-03 at 18:35 +0200, Heikki Linnakangas wrote:
 On 03.01.2011 18:29, Simon Riggs wrote:
  On Mon, 2011-01-03 at 18:08 +0200, Heikki Linnakangas wrote:
 
  It works in read committed mode, because you acquire a new snapshot
  after the LOCK TABLE, and anyone else who modified the table must commit
  before the lock is granted. In serializable mode you get a serialization
  error.
 
  If its not safe without this
 
  LOCK TABLE ... IN SHARE ROW EXCLUSIVE MODE
 
  then we should do that automatically, and document that.
 
 No we should not. The SQL standard doesn't require that, and it would 
 unnecessarily restrict concurrent updates on unrelated rows in the table.

If we do that, then we definitely need a catch-all WHEN statement, so
that we can say

WHEN NOT MATCHED
  INSERT
WHEN MATCHED
  UPDATE
ELSE
  { INSERT into another table so we can try again in a minute
 or RAISE error }

Otherwise we will silently drop rows. Throwing an error every time isn't
useful behaviour.

Of course, that then breaks the standard, just as all existing
implementations do.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid


On 03.01.2011 18:49, Simon Riggs wrote:

On Mon, 2011-01-03 at 18:35 +0200, Heikki Linnakangas wrote:

On 03.01.2011 18:29, Simon Riggs wrote:

On Mon, 2011-01-03 at 18:08 +0200, Heikki Linnakangas wrote:


It works in read committed mode, because you acquire a new snapshot
after the LOCK TABLE, and anyone else who modified the table must commit
before the lock is granted. In serializable mode you get a serialization
error.


If its not safe without this

LOCK TABLE ... IN SHARE ROW EXCLUSIVE MODE

then we should do that automatically, and document that.


No we should not. The SQL standard doesn't require that, and it would
unnecessarily restrict concurrent updates on unrelated rows in the table.


If we do that, then we definitely need a catch-all WHEN statement, so
that we can say

WHEN NOT MATCHED
   INSERT
WHEN MATCHED
   UPDATE
ELSE
   { INSERT into another table so we can try again in a minute
  or RAISE error }

Otherwise we will silently drop rows. Throwing an error every time isn't
useful behaviour.


An ELSE clause would be nice, but it's not related to the question at 
hand. Only some serialization anomalities result in a row that matches 
neither WHEN MATCHED nor WHEN NOT MATCHED. Others result in a duplicate 
key exception, for example.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

On Mon, Jan 3, 2011 at 11:36 AM, Florian Pflug f...@phlo.org wrote:
 On Jan3, 2011, at 17:21 , Robert Haas wrote:
 On Mon, Jan 3, 2011 at 11:08 AM, Heikki Linnakangas
 In serializable mode you get a serialization error.

 I don't think this part is true.  You can certainly do this:

 CREATE TABLE test (a int);
 BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
 SELECT * FROM test;
 in another session, insert (1) into test
 LOCK TABLE test IN SHARE MODE; -- or just LOCK TABLE test, if you prefer
 SELECT * FROM test;  -- still ain't there
 INSERT INTO test VALUES (1);

 In SERIALIZABLE mode, you need to take any table-level locks before obtaining
 a snapshot. There's even a warning about this in the docs somewhere IIRC...

That should be safe, if people do it that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

On Mon, Jan 3, 2011 at 12:01 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 If we do that, then we definitely need a catch-all WHEN statement, so
 that we can say

 WHEN NOT MATCHED
   INSERT
 WHEN MATCHED
   UPDATE
 ELSE
   { INSERT into another table so we can try again in a minute
  or RAISE error }

 Otherwise we will silently drop rows. Throwing an error every time isn't
 useful behaviour.

 An ELSE clause would be nice, but it's not related to the question at hand.
 Only some serialization anomalities result in a row that matches neither
 WHEN MATCHED nor WHEN NOT MATCHED. Others result in a duplicate key
 exception, for example.

I must be missing something.  A source row is either matched or not
matched.  ELSE doesn't exist because the writers of the spec thought
there might be some third matched-invisible-row case, but rather
because you might have written WHEN [NOT MATCHED] AND some qual, and
you might fall through a list of all such clauses.

I think we're focusing on the wrong problem here.  MERGE creates some
syntax to let you do with SQL something that people are currently
doing with application-side logic.  I've written the application-side
logic to do this kind of thing more times than I care to remember, and
yeah there are concurrency issues, but:

- sometimes there's only one writer, so it doesn't matter
- sometimes there can technically be more than one writer, but the
usage is so low that nothing actually breaks
- sometimes you know that a given writer will only operate on rows
customer_id = some constant; so you only need to prevent two
concurrent writers *for the same customer*, not any two concurrent
writers
- and sometimes you need a full table lock.

The third case, in particular, is quite common in my experience, and a
very good reason not to impose a full table lock.  Users hate having
to do explicit locking (especially users whose names rhyme with Bevin
Bittner) but they hate *unnecessary* full-table locks even more.  A
problem that you can fix by adding a LOCK TABLE statement is annoying;
a problem that you can fix only be removing an implicit lock table
operation that the system performs under the hood is a lot worse.  In
the fourth case above, which IME is quite common, you could EITHER
take a full-table lock, if that performs OK, OR you could arrange to
take an advisory lock that protects the records for the particular
customer whose data you want to update.  If we always take a
full-table lock, then the user loses the option to do something else.

The point we ought to be focusing on is that the patch doesn't work.
Unless someone is prepared to put in some time to fix THAT, the rest
of this discussion is academic.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

On Mon, 2011-01-03 at 19:01 +0200, Heikki Linnakangas wrote:

  If we do that, then we definitely need a catch-all WHEN statement, so
  that we can say
 
  WHEN NOT MATCHED
 INSERT
  WHEN MATCHED
 UPDATE
  ELSE
 { INSERT into another table so we can try again in a minute
or RAISE error }
 
  Otherwise we will silently drop rows. Throwing an error every time isn't
  useful behaviour.
 
 An ELSE clause would be nice, but it's not related to the question at 
 hand. Only some serialization anomalities result in a row that matches 
 neither WHEN MATCHED nor WHEN NOT MATCHED. 

Concurrent UPDATEs, DELETEs, MERGE

 Others result in a duplicate 
 key exception, for example.

Concurrent INSERTs, MERGE

So an ELSE clause is very relevant to handling anomalies in a useful
way.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2011-01-03 Thread Kevin Grittner

Robert Haas robertmh...@gmail.com wrote:
 
 Users hate having to do explicit locking (especially users whose
 names rhyme with Bevin Bittner)
 
:-)
 
Before you decide to taunt me again, I guess I should point out a
few things here.
 
Should SSI and MERGE both make it into 9.1, there's every reason to
believe that running just about any DML, including MERGE, at
REPEATABLE READ would generate the same behavior which running at
REPEATABLE READ or SERIALIZABLE does now.  If MERGE is susceptible
to such anomalies as testing for the presence of a row and then
trying to UPDATE it if found, only to update zero rows because it
was concurrently deleted, SERIALIZABLE would prevent that with a
serialization failure.  I'd kind of expect that already with a
write-write conflict, but if that isn't happening, the SSI patch
should help.  Well, help prevent anomalies -- if you want it to work
out how to continue without rolling back it won't help at all.
 
The fact that SSI introduces predicate locking may ultimately allow
MERGE to do something more clever in terms of UPSERT logic, but I
*REALLY* think it's too late in the release cycle to start looking
at that.  Predicate locking for SSI was done exactly as was most
useful for SSI, on the basis (generally popular on this list) that
trying to generalize something with only one use case is doomed to
failure.  Trying to bend it in an additional direction this late in
the release would pretty much ensure that neither MERGE nor SSI
could make it in.
 
On the other hand, if we put SSI in with predicate locking more or
less as it is, and put MERGE in with more primitive concurrency
control, I fully expect that someone could figure out how to tease
apart SSI and its predicate locking during the next release cycle,
so that the predicate locking was more generally useful.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

On Mon, Jan 3, 2011 at 1:18 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas robertmh...@gmail.com wrote:

 Users hate having to do explicit locking (especially users whose
 names rhyme with Bevin Bittner)

 :-)

 Before you decide to taunt me again, I guess I should point out a
 few things here.

Sorry, that was intended as good-natured humor, not taunting.  I think
that the work you are doing on the serializability stuff is *exactly*
the right fix for the concurrency issues associated with MERGE.
Coming up with a fix that is specific to MERGE doesn't impress me
much.  I don't believe that hacking up MERGE will lead to anything
other than an ugly mess; it's just a syntax wrapper around an
operation that's fundamentally not too easy to make concurrent.  SSI
will handle it, though, along with, well, all the other cases that are
worth worrying about.  I don't have quite as much of an allergy to
explicit locking as you do, but I'm quite clear that it isn't nearly
as good as it just works.

 Should SSI and MERGE both make it into 9.1, [...]

So far the thread on large patches has lead to a status report from
most of the people working on large patches, and no volunteers to take
the lead on reviewing/committing any of them.  Although I think both
of those patches are worthwhile, and although I intend to spend a
very, very large amount of time doing CF work in the next 43 days, I
don't foresee committing either of them, and I probably will not have
time for a detailed review of either one, either.  I feel pretty bad
about that, but I just don't have any more bandwidth.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2011-01-03 Thread Kevin Grittner

Robert Haas robertmh...@gmail.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 
 Before you decide to taunt me again, I guess I should point out a
 few things here.
 
 Sorry, that was intended as good-natured humor, not taunting.
 
Oh, I took it that way.  I guess my attempt at humor through an
oblique reference to a line from Monty Python and the Holy Grail
fell flat.  :-/  I guess I should have said before you taunt me a
second time to make it more readily recognizable...
 
 I think that the work you are doing on the serializability stuff
 is *exactly* the right fix for the concurrency issues associated
 with MERGE.
 
It's got a nice consistency with current behavior, with reads never
blocking or being blocked, but I can see why people would want a
MERGE which could dance around the concurrency problems and always
succeed with UPSERT behavior.
 
Various topics have come up which seem like they might benefit from
predicate locking.  I don't know how many would need locks which
introduce blocking.  I think it will actually be very easy to adapt
the predicate locking for such things as transactional cache
invalidation (which is what drew the interest of the MIT folks). 
I'm not sure how much work it would be to adapt it to use for the
type of blocking locks which seem to be needed based on some of the
MERGE discussions I've read.  It think it will be non-trivial but
possible.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid