subject:"\\\\\\\[HACKERS\\\\\\\] MERGE vs REPLACE"

Re: [HACKERS] MERGE vs REPLACE

2005-11-23 Thread Martijn van Oosterhout

On Wed, Nov 23, 2005 at 12:24:24AM +0100, Petr Jelinek wrote:
 Btw about that keys, oracle gives error on many-to-one or many-to-many 
 relationship between the source and target tables.

The standard has something called a cardinality violation if the
to-be-merged table doesn't match 1-1 with the rest of the statement. If
I had access to an Oracle I'd run two tests on MERGE:

1. Does the joining column have to have an index? For example, make a
column that's full of unique values but no unique index. According to
my reading of the the standard, this should still work (just slower).

2. Additionally, only the rows involved in the MERGE need to be
uniquely referenced, so if you add duplicate values but add a WHERE
clause to exclude those, it should also work.

My feeling is that requiring an index will limit it's usefulness as a
general tool.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpXi0Kng0cKD.pgp
Description: PGP signature

Re: [HACKERS] MERGE vs REPLACE

2005-11-23 Thread Lyubomir Petrov


Martijn,

Here is a quick test (Oracle 10.1.0.3/Linux):


SQL select banner from v$version;
BANNER

Oracle Database 10g Enterprise Edition Release 10.1.0.3.0 - Prod
PL/SQL Release 10.1.0.3.0 - Production
CORE10.1.0.3.0  Production
TNS for Linux: Version 10.1.0.3.0 - Production
NLSRTL Version 10.1.0.3.0 - Production

SQL select * from merge_test_1;
   ID NAME
-- 
1 aaa
2 bbb
3 ccc
4 ddd
5 eee
1 xxx
6 rows selected.

SQL select * from merge_test_2;
   ID NAME
-- 
1 AAA
2 BBB
6 FFF

SQL select index_name from user_indexes where table_name like 
'merge_test%';

no rows selected

SQL merge into merge_test_1 a1
 2  using merge_test_2 a2
 3  on (a1.id = a2.id)
 4  when matched then
 5  update set a1.name = a2.name
 6  when not matched then
 7  insert (id, name) values (a2.id, a2.name);
4 rows merged.

SQL select * from merge_test_1;
   ID NAME
-- 
1 AAA
2 BBB
3 ccc
4 ddd
5 eee
1 AAA
6 FFF
7 rows selected.



Regards,
Lubomir Petrov



Martijn van Oosterhout wrote:

On Wed, Nov 23, 2005 at 12:24:24AM +0100, Petr Jelinek wrote:
  
Btw about that keys, oracle gives error on many-to-one or many-to-many 
relationship between the source and target tables.



The standard has something called a cardinality violation if the
to-be-merged table doesn't match 1-1 with the rest of the statement. If
I had access to an Oracle I'd run two tests on MERGE:

1. Does the joining column have to have an index? For example, make a
column that's full of unique values but no unique index. According to
my reading of the the standard, this should still work (just slower).

2. Additionally, only the rows involved in the MERGE need to be
uniquely referenced, so if you add duplicate values but add a WHERE
clause to exclude those, it should also work.

My feeling is that requiring an index will limit it's usefulness as a
general tool.

Have a nice day,
  



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] MERGE vs REPLACE

2005-11-22 Thread Martijn van Oosterhout

On Tue, Nov 22, 2005 at 04:20:12AM +0100, Petr Jelinek wrote:
 It was already said here that oracle and db2 both use MERGE, dunno about 
 mssql.
 
 And yes merge CAN be used to do REPLACE (oracle uses their dummy table 
 for this, we can use the fact that FROM clause isn't required in postgres).

Statements about MERGE on the web:

http://www.dba-oracle.com/oracle_tips_rittman_merge.htm
http://databasejournal.com/features/db2/article.php/3322041
http://certcities.com/editorial/columns/story.asp?EditorialsID=51
http://publib.boulder.ibm.com/infocenter/ids9help/index.jsp?topic=/com.ibm.sqls.doc/sqls578.htm
http://www.jdixon.dotnetdevelopersjournal.com/i_want_my_sql_2005_merge_statement.htm
http://publib.boulder.ibm.com/infocenter/db2help/topic/com.ibm.db2.udb.doc/admin/r0010873.htm
http://expertanswercenter.techtarget.com/eac/knowledgebaseAnswer/0,295199,sid63_gci978700,00.html

Not one (*not one!*) of these mentions any special handling of
duplicate keys. They even go to pains to say that any errors cause
everything to rollback. The last one is especially interesting:

: Is there any way to capture errors from a MERGE statement? Also, is
: there any way to know how many records were inserted or updated for the
: MERGE statement like SQL%ROWCOUNT? Any assistance greatly appreciated. 
:
: You capture errors the same way you would if you were doing regular
: INSERT and UPDATE statementswith exception handlers. Just include a
: WHEN OTHERS exception handler in the block where your MERGE statement
: is and have to display SQLCODE and SQLERRM if an error occurs. Then you
: can figure out which specific errors are occurring and create
: individual exception handlers for those. 

There are even places that tell you how to decompose your MERGE into an
INSERT plus UPDATE statement. The real advantage of MERGE is that the
semantics prevent your updating a row you just inserted, which is
harder in the general case but easy if the executor is handling the
rows one at a time.

Rather than trying to make MERGE do something it wasn't designed for,
we should probably be spending our efforts on triggers for error
conditions. Maybe something like:

CREATE TRIGGER foo AFTER ERROR ON bar EXECUTE baz();

Where baz would be passed NEW and OLD just like a normal trigger and if
the trigger return NULL, the update is ignored. In the meantime the
function can divert the insert to another table if it likes. This seems
like a much more workable and useful addition.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpqFtq6mXEYi.pgp
Description: PGP signature

Re: [HACKERS] MERGE vs REPLACE

2005-11-22 Thread Bruce Momjian

Jaime Casanova wrote:
 
  And yes merge CAN be used to do REPLACE (oracle uses their dummy table
  for this, we can use the fact that FROM clause isn't required in postgres).
 
 
 the FROM clause is required by default (starting with 8.1) unless you
 change a postgresql.conf parameter.
 
 and i don't think that idea will have any fan...

No, it is not, try SELECT 1.  Oracle requires SELECT 1 FROM dual.  The
change in 8.1 is that SELECT pg_class.relname no longer works.  You have to
do SELECT relname FROM pg_class.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] MERGE vs REPLACE

2005-11-22 Thread Jaime Casanova

On 11/22/05, Bruce Momjian pgman@candle.pha.pa.us wrote:
 Jaime Casanova wrote:
  
   And yes merge CAN be used to do REPLACE (oracle uses their dummy table
   for this, we can use the fact that FROM clause isn't required in 
   postgres).
  
 
  the FROM clause is required by default (starting with 8.1) unless you
  change a postgresql.conf parameter.
 
  and i don't think that idea will have any fan...

 No, it is not, try SELECT 1.  Oracle requires SELECT 1 FROM dual.  The
 change in 8.1 is that SELECT pg_class.relname no longer works.  You have to
 do SELECT relname FROM pg_class.

 --
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073


touche...

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] MERGE vs REPLACE

2005-11-22 Thread Jim C. Nasby

On Tue, Nov 22, 2005 at 11:57:48AM +0100, Martijn van Oosterhout wrote:
excellent research snipped

 Rather than trying to make MERGE do something it wasn't designed for,
 we should probably be spending our efforts on triggers for error
 conditions. Maybe something like:
 
 CREATE TRIGGER foo AFTER ERROR ON bar EXECUTE baz();
 
 Where baz would be passed NEW and OLD just like a normal trigger and if
 the trigger return NULL, the update is ignored. In the meantime the
 function can divert the insert to another table if it likes. This seems
 like a much more workable and useful addition.

I agree that we shouldn't try and distort MERGE into something fancy.
The AFTER ERROR trigger is a very interesting idea, since it could
handle many different cases. But I'm worried that people might not want
that behavior on by default for everything done against some table. I
think it'd be better to have some way to specify in a command that
you want to use some kind of error-handling trigger. Though presumably
the underlying framework would be same, so it shouldn't be hard to
support both.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] MERGE vs REPLACE

2005-11-22 Thread Petr Jelinek


Jaime Casanova wrote:


the FROM clause is required by default (starting with 8.1) unless you
change a postgresql.conf parameter.

and i don't think that idea will have any fan...



Bruce already replied to your first statement so, what idea won't have 
any fan ? It's not that we would change what MERGE does. Postgres just 
does not requeire FROM clause in SELECT and second parameter of MERGE 
can be SELECT which means you can do what REPLACE) does without problems 
and without breaking something or violating standard and like I said you 
can do the same in oracle using dual.


Btw about that keys, oracle gives error on many-to-one or many-to-many 
relationship between the source and target tables.


--
Regards
Petr Jelinek (PJMODOS)


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] MERGE vs REPLACE

2005-11-21 Thread Jim C. Nasby

On Thu, Nov 17, 2005 at 09:30:43PM -0500, Bruce Momjian wrote:
  Is the requirement for predicate locking, over and above a unique
  constraint on an index that involves the record key, to deal with
  the scenario of two inserts executing at the same time, both before
  commit?
 
 No.  If you have a primary key you can easily prevent duplicates.  You
 need a table lock or predicate locking to prevent duplicates if you do
 not have a primary key.

AFAIK you can also accomplish this without a table lock as long as you
have a unique index on the right set of fields and those fields are also
NOT NULL. ISTM it would be good to support that case as well, since you
might want to MERGE based on something other than the PK.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] MERGE vs REPLACE

2005-11-21 Thread Jim C. Nasby

On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote:
 I don't think MERGE can really be made to be both though, in which case
 it should really be the SQL2003 MERGE and we can make REPLACE/INSERT ON
 DUPLICATE UPDATE something else.  Perhaps a special form of MERGE where
 you know it's going to be doing that locking.  I really don't like the
 idea of making the SQL2003 version of MERGE be the MERGE special case
 (by requiring someone to take a table lock ahead of time or do something
 else odd).

Anyone know off-hand what the big 3 do? If the industry consensus is
that merge should actually be REPLACE/INSERT ON DUPLICATE UPDATE then
it's probably better to follow that lead.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] MERGE vs REPLACE

2005-11-21 Thread Petr Jelinek


Jim C. Nasby wrote:

On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote:


I don't think MERGE can really be made to be both though, in which case
it should really be the SQL2003 MERGE and we can make REPLACE/INSERT ON
DUPLICATE UPDATE something else.  Perhaps a special form of MERGE where
you know it's going to be doing that locking.  I really don't like the
idea of making the SQL2003 version of MERGE be the MERGE special case
(by requiring someone to take a table lock ahead of time or do something
else odd).



Anyone know off-hand what the big 3 do? If the industry consensus is
that merge should actually be REPLACE/INSERT ON DUPLICATE UPDATE then
it's probably better to follow that lead.


It was already said here that oracle and db2 both use MERGE, dunno about 
mssql.


And yes merge CAN be used to do REPLACE (oracle uses their dummy table 
for this, we can use the fact that FROM clause isn't required in postgres).


--
Regards
Petr Jelinek (PJMODOS)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] MERGE vs REPLACE

2005-11-21 Thread Jaime Casanova


 And yes merge CAN be used to do REPLACE (oracle uses their dummy table
 for this, we can use the fact that FROM clause isn't required in postgres).


the FROM clause is required by default (starting with 8.1) unless you
change a postgresql.conf parameter.

and i don't think that idea will have any fan...


--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] MERGE vs REPLACE

2005-11-20 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
  Oh, good point.  I was thinking just about concurrent MERGEs.  However,
  it is more complicated than that.  By definitaion you can not see
  changes from other transactions while your statement is being run (even
  if you increment CommandCounter), so to be atomic, you would still see
  the row even though some other transaction had deleted it.
 
 We would have to use the same semantics we use now for read-committed
 UPDATE, that is look at the latest version of the row even though this
 would not normally be visible to the transaction's snapshot.
 
 In the case of a serializable transaction, no doubt we should fail if
 any concurrent change actually happens.

I have some psaudocode to explain what we want for this feature,
whatever syntax we choose:

Start
Check unique index
Found
lock row for update
if zero rows, return to start
if more than one row, fail
update row
Notfound
create savepoint
insert row into heap
lock index page
if conflicting index entry, abort savepoint, return to start
add index entry
unlock index page

While the notfound case might look strange, we actually use this exact
method for inserts now, see ExecInsert() and _bt_doinsert(). 
Particularly see this comment in the second function:

/*
 * If we're not allowing duplicates, make sure the key isn't already in
 * the index.
 *
 * NOTE: obviously,_bt_check_unique can only detect keys that are already in
 * the index; so it cannot defend against concurrent insertions of the
 * same key.  We protect against that by means of holding a write lock on
 * the target page.  Any other would-be inserter of the same key must
 * acquire a write lock on the same target page, so only one would-be
 * inserter can be making the check at one time.  Furthermore, once we are
 * past the check we hold write locks continuously until we have performed
 * our insertion, so no later inserter can fail to see our insertion.
 * (This requires some care in _bt_insertonpg.)
 *
 * If we must wait for another xact, we release the lock while waiting, and
 * then must start over completely.
 */

Here is the unique check error from _bt_check_unique():

ereport(ERROR,
(errcode(ERRCODE_UNIQUE_VIOLATION),
errmsg(duplicate key violates unique constraint \%s\,
   RelationGetRelationName(rel;

I think the problem here is that it is going to longjump() back to
postgres.c (and out of your code loop).  While we have savepoints, I
think they only work coming from client applications, rather than inside
our code.  Ideally you would like to be able to say:

savepoint();
func();
rollback_to_savepoint();

but you can't, so I think you are going to have to factor out that
unique error callback and return a failure code to the caller.  I
suppose some boolean flag need to be added to _bt_doinsert(), but that
is called via a function pointer for the index type, so you are going to
have to update the insert function signatures for all access methods. 
The good news is that only btree supports unique indexes, according to
the documentation (Only B-tree currently supports unique indexes) so
for the other access methods the extra parameter is just ignored.

Another issue is multiple unique indexes.  What if the first unique
index matches one row, but a different row matches the second unique
indexed column?  Fail because unique checks do not identify exactly one
row?

Or the _new_ value for the second indexed column conflicts with the
second unique index. The MERGE/REPLACE should fail.  The UPDATE block
will handle this on its own, but the INSERT block will need to check for
that an really error out, rather than return to the caller, so the loop
in ExecInsertIndexTuples() has to restart on unique failure _only_ on
the first index check, not the subsequent ones.

One simplification would be to allow MERGE/REPLACE only on a table that
has a single unique index.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] MERGE vs REPLACE

2005-11-18 Thread Zeugswetter Andreas DCP SD


 Unless you have a table lock, INSERT has to be before UPDATE, think
UPDATE, UPDATE (both fail), INSERT, INSERT.

  update
  if no rows updated
insert
if duplicate key
  update
  if no rows updated goto insert

That is why you have the loop. This is not a problem with above code,
because only one insert succeeds
while the others then do the update.

Andreas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] MERGE vs REPLACE

2005-11-18 Thread Bruce Momjian


Oh, good point.  I was thinking just about concurrent MERGEs.  However,
it is more complicated than that.  By definitaion you can not see
changes from other transactions while your statement is being run (even
if you increment CommandCounter), so to be atomic, you would still see
the row even though some other transaction had deleted it.

I think we avoid that now because UPDATE, (which is a DELETE then
INSERT) chains the tuples together so others see the activity happening.

Seems like we are going to have to peek at rows like we do now for
INSERT and peek at index rows, if I remember correctly.  I can't think
of any other place in the code where we loop around other backend's
activity like this.

This could be tricky.

---

Dennis Bjorklund wrote:
 On Thu, 17 Nov 2005, Bruce Momjian wrote:
 
  Unless you have a table lock, INSERT has to be before UPDATE, think
  UPDATE, UPDATE (both fail), INSERT, INSERT.
 
 No matter what operation you start with you need a loop that try 
 insert/update until one of them succeed like in this example:
 
 http://www.postgresql.org/docs/8.1/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE
 
 Without a loop you might not get to execute neither the insert nor the 
 update. Why? Think about this example:
 
 BEGIN
 
 INSERT  - fail because there is a row already
 
 - before we manage to do the update someone
delete the row (which we can see in the
default transaction isolation level)
 
 UPDATE  - fail because there is no row so we will loop
and try the insert again
 
 - before we manage to do the insert someone else does
an insert
 
 INSERT  - fail because there is a row already
 
 - before we manage to do the update someone
delete the row 
 
 
 
 You might need to loop any number of times before you manage to perform
 one of the two operations. Which operation you should start with depends
 on which of the two cases is the common one.
 
 -- 
 /Dennis Bj?rklund
 
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] MERGE vs REPLACE

2005-11-18 Thread Tom Lane

Bruce Momjian pgman@candle.pha.pa.us writes:
 Oh, good point.  I was thinking just about concurrent MERGEs.  However,
 it is more complicated than that.  By definitaion you can not see
 changes from other transactions while your statement is being run (even
 if you increment CommandCounter), so to be atomic, you would still see
 the row even though some other transaction had deleted it.

We would have to use the same semantics we use now for read-committed
UPDATE, that is look at the latest version of the row even though this
would not normally be visible to the transaction's snapshot.

In the case of a serializable transaction, no doubt we should fail if
any concurrent change actually happens.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] MERGE vs REPLACE

2005-11-17 Thread Csaba Nagy

Well, from my point of view it is more than delete and insert. That I
can do right now with existing infrastructure. The problem I try to
solve is something along: a bunch of clients try to update a count, and
ONE of them must initialize the count if it does not exist... this can't
be done with current infrastructure without race conditions.
Our current solution is to initialize all the possible counts
beforehand, but that suboptimal as only a few of them will actually have
data coming in later...
And of course the import problem... we can have multiple concurrent
imports, which must insert just once per some unique criteria, and
update if the record is already there, and all this in a batch. This is
also not possible without race conditions or aggressive locking.
So for me the atomic, consistent and without performance penalties
update_or_insert_based_on_unique_criteria does have a value, and
that's coming exactly from the locking of the unique index which
eliminates the race condition from this operation. I don't care about
syntax sugar, just about things I could do more efficiently if this
mechanism were in place...

Cheers,
Csaba.

On Wed, 2005-11-16 at 20:33, Martijn van Oosterhout wrote:
 On Wed, Nov 16, 2005 at 11:06:15AM -0800, Dann Corbit wrote:
   -Original Message-
   From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
   [EMAIL PROTECTED] On Behalf Of Simon Riggs
Conceptually, a MERGE statement is just a long string of INSERTs and
UPDATEs in the same transaction and I think we should treat it as
such.
  
  Merge could also be considered as a long string of deletes and inserts.
  I guess that deleting those records that already exist and then
  inserting all of the records is faster because it could be done like a
  single join to perform the delete and then a single batch insert.
 
 And for us it makes no difference because in MVCC, UPDATE == DELETE +
 INSERT. IMHO it's just a nice construct to specify UPDATEs and INSERTs
 in the same statement.
 
 Have a nice day,


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] MERGE vs REPLACE

2005-11-17 Thread Zeugswetter Andreas DCP SD

 The problem I try to solve is something along: a bunch of clients try
to update a count, and ONE of them must initialize  the count if it
does not exist... this can't be done with current infrastructure without
race conditions.

The solution without merge but a unique key in other db's is:

update
if no rows updated
  insert
  if duplicate key
update
if no rows updated goto insert

note, that the counter updates need to be of the form set x = x + ?
where key=y 
do you see a potential race condition with this ?
In pg you also need a savepoint before the insert for this to work.

Depending on the ratio of insert vs update we also start with insert
when 
the insert succeeds more that 50% (I would use a higher percentage with
pg though):

insert
  if duplicate key
update
if no rows updated goto insert

Andreas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] MERGE vs REPLACE

2005-11-17 Thread Csaba Nagy

Yes, these algorithms are clear to me, but they don't work for batch
updates in postgres without savepoints before each row insert/update,
which is not good for performance (not to mention on older postgres
versions without savepoint support it won't work at all). If there is a
way of no race condition, no performance penalty, that would be
something new and useful. I just guess the MERGE would provide that.

Cheers,
Csaba.

On Thu, 2005-11-17 at 12:34, Zeugswetter Andreas DCP SD wrote:
  The problem I try to solve is something along: a bunch of clients try
 to update a count, and ONE of them must initialize  the count if it
 does not exist... this can't be done with current infrastructure without
 race conditions.
 
 The solution without merge but a unique key in other db's is:
 
 update
 if no rows updated
   insert
   if duplicate key
 update
 if no rows updated goto insert
 
 note, that the counter updates need to be of the form set x = x + ?
 where key=y 
 do you see a potential race condition with this ?
 In pg you also need a savepoint before the insert for this to work.
 
 Depending on the ratio of insert vs update we also start with insert
 when 
 the insert succeeds more that 50% (I would use a higher percentage with
 pg though):
 
 insert
   if duplicate key
 update
 if no rows updated goto insert
 
 Andreas


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] MERGE vs REPLACE

2005-11-17 Thread Martijn van Oosterhout

On Thu, Nov 17, 2005 at 12:52:53PM +0100, Csaba Nagy wrote:
 Yes, these algorithms are clear to me, but they don't work for batch
 updates in postgres without savepoints before each row insert/update,
 which is not good for performance (not to mention on older postgres
 versions without savepoint support it won't work at all). If there is a
 way of no race condition, no performance penalty, that would be
 something new and useful. I just guess the MERGE would provide that.

Well, then you guess wrong. This isn't what MERGE is for. MERGE is just
a neat way of specifying the UPDATE and INSERT cases in the same
statement. It doesn't remove the possibility duplicate inserts and thus
primary key violations.

If someone wants to make extensions to MERGE so that it can avoid the
race condition and avoid the duplicate key violations, that's fine. But
be aware that this is outside of the spec. It may be a useful addition,
but perhaps we should consider MERGE and REPLACE as completely seperate
targets.

MERGE has a whole join construction with subqueries that would be a
pain to make work in a way that is truly serialisable. REPLACE deals
with only one row and tries to solve the race for that case only. Much
easier to consider them seperately, no?

I guess what's really irritating is that this clearly exposes the case
listed in the docs as Why SERIALIZABLE isn't in all cases. If we
could solve that for MERGE, we could probably solve it in the general
case too.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpwqXKPvo94y.pgp
Description: PGP signature

Re: [HACKERS] MERGE vs REPLACE

2005-11-17 Thread Csaba Nagy

OK, in this case I don't care about either MERGE or REPLACE, but for an
UPSERT which does the locking :-)

Cheers,
Csaba.

On Thu, 2005-11-17 at 13:32, Martijn van Oosterhout wrote:
 On Thu, Nov 17, 2005 at 12:52:53PM +0100, Csaba Nagy wrote:
  Yes, these algorithms are clear to me, but they don't work for batch
  updates in postgres without savepoints before each row insert/update,
  which is not good for performance (not to mention on older postgres
  versions without savepoint support it won't work at all). If there is a
  way of no race condition, no performance penalty, that would be
  something new and useful. I just guess the MERGE would provide that.
 
 Well, then you guess wrong. This isn't what MERGE is for. MERGE is just
 a neat way of specifying the UPDATE and INSERT cases in the same
 statement. It doesn't remove the possibility duplicate inserts and thus
 primary key violations.
 
 If someone wants to make extensions to MERGE so that it can avoid the
 race condition and avoid the duplicate key violations, that's fine. But
 be aware that this is outside of the spec. It may be a useful addition,
 but perhaps we should consider MERGE and REPLACE as completely seperate
 targets.
 
 MERGE has a whole join construction with subqueries that would be a
 pain to make work in a way that is truly serialisable. REPLACE deals
 with only one row and tries to solve the race for that case only. Much
 easier to consider them seperately, no?
 
 I guess what's really irritating is that this clearly exposes the case
 listed in the docs as Why SERIALIZABLE isn't in all cases. If we
 could solve that for MERGE, we could probably solve it in the general
 case too.
 
 Have a nice day,


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] MERGE vs REPLACE

2005-11-17 Thread Tom Lane

Csaba Nagy [EMAIL PROTECTED] writes:
 OK, in this case I don't care about either MERGE or REPLACE, but for an
 UPSERT which does the locking :-)

This is exactly the point --- pretty much nobody has come to us and
asked for a feature that does what Peter and Martijn say MERGE does.
(I haven't bothered to look at the 2003 spec, I'm assuming they read it
correctly.)  What we *have* been asked for, over and over, is an
insert-or-update feature that's not so tedious and inefficient as the
savepoint-insert-rollback-update kluge.  That's what we ought to be
concentrating on providing.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] MERGE vs REPLACE

2005-11-17 Thread Stephen Frost

* Tom Lane ([EMAIL PROTECTED]) wrote:
 This is exactly the point --- pretty much nobody has come to us and
 asked for a feature that does what Peter and Martijn say MERGE does.
 (I haven't bothered to look at the 2003 spec, I'm assuming they read it
 correctly.)  What we *have* been asked for, over and over, is an
 insert-or-update feature that's not so tedious and inefficient as the
 savepoint-insert-rollback-update kluge.  That's what we ought to be
 concentrating on providing.

I guess to be clear on what this distinction actually is, specifically:
MERGE under SQL2003 doesn't appear to be intended to be used
concurrently.  For data warehousing situations this can be just fine
such as in my case where I get a monthly update of some information and
need to merge that update in with the prior information.  In this case
there's only one MERGE running and I'd hope it'd be faster than doing
check for existance, insert/update on each row in plpgsql or something
(since there'd be multiple index lookups, etc, I think).  Concurrent
MERGEs running *can* fail, just like whole transactions which do the
check/insert/update can fail.

REPLACE/INSERT ON DUPLICATE UPDATE appears to essentially be a
transaction which is supposed to not fail but instead do locking to
ensure that it doesn't fail.  This requires predicate locking to be
efficient because you want to tell the concurrent transaction if you
have the same key as me, just wait a second and you can do an update
'cause I'm going to create the key if it doesn't exist before I'm done.

I think REPLACE/INSERT ON DUPLICATE UPDATE is definitely harder to do
than MERGE because of the idea that it isn't supposed to fail generally.
I think SQL2003 MERGE would be reasonably easy to do and to get the
efficiency benefits out of it (assuming there are some to be had in the
end).  

I don't think MERGE can really be made to be both though, in which case
it should really be the SQL2003 MERGE and we can make REPLACE/INSERT ON
DUPLICATE UPDATE something else.  Perhaps a special form of MERGE where
you know it's going to be doing that locking.  I really don't like the
idea of making the SQL2003 version of MERGE be the MERGE special case
(by requiring someone to take a table lock ahead of time or do something
else odd).

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] MERGE vs REPLACE

2005-11-17 Thread mark

On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote:
 REPLACE/INSERT ON DUPLICATE UPDATE appears to essentially be a
 transaction which is supposed to not fail but instead do locking to
 ensure that it doesn't fail.  This requires predicate locking to be
 efficient because you want to tell the concurrent transaction if you
 have the same key as me, just wait a second and you can do an update
 'cause I'm going to create the key if it doesn't exist before I'm done.

Is the requirement for predicate locking, over and above a unique
constraint on an index that involves the record key, to deal with
the scenario of two inserts executing at the same time, both before
commit?

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] MERGE vs REPLACE

2005-11-17 Thread Bruce Momjian


Unless you have a table lock, INSERT has to be before UPDATE, think
UPDATE, UPDATE (both fail), INSERT, INSERT.

---

Zeugswetter Andreas DCP SD wrote:
  The problem I try to solve is something along: a bunch of clients try
 to update a count, and ONE of them must initialize  the count if it
 does not exist... this can't be done with current infrastructure without
 race conditions.
 
 The solution without merge but a unique key in other db's is:
 
 update
 if no rows updated
   insert
   if duplicate key
 update
 if no rows updated goto insert
 
 note, that the counter updates need to be of the form set x = x + ?
 where key=y 
 do you see a potential race condition with this ?
 In pg you also need a savepoint before the insert for this to work.
 
 Depending on the ratio of insert vs update we also start with insert
 when 
 the insert succeeds more that 50% (I would use a higher percentage with
 pg though):
 
 insert
   if duplicate key
 update
 if no rows updated goto insert
 
 Andreas
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] MERGE vs REPLACE

2005-11-17 Thread Bruce Momjian

[EMAIL PROTECTED] wrote:
 On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote:
  REPLACE/INSERT ON DUPLICATE UPDATE appears to essentially be a
  transaction which is supposed to not fail but instead do locking to
  ensure that it doesn't fail.  This requires predicate locking to be
  efficient because you want to tell the concurrent transaction if you
  have the same key as me, just wait a second and you can do an update
  'cause I'm going to create the key if it doesn't exist before I'm done.
 
 Is the requirement for predicate locking, over and above a unique
 constraint on an index that involves the record key, to deal with
 the scenario of two inserts executing at the same time, both before
 commit?

No.  If you have a primary key you can easily prevent duplicates.  You
need a table lock or predicate locking to prevent duplicates if you do
not have a primary key.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] MERGE vs REPLACE

2005-11-17 Thread Bruce Momjian

Tom Lane wrote:
 Csaba Nagy [EMAIL PROTECTED] writes:
  OK, in this case I don't care about either MERGE or REPLACE, but for an
  UPSERT which does the locking :-)
 
 This is exactly the point --- pretty much nobody has come to us and
 asked for a feature that does what Peter and Martijn say MERGE does.
 (I haven't bothered to look at the 2003 spec, I'm assuming they read it
 correctly.)  What we *have* been asked for, over and over, is an
 insert-or-update feature that's not so tedious and inefficient as the
 savepoint-insert-rollback-update kluge.  That's what we ought to be
 concentrating on providing.

I am confused over the various options.  I have heard these syntaxes:

SQL2003 MERGE
MySQL REPLACE
http://dev.mysql.com/doc/refman/5.1/en/replace.html
MySQL INSERT VIOLATION ...
UPSERT

So it seems MERGE does not have the use-case we most need, though it can
be bent to do it.  (Given their MATCH syntax, it doesn't seem there is
any logic that it tries INSERT first).

Looking at the MySQL URL above, REPLACE has three possible syntaxes with
normal (DELETE), SET (UPDATE), and SELECT.  Is this the direction we
need to go?  I don't like INSERT ... VIOLATION because I would like a
new keyword for this.  Is UPSERT the same as REPLACE?  Should we use
UPSERT instead?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] MERGE vs REPLACE

2005-11-17 Thread Dennis Bjorklund

On Thu, 17 Nov 2005, Bruce Momjian wrote:

 Unless you have a table lock, INSERT has to be before UPDATE, think
 UPDATE, UPDATE (both fail), INSERT, INSERT.

No matter what operation you start with you need a loop that try 
insert/update until one of them succeed like in this example:

http://www.postgresql.org/docs/8.1/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE

Without a loop you might not get to execute neither the insert nor the 
update. Why? Think about this example:

BEGIN

INSERT  - fail because there is a row already

- before we manage to do the update someone
   delete the row (which we can see in the
   default transaction isolation level)

UPDATE  - fail because there is no row so we will loop
   and try the insert again

- before we manage to do the insert someone else does
   an insert

INSERT  - fail because there is a row already

- before we manage to do the update someone
   delete the row 



You might need to loop any number of times before you manage to perform
one of the two operations. Which operation you should start with depends
on which of the two cases is the common one.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] MERGE vs REPLACE

2005-11-16 Thread Tom Lane

Christopher Kings-Lynne [EMAIL PROTECTED] writes:
 We should probably throw a notice or warning if we go to a table lock,
 too.

 That's not very useful, because you can only do somethign about it AFTER 
 the 1 hour exclusive lock merge has already run :)

We shouldn't do anything remotely like that.  A statement whose locking
effects can't be predicted on sight is horrid both from the user's
viewpoint and from the implementation viewpoint.  In particular, if we
have to do planning before we can determine whether the table needs just
a SELECT lock or something stronger, then we have to take a weak lock to
do the planning and then we are faced with upgrading to the stronger
lock at runtime.  Can you say deadlock risk?

I think we should do REPLACE-like functionality that simply fails if the
match condition isn't equality on a primary key.  If we can use SQL-spec
MERGE syntax for this, that's fine, but let's not think in terms of
silently changing to a stronger table lock and a much slower
implementation when the condition isn't a primary key.  That's a whole
lot of work that isn't solving any real-world problems, and *is*
creating a foot-gun for people to cause themselves performance and
deadlock problems anytime they spell the WHERE condition slightly wrong.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] MERGE vs REPLACE

2005-11-16 Thread Rick Gigger

I agree.  I would never ever ever want it to silently start doing  
table locks.  I would simply avoid using merge at all if that was a  
possibility.


However it seems like the idea is to eventually flesh out full  
fledged merge.  And to do that it sounds like you would need to do  
one of the following:


1) implement predicate locking beyond the simple match on unique  
index case that we have here

2) do full table locks.

It sounds like #1 isn't going to happen for a while.  So in order to  
do more complicated merges you will need to do #2.  If you are going  
to implement more complicated merge functionality I certainly  
wouldn't want it throwing a warning telling me about a table lock if  
I had already knew it would get the table lock and decided I wanted  
to go ahead with using merge anyway.


Could you let the user create the lock himself to handle this  
situation?  For instance:


analyze the merge
if merge condition matches unique index
merge without  table locking
elseif needed table lock already exists
merge
else
throw an error

You could also just add something to the merge syntax like ALLOW  
TABLE LOCK or something.  The idea is just that the user can  
explicitly allow the table lock and thus the more complicated merge.


I don't really know anything about the implementation details but  
that is the behavior that I would prefer.  That way I could always do  
a complicated merge if I wanted to but there is no way it would ever  
do an implicit table lock on me.  And it would never throw an error/ 
warning unless I actually did something questionable.


Does that make sense.

Rick Gigger

On Nov 16, 2005, at 7:49 AM, Tom Lane wrote:


Christopher Kings-Lynne [EMAIL PROTECTED] writes:
We should probably throw a notice or warning if we go to a table  
lock,

too.


That's not very useful, because you can only do somethign about it  
AFTER

the 1 hour exclusive lock merge has already run :)


We shouldn't do anything remotely like that.  A statement whose  
locking

effects can't be predicted on sight is horrid both from the user's
viewpoint and from the implementation viewpoint.  In particular, if we
have to do planning before we can determine whether the table needs  
just
a SELECT lock or something stronger, then we have to take a weak  
lock to

do the planning and then we are faced with upgrading to the stronger
lock at runtime.  Can you say deadlock risk?

I think we should do REPLACE-like functionality that simply fails  
if the
match condition isn't equality on a primary key.  If we can use SQL- 
spec

MERGE syntax for this, that's fine, but let's not think in terms of
silently changing to a stronger table lock and a much slower
implementation when the condition isn't a primary key.  That's a whole
lot of work that isn't solving any real-world problems, and *is*
creating a foot-gun for people to cause themselves performance and
deadlock problems anytime they spell the WHERE condition slightly  
wrong.


regards, tom lane

---(end of  
broadcast)---

TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that  
your

   message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] MERGE vs REPLACE

2005-11-16 Thread Jaime Casanova

 You could also just add something to the merge syntax like ALLOW
 TABLE LOCK or something.  The idea is just that the user can
 explicitly allow the table lock and thus the more complicated merge.


The problem here is that many people will see that option and think
it's safe to do it... i mean, many people will shoot themselves in the
foot and the culprit will be PostgreSQL because he let a ready to
shoot gun in a visible place when are kids around

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] MERGE vs REPLACE

2005-11-16 Thread Bruce Momjian


Interesting approach.  Actually, we could tell the user they have to use
BEGIN;LOCK tab before doing MERGE, and throw an error if we don't
already have a table lock.

---

Rick Gigger wrote:
 I agree.  I would never ever ever want it to silently start doing  
 table locks.  I would simply avoid using merge at all if that was a  
 possibility.
 
 However it seems like the idea is to eventually flesh out full  
 fledged merge.  And to do that it sounds like you would need to do  
 one of the following:
 
 1) implement predicate locking beyond the simple match on unique  
 index case that we have here
 2) do full table locks.
 
 It sounds like #1 isn't going to happen for a while.  So in order to  
 do more complicated merges you will need to do #2.  If you are going  
 to implement more complicated merge functionality I certainly  
 wouldn't want it throwing a warning telling me about a table lock if  
 I had already knew it would get the table lock and decided I wanted  
 to go ahead with using merge anyway.
 
 Could you let the user create the lock himself to handle this  
 situation?  For instance:
 
 analyze the merge
 if merge condition matches unique index
   merge without  table locking
 elseif needed table lock already exists
   merge
 else
   throw an error
 
 You could also just add something to the merge syntax like ALLOW  
 TABLE LOCK or something.  The idea is just that the user can  
 explicitly allow the table lock and thus the more complicated merge.
 
 I don't really know anything about the implementation details but  
 that is the behavior that I would prefer.  That way I could always do  
 a complicated merge if I wanted to but there is no way it would ever  
 do an implicit table lock on me.  And it would never throw an error/ 
 warning unless I actually did something questionable.
 
 Does that make sense.
 
 Rick Gigger
 
 On Nov 16, 2005, at 7:49 AM, Tom Lane wrote:
 
  Christopher Kings-Lynne [EMAIL PROTECTED] writes:
  We should probably throw a notice or warning if we go to a table  
  lock,
  too.
 
  That's not very useful, because you can only do somethign about it  
  AFTER
  the 1 hour exclusive lock merge has already run :)
 
  We shouldn't do anything remotely like that.  A statement whose  
  locking
  effects can't be predicted on sight is horrid both from the user's
  viewpoint and from the implementation viewpoint.  In particular, if we
  have to do planning before we can determine whether the table needs  
  just
  a SELECT lock or something stronger, then we have to take a weak  
  lock to
  do the planning and then we are faced with upgrading to the stronger
  lock at runtime.  Can you say deadlock risk?
 
  I think we should do REPLACE-like functionality that simply fails  
  if the
  match condition isn't equality on a primary key.  If we can use SQL- 
  spec
  MERGE syntax for this, that's fine, but let's not think in terms of
  silently changing to a stronger table lock and a much slower
  implementation when the condition isn't a primary key.  That's a whole
  lot of work that isn't solving any real-world problems, and *is*
  creating a foot-gun for people to cause themselves performance and
  deadlock problems anytime they spell the WHERE condition slightly  
  wrong.
 
  regards, tom lane
 
  ---(end of  
  broadcast)---
  TIP 1: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that  
  your
 message can get through to the mailing list cleanly
 
 
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] MERGE vs REPLACE

2005-11-16 Thread Jaime Casanova

On 11/16/05, Bruce Momjian pgman@candle.pha.pa.us wrote:

 Interesting approach.  Actually, we could tell the user they have to use
 BEGIN;LOCK tab before doing MERGE, and throw an error if we don't
 already have a table lock.


If the lock will be required, what's the problem in doing it
internally? without user interaction?

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] MERGE vs REPLACE

2005-11-16 Thread Tom Lane

Jaime Casanova [EMAIL PROTECTED] writes:
 If the lock will be required, what's the problem in doing it
 internally?

I already explained that: lock upgrading is deadlock-prone.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] MERGE vs REPLACE

2005-11-16 Thread Bruce Momjian

Jaime Casanova wrote:
 On 11/16/05, Bruce Momjian pgman@candle.pha.pa.us wrote:
 
  Interesting approach.  Actually, we could tell the user they have to use
  BEGIN;LOCK tab before doing MERGE, and throw an error if we don't
  already have a table lock.
 
 
 If the lock will be required, what's the problem in doing it
 internally? without user interaction?

Because first, we are making it visible to the user, and second the lock
is taken out _before_ we actually execute the statement, meaning we
don't have to escalate our locks.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] MERGE vs REPLACE

2005-11-16 Thread Martijn van Oosterhout

On Wed, Nov 16, 2005 at 11:37:46AM -0500, Bruce Momjian wrote:
 
 Interesting approach.  Actually, we could tell the user they have to use
 BEGIN;LOCK tab before doing MERGE, and throw an error if we don't
 already have a table lock.

The bit I'm still missing is why there needs to be a lock at all. The
SQL standard doesn't say anywhere that concurrent MERGE operations
can't conflict. It seems to me that standard visibility rules apply. If
neither MERGE statement can see the results of the other, then they
will both INSERT. If you don't have a UNIQUE constraint to prevent this
then what's the problem?

It seems to me people would like, in the case of an existing UNIQUE
constraint, to be able to use it to prevent duplicate key errors.
This is nice, but the standard doesn't require that either. 

In other words, if we can use an index to avoid duplicate key errors,
fine. But if there is no index available, it is not an error to do an
INSERT because another INSERT was hidden from you.

Conceptually, a MERGE statement is just a long string of INSERTs and
UPDATEs in the same transaction and I think we should treat it as
such.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpLQAwAS0yUe.pgp
Description: PGP signature

Re: [HACKERS] MERGE vs REPLACE

2005-11-16 Thread Bruce Momjian

Martijn van Oosterhout wrote:
-- Start of PGP signed section.
 On Wed, Nov 16, 2005 at 11:37:46AM -0500, Bruce Momjian wrote:
  
  Interesting approach.  Actually, we could tell the user they have to use
  BEGIN;LOCK tab before doing MERGE, and throw an error if we don't
  already have a table lock.
 
 The bit I'm still missing is why there needs to be a lock at all. The
 SQL standard doesn't say anywhere that concurrent MERGE operations
 can't conflict. It seems to me that standard visibility rules apply. If
 neither MERGE statement can see the results of the other, then they
 will both INSERT. If you don't have a UNIQUE constraint to prevent this
 then what's the problem?

I assume they want MERGE because they don't want duplicates.  If they
don't care, they would have used INSERT.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] MERGE vs REPLACE

2005-11-16 Thread Simon Riggs

On Wed, 2005-11-16 at 18:34 +0100, Martijn van Oosterhout wrote:
 On Wed, Nov 16, 2005 at 11:37:46AM -0500, Bruce Momjian wrote:
  
  Interesting approach.  Actually, we could tell the user they have to use
  BEGIN;LOCK tab before doing MERGE, and throw an error if we don't
  already have a table lock.
 
 The bit I'm still missing is why there needs to be a lock at all. The
 SQL standard doesn't say anywhere that concurrent MERGE operations
 can't conflict. It seems to me that standard visibility rules apply. If
 neither MERGE statement can see the results of the other, then they
 will both INSERT. If you don't have a UNIQUE constraint to prevent this
 then what's the problem?
 
 It seems to me people would like, in the case of an existing UNIQUE
 constraint, to be able to use it to prevent duplicate key errors.
 This is nice, but the standard doesn't require that either. 
 
 In other words, if we can use an index to avoid duplicate key errors,
 fine. But if there is no index available, it is not an error to do an
 INSERT because another INSERT was hidden from you.
 
 Conceptually, a MERGE statement is just a long string of INSERTs and
 UPDATEs in the same transaction and I think we should treat it as
 such.

Agreed.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] MERGE vs REPLACE

2005-11-16 Thread Martijn van Oosterhout

On Wed, Nov 16, 2005 at 12:59:36PM -0500, Bruce Momjian wrote:
 Martijn van Oosterhout wrote:
  The bit I'm still missing is why there needs to be a lock at all. The
  SQL standard doesn't say anywhere that concurrent MERGE operations
  can't conflict. It seems to me that standard visibility rules apply. If
  neither MERGE statement can see the results of the other, then they
  will both INSERT. If you don't have a UNIQUE constraint to prevent this
  then what's the problem?
 
 I assume they want MERGE because they don't want duplicates.  If they
 don't care, they would have used INSERT.

The whole point of MERGE was because you can do different things on
INSERT or UPDATE. If you didn't care about the UPDATE case you could
indeed do just inserts...

I just don't think we should tie ourselves in knots over details that
the spec doesn't require. If the spec doesn't require predicate
locking then why on earth should we provide it if it's so difficult?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpN5btUoK1IV.pgp
Description: PGP signature

Re: [HACKERS] MERGE vs REPLACE

2005-11-16 Thread Dann Corbit

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Simon Riggs
 Sent: Wednesday, November 16, 2005 10:35 AM
 To: Martijn van Oosterhout
 Cc: Bruce Momjian; Rick Gigger; Tom Lane; Christopher Kings-Lynne; Jim
C.
 Nasby; josh@agliodbs.com; pgsql-hackers@postgresql.org; Jaime
Casanova;
 Peter Eisentraut
 Subject: Re: [HACKERS] MERGE vs REPLACE

 On Wed, 2005-11-16 at 18:34 +0100, Martijn van Oosterhout wrote:
  On Wed, Nov 16, 2005 at 11:37:46AM -0500, Bruce Momjian wrote:

   Interesting approach.  Actually, we could tell the user they have
to
 use
   BEGIN;LOCK tab before doing MERGE, and throw an error if we don't
   already have a table lock.

  The bit I'm still missing is why there needs to be a lock at all.
The
  SQL standard doesn't say anywhere that concurrent MERGE operations
  can't conflict. It seems to me that standard visibility rules apply.
If
  neither MERGE statement can see the results of the other, then they
  will both INSERT. If you don't have a UNIQUE constraint to prevent
this
  then what's the problem?

  It seems to me people would like, in the case of an existing UNIQUE
  constraint, to be able to use it to prevent duplicate key errors.
  This is nice, but the standard doesn't require that either.

  In other words, if we can use an index to avoid duplicate key
errors,
  fine. But if there is no index available, it is not an error to do
an
  INSERT because another INSERT was hidden from you.

  Conceptually, a MERGE statement is just a long string of INSERTs and
  UPDATEs in the same transaction and I think we should treat it as
  such.

Merge could also be considered as a long string of deletes and inserts.
I guess that deleting those records that already exist and then
inserting all of the records is faster because it could be done like a
single join to perform the delete and then a single batch insert.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] MERGE vs REPLACE

2005-11-16 Thread Martijn van Oosterhout

On Wed, Nov 16, 2005 at 11:06:15AM -0800, Dann Corbit wrote:
  -Original Message-
  From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
  [EMAIL PROTECTED] On Behalf Of Simon Riggs
   Conceptually, a MERGE statement is just a long string of INSERTs and
   UPDATEs in the same transaction and I think we should treat it as
   such.
 
 Merge could also be considered as a long string of deletes and inserts.
 I guess that deleting those records that already exist and then
 inserting all of the records is faster because it could be done like a
 single join to perform the delete and then a single batch insert.

And for us it makes no difference because in MVCC, UPDATE == DELETE +
INSERT. IMHO it's just a nice construct to specify UPDATEs and INSERTs
in the same statement.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpsVlES6C7e7.pgp
Description: PGP signature

Re: [HACKERS] MERGE vs REPLACE

2005-11-16 Thread Rick Gigger




Conceptually, a MERGE statement is just a long string of INSERTs and
UPDATEs in the same transaction and I think we should treat it as
such.


I've just got one question about this.  Everyone seems to be saying  
that try to insert and if that fails update is the same as try to  
insert and if that fails delete and then insert.


What about the following scenario:

mytable

id  serial primary key,
a  int4,
b  int4,
data text

I've got an id field on every table because it simplifies a lot of  
things (such as slony configuration for example)
But I've also got a unique key on (a, b) and if I was to do a merge I  
would most likely do it in (a, b) not id.


If merge does a delete insert then it creates new values for the id  
columns which could cause me problems.  Basically any default fields  
are going to change or for that matter any fields not specified would  
be reinitialized whereas an update would leave them in place.


It seems to me that try to update and if that fails insert seems to  
be the best approach for not messing with existing data.  I guess   
try to insert and if that fails update gets you the same effect.


- Rick Gigger

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] MERGE vs REPLACE

2005-11-16 Thread daveg

On Wed, Nov 16, 2005 at 09:49:28AM -0500, Tom Lane wrote:
 I think we should do REPLACE-like functionality that simply fails if the
 match condition isn't equality on a primary key.  If we can use SQL-spec
 MERGE syntax for this, that's fine, but let's not think in terms of
 silently changing to a stronger table lock and a much slower
 implementation when the condition isn't a primary key.  That's a whole

I agree, but would like to relax the primary key requirement to simply
a unique index. I can see use cases for unique so long as not null keys,
so it would be nice if the MERGE operation would work for these. As nulls
are not equal anyway this doesn't seem to do too much violence to the
semantics.

-dg

-- 
David Gould  [EMAIL PROTECTED]
If simplicity worked, the world would be overrun with insects.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] MERGE vs REPLACE

2005-11-16 Thread Tom Lane

daveg [EMAIL PROTECTED] writes:
 I agree, but would like to relax the primary key requirement to simply
 a unique index. I can see use cases for unique so long as not null keys,
 so it would be nice if the MERGE operation would work for these. As nulls
 are not equal anyway this doesn't seem to do too much violence to the
 semantics.

But a unique key doesn't guarantee that there's only one matching row,
so ISTM you're right back to needing a predicate lock if you do that.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] MERGE vs REPLACE

2005-11-16 Thread Martijn van Oosterhout

On Wed, Nov 16, 2005 at 04:51:07PM -0500, Tom Lane wrote:
 daveg [EMAIL PROTECTED] writes:
  I agree, but would like to relax the primary key requirement to simply
  a unique index. I can see use cases for unique so long as not null keys,
  so it would be nice if the MERGE operation would work for these. As nulls
  are not equal anyway this doesn't seem to do too much violence to the
  semantics.
 
 But a unique key doesn't guarantee that there's only one matching row,
 so ISTM you're right back to needing a predicate lock if you do that.

But there is no need to guarentee anything. As the spec says, if the
join of the table with the other clauses matches a row in the table
more than once, raise a cardinality exception. If someone creates a
join that matches more than once the whole statement fails. But you can
work that out at runtime. If the user specifies NOT NULL in the join
condition then it can work and there no reason to forbid that.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpoZSWYuKply.pgp
Description: PGP signature

Re: [HACKERS] MERGE vs REPLACE

2005-11-15 Thread mark

On Tue, Nov 15, 2005 at 10:27:10AM -0800, Josh Berkus wrote:
 Yes, I guess my hesitation on the full-table-lock strategy is that it 
 doesn't really fulfill the mandate for why people want REPLACE-like 
 statements ... to give them an INSERT-or-UPDATE with *higher* efficiency 
 and concurrency than doing two statements.  That being said, I've 
 personally designed more than a dozen web applications and have not yet 
 been faced with a single circumstance of not knowing whether I wanted to 
 INSERT or UPDATE.  I've even ported MySQL apps and found it easy to 
 re-code them to do if $id = 0, then insert ... without even needing to 
 use a pl/pgsql hack.

 So we thus have two seperate use cases.  The first, for bulk loading/ETL is 
 what MERGE fulfills rather neatly and for that full table locking is 
 perfectly OK, even desirable.  You really don't want to MERGE-load the 
 same table on two threads at once.  

Yes.

 The second case is for applications coded for MySQL; this is the REPLACE 
 case.  However, the most common MySQL applications doing this use full 
 table locking (MyISAM) anyway!  So, while full table locking wouldn't gain 
 them any performance over using two statements, it shouldn't lose them 
 anything they're used to having.

The last two times I wished to use it:

Objects with dynamic fields. The table rows are identified by the
object key, a field key, that refer to a text value. I believe I
still have a race in this case, as I allow INSERT, UPDATE, and DELETE.
The DELETE means that an UPDATE, followed by an INSERT, may fail,
if the DELETE occurs between the two. DELETE is fine as is, as it
is a single operation, that won't fail. Combining UPDATE and INSERT
into one, as per the MySQL REPLACE, would eliminate my race caused
by the DELETE. If REPLACE locked the whole table, it would not be a
candidate, as this table is frequently modified by many users at
once. REPLACE using the primary key index to do locking would be
great.

Timestamping the last signin and access time (~30 seconds error
allowed in the access time to eliminate a majority of the updates
using memcached as we all know how much PostgreSQL hates this) for a
table that doesn't have a 1:1 mapping between the fields associated
with the user, and the row that records their last signin/access time.
The signin/access time can be associated with a context, that is a
part of the primary key. Currently, I rely on the code never deleting
rows, and perform update first, and if the update does not change any
rows, I fall back to insert.

As it is, though, I ended up combining the signin and access time
into the same row, to use one table instead of two, and I'm not sure
that MySQL replace really gives me what I want in this situation. I
have never used MERGE, so can't say whether this would do what I want.
I would not want the solution to lock the entire table, as these updates,
although throttled by the the ~30 seconds error allowed, and memcached
queries, would be potentially performed by hundreds of users every
30 seconds. I can extend the error factor to 60 seconds, but that only
cuts the queries in half. In general, however, the current model does
work fine, and doesn't require replacement. UPDATE will be used most
of the time, and be the correct operation.

Unfortunately, there is a race that I see here. If both UPDATE
operations fail at the same time, then one of the INSERTS will fail.
I don't handle this currently, but perhaps I could argue myself into
considering this an ignorable failure.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] MERGE vs REPLACE

2005-11-15 Thread Stephen Frost

* Tom Lane ([EMAIL PROTECTED]) wrote:
 Normally I'd plump for following the standard ... but AFAIR, we have had
 bucketloads of requests for REPLACE functionality, and not one request
 for spec-compatible MERGE.  If, as it appears, full-spec MERGE is also a
 whole lot harder and slower than REPLACE, it seems that we could do
 worse than to concentrate on doing REPLACE for now.  (We can always come
 back to MERGE some other day.)

Not to be too much of a pain, but I asked for full-spec MERGE a while
back... :)  I don't think I was the only one asking for full-spec MERGE
in the What features would you like to see in Postgres? thread a while
ago either, though I could be wrong.

I'd like to see MySQL-like 'replace' too, of course. :)

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] MERGE vs REPLACE

2005-11-15 Thread Jaime Casanova

On 11/15/05, Josh Berkus josh@agliodbs.com wrote:
 Simon,

  The UPSERT concept is also supported by Teradata, who simply append an
  ELSE INSERT clause onto the standard UPDATE syntax. MySQL REPLACE seems
  to me to be a fairly small subset of MERGE functionality and we ought to
  be able to offer that functionality as a side branch of the main work.

 Yes, I guess my hesitation on the full-table-lock strategy is that it
 doesn't really fulfill the mandate for why people want REPLACE-like
 statements ... to give them an INSERT-or-UPDATE with *higher* efficiency
 and concurrency than doing two statements.  That being said, I've
 personally designed more than a dozen web applications and have not yet
 been faced with a single circumstance of not knowing whether I wanted to
 INSERT or UPDATE.  I've even ported MySQL apps and found it easy to
 re-code them to do if $id = 0, then insert ... without even needing to
 use a pl/pgsql hack.


Actually REPLACE is not INSERT or UPDATE...
REPLACE means INSERT if already exists DELETE then INSERT

can be used as an UPDATE if you use the SET clause but, it is optional


--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] MERGE vs REPLACE

2005-11-15 Thread Simon Riggs

On Tue, 2005-11-15 at 10:27 -0800, Josh Berkus wrote:

  The UPSERT concept is also supported by Teradata, who simply append an
  ELSE INSERT clause onto the standard UPDATE syntax. MySQL REPLACE seems
  to me to be a fairly small subset of MERGE functionality and we ought to
  be able to offer that functionality as a side branch of the main work.
 
 Yes, I guess my hesitation on the full-table-lock strategy is that it 
 doesn't really fulfill the mandate for why people want REPLACE-like 
 statements ... to give them an INSERT-or-UPDATE with *higher* efficiency 
 and concurrency than doing two statements.  

Agreed...

OK, what I said was ...Where there is doubt, we should fall back to
table locking just like the rest of the world, IMHO. I didn't mean we
should use full table locking all of the time. Sorry if I wasn't clear.

What I meant, in context was, IMHO
- we do no need predicate locking
- we should use row level locks when these can be used
- we should use table level locks other times

On Fri, 2005-11-11 at 14:40 -0800, Josh Berkus wrote: 
 Scenario:
 
 session1: REPLACE  1   
   session2:  REPLACE . 1
 session1: check to see that 1 exists  no
   session2: check to see that 1 exists  no
 session1: INSERT 1
   session2: INSERT 1  ERROR
 
 Get the picture?  The only way to avoid a race condition is to be able to 
 do predicate locking, that is to lock the table against any data write 
 matching that predicate.

The above situation happens now if you have two people doing SELECT then
INSERT on the same row. Or even if you do UPDATE then INSERT.

Currently if two people INSERT a row with a duplicate PK, we may find
that both INSERTs made it into a data block, but one has its txn aborted
because of a unique index violation.

Nobody complains about that, nor says we should have predicate locking
for that case, so why worry about it for MERGE? I don't see we need a
special effort to avoid: if you try to do the same thing more than once,
only the first one will work. At least you get a nice error message to
let you know its happened. 

Seems like we
- run the query in the USING clause
- join it to the target table using pseudo outer join logic
- if it matches we UPDATE
- if it doesn't we INSERT
...but we do the last two in a single step, to minimise the window of
opportunity for wierd situations.

 That being said, I've 
 personally designed more than a dozen web applications and have not yet 
 been faced with a single circumstance of not knowing whether I wanted to 
 INSERT or UPDATE.  I've even ported MySQL apps and found it easy to 
 re-code them to do if $id = 0, then insert ... without even needing to 
 use a pl/pgsql hack.

OK, but not everybody knows what they're doing as well as you do. :-)

 So we thus have two seperate use cases.  The first, for bulk loading/ETL is 
 what MERGE fulfills rather neatly and for that full table locking is 
 perfectly OK, even desirable.  You really don't want to MERGE-load the 
 same table on two threads at once.  

Probably true, but I do want to avoid full table locking for MERGE
whenever possible. Concurrency is important even in data warehousing.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] MERGE vs REPLACE

2005-11-15 Thread Bruce Momjian

Josh Berkus wrote:
 Jaime,
 
  why? seems that REPLACE only work if there are at least one row
  matching...
 
 Scenario:
 
 session1: REPLACE  1   
   session2:  REPLACE . 1
 session1: check to see that 1 exists  no
   session2: check to see that 1 exists  no
 session1: INSERT 1
   session2: INSERT 1  ERROR
 
 Get the picture?  The only way to avoid a race condition is to be able to 
 do predicate locking, that is to lock the table against any data write 
 matching that predicate.

Isn't the standard practice to do the INSERT, and if that fails, do
the UPDATE?  In fact, if someone is already doing a INSERT inside a
transaction, it will wait for the transaction to complete so you can
then do the UPDATE, or complete the INSERT if the transaction aborted.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] MERGE vs REPLACE

2005-11-15 Thread Bruce Momjian

Tom Lane wrote:
 Josh Berkus josh@agliodbs.com writes:
  But even REPLACE requires predicate locking.  There's no real way to get 
  around it.
 
 The point though is that REPLACE is restricted to a type of predicate
 narrow enough to be enforced through a unique-index mechanism, and so
 it's implementable without solving the general case of predicate
 locking.
 
 Predicate locking for narrow cases isn't very hard; it's the general
 case of arbitrary predicates that's hard.

My feeling is we should implement MERGE for the limited cases we can,
and throw an error for cases we can not (or require table locking), and
then see what reports we get from users.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] MERGE vs REPLACE

2005-11-15 Thread Bruce Momjian

Simon Riggs wrote:
 I've been looking at ways of doing INSERT then UPDATE, but it doesn't
 seem very easy to avoid unique index violations in that case. So doing
 the UPDATE first then INSERTs later seems like the way to go.

INSERT has to be first to avoid a race condition (see my previous
emails).  We just need a way to get a unique index violation and
continue with the UPDATE.  We have savepoints now so it certainly seems
possible.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] MERGE vs REPLACE

2005-11-15 Thread Bruce Momjian

Josh Berkus wrote:
 So we thus have two seperate use cases.  The first, for bulk loading/ETL is 
 what MERGE fulfills rather neatly and for that full table locking is 
 perfectly OK, even desirable.  You really don't want to MERGE-load the 
 same table on two threads at once.  
 
 The second case is for applications coded for MySQL; this is the REPLACE 
 case.  However, the most common MySQL applications doing this use full 
 table locking (MyISAM) anyway!  So, while full table locking wouldn't gain 
 them any performance over using two statements, it shouldn't lose them 
 anything they're used to having.

For any kind of efficiency, I assume MySQL REPLACE wants a unique index
in place, so practially everyone doing merge probably already has the
setup we need to avoid new non-index predicate locking code.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] MERGE vs REPLACE

2005-11-15 Thread Bruce Momjian


Just to summarize, MySQL REPLACE is INSERT or DELETE/INSERT, while they
have a SET clauses that allows UPDATE, and INSERT has a ON DUPLICATE KEY
UPDATE clause too.

I think the INSERT ...  ON DUPLICATE KEY is undesirable because this
functionality should have a new keyword in the first position, e.g
MERGE, and I think the REPLACE is out because most people feel that the
DELETE/INSERT functionality is near-useless if we can give users the
INSERT/UPDATE functionality of MERGE.

I think even if we have to restrict MERGE to requiring a unique index,
it is better to go that way than to drag REPLACE into our syntax.  MERGE
can be extended over time, while REPLACE has a non-optimal initial
behavior.  The idea that MERGE can use a constant list (not requiring a
second table) makes it a valid replacement for REPLACE, and other
database support for MERGE reinforces this.

---

[EMAIL PROTECTED] wrote:
 I think you translated it correctly, MySQL has another way of specifying this
 which is INSERT ... ON DUPLICATE KEY UPDATE ...
 (http://dev.mysql.com/doc/refman/5.0/en/insert.html)
 Regards
 
 Paolo
 
 Jochem van Dieten [EMAIL PROTECTED] ha scritto
 
  On 11/13/05, Petr Jelinek wrote:
  
   I am really not db expert and I don't have copy of sql standard but you
   don't need to use 2 tables I think - USING part can also be subquery
   (some SELECT) and if I am right then you could simulate what REPLACE
   does because in PostgreSQL you are not forced to specify FROM clause in
   SELECT. So you could in theory do
   MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid)
 ...
 
  I think the MySQL statement:
  REPLACE INTO table (pk, col1, col2, col3) VALUES (2, '-00-00', NULL,
 3)
 
  would translate into the following MERGE statement:
  MERGE INTO table target
  USING (2 as pknew , NULL as col1new, NULL as col2new, 3 as col3new) source
  ON target.pknew = source.pk
  WHEN MATCHED THEN UPDATE SET col1 = col1new, col2 = col2new, col3 =
 col3new
  WHEN NOT MATCHED THEN INSERT (pk, col1, col2, col3) VALUES (pknew,
  col1new, col2new, col3new)
 
  It might not be the most elegant solution, but I don't see why it won't
 work.
 
  Jochem
 
  ---(end of broadcast)---
  TIP 4: Have you searched our list archives?
 
 http://archives.postgresql.org
  
 
 
 
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] MERGE vs REPLACE

2005-11-15 Thread Jim C. Nasby

On Tue, Nov 15, 2005 at 07:16:21PM -0500, Bruce Momjian wrote:
 Tom Lane wrote:
  Josh Berkus josh@agliodbs.com writes:
   But even REPLACE requires predicate locking.  There's no real way to get 
   around it.
  
  The point though is that REPLACE is restricted to a type of predicate
  narrow enough to be enforced through a unique-index mechanism, and so
  it's implementable without solving the general case of predicate
  locking.
  
  Predicate locking for narrow cases isn't very hard; it's the general
  case of arbitrary predicates that's hard.
 
 My feeling is we should implement MERGE for the limited cases we can,
 and throw an error for cases we can not (or require table locking), and
 then see what reports we get from users.

We should probably throw a notice or warning if we go to a table lock,
too.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] MERGE vs REPLACE

2005-11-15 Thread Christopher Kings-Lynne


We should probably throw a notice or warning if we go to a table lock,
too.


That's not very useful, because you can only do somethign about it AFTER 
the 1 hour exclusive lock merge has already run :)


Chris


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] MERGE vs REPLACE

2005-11-15 Thread Jim C. Nasby

On Wed, Nov 16, 2005 at 01:27:29PM +0800, Christopher Kings-Lynne wrote:
 We should probably throw a notice or warning if we go to a table lock,
 too.
 
 That's not very useful, because you can only do somethign about it AFTER 
 the 1 hour exclusive lock merge has already run :)

Not true; it would be useful for development when you'd like to know
that some statement is grabbing a table lock. This is something that you
wouldn't normally notice in a dev environment, and it sounds like it'd
be easy to do a merge that has the unintended effect of grabbing a table
lock.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] MERGE vs REPLACE

2005-11-14 Thread Jochem van Dieten

On 11/13/05, Petr Jelinek wrote:

 I am really not db expert and I don't have copy of sql standard but you
 don't need to use 2 tables I think - USING part can also be subquery
 (some SELECT) and if I am right then you could simulate what REPLACE
 does because in PostgreSQL you are not forced to specify FROM clause in
 SELECT. So you could in theory do
 MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) ...

I think the MySQL statement:
REPLACE INTO table (pk, col1, col2, col3) VALUES (2, '-00-00', NULL, 3)

would translate into the following MERGE statement:
MERGE INTO table target
USING (2 as pknew , NULL as col1new, NULL as col2new, 3 as col3new) source
ON target.pknew = source.pk
WHEN MATCHED THEN UPDATE SET col1 = col1new, col2 = col2new, col3 = col3new
WHEN NOT MATCHED THEN INSERT (pk, col1, col2, col3) VALUES (pknew,
col1new, col2new, col3new)

It might not be the most elegant solution, but I don't see why it won't work.

Jochem

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] MERGE vs REPLACE

2005-11-14 Thread pmagnoli

I think you translated it correctly, MySQL has another way of specifying this
which is INSERT ... ON DUPLICATE KEY UPDATE ...
(http://dev.mysql.com/doc/refman/5.0/en/insert.html)
Regards

Paolo

Jochem van Dieten [EMAIL PROTECTED] ha scritto

 On 11/13/05, Petr Jelinek wrote:
 
  I am really not db expert and I don't have copy of sql standard but you
  don't need to use 2 tables I think - USING part can also be subquery
  (some SELECT) and if I am right then you could simulate what REPLACE
  does because in PostgreSQL you are not forced to specify FROM clause in
  SELECT. So you could in theory do
  MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid)
...

 I think the MySQL statement:
 REPLACE INTO table (pk, col1, col2, col3) VALUES (2, '-00-00', NULL,
3)

 would translate into the following MERGE statement:
 MERGE INTO table target
 USING (2 as pknew , NULL as col1new, NULL as col2new, 3 as col3new) source
 ON target.pknew = source.pk
 WHEN MATCHED THEN UPDATE SET col1 = col1new, col2 = col2new, col3 =
col3new
 WHEN NOT MATCHED THEN INSERT (pk, col1, col2, col3) VALUES (pknew,
 col1new, col2new, col3new)

 It might not be the most elegant solution, but I don't see why it won't
work.

 Jochem

 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?

http://archives.postgresql.org
 




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] MERGE vs REPLACE

2005-11-14 Thread Jim C. Nasby

See 'merge_db' in http://lnk.nu/postgresql.org/5sl.html

On Fri, Nov 11, 2005 at 10:07:07PM -0500, Rod Taylor wrote:
 On Fri, 2005-11-11 at 18:36 -0500, [EMAIL PROTECTED] wrote:
  On Fri, Nov 11, 2005 at 06:00:32PM -0500, Rod Taylor wrote:
   So? That is what save points are for.  You can even skip the select for
   update if you don't mind dead tuples from the attempted insert.
   SELECT ... FOR UPDATE;
   IF not exists THEN
 SAVEPOINT;
 INSERT ;
 IF UNIQUE VIOLATION THEN
 /* Someone else inserted between the SELECT and our INSERT */
 ROLLBACK TO SAVEPOINT;
 UPDATE;
 ELSE
 RELEASE SAVEPOINT;
 FI
   ELSE
 UPDATE;
   FI
  
  Isn't there still a race between INSERT and UPDATE?
 
 I suppose there is although I hadn't noticed before. I've never run into
 it and always check to ensure the expected number of tuples were touched
 by the update or delete.
 
 Within the PostgreSQL backend you might get away with having your insert
 hold a lock on the index page and follow it up with a FOR UPDATE lock on
 the offending tuple thus ensuring that your update will succeed. If you
 hack index mechanisms for the support you don't need the SAVEPOINT
 either -- just don't throw an error when you run across the existing
 entry.
 
 For client side code one possibility is to repeat until successful.
 
 WHILE
   SELECT FOR UPDATE;
   IF NOT EXISTS THEN
   SAVEPOINT
   INSERT;
   IF UNIQUE VIOLATION THEN
   ROLLBACK TO SAVEPOINT;
   ELSE
   RELEASE SAVEPOINT
   EXIT;
   FI
   ELSE
   UPDATE;
   EXIT;
   END
 
   -- Check for infinite loop
 END
 
 -- 
 
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly
 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] MERGE vs REPLACE

2005-11-14 Thread Jim C. Nasby

On Fri, Nov 11, 2005 at 03:42:38PM -0500, Tom Lane wrote:
 Peter Eisentraut [EMAIL PROTECTED] writes:
  It seems to me that it has always been implicitly assumed around here 
  that the MERGE command would be a substitute for a MySQL-like REPLACE 
  functionality.  After rereading the spec it seems that this is not the 
  case.  MERGE always operates on two different tables, which REPLACE 
  doesn't do.
 
 Normally I'd plump for following the standard ... but AFAIR, we have had
 bucketloads of requests for REPLACE functionality, and not one request
 for spec-compatible MERGE.  If, as it appears, full-spec MERGE is also a
 whole lot harder and slower than REPLACE, it seems that we could do
 worse than to concentrate on doing REPLACE for now.  (We can always come
 back to MERGE some other day.)

I suspect a lot of those requests are from people who actually want
merge and don't realize that mysql has a replace.

On another note, is there any reason we can't put an equivalent to
example 36-1 (http://lnk.nu/postgresql.org/617.html) into the backend?
Presumably it wouldn't be as fast as a more elegant solution, but OTOH
it'd probably be faster than plpgsql...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] MERGE vs REPLACE

2005-11-14 Thread Simon Riggs

On Sun, 2005-11-13 at 23:56 +0100, Martijn van Oosterhout wrote:
 On Sun, Nov 13, 2005 at 11:32:47PM +0100, Petr Jelinek wrote:
  I am really not db expert and I don't have copy of sql standard but you 
  don't need to use 2 tables I think - USING part can also be subquery 
  (some SELECT) and if I am right then you could simulate what REPLACE 
  does because in PostgreSQL you are not forced to specify FROM clause in 
  SELECT. So you could in theory do
  MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) ...
  But I am not sure if this is what you want.
 
 Well, the obvious extension to this is that the extire USING clause is
 in fact optional:
 
 MERGE INTO tablename ON id = 1 ...
 
 Which starts looking a lot simpler.
 

...choosing a place to jump in is a little hard in this thread...so I'll
jump in with some general info and thoughts on topics so far:

MERGE is useful both for OLTP systems and for Data Warehousing, where it
is sometimes known as the UPSERT. The MERGE statement in SQL:2003
requires a target table and a table statement. I don't see anything in
that to always require two separate tables - this is just the same as a
self-referencing INSERT SELECT statement. The USING clause is also a
compulsory part of SQL:2003.

One of the more interesting ways to use MERGE is with Oracle external
tables. The same idea for us would be to have MERGE become a variant of
the PostgreSQL COPY FROM command. That would be very cool.

The above is the reason why MERGE doesn't seem to provide for external
data being passed, as does INSERT or MySQL REPLACE.

Neither DB2 or Oracle perform predicate locking. DB2 is more cautious,
and some would say more efficient, thats all. PostgreSQL's locking
features are just fine for pragmatic implementation of MERGE, AFAICS.
Where there is doubt, we should fall back to table locking just like the
rest of the world, IMHO. Making this work with partitioning will be hard
enough without overusing the predicate solving logic.

The UPSERT concept is also supported by Teradata, who simply append an
ELSE INSERT clause onto the standard UPDATE syntax. MySQL REPLACE seems
to me to be a fairly small subset of MERGE functionality and we ought to
be able to offer that functionality as a side branch of the main work.

I've been looking at ways of doing INSERT then UPDATE, but it doesn't
seem very easy to avoid unique index violations in that case. So doing
the UPDATE first then INSERTs later seems like the way to go.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] MERGE vs REPLACE

2005-11-13 Thread Robert Treat

On Saturday 12 November 2005 04:06, Matteo Beccati wrote:
 Tom Lane wrote:
  Peter Eisentraut [EMAIL PROTECTED] writes:
  It seems to me that it has always been implicitly assumed around here
  that the MERGE command would be a substitute for a MySQL-like REPLACE
  functionality.  After rereading the spec it seems that this is not the
  case.  MERGE always operates on two different tables, which REPLACE
  doesn't do.
 
  Normally I'd plump for following the standard ... but AFAIR, we have had
  bucketloads of requests for REPLACE functionality, and not one request
  for spec-compatible MERGE.  If, as it appears, full-spec MERGE is also a
  whole lot harder and slower than REPLACE, it seems that we could do
  worse than to concentrate on doing REPLACE for now.  (We can always come
  back to MERGE some other day.)

 I would also like to add that MySQL's REPLACE is not exactly an INSERT
 OR UPDATE, rather and INSERT OR (DELETE then INSERT): I mean that the
 fields not specified in the query are set to their defaults:

 i.e.

 CREATE TABLE t (a int PRIMARY KEY, b int, c int);

 INSERT INTO t (a, b, c) VALUES (1, 1, 2);

 SELECT * FROM t;
 +---+--+--+

 | a | b| c|

 +---+--+--+

 | 1 |1 |2 |

 +---+--+--+

 REPLACE INTO t (a, b) VALUES (1, 1);

 SELECT * FROM t;
 +---+--+--+

 | a | b| c|

 +---+--+--+

 | 1 |1 | NULL |

 +---+--+--+


 I wanted to point it out this because people are commonly mistaking this.



Wow, that seems ugly maybe there's a reason for it, but I'm not sure we 
could deviate from my$ql's behavior on this even if we wanted... they are the 
standard here.

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] MERGE vs REPLACE

2005-11-13 Thread Gregory Maxwell

On 11/13/05, Robert Treat [EMAIL PROTECTED] wrote:
 On Saturday 12 November 2005 04:06, Matteo Beccati wrote:
  | 1 |1 | NULL |
 Wow, that seems ugly maybe there's a reason for it, but I'm not sure we
 could deviate from my$ql's behavior on this even if we wanted... they are the
 standard here.

I don't think that's ugly, I think that's exactly working as
advertised. Replace behaves exactly like deleting the record with the
matching primary key and inserting the provided input. ... not merging
together old data with new.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] MERGE vs REPLACE

2005-11-13 Thread Robert Treat

On Sunday 13 November 2005 10:01, Gregory Maxwell wrote:
 On 11/13/05, Robert Treat [EMAIL PROTECTED] wrote:
  On Saturday 12 November 2005 04:06, Matteo Beccati wrote:
   | 1 |1 | NULL |
 
  Wow, that seems ugly maybe there's a reason for it, but I'm not sure
  we could deviate from my$ql's behavior on this even if we wanted... they
  are the standard here.

 I don't think that's ugly, I think that's exactly working as
 advertised. Replace behaves exactly like deleting the record with the
 matching primary key and inserting the provided input. ... not merging
 together old data with new.

I disagree in that REPLACE is advertised as a solution for the INSERT else 
UPDATE problem, but has a different behavior than a true INSERT else UPDATE 
would produce.   Maybe that's a problem with the implementation, or maybe 
it's a problem in the advertisment, but there is certainly a discrepency 
there. 

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] MERGE vs REPLACE

2005-11-13 Thread Tom Lane

Robert Treat [EMAIL PROTECTED] writes:
 I disagree in that REPLACE is advertised as a solution for the INSERT else 
 UPDATE problem, but has a different behavior than a true INSERT else UPDATE 
 would produce.   Maybe that's a problem with the implementation, or maybe 
 it's a problem in the advertisment, but there is certainly a discrepency 
 there. 

Yeah.  REPLACE fails to solve common examples like a web hit counter
(if key doesn't exist, insert row with count 1; if it does exist,
add 1 to the current count).

IIRC, SQL's MERGE deals with this by offering two quite separate
specifications of what to do when there is or isn't already a matching
row.

I don't necessarily feel that we have to slavishly duplicate what MySQL
offers.  I do think that it's reasonable to restrict the functionality
to updating/replacing a row with matching primary key --- that gets us
out of the problem of needing a full predicate-locking mechanism, while
still covering most all of the practical use-cases that I can see.

It'd be useful to look at what comparable functionality is offered by
other DBs besides MySQL.  Anyone know what DB2 or Oracle have in this
area?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] MERGE vs REPLACE

2005-11-13 Thread Petr Jelinek


Tom Lane wrote:


It'd be useful to look at what comparable functionality is offered by
other DBs besides MySQL.  Anyone know what DB2 or Oracle have in this
area?



IIRC they both have MERGE.

--
Regards
Petr Jelinek (PJMODOS)


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] MERGE vs REPLACE

2005-11-13 Thread Joshua D. Drake


Petr Jelinek wrote:


Tom Lane wrote:



It'd be useful to look at what comparable functionality is offered by
other DBs besides MySQL.  Anyone know what DB2 or Oracle have in this
area?



Oracle:
http://www.psoug.org/reference/merge.html
http://www.psoug.org/reference/translate_replace.html

But the oracle replace seems completely different to the topic at hand.

DB2:
  Merge:
 
http://www.databasejournal.com/features/db2/article.php/10896_3322041_2


  I was not able to easily find information on REPLACE.

Joshua D. Drake




--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] MERGE vs REPLACE

2005-11-13 Thread Peter Eisentraut

Tom Lane wrote:
 IIRC, SQL's MERGE deals with this by offering two quite separate
 specifications of what to do when there is or isn't already a
 matching row.

In that regard, MERGE is quite flexible, but MERGE doesn't address the 
point of REPLACE, because MERGE requires *two* tables as input, whereas 
REPLACE only takes *one*.  Unless someone can show that you can trick 
MERGE into doing the REPLACE job anyway, we're not discussing the same 
thing.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] MERGE vs REPLACE

2005-11-13 Thread Petr Jelinek


Peter Eisentraut wrote:


In that regard, MERGE is quite flexible, but MERGE doesn't address the 
point of REPLACE, because MERGE requires *two* tables as input, whereas 
REPLACE only takes *one*.  Unless someone can show that you can trick 
MERGE into doing the REPLACE job anyway, we're not discussing the same 
thing.




I am really not db expert and I don't have copy of sql standard but you 
don't need to use 2 tables I think - USING part can also be subquery 
(some SELECT) and if I am right then you could simulate what REPLACE 
does because in PostgreSQL you are not forced to specify FROM clause in 
SELECT. So you could in theory do

MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) ...
But I am not sure if this is what you want.

--
Regards
Petr Jelinek (PJMODOS)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] MERGE vs REPLACE

2005-11-13 Thread Martijn van Oosterhout

On Sun, Nov 13, 2005 at 11:32:47PM +0100, Petr Jelinek wrote:
 I am really not db expert and I don't have copy of sql standard but you 
 don't need to use 2 tables I think - USING part can also be subquery 
 (some SELECT) and if I am right then you could simulate what REPLACE 
 does because in PostgreSQL you are not forced to specify FROM clause in 
 SELECT. So you could in theory do
 MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) ...
 But I am not sure if this is what you want.

Well, the obvious extension to this is that the extire USING clause is
in fact optional:

MERGE INTO tablename ON id = 1 ...

Which starts looking a lot simpler.

BTW, my reading of the MERGE examples given earlier is that there no
notes in there at all about guarenteeing concurrency. None of the
documentation says that using MERGE will avoid duplicate key errors if
someone else does the same thing concurrently. It seems more like a
performence hack to avoid scanning the table twice.

Basically, you could implement this by taking the USING clause, do a
left outer join with the merge table and for the blank rows fill in a
CTID for insert and instead of NULLs the values of the INSERT portion.

Which is kind of a bummer for the people who want to do the insert
zero if not there else add 1 thing a lot and expecting this to solve
the concurrency for them.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpFeF7xQDptG.pgp
Description: PGP signature

Re: [HACKERS] MERGE vs REPLACE

2005-11-12 Thread Matteo Beccati


Tom Lane wrote:

Peter Eisentraut [EMAIL PROTECTED] writes:
It seems to me that it has always been implicitly assumed around here 
that the MERGE command would be a substitute for a MySQL-like REPLACE 
functionality.  After rereading the spec it seems that this is not the 
case.  MERGE always operates on two different tables, which REPLACE 
doesn't do.


Normally I'd plump for following the standard ... but AFAIR, we have had
bucketloads of requests for REPLACE functionality, and not one request
for spec-compatible MERGE.  If, as it appears, full-spec MERGE is also a
whole lot harder and slower than REPLACE, it seems that we could do
worse than to concentrate on doing REPLACE for now.  (We can always come
back to MERGE some other day.)


I would also like to add that MySQL's REPLACE is not exactly an INSERT 
OR UPDATE, rather and INSERT OR (DELETE then INSERT): I mean that the 
fields not specified in the query are set to their defaults:


i.e.

CREATE TABLE t (a int PRIMARY KEY, b int, c int);

INSERT INTO t (a, b, c) VALUES (1, 1, 2);

SELECT * FROM t;
+---+--+--+
| a | b| c|
+---+--+--+
| 1 |1 |2 |
+---+--+--+

REPLACE INTO t (a, b) VALUES (1, 1);

SELECT * FROM t;
+---+--+--+
| a | b| c|
+---+--+--+
| 1 |1 | NULL |
+---+--+--+


I wanted to point it out this because people are commonly mistaking this.


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

[HACKERS] MERGE vs REPLACE

2005-11-11 Thread Peter Eisentraut

It seems to me that it has always been implicitly assumed around here 
that the MERGE command would be a substitute for a MySQL-like REPLACE 
functionality.  After rereading the spec it seems that this is not the 
case.  MERGE always operates on two different tables, which REPLACE 
doesn't do.

That said, what kind of support for insert-or-update-this-row do we want 
to provide, if any?  Should it be a REPLACE command, an extension of 
the INSERT command, a modication of the MERGE syntax, or something 
else?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] MERGE vs REPLACE

2005-11-11 Thread Petr Jelinek


Peter Eisentraut wrote:
It seems to me that it has always been implicitly assumed around here 
that the MERGE command would be a substitute for a MySQL-like REPLACE 
functionality.  After rereading the spec it seems that this is not the 
case.  MERGE always operates on two different tables, which REPLACE 
doesn't do.


That said, what kind of support for insert-or-update-this-row do we want 
to provide, if any?  Should it be a REPLACE command, an extension of 
the INSERT command, a modication of the MERGE syntax, or something 
else?




MERGE of course, it's standard, REPLACE is mysql extension


--
Regards
Petr Jelinek (PJMODOS)
www.parba.cz

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] MERGE vs REPLACE

2005-11-11 Thread Jaime Casanova

On 11/11/05, Peter Eisentraut [EMAIL PROTECTED] wrote:
 It seems to me that it has always been implicitly assumed around here
 that the MERGE command would be a substitute for a MySQL-like REPLACE
 functionality.  After rereading the spec it seems that this is not the
 case.  MERGE always operates on two different tables, which REPLACE
 doesn't do.

 That said, what kind of support for insert-or-update-this-row do we want
 to provide, if any?  Should it be a REPLACE command, an extension of
 the INSERT command, a modication of the MERGE syntax, or something
 else?

 --
 Peter Eisentraut
 http://developer.postgresql.org/~petere/


MERGE seems to me the better option... not just because is standard
but at least i can see some use cases for it...


--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] MERGE vs REPLACE

2005-11-11 Thread Peter Eisentraut

Jaime Casanova wrote:
 MERGE seems to me the better option... not just because is standard
 but at least i can see some use cases for it...

I don't think you understand my message: MERGE does not do what REPLACE 
does.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] MERGE vs REPLACE

2005-11-11 Thread Tom Lane

Peter Eisentraut [EMAIL PROTECTED] writes:
 It seems to me that it has always been implicitly assumed around here 
 that the MERGE command would be a substitute for a MySQL-like REPLACE 
 functionality.  After rereading the spec it seems that this is not the 
 case.  MERGE always operates on two different tables, which REPLACE 
 doesn't do.

Normally I'd plump for following the standard ... but AFAIR, we have had
bucketloads of requests for REPLACE functionality, and not one request
for spec-compatible MERGE.  If, as it appears, full-spec MERGE is also a
whole lot harder and slower than REPLACE, it seems that we could do
worse than to concentrate on doing REPLACE for now.  (We can always come
back to MERGE some other day.)

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] MERGE vs REPLACE

2005-11-11 Thread Jaime Casanova

On 11/11/05, Peter Eisentraut [EMAIL PROTECTED] wrote:
 Jaime Casanova wrote:
  MERGE seems to me the better option... not just because is standard
  but at least i can see some use cases for it...

 I don't think you understand my message: MERGE does not do what REPLACE
 does.

 --
 Peter Eisentraut
 http://developer.postgresql.org/~petere/


I understand you well... what i was trying to say is that i prefer
MERGE (standard SQL command) to be done because the functionally it
has (basically a merge of two tables) seems to me to be more usefull
than REPLACE (MySql Command)...

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] MERGE vs REPLACE

2005-11-11 Thread Josh Berkus

Guys,

 I understand you well... what i was trying to say is that i prefer
 MERGE (standard SQL command) to be done because the functionally it
 has (basically a merge of two tables) seems to me to be more usefull
 than REPLACE (MySql Command)...

But even REPLACE requires predicate locking.  There's no real way to get 
around it.

--Josh

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] MERGE vs REPLACE

2005-11-11 Thread Jaime Casanova

On 11/11/05, Josh Berkus josh@agliodbs.com wrote:
 Guys,

  I understand you well... what i was trying to say is that i prefer
  MERGE (standard SQL command) to be done because the functionally it
  has (basically a merge of two tables) seems to me to be more usefull
  than REPLACE (MySql Command)...

 But even REPLACE requires predicate locking.  There's no real way to get
 around it.

 --Josh


why? seems that REPLACE only work if there are at least one row matching...

--
Atentamente,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] MERGE vs REPLACE

2005-11-11 Thread Josh Berkus

Jaime,

 why? seems that REPLACE only work if there are at least one row
 matching...

Scenario:

session1: REPLACE  1   
  session2:  REPLACE . 1
session1: check to see that 1 exists  no
  session2: check to see that 1 exists  no
session1: INSERT 1
  session2: INSERT 1  ERROR

Get the picture?  The only way to avoid a race condition is to be able to 
do predicate locking, that is to lock the table against any data write 
matching that predicate.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] MERGE vs REPLACE

2005-11-11 Thread Rod Taylor

On Fri, 2005-11-11 at 14:40 -0800, Josh Berkus wrote:
 Jaime,
 
  why? seems that REPLACE only work if there are at least one row
  matching...

 Get the picture?  The only way to avoid a race condition is to be able to 
 do predicate locking, that is to lock the table against any data write 
 matching that predicate.

So? That is what save points are for.  You can even skip the select for
update if you don't mind dead tuples from the attempted insert.

SELECT ... FOR UPDATE;
IF not exists THEN
SAVEPOINT;
INSERT ;
IF UNIQUE VIOLATION THEN
/* Someone else inserted between the SELECT and our INSERT */
ROLLBACK TO SAVEPOINT;
UPDATE;
ELSE
RELEASE SAVEPOINT;
FI
ELSE
UPDATE;
FI
-- 


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] MERGE vs REPLACE

2005-11-11 Thread Tom Lane

Josh Berkus josh@agliodbs.com writes:
 But even REPLACE requires predicate locking.  There's no real way to get 
 around it.

The point though is that REPLACE is restricted to a type of predicate
narrow enough to be enforced through a unique-index mechanism, and so
it's implementable without solving the general case of predicate
locking.

Predicate locking for narrow cases isn't very hard; it's the general
case of arbitrary predicates that's hard.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] MERGE vs REPLACE

2005-11-11 Thread Gavin Sherry

On Fri, 11 Nov 2005, Josh Berkus wrote:

 Jaime,

  why? seems that REPLACE only work if there are at least one row
  matching...

 Scenario:

 session1: REPLACE  1
   session2:  REPLACE . 1
 session1: check to see that 1 exists  no
   session2: check to see that 1 exists  no
 session1: INSERT 1
   session2: INSERT 1  ERROR

 Get the picture?  The only way to avoid a race condition is to be able to
 do predicate locking, that is to lock the table against any data write
 matching that predicate.

When it comes to predicate locking, I think we should defer to Peter's
comment at Open DB Con:

http://www.treehou.se/~swm/peter_merge.jpg

Gavin

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] MERGE vs REPLACE

2005-11-11 Thread mark

On Fri, Nov 11, 2005 at 06:00:32PM -0500, Rod Taylor wrote:
 So? That is what save points are for.  You can even skip the select for
 update if you don't mind dead tuples from the attempted insert.
 SELECT ... FOR UPDATE;
 IF not exists THEN
   SAVEPOINT;
   INSERT ;
   IF UNIQUE VIOLATION THEN
   /* Someone else inserted between the SELECT and our INSERT */
   ROLLBACK TO SAVEPOINT;
   UPDATE;
   ELSE
   RELEASE SAVEPOINT;
   FI
 ELSE
   UPDATE;
 FI

Isn't there still a race between INSERT and UPDATE?

Low probability, for sure, as it would have had to not exist, then
exist, then not exist, but still possible.

I'd like a REPLACE that could be safe, or at least cause a COMMIT to
fail, for this reason.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] MERGE vs REPLACE

2005-11-11 Thread Rod Taylor

On Fri, 2005-11-11 at 18:36 -0500, [EMAIL PROTECTED] wrote:
 On Fri, Nov 11, 2005 at 06:00:32PM -0500, Rod Taylor wrote:
  So? That is what save points are for.  You can even skip the select for
  update if you don't mind dead tuples from the attempted insert.
  SELECT ... FOR UPDATE;
  IF not exists THEN
  SAVEPOINT;
  INSERT ;
  IF UNIQUE VIOLATION THEN
  /* Someone else inserted between the SELECT and our INSERT */
  ROLLBACK TO SAVEPOINT;
  UPDATE;
  ELSE
  RELEASE SAVEPOINT;
  FI
  ELSE
  UPDATE;
  FI
 
 Isn't there still a race between INSERT and UPDATE?

I suppose there is although I hadn't noticed before. I've never run into
it and always check to ensure the expected number of tuples were touched
by the update or delete.

Within the PostgreSQL backend you might get away with having your insert
hold a lock on the index page and follow it up with a FOR UPDATE lock on
the offending tuple thus ensuring that your update will succeed. If you
hack index mechanisms for the support you don't need the SAVEPOINT
either -- just don't throw an error when you run across the existing
entry.

For client side code one possibility is to repeat until successful.

WHILE
SELECT FOR UPDATE;
IF NOT EXISTS THEN
SAVEPOINT
INSERT;
IF UNIQUE VIOLATION THEN
ROLLBACK TO SAVEPOINT;
ELSE
RELEASE SAVEPOINT
EXIT;
FI
ELSE
UPDATE;
EXIT;
END

-- Check for infinite loop
END

-- 


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

85 matches

Mail list logo