Re: [HACKERS] MERGE vs REPLACE
On Wed, Nov 23, 2005 at 12:24:24AM +0100, Petr Jelinek wrote: Btw about that keys, oracle gives error on many-to-one or many-to-many relationship between the source and target tables. The standard has something called a cardinality violation if the to-be-merged table doesn't match 1-1 with the rest of the statement. If I had access to an Oracle I'd run two tests on MERGE: 1. Does the joining column have to have an index? For example, make a column that's full of unique values but no unique index. According to my reading of the the standard, this should still work (just slower). 2. Additionally, only the rows involved in the MERGE need to be uniquely referenced, so if you add duplicate values but add a WHERE clause to exclude those, it should also work. My feeling is that requiring an index will limit it's usefulness as a general tool. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgpXi0Kng0cKD.pgp Description: PGP signature
Re: [HACKERS] MERGE vs REPLACE
Martijn, Here is a quick test (Oracle 10.1.0.3/Linux): SQL select banner from v$version; BANNER Oracle Database 10g Enterprise Edition Release 10.1.0.3.0 - Prod PL/SQL Release 10.1.0.3.0 - Production CORE10.1.0.3.0 Production TNS for Linux: Version 10.1.0.3.0 - Production NLSRTL Version 10.1.0.3.0 - Production SQL select * from merge_test_1; ID NAME -- 1 aaa 2 bbb 3 ccc 4 ddd 5 eee 1 xxx 6 rows selected. SQL select * from merge_test_2; ID NAME -- 1 AAA 2 BBB 6 FFF SQL select index_name from user_indexes where table_name like 'merge_test%'; no rows selected SQL merge into merge_test_1 a1 2 using merge_test_2 a2 3 on (a1.id = a2.id) 4 when matched then 5 update set a1.name = a2.name 6 when not matched then 7 insert (id, name) values (a2.id, a2.name); 4 rows merged. SQL select * from merge_test_1; ID NAME -- 1 AAA 2 BBB 3 ccc 4 ddd 5 eee 1 AAA 6 FFF 7 rows selected. Regards, Lubomir Petrov Martijn van Oosterhout wrote: On Wed, Nov 23, 2005 at 12:24:24AM +0100, Petr Jelinek wrote: Btw about that keys, oracle gives error on many-to-one or many-to-many relationship between the source and target tables. The standard has something called a cardinality violation if the to-be-merged table doesn't match 1-1 with the rest of the statement. If I had access to an Oracle I'd run two tests on MERGE: 1. Does the joining column have to have an index? For example, make a column that's full of unique values but no unique index. According to my reading of the the standard, this should still work (just slower). 2. Additionally, only the rows involved in the MERGE need to be uniquely referenced, so if you add duplicate values but add a WHERE clause to exclude those, it should also work. My feeling is that requiring an index will limit it's usefulness as a general tool. Have a nice day, ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MERGE vs REPLACE
On Tue, Nov 22, 2005 at 04:20:12AM +0100, Petr Jelinek wrote: It was already said here that oracle and db2 both use MERGE, dunno about mssql. And yes merge CAN be used to do REPLACE (oracle uses their dummy table for this, we can use the fact that FROM clause isn't required in postgres). Statements about MERGE on the web: http://www.dba-oracle.com/oracle_tips_rittman_merge.htm http://databasejournal.com/features/db2/article.php/3322041 http://certcities.com/editorial/columns/story.asp?EditorialsID=51 http://publib.boulder.ibm.com/infocenter/ids9help/index.jsp?topic=/com.ibm.sqls.doc/sqls578.htm http://www.jdixon.dotnetdevelopersjournal.com/i_want_my_sql_2005_merge_statement.htm http://publib.boulder.ibm.com/infocenter/db2help/topic/com.ibm.db2.udb.doc/admin/r0010873.htm http://expertanswercenter.techtarget.com/eac/knowledgebaseAnswer/0,295199,sid63_gci978700,00.html Not one (*not one!*) of these mentions any special handling of duplicate keys. They even go to pains to say that any errors cause everything to rollback. The last one is especially interesting: : Is there any way to capture errors from a MERGE statement? Also, is : there any way to know how many records were inserted or updated for the : MERGE statement like SQL%ROWCOUNT? Any assistance greatly appreciated. : : You capture errors the same way you would if you were doing regular : INSERT and UPDATE statementswith exception handlers. Just include a : WHEN OTHERS exception handler in the block where your MERGE statement : is and have to display SQLCODE and SQLERRM if an error occurs. Then you : can figure out which specific errors are occurring and create : individual exception handlers for those. There are even places that tell you how to decompose your MERGE into an INSERT plus UPDATE statement. The real advantage of MERGE is that the semantics prevent your updating a row you just inserted, which is harder in the general case but easy if the executor is handling the rows one at a time. Rather than trying to make MERGE do something it wasn't designed for, we should probably be spending our efforts on triggers for error conditions. Maybe something like: CREATE TRIGGER foo AFTER ERROR ON bar EXECUTE baz(); Where baz would be passed NEW and OLD just like a normal trigger and if the trigger return NULL, the update is ignored. In the meantime the function can divert the insert to another table if it likes. This seems like a much more workable and useful addition. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgpqFtq6mXEYi.pgp Description: PGP signature
Re: [HACKERS] MERGE vs REPLACE
Jaime Casanova wrote: And yes merge CAN be used to do REPLACE (oracle uses their dummy table for this, we can use the fact that FROM clause isn't required in postgres). the FROM clause is required by default (starting with 8.1) unless you change a postgresql.conf parameter. and i don't think that idea will have any fan... No, it is not, try SELECT 1. Oracle requires SELECT 1 FROM dual. The change in 8.1 is that SELECT pg_class.relname no longer works. You have to do SELECT relname FROM pg_class. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] MERGE vs REPLACE
On 11/22/05, Bruce Momjian pgman@candle.pha.pa.us wrote: Jaime Casanova wrote: And yes merge CAN be used to do REPLACE (oracle uses their dummy table for this, we can use the fact that FROM clause isn't required in postgres). the FROM clause is required by default (starting with 8.1) unless you change a postgresql.conf parameter. and i don't think that idea will have any fan... No, it is not, try SELECT 1. Oracle requires SELECT 1 FROM dual. The change in 8.1 is that SELECT pg_class.relname no longer works. You have to do SELECT relname FROM pg_class. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 touche... -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MERGE vs REPLACE
On Tue, Nov 22, 2005 at 11:57:48AM +0100, Martijn van Oosterhout wrote: excellent research snipped Rather than trying to make MERGE do something it wasn't designed for, we should probably be spending our efforts on triggers for error conditions. Maybe something like: CREATE TRIGGER foo AFTER ERROR ON bar EXECUTE baz(); Where baz would be passed NEW and OLD just like a normal trigger and if the trigger return NULL, the update is ignored. In the meantime the function can divert the insert to another table if it likes. This seems like a much more workable and useful addition. I agree that we shouldn't try and distort MERGE into something fancy. The AFTER ERROR trigger is a very interesting idea, since it could handle many different cases. But I'm worried that people might not want that behavior on by default for everything done against some table. I think it'd be better to have some way to specify in a command that you want to use some kind of error-handling trigger. Though presumably the underlying framework would be same, so it shouldn't be hard to support both. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] MERGE vs REPLACE
Jaime Casanova wrote: the FROM clause is required by default (starting with 8.1) unless you change a postgresql.conf parameter. and i don't think that idea will have any fan... Bruce already replied to your first statement so, what idea won't have any fan ? It's not that we would change what MERGE does. Postgres just does not requeire FROM clause in SELECT and second parameter of MERGE can be SELECT which means you can do what REPLACE) does without problems and without breaking something or violating standard and like I said you can do the same in oracle using dual. Btw about that keys, oracle gives error on many-to-one or many-to-many relationship between the source and target tables. -- Regards Petr Jelinek (PJMODOS) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MERGE vs REPLACE
On Thu, Nov 17, 2005 at 09:30:43PM -0500, Bruce Momjian wrote: Is the requirement for predicate locking, over and above a unique constraint on an index that involves the record key, to deal with the scenario of two inserts executing at the same time, both before commit? No. If you have a primary key you can easily prevent duplicates. You need a table lock or predicate locking to prevent duplicates if you do not have a primary key. AFAIK you can also accomplish this without a table lock as long as you have a unique index on the right set of fields and those fields are also NOT NULL. ISTM it would be good to support that case as well, since you might want to MERGE based on something other than the PK. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] MERGE vs REPLACE
On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote: I don't think MERGE can really be made to be both though, in which case it should really be the SQL2003 MERGE and we can make REPLACE/INSERT ON DUPLICATE UPDATE something else. Perhaps a special form of MERGE where you know it's going to be doing that locking. I really don't like the idea of making the SQL2003 version of MERGE be the MERGE special case (by requiring someone to take a table lock ahead of time or do something else odd). Anyone know off-hand what the big 3 do? If the industry consensus is that merge should actually be REPLACE/INSERT ON DUPLICATE UPDATE then it's probably better to follow that lead. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] MERGE vs REPLACE
Jim C. Nasby wrote: On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote: I don't think MERGE can really be made to be both though, in which case it should really be the SQL2003 MERGE and we can make REPLACE/INSERT ON DUPLICATE UPDATE something else. Perhaps a special form of MERGE where you know it's going to be doing that locking. I really don't like the idea of making the SQL2003 version of MERGE be the MERGE special case (by requiring someone to take a table lock ahead of time or do something else odd). Anyone know off-hand what the big 3 do? If the industry consensus is that merge should actually be REPLACE/INSERT ON DUPLICATE UPDATE then it's probably better to follow that lead. It was already said here that oracle and db2 both use MERGE, dunno about mssql. And yes merge CAN be used to do REPLACE (oracle uses their dummy table for this, we can use the fact that FROM clause isn't required in postgres). -- Regards Petr Jelinek (PJMODOS) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] MERGE vs REPLACE
And yes merge CAN be used to do REPLACE (oracle uses their dummy table for this, we can use the fact that FROM clause isn't required in postgres). the FROM clause is required by default (starting with 8.1) unless you change a postgresql.conf parameter. and i don't think that idea will have any fan... -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] MERGE vs REPLACE
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Oh, good point. I was thinking just about concurrent MERGEs. However, it is more complicated than that. By definitaion you can not see changes from other transactions while your statement is being run (even if you increment CommandCounter), so to be atomic, you would still see the row even though some other transaction had deleted it. We would have to use the same semantics we use now for read-committed UPDATE, that is look at the latest version of the row even though this would not normally be visible to the transaction's snapshot. In the case of a serializable transaction, no doubt we should fail if any concurrent change actually happens. I have some psaudocode to explain what we want for this feature, whatever syntax we choose: Start Check unique index Found lock row for update if zero rows, return to start if more than one row, fail update row Notfound create savepoint insert row into heap lock index page if conflicting index entry, abort savepoint, return to start add index entry unlock index page While the notfound case might look strange, we actually use this exact method for inserts now, see ExecInsert() and _bt_doinsert(). Particularly see this comment in the second function: /* * If we're not allowing duplicates, make sure the key isn't already in * the index. * * NOTE: obviously,_bt_check_unique can only detect keys that are already in * the index; so it cannot defend against concurrent insertions of the * same key. We protect against that by means of holding a write lock on * the target page. Any other would-be inserter of the same key must * acquire a write lock on the same target page, so only one would-be * inserter can be making the check at one time. Furthermore, once we are * past the check we hold write locks continuously until we have performed * our insertion, so no later inserter can fail to see our insertion. * (This requires some care in _bt_insertonpg.) * * If we must wait for another xact, we release the lock while waiting, and * then must start over completely. */ Here is the unique check error from _bt_check_unique(): ereport(ERROR, (errcode(ERRCODE_UNIQUE_VIOLATION), errmsg(duplicate key violates unique constraint \%s\, RelationGetRelationName(rel; I think the problem here is that it is going to longjump() back to postgres.c (and out of your code loop). While we have savepoints, I think they only work coming from client applications, rather than inside our code. Ideally you would like to be able to say: savepoint(); func(); rollback_to_savepoint(); but you can't, so I think you are going to have to factor out that unique error callback and return a failure code to the caller. I suppose some boolean flag need to be added to _bt_doinsert(), but that is called via a function pointer for the index type, so you are going to have to update the insert function signatures for all access methods. The good news is that only btree supports unique indexes, according to the documentation (Only B-tree currently supports unique indexes) so for the other access methods the extra parameter is just ignored. Another issue is multiple unique indexes. What if the first unique index matches one row, but a different row matches the second unique indexed column? Fail because unique checks do not identify exactly one row? Or the _new_ value for the second indexed column conflicts with the second unique index. The MERGE/REPLACE should fail. The UPDATE block will handle this on its own, but the INSERT block will need to check for that an really error out, rather than return to the caller, so the loop in ExecInsertIndexTuples() has to restart on unique failure _only_ on the first index check, not the subsequent ones. One simplification would be to allow MERGE/REPLACE only on a table that has a single unique index. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] MERGE vs REPLACE
Unless you have a table lock, INSERT has to be before UPDATE, think UPDATE, UPDATE (both fail), INSERT, INSERT. update if no rows updated insert if duplicate key update if no rows updated goto insert That is why you have the loop. This is not a problem with above code, because only one insert succeeds while the others then do the update. Andreas ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] MERGE vs REPLACE
Oh, good point. I was thinking just about concurrent MERGEs. However, it is more complicated than that. By definitaion you can not see changes from other transactions while your statement is being run (even if you increment CommandCounter), so to be atomic, you would still see the row even though some other transaction had deleted it. I think we avoid that now because UPDATE, (which is a DELETE then INSERT) chains the tuples together so others see the activity happening. Seems like we are going to have to peek at rows like we do now for INSERT and peek at index rows, if I remember correctly. I can't think of any other place in the code where we loop around other backend's activity like this. This could be tricky. --- Dennis Bjorklund wrote: On Thu, 17 Nov 2005, Bruce Momjian wrote: Unless you have a table lock, INSERT has to be before UPDATE, think UPDATE, UPDATE (both fail), INSERT, INSERT. No matter what operation you start with you need a loop that try insert/update until one of them succeed like in this example: http://www.postgresql.org/docs/8.1/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE Without a loop you might not get to execute neither the insert nor the update. Why? Think about this example: BEGIN INSERT - fail because there is a row already - before we manage to do the update someone delete the row (which we can see in the default transaction isolation level) UPDATE - fail because there is no row so we will loop and try the insert again - before we manage to do the insert someone else does an insert INSERT - fail because there is a row already - before we manage to do the update someone delete the row You might need to loop any number of times before you manage to perform one of the two operations. Which operation you should start with depends on which of the two cases is the common one. -- /Dennis Bj?rklund ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] MERGE vs REPLACE
Bruce Momjian pgman@candle.pha.pa.us writes: Oh, good point. I was thinking just about concurrent MERGEs. However, it is more complicated than that. By definitaion you can not see changes from other transactions while your statement is being run (even if you increment CommandCounter), so to be atomic, you would still see the row even though some other transaction had deleted it. We would have to use the same semantics we use now for read-committed UPDATE, that is look at the latest version of the row even though this would not normally be visible to the transaction's snapshot. In the case of a serializable transaction, no doubt we should fail if any concurrent change actually happens. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] MERGE vs REPLACE
Well, from my point of view it is more than delete and insert. That I can do right now with existing infrastructure. The problem I try to solve is something along: a bunch of clients try to update a count, and ONE of them must initialize the count if it does not exist... this can't be done with current infrastructure without race conditions. Our current solution is to initialize all the possible counts beforehand, but that suboptimal as only a few of them will actually have data coming in later... And of course the import problem... we can have multiple concurrent imports, which must insert just once per some unique criteria, and update if the record is already there, and all this in a batch. This is also not possible without race conditions or aggressive locking. So for me the atomic, consistent and without performance penalties update_or_insert_based_on_unique_criteria does have a value, and that's coming exactly from the locking of the unique index which eliminates the race condition from this operation. I don't care about syntax sugar, just about things I could do more efficiently if this mechanism were in place... Cheers, Csaba. On Wed, 2005-11-16 at 20:33, Martijn van Oosterhout wrote: On Wed, Nov 16, 2005 at 11:06:15AM -0800, Dann Corbit wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-hackers- [EMAIL PROTECTED] On Behalf Of Simon Riggs Conceptually, a MERGE statement is just a long string of INSERTs and UPDATEs in the same transaction and I think we should treat it as such. Merge could also be considered as a long string of deletes and inserts. I guess that deleting those records that already exist and then inserting all of the records is faster because it could be done like a single join to perform the delete and then a single batch insert. And for us it makes no difference because in MVCC, UPDATE == DELETE + INSERT. IMHO it's just a nice construct to specify UPDATEs and INSERTs in the same statement. Have a nice day, ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] MERGE vs REPLACE
The problem I try to solve is something along: a bunch of clients try to update a count, and ONE of them must initialize the count if it does not exist... this can't be done with current infrastructure without race conditions. The solution without merge but a unique key in other db's is: update if no rows updated insert if duplicate key update if no rows updated goto insert note, that the counter updates need to be of the form set x = x + ? where key=y do you see a potential race condition with this ? In pg you also need a savepoint before the insert for this to work. Depending on the ratio of insert vs update we also start with insert when the insert succeeds more that 50% (I would use a higher percentage with pg though): insert if duplicate key update if no rows updated goto insert Andreas ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] MERGE vs REPLACE
Yes, these algorithms are clear to me, but they don't work for batch updates in postgres without savepoints before each row insert/update, which is not good for performance (not to mention on older postgres versions without savepoint support it won't work at all). If there is a way of no race condition, no performance penalty, that would be something new and useful. I just guess the MERGE would provide that. Cheers, Csaba. On Thu, 2005-11-17 at 12:34, Zeugswetter Andreas DCP SD wrote: The problem I try to solve is something along: a bunch of clients try to update a count, and ONE of them must initialize the count if it does not exist... this can't be done with current infrastructure without race conditions. The solution without merge but a unique key in other db's is: update if no rows updated insert if duplicate key update if no rows updated goto insert note, that the counter updates need to be of the form set x = x + ? where key=y do you see a potential race condition with this ? In pg you also need a savepoint before the insert for this to work. Depending on the ratio of insert vs update we also start with insert when the insert succeeds more that 50% (I would use a higher percentage with pg though): insert if duplicate key update if no rows updated goto insert Andreas ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] MERGE vs REPLACE
On Thu, Nov 17, 2005 at 12:52:53PM +0100, Csaba Nagy wrote: Yes, these algorithms are clear to me, but they don't work for batch updates in postgres without savepoints before each row insert/update, which is not good for performance (not to mention on older postgres versions without savepoint support it won't work at all). If there is a way of no race condition, no performance penalty, that would be something new and useful. I just guess the MERGE would provide that. Well, then you guess wrong. This isn't what MERGE is for. MERGE is just a neat way of specifying the UPDATE and INSERT cases in the same statement. It doesn't remove the possibility duplicate inserts and thus primary key violations. If someone wants to make extensions to MERGE so that it can avoid the race condition and avoid the duplicate key violations, that's fine. But be aware that this is outside of the spec. It may be a useful addition, but perhaps we should consider MERGE and REPLACE as completely seperate targets. MERGE has a whole join construction with subqueries that would be a pain to make work in a way that is truly serialisable. REPLACE deals with only one row and tries to solve the race for that case only. Much easier to consider them seperately, no? I guess what's really irritating is that this clearly exposes the case listed in the docs as Why SERIALIZABLE isn't in all cases. If we could solve that for MERGE, we could probably solve it in the general case too. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgpwqXKPvo94y.pgp Description: PGP signature
Re: [HACKERS] MERGE vs REPLACE
OK, in this case I don't care about either MERGE or REPLACE, but for an UPSERT which does the locking :-) Cheers, Csaba. On Thu, 2005-11-17 at 13:32, Martijn van Oosterhout wrote: On Thu, Nov 17, 2005 at 12:52:53PM +0100, Csaba Nagy wrote: Yes, these algorithms are clear to me, but they don't work for batch updates in postgres without savepoints before each row insert/update, which is not good for performance (not to mention on older postgres versions without savepoint support it won't work at all). If there is a way of no race condition, no performance penalty, that would be something new and useful. I just guess the MERGE would provide that. Well, then you guess wrong. This isn't what MERGE is for. MERGE is just a neat way of specifying the UPDATE and INSERT cases in the same statement. It doesn't remove the possibility duplicate inserts and thus primary key violations. If someone wants to make extensions to MERGE so that it can avoid the race condition and avoid the duplicate key violations, that's fine. But be aware that this is outside of the spec. It may be a useful addition, but perhaps we should consider MERGE and REPLACE as completely seperate targets. MERGE has a whole join construction with subqueries that would be a pain to make work in a way that is truly serialisable. REPLACE deals with only one row and tries to solve the race for that case only. Much easier to consider them seperately, no? I guess what's really irritating is that this clearly exposes the case listed in the docs as Why SERIALIZABLE isn't in all cases. If we could solve that for MERGE, we could probably solve it in the general case too. Have a nice day, ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] MERGE vs REPLACE
Csaba Nagy [EMAIL PROTECTED] writes: OK, in this case I don't care about either MERGE or REPLACE, but for an UPSERT which does the locking :-) This is exactly the point --- pretty much nobody has come to us and asked for a feature that does what Peter and Martijn say MERGE does. (I haven't bothered to look at the 2003 spec, I'm assuming they read it correctly.) What we *have* been asked for, over and over, is an insert-or-update feature that's not so tedious and inefficient as the savepoint-insert-rollback-update kluge. That's what we ought to be concentrating on providing. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MERGE vs REPLACE
* Tom Lane ([EMAIL PROTECTED]) wrote: This is exactly the point --- pretty much nobody has come to us and asked for a feature that does what Peter and Martijn say MERGE does. (I haven't bothered to look at the 2003 spec, I'm assuming they read it correctly.) What we *have* been asked for, over and over, is an insert-or-update feature that's not so tedious and inefficient as the savepoint-insert-rollback-update kluge. That's what we ought to be concentrating on providing. I guess to be clear on what this distinction actually is, specifically: MERGE under SQL2003 doesn't appear to be intended to be used concurrently. For data warehousing situations this can be just fine such as in my case where I get a monthly update of some information and need to merge that update in with the prior information. In this case there's only one MERGE running and I'd hope it'd be faster than doing check for existance, insert/update on each row in plpgsql or something (since there'd be multiple index lookups, etc, I think). Concurrent MERGEs running *can* fail, just like whole transactions which do the check/insert/update can fail. REPLACE/INSERT ON DUPLICATE UPDATE appears to essentially be a transaction which is supposed to not fail but instead do locking to ensure that it doesn't fail. This requires predicate locking to be efficient because you want to tell the concurrent transaction if you have the same key as me, just wait a second and you can do an update 'cause I'm going to create the key if it doesn't exist before I'm done. I think REPLACE/INSERT ON DUPLICATE UPDATE is definitely harder to do than MERGE because of the idea that it isn't supposed to fail generally. I think SQL2003 MERGE would be reasonably easy to do and to get the efficiency benefits out of it (assuming there are some to be had in the end). I don't think MERGE can really be made to be both though, in which case it should really be the SQL2003 MERGE and we can make REPLACE/INSERT ON DUPLICATE UPDATE something else. Perhaps a special form of MERGE where you know it's going to be doing that locking. I really don't like the idea of making the SQL2003 version of MERGE be the MERGE special case (by requiring someone to take a table lock ahead of time or do something else odd). Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] MERGE vs REPLACE
On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote: REPLACE/INSERT ON DUPLICATE UPDATE appears to essentially be a transaction which is supposed to not fail but instead do locking to ensure that it doesn't fail. This requires predicate locking to be efficient because you want to tell the concurrent transaction if you have the same key as me, just wait a second and you can do an update 'cause I'm going to create the key if it doesn't exist before I'm done. Is the requirement for predicate locking, over and above a unique constraint on an index that involves the record key, to deal with the scenario of two inserts executing at the same time, both before commit? Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] MERGE vs REPLACE
Unless you have a table lock, INSERT has to be before UPDATE, think UPDATE, UPDATE (both fail), INSERT, INSERT. --- Zeugswetter Andreas DCP SD wrote: The problem I try to solve is something along: a bunch of clients try to update a count, and ONE of them must initialize the count if it does not exist... this can't be done with current infrastructure without race conditions. The solution without merge but a unique key in other db's is: update if no rows updated insert if duplicate key update if no rows updated goto insert note, that the counter updates need to be of the form set x = x + ? where key=y do you see a potential race condition with this ? In pg you also need a savepoint before the insert for this to work. Depending on the ratio of insert vs update we also start with insert when the insert succeeds more that 50% (I would use a higher percentage with pg though): insert if duplicate key update if no rows updated goto insert Andreas -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MERGE vs REPLACE
[EMAIL PROTECTED] wrote: On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote: REPLACE/INSERT ON DUPLICATE UPDATE appears to essentially be a transaction which is supposed to not fail but instead do locking to ensure that it doesn't fail. This requires predicate locking to be efficient because you want to tell the concurrent transaction if you have the same key as me, just wait a second and you can do an update 'cause I'm going to create the key if it doesn't exist before I'm done. Is the requirement for predicate locking, over and above a unique constraint on an index that involves the record key, to deal with the scenario of two inserts executing at the same time, both before commit? No. If you have a primary key you can easily prevent duplicates. You need a table lock or predicate locking to prevent duplicates if you do not have a primary key. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] MERGE vs REPLACE
Tom Lane wrote: Csaba Nagy [EMAIL PROTECTED] writes: OK, in this case I don't care about either MERGE or REPLACE, but for an UPSERT which does the locking :-) This is exactly the point --- pretty much nobody has come to us and asked for a feature that does what Peter and Martijn say MERGE does. (I haven't bothered to look at the 2003 spec, I'm assuming they read it correctly.) What we *have* been asked for, over and over, is an insert-or-update feature that's not so tedious and inefficient as the savepoint-insert-rollback-update kluge. That's what we ought to be concentrating on providing. I am confused over the various options. I have heard these syntaxes: SQL2003 MERGE MySQL REPLACE http://dev.mysql.com/doc/refman/5.1/en/replace.html MySQL INSERT VIOLATION ... UPSERT So it seems MERGE does not have the use-case we most need, though it can be bent to do it. (Given their MATCH syntax, it doesn't seem there is any logic that it tries INSERT first). Looking at the MySQL URL above, REPLACE has three possible syntaxes with normal (DELETE), SET (UPDATE), and SELECT. Is this the direction we need to go? I don't like INSERT ... VIOLATION because I would like a new keyword for this. Is UPSERT the same as REPLACE? Should we use UPSERT instead? -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] MERGE vs REPLACE
On Thu, 17 Nov 2005, Bruce Momjian wrote: Unless you have a table lock, INSERT has to be before UPDATE, think UPDATE, UPDATE (both fail), INSERT, INSERT. No matter what operation you start with you need a loop that try insert/update until one of them succeed like in this example: http://www.postgresql.org/docs/8.1/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE Without a loop you might not get to execute neither the insert nor the update. Why? Think about this example: BEGIN INSERT - fail because there is a row already - before we manage to do the update someone delete the row (which we can see in the default transaction isolation level) UPDATE - fail because there is no row so we will loop and try the insert again - before we manage to do the insert someone else does an insert INSERT - fail because there is a row already - before we manage to do the update someone delete the row You might need to loop any number of times before you manage to perform one of the two operations. Which operation you should start with depends on which of the two cases is the common one. -- /Dennis Björklund ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] MERGE vs REPLACE
Christopher Kings-Lynne [EMAIL PROTECTED] writes: We should probably throw a notice or warning if we go to a table lock, too. That's not very useful, because you can only do somethign about it AFTER the 1 hour exclusive lock merge has already run :) We shouldn't do anything remotely like that. A statement whose locking effects can't be predicted on sight is horrid both from the user's viewpoint and from the implementation viewpoint. In particular, if we have to do planning before we can determine whether the table needs just a SELECT lock or something stronger, then we have to take a weak lock to do the planning and then we are faced with upgrading to the stronger lock at runtime. Can you say deadlock risk? I think we should do REPLACE-like functionality that simply fails if the match condition isn't equality on a primary key. If we can use SQL-spec MERGE syntax for this, that's fine, but let's not think in terms of silently changing to a stronger table lock and a much slower implementation when the condition isn't a primary key. That's a whole lot of work that isn't solving any real-world problems, and *is* creating a foot-gun for people to cause themselves performance and deadlock problems anytime they spell the WHERE condition slightly wrong. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] MERGE vs REPLACE
I agree. I would never ever ever want it to silently start doing table locks. I would simply avoid using merge at all if that was a possibility. However it seems like the idea is to eventually flesh out full fledged merge. And to do that it sounds like you would need to do one of the following: 1) implement predicate locking beyond the simple match on unique index case that we have here 2) do full table locks. It sounds like #1 isn't going to happen for a while. So in order to do more complicated merges you will need to do #2. If you are going to implement more complicated merge functionality I certainly wouldn't want it throwing a warning telling me about a table lock if I had already knew it would get the table lock and decided I wanted to go ahead with using merge anyway. Could you let the user create the lock himself to handle this situation? For instance: analyze the merge if merge condition matches unique index merge without table locking elseif needed table lock already exists merge else throw an error You could also just add something to the merge syntax like ALLOW TABLE LOCK or something. The idea is just that the user can explicitly allow the table lock and thus the more complicated merge. I don't really know anything about the implementation details but that is the behavior that I would prefer. That way I could always do a complicated merge if I wanted to but there is no way it would ever do an implicit table lock on me. And it would never throw an error/ warning unless I actually did something questionable. Does that make sense. Rick Gigger On Nov 16, 2005, at 7:49 AM, Tom Lane wrote: Christopher Kings-Lynne [EMAIL PROTECTED] writes: We should probably throw a notice or warning if we go to a table lock, too. That's not very useful, because you can only do somethign about it AFTER the 1 hour exclusive lock merge has already run :) We shouldn't do anything remotely like that. A statement whose locking effects can't be predicted on sight is horrid both from the user's viewpoint and from the implementation viewpoint. In particular, if we have to do planning before we can determine whether the table needs just a SELECT lock or something stronger, then we have to take a weak lock to do the planning and then we are faced with upgrading to the stronger lock at runtime. Can you say deadlock risk? I think we should do REPLACE-like functionality that simply fails if the match condition isn't equality on a primary key. If we can use SQL- spec MERGE syntax for this, that's fine, but let's not think in terms of silently changing to a stronger table lock and a much slower implementation when the condition isn't a primary key. That's a whole lot of work that isn't solving any real-world problems, and *is* creating a foot-gun for people to cause themselves performance and deadlock problems anytime they spell the WHERE condition slightly wrong. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MERGE vs REPLACE
You could also just add something to the merge syntax like ALLOW TABLE LOCK or something. The idea is just that the user can explicitly allow the table lock and thus the more complicated merge. The problem here is that many people will see that option and think it's safe to do it... i mean, many people will shoot themselves in the foot and the culprit will be PostgreSQL because he let a ready to shoot gun in a visible place when are kids around -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MERGE vs REPLACE
Interesting approach. Actually, we could tell the user they have to use BEGIN;LOCK tab before doing MERGE, and throw an error if we don't already have a table lock. --- Rick Gigger wrote: I agree. I would never ever ever want it to silently start doing table locks. I would simply avoid using merge at all if that was a possibility. However it seems like the idea is to eventually flesh out full fledged merge. And to do that it sounds like you would need to do one of the following: 1) implement predicate locking beyond the simple match on unique index case that we have here 2) do full table locks. It sounds like #1 isn't going to happen for a while. So in order to do more complicated merges you will need to do #2. If you are going to implement more complicated merge functionality I certainly wouldn't want it throwing a warning telling me about a table lock if I had already knew it would get the table lock and decided I wanted to go ahead with using merge anyway. Could you let the user create the lock himself to handle this situation? For instance: analyze the merge if merge condition matches unique index merge without table locking elseif needed table lock already exists merge else throw an error You could also just add something to the merge syntax like ALLOW TABLE LOCK or something. The idea is just that the user can explicitly allow the table lock and thus the more complicated merge. I don't really know anything about the implementation details but that is the behavior that I would prefer. That way I could always do a complicated merge if I wanted to but there is no way it would ever do an implicit table lock on me. And it would never throw an error/ warning unless I actually did something questionable. Does that make sense. Rick Gigger On Nov 16, 2005, at 7:49 AM, Tom Lane wrote: Christopher Kings-Lynne [EMAIL PROTECTED] writes: We should probably throw a notice or warning if we go to a table lock, too. That's not very useful, because you can only do somethign about it AFTER the 1 hour exclusive lock merge has already run :) We shouldn't do anything remotely like that. A statement whose locking effects can't be predicted on sight is horrid both from the user's viewpoint and from the implementation viewpoint. In particular, if we have to do planning before we can determine whether the table needs just a SELECT lock or something stronger, then we have to take a weak lock to do the planning and then we are faced with upgrading to the stronger lock at runtime. Can you say deadlock risk? I think we should do REPLACE-like functionality that simply fails if the match condition isn't equality on a primary key. If we can use SQL- spec MERGE syntax for this, that's fine, but let's not think in terms of silently changing to a stronger table lock and a much slower implementation when the condition isn't a primary key. That's a whole lot of work that isn't solving any real-world problems, and *is* creating a foot-gun for people to cause themselves performance and deadlock problems anytime they spell the WHERE condition slightly wrong. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MERGE vs REPLACE
On 11/16/05, Bruce Momjian pgman@candle.pha.pa.us wrote: Interesting approach. Actually, we could tell the user they have to use BEGIN;LOCK tab before doing MERGE, and throw an error if we don't already have a table lock. If the lock will be required, what's the problem in doing it internally? without user interaction? -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] MERGE vs REPLACE
Jaime Casanova [EMAIL PROTECTED] writes: If the lock will be required, what's the problem in doing it internally? I already explained that: lock upgrading is deadlock-prone. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] MERGE vs REPLACE
Jaime Casanova wrote: On 11/16/05, Bruce Momjian pgman@candle.pha.pa.us wrote: Interesting approach. Actually, we could tell the user they have to use BEGIN;LOCK tab before doing MERGE, and throw an error if we don't already have a table lock. If the lock will be required, what's the problem in doing it internally? without user interaction? Because first, we are making it visible to the user, and second the lock is taken out _before_ we actually execute the statement, meaning we don't have to escalate our locks. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] MERGE vs REPLACE
On Wed, Nov 16, 2005 at 11:37:46AM -0500, Bruce Momjian wrote: Interesting approach. Actually, we could tell the user they have to use BEGIN;LOCK tab before doing MERGE, and throw an error if we don't already have a table lock. The bit I'm still missing is why there needs to be a lock at all. The SQL standard doesn't say anywhere that concurrent MERGE operations can't conflict. It seems to me that standard visibility rules apply. If neither MERGE statement can see the results of the other, then they will both INSERT. If you don't have a UNIQUE constraint to prevent this then what's the problem? It seems to me people would like, in the case of an existing UNIQUE constraint, to be able to use it to prevent duplicate key errors. This is nice, but the standard doesn't require that either. In other words, if we can use an index to avoid duplicate key errors, fine. But if there is no index available, it is not an error to do an INSERT because another INSERT was hidden from you. Conceptually, a MERGE statement is just a long string of INSERTs and UPDATEs in the same transaction and I think we should treat it as such. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgpLQAwAS0yUe.pgp Description: PGP signature
Re: [HACKERS] MERGE vs REPLACE
Martijn van Oosterhout wrote: -- Start of PGP signed section. On Wed, Nov 16, 2005 at 11:37:46AM -0500, Bruce Momjian wrote: Interesting approach. Actually, we could tell the user they have to use BEGIN;LOCK tab before doing MERGE, and throw an error if we don't already have a table lock. The bit I'm still missing is why there needs to be a lock at all. The SQL standard doesn't say anywhere that concurrent MERGE operations can't conflict. It seems to me that standard visibility rules apply. If neither MERGE statement can see the results of the other, then they will both INSERT. If you don't have a UNIQUE constraint to prevent this then what's the problem? I assume they want MERGE because they don't want duplicates. If they don't care, they would have used INSERT. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] MERGE vs REPLACE
On Wed, 2005-11-16 at 18:34 +0100, Martijn van Oosterhout wrote: On Wed, Nov 16, 2005 at 11:37:46AM -0500, Bruce Momjian wrote: Interesting approach. Actually, we could tell the user they have to use BEGIN;LOCK tab before doing MERGE, and throw an error if we don't already have a table lock. The bit I'm still missing is why there needs to be a lock at all. The SQL standard doesn't say anywhere that concurrent MERGE operations can't conflict. It seems to me that standard visibility rules apply. If neither MERGE statement can see the results of the other, then they will both INSERT. If you don't have a UNIQUE constraint to prevent this then what's the problem? It seems to me people would like, in the case of an existing UNIQUE constraint, to be able to use it to prevent duplicate key errors. This is nice, but the standard doesn't require that either. In other words, if we can use an index to avoid duplicate key errors, fine. But if there is no index available, it is not an error to do an INSERT because another INSERT was hidden from you. Conceptually, a MERGE statement is just a long string of INSERTs and UPDATEs in the same transaction and I think we should treat it as such. Agreed. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] MERGE vs REPLACE
On Wed, Nov 16, 2005 at 12:59:36PM -0500, Bruce Momjian wrote: Martijn van Oosterhout wrote: The bit I'm still missing is why there needs to be a lock at all. The SQL standard doesn't say anywhere that concurrent MERGE operations can't conflict. It seems to me that standard visibility rules apply. If neither MERGE statement can see the results of the other, then they will both INSERT. If you don't have a UNIQUE constraint to prevent this then what's the problem? I assume they want MERGE because they don't want duplicates. If they don't care, they would have used INSERT. The whole point of MERGE was because you can do different things on INSERT or UPDATE. If you didn't care about the UPDATE case you could indeed do just inserts... I just don't think we should tie ourselves in knots over details that the spec doesn't require. If the spec doesn't require predicate locking then why on earth should we provide it if it's so difficult? Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgpN5btUoK1IV.pgp Description: PGP signature
Re: [HACKERS] MERGE vs REPLACE
-Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-hackers- [EMAIL PROTECTED] On Behalf Of Simon Riggs Sent: Wednesday, November 16, 2005 10:35 AM To: Martijn van Oosterhout Cc: Bruce Momjian; Rick Gigger; Tom Lane; Christopher Kings-Lynne; Jim C. Nasby; josh@agliodbs.com; pgsql-hackers@postgresql.org; Jaime Casanova; Peter Eisentraut Subject: Re: [HACKERS] MERGE vs REPLACE On Wed, 2005-11-16 at 18:34 +0100, Martijn van Oosterhout wrote: On Wed, Nov 16, 2005 at 11:37:46AM -0500, Bruce Momjian wrote: Interesting approach. Actually, we could tell the user they have to use BEGIN;LOCK tab before doing MERGE, and throw an error if we don't already have a table lock. The bit I'm still missing is why there needs to be a lock at all. The SQL standard doesn't say anywhere that concurrent MERGE operations can't conflict. It seems to me that standard visibility rules apply. If neither MERGE statement can see the results of the other, then they will both INSERT. If you don't have a UNIQUE constraint to prevent this then what's the problem? It seems to me people would like, in the case of an existing UNIQUE constraint, to be able to use it to prevent duplicate key errors. This is nice, but the standard doesn't require that either. In other words, if we can use an index to avoid duplicate key errors, fine. But if there is no index available, it is not an error to do an INSERT because another INSERT was hidden from you. Conceptually, a MERGE statement is just a long string of INSERTs and UPDATEs in the same transaction and I think we should treat it as such. Merge could also be considered as a long string of deletes and inserts. I guess that deleting those records that already exist and then inserting all of the records is faster because it could be done like a single join to perform the delete and then a single batch insert. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] MERGE vs REPLACE
On Wed, Nov 16, 2005 at 11:06:15AM -0800, Dann Corbit wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-hackers- [EMAIL PROTECTED] On Behalf Of Simon Riggs Conceptually, a MERGE statement is just a long string of INSERTs and UPDATEs in the same transaction and I think we should treat it as such. Merge could also be considered as a long string of deletes and inserts. I guess that deleting those records that already exist and then inserting all of the records is faster because it could be done like a single join to perform the delete and then a single batch insert. And for us it makes no difference because in MVCC, UPDATE == DELETE + INSERT. IMHO it's just a nice construct to specify UPDATEs and INSERTs in the same statement. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgpsVlES6C7e7.pgp Description: PGP signature
Re: [HACKERS] MERGE vs REPLACE
Conceptually, a MERGE statement is just a long string of INSERTs and UPDATEs in the same transaction and I think we should treat it as such. I've just got one question about this. Everyone seems to be saying that try to insert and if that fails update is the same as try to insert and if that fails delete and then insert. What about the following scenario: mytable id serial primary key, a int4, b int4, data text I've got an id field on every table because it simplifies a lot of things (such as slony configuration for example) But I've also got a unique key on (a, b) and if I was to do a merge I would most likely do it in (a, b) not id. If merge does a delete insert then it creates new values for the id columns which could cause me problems. Basically any default fields are going to change or for that matter any fields not specified would be reinitialized whereas an update would leave them in place. It seems to me that try to update and if that fails insert seems to be the best approach for not messing with existing data. I guess try to insert and if that fails update gets you the same effect. - Rick Gigger ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] MERGE vs REPLACE
On Wed, Nov 16, 2005 at 09:49:28AM -0500, Tom Lane wrote: I think we should do REPLACE-like functionality that simply fails if the match condition isn't equality on a primary key. If we can use SQL-spec MERGE syntax for this, that's fine, but let's not think in terms of silently changing to a stronger table lock and a much slower implementation when the condition isn't a primary key. That's a whole I agree, but would like to relax the primary key requirement to simply a unique index. I can see use cases for unique so long as not null keys, so it would be nice if the MERGE operation would work for these. As nulls are not equal anyway this doesn't seem to do too much violence to the semantics. -dg -- David Gould [EMAIL PROTECTED] If simplicity worked, the world would be overrun with insects. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MERGE vs REPLACE
daveg [EMAIL PROTECTED] writes: I agree, but would like to relax the primary key requirement to simply a unique index. I can see use cases for unique so long as not null keys, so it would be nice if the MERGE operation would work for these. As nulls are not equal anyway this doesn't seem to do too much violence to the semantics. But a unique key doesn't guarantee that there's only one matching row, so ISTM you're right back to needing a predicate lock if you do that. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MERGE vs REPLACE
On Wed, Nov 16, 2005 at 04:51:07PM -0500, Tom Lane wrote: daveg [EMAIL PROTECTED] writes: I agree, but would like to relax the primary key requirement to simply a unique index. I can see use cases for unique so long as not null keys, so it would be nice if the MERGE operation would work for these. As nulls are not equal anyway this doesn't seem to do too much violence to the semantics. But a unique key doesn't guarantee that there's only one matching row, so ISTM you're right back to needing a predicate lock if you do that. But there is no need to guarentee anything. As the spec says, if the join of the table with the other clauses matches a row in the table more than once, raise a cardinality exception. If someone creates a join that matches more than once the whole statement fails. But you can work that out at runtime. If the user specifies NOT NULL in the join condition then it can work and there no reason to forbid that. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgpoZSWYuKply.pgp Description: PGP signature
Re: [HACKERS] MERGE vs REPLACE
On Tue, Nov 15, 2005 at 10:27:10AM -0800, Josh Berkus wrote: Yes, I guess my hesitation on the full-table-lock strategy is that it doesn't really fulfill the mandate for why people want REPLACE-like statements ... to give them an INSERT-or-UPDATE with *higher* efficiency and concurrency than doing two statements. That being said, I've personally designed more than a dozen web applications and have not yet been faced with a single circumstance of not knowing whether I wanted to INSERT or UPDATE. I've even ported MySQL apps and found it easy to re-code them to do if $id = 0, then insert ... without even needing to use a pl/pgsql hack. So we thus have two seperate use cases. The first, for bulk loading/ETL is what MERGE fulfills rather neatly and for that full table locking is perfectly OK, even desirable. You really don't want to MERGE-load the same table on two threads at once. Yes. The second case is for applications coded for MySQL; this is the REPLACE case. However, the most common MySQL applications doing this use full table locking (MyISAM) anyway! So, while full table locking wouldn't gain them any performance over using two statements, it shouldn't lose them anything they're used to having. The last two times I wished to use it: Objects with dynamic fields. The table rows are identified by the object key, a field key, that refer to a text value. I believe I still have a race in this case, as I allow INSERT, UPDATE, and DELETE. The DELETE means that an UPDATE, followed by an INSERT, may fail, if the DELETE occurs between the two. DELETE is fine as is, as it is a single operation, that won't fail. Combining UPDATE and INSERT into one, as per the MySQL REPLACE, would eliminate my race caused by the DELETE. If REPLACE locked the whole table, it would not be a candidate, as this table is frequently modified by many users at once. REPLACE using the primary key index to do locking would be great. Timestamping the last signin and access time (~30 seconds error allowed in the access time to eliminate a majority of the updates using memcached as we all know how much PostgreSQL hates this) for a table that doesn't have a 1:1 mapping between the fields associated with the user, and the row that records their last signin/access time. The signin/access time can be associated with a context, that is a part of the primary key. Currently, I rely on the code never deleting rows, and perform update first, and if the update does not change any rows, I fall back to insert. As it is, though, I ended up combining the signin and access time into the same row, to use one table instead of two, and I'm not sure that MySQL replace really gives me what I want in this situation. I have never used MERGE, so can't say whether this would do what I want. I would not want the solution to lock the entire table, as these updates, although throttled by the the ~30 seconds error allowed, and memcached queries, would be potentially performed by hundreds of users every 30 seconds. I can extend the error factor to 60 seconds, but that only cuts the queries in half. In general, however, the current model does work fine, and doesn't require replacement. UPDATE will be used most of the time, and be the correct operation. Unfortunately, there is a race that I see here. If both UPDATE operations fail at the same time, then one of the INSERTS will fail. I don't handle this currently, but perhaps I could argue myself into considering this an ignorable failure. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] MERGE vs REPLACE
* Tom Lane ([EMAIL PROTECTED]) wrote: Normally I'd plump for following the standard ... but AFAIR, we have had bucketloads of requests for REPLACE functionality, and not one request for spec-compatible MERGE. If, as it appears, full-spec MERGE is also a whole lot harder and slower than REPLACE, it seems that we could do worse than to concentrate on doing REPLACE for now. (We can always come back to MERGE some other day.) Not to be too much of a pain, but I asked for full-spec MERGE a while back... :) I don't think I was the only one asking for full-spec MERGE in the What features would you like to see in Postgres? thread a while ago either, though I could be wrong. I'd like to see MySQL-like 'replace' too, of course. :) Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] MERGE vs REPLACE
On 11/15/05, Josh Berkus josh@agliodbs.com wrote: Simon, The UPSERT concept is also supported by Teradata, who simply append an ELSE INSERT clause onto the standard UPDATE syntax. MySQL REPLACE seems to me to be a fairly small subset of MERGE functionality and we ought to be able to offer that functionality as a side branch of the main work. Yes, I guess my hesitation on the full-table-lock strategy is that it doesn't really fulfill the mandate for why people want REPLACE-like statements ... to give them an INSERT-or-UPDATE with *higher* efficiency and concurrency than doing two statements. That being said, I've personally designed more than a dozen web applications and have not yet been faced with a single circumstance of not knowing whether I wanted to INSERT or UPDATE. I've even ported MySQL apps and found it easy to re-code them to do if $id = 0, then insert ... without even needing to use a pl/pgsql hack. Actually REPLACE is not INSERT or UPDATE... REPLACE means INSERT if already exists DELETE then INSERT can be used as an UPDATE if you use the SET clause but, it is optional -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] MERGE vs REPLACE
On Tue, 2005-11-15 at 10:27 -0800, Josh Berkus wrote: The UPSERT concept is also supported by Teradata, who simply append an ELSE INSERT clause onto the standard UPDATE syntax. MySQL REPLACE seems to me to be a fairly small subset of MERGE functionality and we ought to be able to offer that functionality as a side branch of the main work. Yes, I guess my hesitation on the full-table-lock strategy is that it doesn't really fulfill the mandate for why people want REPLACE-like statements ... to give them an INSERT-or-UPDATE with *higher* efficiency and concurrency than doing two statements. Agreed... OK, what I said was ...Where there is doubt, we should fall back to table locking just like the rest of the world, IMHO. I didn't mean we should use full table locking all of the time. Sorry if I wasn't clear. What I meant, in context was, IMHO - we do no need predicate locking - we should use row level locks when these can be used - we should use table level locks other times On Fri, 2005-11-11 at 14:40 -0800, Josh Berkus wrote: Scenario: session1: REPLACE 1 session2: REPLACE . 1 session1: check to see that 1 exists no session2: check to see that 1 exists no session1: INSERT 1 session2: INSERT 1 ERROR Get the picture? The only way to avoid a race condition is to be able to do predicate locking, that is to lock the table against any data write matching that predicate. The above situation happens now if you have two people doing SELECT then INSERT on the same row. Or even if you do UPDATE then INSERT. Currently if two people INSERT a row with a duplicate PK, we may find that both INSERTs made it into a data block, but one has its txn aborted because of a unique index violation. Nobody complains about that, nor says we should have predicate locking for that case, so why worry about it for MERGE? I don't see we need a special effort to avoid: if you try to do the same thing more than once, only the first one will work. At least you get a nice error message to let you know its happened. Seems like we - run the query in the USING clause - join it to the target table using pseudo outer join logic - if it matches we UPDATE - if it doesn't we INSERT ...but we do the last two in a single step, to minimise the window of opportunity for wierd situations. That being said, I've personally designed more than a dozen web applications and have not yet been faced with a single circumstance of not knowing whether I wanted to INSERT or UPDATE. I've even ported MySQL apps and found it easy to re-code them to do if $id = 0, then insert ... without even needing to use a pl/pgsql hack. OK, but not everybody knows what they're doing as well as you do. :-) So we thus have two seperate use cases. The first, for bulk loading/ETL is what MERGE fulfills rather neatly and for that full table locking is perfectly OK, even desirable. You really don't want to MERGE-load the same table on two threads at once. Probably true, but I do want to avoid full table locking for MERGE whenever possible. Concurrency is important even in data warehousing. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] MERGE vs REPLACE
Josh Berkus wrote: Jaime, why? seems that REPLACE only work if there are at least one row matching... Scenario: session1: REPLACE 1 session2: REPLACE . 1 session1: check to see that 1 exists no session2: check to see that 1 exists no session1: INSERT 1 session2: INSERT 1 ERROR Get the picture? The only way to avoid a race condition is to be able to do predicate locking, that is to lock the table against any data write matching that predicate. Isn't the standard practice to do the INSERT, and if that fails, do the UPDATE? In fact, if someone is already doing a INSERT inside a transaction, it will wait for the transaction to complete so you can then do the UPDATE, or complete the INSERT if the transaction aborted. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] MERGE vs REPLACE
Tom Lane wrote: Josh Berkus josh@agliodbs.com writes: But even REPLACE requires predicate locking. There's no real way to get around it. The point though is that REPLACE is restricted to a type of predicate narrow enough to be enforced through a unique-index mechanism, and so it's implementable without solving the general case of predicate locking. Predicate locking for narrow cases isn't very hard; it's the general case of arbitrary predicates that's hard. My feeling is we should implement MERGE for the limited cases we can, and throw an error for cases we can not (or require table locking), and then see what reports we get from users. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] MERGE vs REPLACE
Simon Riggs wrote: I've been looking at ways of doing INSERT then UPDATE, but it doesn't seem very easy to avoid unique index violations in that case. So doing the UPDATE first then INSERTs later seems like the way to go. INSERT has to be first to avoid a race condition (see my previous emails). We just need a way to get a unique index violation and continue with the UPDATE. We have savepoints now so it certainly seems possible. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] MERGE vs REPLACE
Josh Berkus wrote: So we thus have two seperate use cases. The first, for bulk loading/ETL is what MERGE fulfills rather neatly and for that full table locking is perfectly OK, even desirable. You really don't want to MERGE-load the same table on two threads at once. The second case is for applications coded for MySQL; this is the REPLACE case. However, the most common MySQL applications doing this use full table locking (MyISAM) anyway! So, while full table locking wouldn't gain them any performance over using two statements, it shouldn't lose them anything they're used to having. For any kind of efficiency, I assume MySQL REPLACE wants a unique index in place, so practially everyone doing merge probably already has the setup we need to avoid new non-index predicate locking code. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MERGE vs REPLACE
Just to summarize, MySQL REPLACE is INSERT or DELETE/INSERT, while they have a SET clauses that allows UPDATE, and INSERT has a ON DUPLICATE KEY UPDATE clause too. I think the INSERT ... ON DUPLICATE KEY is undesirable because this functionality should have a new keyword in the first position, e.g MERGE, and I think the REPLACE is out because most people feel that the DELETE/INSERT functionality is near-useless if we can give users the INSERT/UPDATE functionality of MERGE. I think even if we have to restrict MERGE to requiring a unique index, it is better to go that way than to drag REPLACE into our syntax. MERGE can be extended over time, while REPLACE has a non-optimal initial behavior. The idea that MERGE can use a constant list (not requiring a second table) makes it a valid replacement for REPLACE, and other database support for MERGE reinforces this. --- [EMAIL PROTECTED] wrote: I think you translated it correctly, MySQL has another way of specifying this which is INSERT ... ON DUPLICATE KEY UPDATE ... (http://dev.mysql.com/doc/refman/5.0/en/insert.html) Regards Paolo Jochem van Dieten [EMAIL PROTECTED] ha scritto On 11/13/05, Petr Jelinek wrote: I am really not db expert and I don't have copy of sql standard but you don't need to use 2 tables I think - USING part can also be subquery (some SELECT) and if I am right then you could simulate what REPLACE does because in PostgreSQL you are not forced to specify FROM clause in SELECT. So you could in theory do MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) ... I think the MySQL statement: REPLACE INTO table (pk, col1, col2, col3) VALUES (2, '-00-00', NULL, 3) would translate into the following MERGE statement: MERGE INTO table target USING (2 as pknew , NULL as col1new, NULL as col2new, 3 as col3new) source ON target.pknew = source.pk WHEN MATCHED THEN UPDATE SET col1 = col1new, col2 = col2new, col3 = col3new WHEN NOT MATCHED THEN INSERT (pk, col1, col2, col3) VALUES (pknew, col1new, col2new, col3new) It might not be the most elegant solution, but I don't see why it won't work. Jochem ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] MERGE vs REPLACE
On Tue, Nov 15, 2005 at 07:16:21PM -0500, Bruce Momjian wrote: Tom Lane wrote: Josh Berkus josh@agliodbs.com writes: But even REPLACE requires predicate locking. There's no real way to get around it. The point though is that REPLACE is restricted to a type of predicate narrow enough to be enforced through a unique-index mechanism, and so it's implementable without solving the general case of predicate locking. Predicate locking for narrow cases isn't very hard; it's the general case of arbitrary predicates that's hard. My feeling is we should implement MERGE for the limited cases we can, and throw an error for cases we can not (or require table locking), and then see what reports we get from users. We should probably throw a notice or warning if we go to a table lock, too. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] MERGE vs REPLACE
We should probably throw a notice or warning if we go to a table lock, too. That's not very useful, because you can only do somethign about it AFTER the 1 hour exclusive lock merge has already run :) Chris ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] MERGE vs REPLACE
On Wed, Nov 16, 2005 at 01:27:29PM +0800, Christopher Kings-Lynne wrote: We should probably throw a notice or warning if we go to a table lock, too. That's not very useful, because you can only do somethign about it AFTER the 1 hour exclusive lock merge has already run :) Not true; it would be useful for development when you'd like to know that some statement is grabbing a table lock. This is something that you wouldn't normally notice in a dev environment, and it sounds like it'd be easy to do a merge that has the unintended effect of grabbing a table lock. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] MERGE vs REPLACE
On 11/13/05, Petr Jelinek wrote: I am really not db expert and I don't have copy of sql standard but you don't need to use 2 tables I think - USING part can also be subquery (some SELECT) and if I am right then you could simulate what REPLACE does because in PostgreSQL you are not forced to specify FROM clause in SELECT. So you could in theory do MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) ... I think the MySQL statement: REPLACE INTO table (pk, col1, col2, col3) VALUES (2, '-00-00', NULL, 3) would translate into the following MERGE statement: MERGE INTO table target USING (2 as pknew , NULL as col1new, NULL as col2new, 3 as col3new) source ON target.pknew = source.pk WHEN MATCHED THEN UPDATE SET col1 = col1new, col2 = col2new, col3 = col3new WHEN NOT MATCHED THEN INSERT (pk, col1, col2, col3) VALUES (pknew, col1new, col2new, col3new) It might not be the most elegant solution, but I don't see why it won't work. Jochem ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] MERGE vs REPLACE
I think you translated it correctly, MySQL has another way of specifying this which is INSERT ... ON DUPLICATE KEY UPDATE ... (http://dev.mysql.com/doc/refman/5.0/en/insert.html) Regards Paolo Jochem van Dieten [EMAIL PROTECTED] ha scritto On 11/13/05, Petr Jelinek wrote: I am really not db expert and I don't have copy of sql standard but you don't need to use 2 tables I think - USING part can also be subquery (some SELECT) and if I am right then you could simulate what REPLACE does because in PostgreSQL you are not forced to specify FROM clause in SELECT. So you could in theory do MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) ... I think the MySQL statement: REPLACE INTO table (pk, col1, col2, col3) VALUES (2, '-00-00', NULL, 3) would translate into the following MERGE statement: MERGE INTO table target USING (2 as pknew , NULL as col1new, NULL as col2new, 3 as col3new) source ON target.pknew = source.pk WHEN MATCHED THEN UPDATE SET col1 = col1new, col2 = col2new, col3 = col3new WHEN NOT MATCHED THEN INSERT (pk, col1, col2, col3) VALUES (pknew, col1new, col2new, col3new) It might not be the most elegant solution, but I don't see why it won't work. Jochem ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] MERGE vs REPLACE
See 'merge_db' in http://lnk.nu/postgresql.org/5sl.html On Fri, Nov 11, 2005 at 10:07:07PM -0500, Rod Taylor wrote: On Fri, 2005-11-11 at 18:36 -0500, [EMAIL PROTECTED] wrote: On Fri, Nov 11, 2005 at 06:00:32PM -0500, Rod Taylor wrote: So? That is what save points are for. You can even skip the select for update if you don't mind dead tuples from the attempted insert. SELECT ... FOR UPDATE; IF not exists THEN SAVEPOINT; INSERT ; IF UNIQUE VIOLATION THEN /* Someone else inserted between the SELECT and our INSERT */ ROLLBACK TO SAVEPOINT; UPDATE; ELSE RELEASE SAVEPOINT; FI ELSE UPDATE; FI Isn't there still a race between INSERT and UPDATE? I suppose there is although I hadn't noticed before. I've never run into it and always check to ensure the expected number of tuples were touched by the update or delete. Within the PostgreSQL backend you might get away with having your insert hold a lock on the index page and follow it up with a FOR UPDATE lock on the offending tuple thus ensuring that your update will succeed. If you hack index mechanisms for the support you don't need the SAVEPOINT either -- just don't throw an error when you run across the existing entry. For client side code one possibility is to repeat until successful. WHILE SELECT FOR UPDATE; IF NOT EXISTS THEN SAVEPOINT INSERT; IF UNIQUE VIOLATION THEN ROLLBACK TO SAVEPOINT; ELSE RELEASE SAVEPOINT EXIT; FI ELSE UPDATE; EXIT; END -- Check for infinite loop END -- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MERGE vs REPLACE
On Fri, Nov 11, 2005 at 03:42:38PM -0500, Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: It seems to me that it has always been implicitly assumed around here that the MERGE command would be a substitute for a MySQL-like REPLACE functionality. After rereading the spec it seems that this is not the case. MERGE always operates on two different tables, which REPLACE doesn't do. Normally I'd plump for following the standard ... but AFAIR, we have had bucketloads of requests for REPLACE functionality, and not one request for spec-compatible MERGE. If, as it appears, full-spec MERGE is also a whole lot harder and slower than REPLACE, it seems that we could do worse than to concentrate on doing REPLACE for now. (We can always come back to MERGE some other day.) I suspect a lot of those requests are from people who actually want merge and don't realize that mysql has a replace. On another note, is there any reason we can't put an equivalent to example 36-1 (http://lnk.nu/postgresql.org/617.html) into the backend? Presumably it wouldn't be as fast as a more elegant solution, but OTOH it'd probably be faster than plpgsql... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MERGE vs REPLACE
On Sun, 2005-11-13 at 23:56 +0100, Martijn van Oosterhout wrote: On Sun, Nov 13, 2005 at 11:32:47PM +0100, Petr Jelinek wrote: I am really not db expert and I don't have copy of sql standard but you don't need to use 2 tables I think - USING part can also be subquery (some SELECT) and if I am right then you could simulate what REPLACE does because in PostgreSQL you are not forced to specify FROM clause in SELECT. So you could in theory do MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) ... But I am not sure if this is what you want. Well, the obvious extension to this is that the extire USING clause is in fact optional: MERGE INTO tablename ON id = 1 ... Which starts looking a lot simpler. ...choosing a place to jump in is a little hard in this thread...so I'll jump in with some general info and thoughts on topics so far: MERGE is useful both for OLTP systems and for Data Warehousing, where it is sometimes known as the UPSERT. The MERGE statement in SQL:2003 requires a target table and a table statement. I don't see anything in that to always require two separate tables - this is just the same as a self-referencing INSERT SELECT statement. The USING clause is also a compulsory part of SQL:2003. One of the more interesting ways to use MERGE is with Oracle external tables. The same idea for us would be to have MERGE become a variant of the PostgreSQL COPY FROM command. That would be very cool. The above is the reason why MERGE doesn't seem to provide for external data being passed, as does INSERT or MySQL REPLACE. Neither DB2 or Oracle perform predicate locking. DB2 is more cautious, and some would say more efficient, thats all. PostgreSQL's locking features are just fine for pragmatic implementation of MERGE, AFAICS. Where there is doubt, we should fall back to table locking just like the rest of the world, IMHO. Making this work with partitioning will be hard enough without overusing the predicate solving logic. The UPSERT concept is also supported by Teradata, who simply append an ELSE INSERT clause onto the standard UPDATE syntax. MySQL REPLACE seems to me to be a fairly small subset of MERGE functionality and we ought to be able to offer that functionality as a side branch of the main work. I've been looking at ways of doing INSERT then UPDATE, but it doesn't seem very easy to avoid unique index violations in that case. So doing the UPDATE first then INSERTs later seems like the way to go. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] MERGE vs REPLACE
On Saturday 12 November 2005 04:06, Matteo Beccati wrote: Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: It seems to me that it has always been implicitly assumed around here that the MERGE command would be a substitute for a MySQL-like REPLACE functionality. After rereading the spec it seems that this is not the case. MERGE always operates on two different tables, which REPLACE doesn't do. Normally I'd plump for following the standard ... but AFAIR, we have had bucketloads of requests for REPLACE functionality, and not one request for spec-compatible MERGE. If, as it appears, full-spec MERGE is also a whole lot harder and slower than REPLACE, it seems that we could do worse than to concentrate on doing REPLACE for now. (We can always come back to MERGE some other day.) I would also like to add that MySQL's REPLACE is not exactly an INSERT OR UPDATE, rather and INSERT OR (DELETE then INSERT): I mean that the fields not specified in the query are set to their defaults: i.e. CREATE TABLE t (a int PRIMARY KEY, b int, c int); INSERT INTO t (a, b, c) VALUES (1, 1, 2); SELECT * FROM t; +---+--+--+ | a | b| c| +---+--+--+ | 1 |1 |2 | +---+--+--+ REPLACE INTO t (a, b) VALUES (1, 1); SELECT * FROM t; +---+--+--+ | a | b| c| +---+--+--+ | 1 |1 | NULL | +---+--+--+ I wanted to point it out this because people are commonly mistaking this. Wow, that seems ugly maybe there's a reason for it, but I'm not sure we could deviate from my$ql's behavior on this even if we wanted... they are the standard here. -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] MERGE vs REPLACE
On 11/13/05, Robert Treat [EMAIL PROTECTED] wrote: On Saturday 12 November 2005 04:06, Matteo Beccati wrote: | 1 |1 | NULL | Wow, that seems ugly maybe there's a reason for it, but I'm not sure we could deviate from my$ql's behavior on this even if we wanted... they are the standard here. I don't think that's ugly, I think that's exactly working as advertised. Replace behaves exactly like deleting the record with the matching primary key and inserting the provided input. ... not merging together old data with new. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] MERGE vs REPLACE
On Sunday 13 November 2005 10:01, Gregory Maxwell wrote: On 11/13/05, Robert Treat [EMAIL PROTECTED] wrote: On Saturday 12 November 2005 04:06, Matteo Beccati wrote: | 1 |1 | NULL | Wow, that seems ugly maybe there's a reason for it, but I'm not sure we could deviate from my$ql's behavior on this even if we wanted... they are the standard here. I don't think that's ugly, I think that's exactly working as advertised. Replace behaves exactly like deleting the record with the matching primary key and inserting the provided input. ... not merging together old data with new. I disagree in that REPLACE is advertised as a solution for the INSERT else UPDATE problem, but has a different behavior than a true INSERT else UPDATE would produce. Maybe that's a problem with the implementation, or maybe it's a problem in the advertisment, but there is certainly a discrepency there. -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MERGE vs REPLACE
Robert Treat [EMAIL PROTECTED] writes: I disagree in that REPLACE is advertised as a solution for the INSERT else UPDATE problem, but has a different behavior than a true INSERT else UPDATE would produce. Maybe that's a problem with the implementation, or maybe it's a problem in the advertisment, but there is certainly a discrepency there. Yeah. REPLACE fails to solve common examples like a web hit counter (if key doesn't exist, insert row with count 1; if it does exist, add 1 to the current count). IIRC, SQL's MERGE deals with this by offering two quite separate specifications of what to do when there is or isn't already a matching row. I don't necessarily feel that we have to slavishly duplicate what MySQL offers. I do think that it's reasonable to restrict the functionality to updating/replacing a row with matching primary key --- that gets us out of the problem of needing a full predicate-locking mechanism, while still covering most all of the practical use-cases that I can see. It'd be useful to look at what comparable functionality is offered by other DBs besides MySQL. Anyone know what DB2 or Oracle have in this area? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] MERGE vs REPLACE
Tom Lane wrote: It'd be useful to look at what comparable functionality is offered by other DBs besides MySQL. Anyone know what DB2 or Oracle have in this area? IIRC they both have MERGE. -- Regards Petr Jelinek (PJMODOS) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] MERGE vs REPLACE
Petr Jelinek wrote: Tom Lane wrote: It'd be useful to look at what comparable functionality is offered by other DBs besides MySQL. Anyone know what DB2 or Oracle have in this area? Oracle: http://www.psoug.org/reference/merge.html http://www.psoug.org/reference/translate_replace.html But the oracle replace seems completely different to the topic at hand. DB2: Merge: http://www.databasejournal.com/features/db2/article.php/10896_3322041_2 I was not able to easily find information on REPLACE. Joshua D. Drake -- The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] MERGE vs REPLACE
Tom Lane wrote: IIRC, SQL's MERGE deals with this by offering two quite separate specifications of what to do when there is or isn't already a matching row. In that regard, MERGE is quite flexible, but MERGE doesn't address the point of REPLACE, because MERGE requires *two* tables as input, whereas REPLACE only takes *one*. Unless someone can show that you can trick MERGE into doing the REPLACE job anyway, we're not discussing the same thing. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] MERGE vs REPLACE
Peter Eisentraut wrote: In that regard, MERGE is quite flexible, but MERGE doesn't address the point of REPLACE, because MERGE requires *two* tables as input, whereas REPLACE only takes *one*. Unless someone can show that you can trick MERGE into doing the REPLACE job anyway, we're not discussing the same thing. I am really not db expert and I don't have copy of sql standard but you don't need to use 2 tables I think - USING part can also be subquery (some SELECT) and if I am right then you could simulate what REPLACE does because in PostgreSQL you are not forced to specify FROM clause in SELECT. So you could in theory do MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) ... But I am not sure if this is what you want. -- Regards Petr Jelinek (PJMODOS) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] MERGE vs REPLACE
On Sun, Nov 13, 2005 at 11:32:47PM +0100, Petr Jelinek wrote: I am really not db expert and I don't have copy of sql standard but you don't need to use 2 tables I think - USING part can also be subquery (some SELECT) and if I am right then you could simulate what REPLACE does because in PostgreSQL you are not forced to specify FROM clause in SELECT. So you could in theory do MERGE INTO tablename USING (SELECT 1 AS myid) ON (tablename.id = myid) ... But I am not sure if this is what you want. Well, the obvious extension to this is that the extire USING clause is in fact optional: MERGE INTO tablename ON id = 1 ... Which starts looking a lot simpler. BTW, my reading of the MERGE examples given earlier is that there no notes in there at all about guarenteeing concurrency. None of the documentation says that using MERGE will avoid duplicate key errors if someone else does the same thing concurrently. It seems more like a performence hack to avoid scanning the table twice. Basically, you could implement this by taking the USING clause, do a left outer join with the merge table and for the blank rows fill in a CTID for insert and instead of NULLs the values of the INSERT portion. Which is kind of a bummer for the people who want to do the insert zero if not there else add 1 thing a lot and expecting this to solve the concurrency for them. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgpFeF7xQDptG.pgp Description: PGP signature
Re: [HACKERS] MERGE vs REPLACE
Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: It seems to me that it has always been implicitly assumed around here that the MERGE command would be a substitute for a MySQL-like REPLACE functionality. After rereading the spec it seems that this is not the case. MERGE always operates on two different tables, which REPLACE doesn't do. Normally I'd plump for following the standard ... but AFAIR, we have had bucketloads of requests for REPLACE functionality, and not one request for spec-compatible MERGE. If, as it appears, full-spec MERGE is also a whole lot harder and slower than REPLACE, it seems that we could do worse than to concentrate on doing REPLACE for now. (We can always come back to MERGE some other day.) I would also like to add that MySQL's REPLACE is not exactly an INSERT OR UPDATE, rather and INSERT OR (DELETE then INSERT): I mean that the fields not specified in the query are set to their defaults: i.e. CREATE TABLE t (a int PRIMARY KEY, b int, c int); INSERT INTO t (a, b, c) VALUES (1, 1, 2); SELECT * FROM t; +---+--+--+ | a | b| c| +---+--+--+ | 1 |1 |2 | +---+--+--+ REPLACE INTO t (a, b) VALUES (1, 1); SELECT * FROM t; +---+--+--+ | a | b| c| +---+--+--+ | 1 |1 | NULL | +---+--+--+ I wanted to point it out this because people are commonly mistaking this. Best regards -- Matteo Beccati http://phpadsnew.com http://phppgads.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] MERGE vs REPLACE
It seems to me that it has always been implicitly assumed around here that the MERGE command would be a substitute for a MySQL-like REPLACE functionality. After rereading the spec it seems that this is not the case. MERGE always operates on two different tables, which REPLACE doesn't do. That said, what kind of support for insert-or-update-this-row do we want to provide, if any? Should it be a REPLACE command, an extension of the INSERT command, a modication of the MERGE syntax, or something else? -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] MERGE vs REPLACE
Peter Eisentraut wrote: It seems to me that it has always been implicitly assumed around here that the MERGE command would be a substitute for a MySQL-like REPLACE functionality. After rereading the spec it seems that this is not the case. MERGE always operates on two different tables, which REPLACE doesn't do. That said, what kind of support for insert-or-update-this-row do we want to provide, if any? Should it be a REPLACE command, an extension of the INSERT command, a modication of the MERGE syntax, or something else? MERGE of course, it's standard, REPLACE is mysql extension -- Regards Petr Jelinek (PJMODOS) www.parba.cz ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] MERGE vs REPLACE
On 11/11/05, Peter Eisentraut [EMAIL PROTECTED] wrote: It seems to me that it has always been implicitly assumed around here that the MERGE command would be a substitute for a MySQL-like REPLACE functionality. After rereading the spec it seems that this is not the case. MERGE always operates on two different tables, which REPLACE doesn't do. That said, what kind of support for insert-or-update-this-row do we want to provide, if any? Should it be a REPLACE command, an extension of the INSERT command, a modication of the MERGE syntax, or something else? -- Peter Eisentraut http://developer.postgresql.org/~petere/ MERGE seems to me the better option... not just because is standard but at least i can see some use cases for it... -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] MERGE vs REPLACE
Jaime Casanova wrote: MERGE seems to me the better option... not just because is standard but at least i can see some use cases for it... I don't think you understand my message: MERGE does not do what REPLACE does. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MERGE vs REPLACE
Peter Eisentraut [EMAIL PROTECTED] writes: It seems to me that it has always been implicitly assumed around here that the MERGE command would be a substitute for a MySQL-like REPLACE functionality. After rereading the spec it seems that this is not the case. MERGE always operates on two different tables, which REPLACE doesn't do. Normally I'd plump for following the standard ... but AFAIR, we have had bucketloads of requests for REPLACE functionality, and not one request for spec-compatible MERGE. If, as it appears, full-spec MERGE is also a whole lot harder and slower than REPLACE, it seems that we could do worse than to concentrate on doing REPLACE for now. (We can always come back to MERGE some other day.) regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] MERGE vs REPLACE
On 11/11/05, Peter Eisentraut [EMAIL PROTECTED] wrote: Jaime Casanova wrote: MERGE seems to me the better option... not just because is standard but at least i can see some use cases for it... I don't think you understand my message: MERGE does not do what REPLACE does. -- Peter Eisentraut http://developer.postgresql.org/~petere/ I understand you well... what i was trying to say is that i prefer MERGE (standard SQL command) to be done because the functionally it has (basically a merge of two tables) seems to me to be more usefull than REPLACE (MySql Command)... -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] MERGE vs REPLACE
Guys, I understand you well... what i was trying to say is that i prefer MERGE (standard SQL command) to be done because the functionally it has (basically a merge of two tables) seems to me to be more usefull than REPLACE (MySql Command)... But even REPLACE requires predicate locking. There's no real way to get around it. --Josh -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] MERGE vs REPLACE
On 11/11/05, Josh Berkus josh@agliodbs.com wrote: Guys, I understand you well... what i was trying to say is that i prefer MERGE (standard SQL command) to be done because the functionally it has (basically a merge of two tables) seems to me to be more usefull than REPLACE (MySql Command)... But even REPLACE requires predicate locking. There's no real way to get around it. --Josh why? seems that REPLACE only work if there are at least one row matching... -- Atentamente, Jaime Casanova (DBA: DataBase Aniquilator ;) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] MERGE vs REPLACE
Jaime, why? seems that REPLACE only work if there are at least one row matching... Scenario: session1: REPLACE 1 session2: REPLACE . 1 session1: check to see that 1 exists no session2: check to see that 1 exists no session1: INSERT 1 session2: INSERT 1 ERROR Get the picture? The only way to avoid a race condition is to be able to do predicate locking, that is to lock the table against any data write matching that predicate. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MERGE vs REPLACE
On Fri, 2005-11-11 at 14:40 -0800, Josh Berkus wrote: Jaime, why? seems that REPLACE only work if there are at least one row matching... Get the picture? The only way to avoid a race condition is to be able to do predicate locking, that is to lock the table against any data write matching that predicate. So? That is what save points are for. You can even skip the select for update if you don't mind dead tuples from the attempted insert. SELECT ... FOR UPDATE; IF not exists THEN SAVEPOINT; INSERT ; IF UNIQUE VIOLATION THEN /* Someone else inserted between the SELECT and our INSERT */ ROLLBACK TO SAVEPOINT; UPDATE; ELSE RELEASE SAVEPOINT; FI ELSE UPDATE; FI -- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] MERGE vs REPLACE
Josh Berkus josh@agliodbs.com writes: But even REPLACE requires predicate locking. There's no real way to get around it. The point though is that REPLACE is restricted to a type of predicate narrow enough to be enforced through a unique-index mechanism, and so it's implementable without solving the general case of predicate locking. Predicate locking for narrow cases isn't very hard; it's the general case of arbitrary predicates that's hard. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] MERGE vs REPLACE
On Fri, 11 Nov 2005, Josh Berkus wrote: Jaime, why? seems that REPLACE only work if there are at least one row matching... Scenario: session1: REPLACE 1 session2: REPLACE . 1 session1: check to see that 1 exists no session2: check to see that 1 exists no session1: INSERT 1 session2: INSERT 1 ERROR Get the picture? The only way to avoid a race condition is to be able to do predicate locking, that is to lock the table against any data write matching that predicate. When it comes to predicate locking, I think we should defer to Peter's comment at Open DB Con: http://www.treehou.se/~swm/peter_merge.jpg Gavin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] MERGE vs REPLACE
On Fri, Nov 11, 2005 at 06:00:32PM -0500, Rod Taylor wrote: So? That is what save points are for. You can even skip the select for update if you don't mind dead tuples from the attempted insert. SELECT ... FOR UPDATE; IF not exists THEN SAVEPOINT; INSERT ; IF UNIQUE VIOLATION THEN /* Someone else inserted between the SELECT and our INSERT */ ROLLBACK TO SAVEPOINT; UPDATE; ELSE RELEASE SAVEPOINT; FI ELSE UPDATE; FI Isn't there still a race between INSERT and UPDATE? Low probability, for sure, as it would have had to not exist, then exist, then not exist, but still possible. I'd like a REPLACE that could be safe, or at least cause a COMMIT to fail, for this reason. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] MERGE vs REPLACE
On Fri, 2005-11-11 at 18:36 -0500, [EMAIL PROTECTED] wrote: On Fri, Nov 11, 2005 at 06:00:32PM -0500, Rod Taylor wrote: So? That is what save points are for. You can even skip the select for update if you don't mind dead tuples from the attempted insert. SELECT ... FOR UPDATE; IF not exists THEN SAVEPOINT; INSERT ; IF UNIQUE VIOLATION THEN /* Someone else inserted between the SELECT and our INSERT */ ROLLBACK TO SAVEPOINT; UPDATE; ELSE RELEASE SAVEPOINT; FI ELSE UPDATE; FI Isn't there still a race between INSERT and UPDATE? I suppose there is although I hadn't noticed before. I've never run into it and always check to ensure the expected number of tuples were touched by the update or delete. Within the PostgreSQL backend you might get away with having your insert hold a lock on the index page and follow it up with a FOR UPDATE lock on the offending tuple thus ensuring that your update will succeed. If you hack index mechanisms for the support you don't need the SAVEPOINT either -- just don't throw an error when you run across the existing entry. For client side code one possibility is to repeat until successful. WHILE SELECT FOR UPDATE; IF NOT EXISTS THEN SAVEPOINT INSERT; IF UNIQUE VIOLATION THEN ROLLBACK TO SAVEPOINT; ELSE RELEASE SAVEPOINT EXIT; FI ELSE UPDATE; EXIT; END -- Check for infinite loop END -- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly