[HACKERS] [ANNOUNCE] PGDay.IT 2015 - Call for Papers
The 9th edition of the Italian PostgreSQL Day ( PGDay.it 2015 ) will be held on Friday , 23th October 2015 in Prato, Italy. The International Call for Papers has opened and will close on 8th August 2015. For more information about the conference and the International Call for Paper, please visit : *http://2015.pgday.it/ http://2015.pgday.it/* Information in English for papers submission is available at : http://2015.pgday.it/?page_id=43 For any kind of information, do not hesitate to contact the committee via email at *pg...@itpug.org* pg...@itpug.org . Kind Regards -- Matteo Durighetto - - - - - - - - - - - - - - - - - - - - - - - Italian PostgreSQL User Group http://www.itpug.org/index.it.html Italian Community for Geographic Free/Open-Source Software http://www.gfoss.it
Re: [HACKERS] Proposal: Incremental Backup
2014-08-01 18:20 GMT+02:00 Claudio Freire klaussfre...@gmail.com: On Fri, Aug 1, 2014 at 12:35 AM, Amit Kapila amit.kapil...@gmail.com wrote: c) the map is not crash safe by design, because it needs only for incremental backup to track what blocks needs to be backuped, not for consistency or recovery of the whole cluster, so it's not an heavy cost for the whole cluster to maintain it. we could think an option (but it's heavy) to write it at every flush on file to have crash-safe map, but I not think it's so usefull . I think it's acceptable, and probably it's better to force that, to say: if your db will crash, you need a fullbackup , I am not sure if your this assumption is right/acceptable, how can we say that in such a case users will be okay to have a fullbackup? In general, taking fullbackup is very heavy operation and we should try to avoid such a situation. Besides, the one taking the backup (ie: script) may not be aware of the need to take a full one. It's a bad design to allow broken backups at all, IMNSHO. Hi Claudio, thanks for your observation First: the case it's after a crash of a database, and it's not something happens every day or every week. It's something that happens in rare conditions, or almost my experience is so. If it happens very often probably there are other problems. Second: to avoid the problem to know if the db needed to have a full backup to rebuild the map we could think to write in the map header the backup reference (with an id and LSN reference for example ) so if the someone/something try to do an incremental backup after a crash, the map header will not have noone full backup listed [because it will be empty] , and automaticcaly switch to a full one. I think after a crash it's a good practice to do a full backup, to see if there are some problems on files or on filesystems, but if I am wrong I am happy to know :) . Remember that I propose a map in ram to reduce the impact on performances, but we could create an option to leave the choose to the user, if you want a crash safe map, at every flush will be updated also a map file , if not, the map will be in ram. Kind Regards Mat
Re: [HACKERS] Proposal: Incremental Backup
2014-07-31 8:26 GMT+02:00 Amit Kapila amit.kapil...@gmail.com: On Wed, Jul 30, 2014 at 7:00 PM, desmodemone desmodem...@gmail.com wrote: Hello, I think it's very useful an incremental/differential backup method, by the way the method has two drawbacks: 1) In a database normally, even if the percent of modify rows is small compared to total rows, the probability to change only some files /tables is small, because the rows are normally not ordered inside a tables and the update are random. If some tables are static, probably they are lookup tables or something like a registry, and normally these tables are small . 2) every time a file changed require every time to read all file. So if the point A is true, probably you are reading a large part of the databases and then send that part , instead of sending a small part. In my opinion to solve these problems we need a different implementation of incremental backup. I will try to show my idea about it. I think we need a bitmap map in memory to track the changed chunks of the file/s/table [ for chunk I mean an X number of tracked pages , to divide the every tracked files in chunks ], so we could send only the changed blocks from last incremental backup ( that could be a full for incremental backup ).The map could have one submaps for every tracked files, so it's more simple. So ,if we track with one bit a chunk of 8 page blocks ( 64KB) [ a chunk of 8 block is only an example] , If we use one map of 1Mbit ( 1Mbit are 125KB of memory ) we could track a table with a total size of 64Gb, probably we could use a compression algorithm because the map is done by 1 and 0 . This is a very simple idea, but it shows that the map does not need too much memory if we track groups of blocks i.e. chunk, obviously the problem is more complex, and probably there are better and more robust solutions. Probably we need more space for the header of map to track the informations about file and the last backup and so on. I think the map must be updated by the bgwriter , i.e. when it flushes the dirty buffers, Not only bgwriter, but checkpointer and backends as well, as those also flush buffers. Also there are some writes which are done outside shared buffers, you need to track those separately. Another point is that to track the changes due to hint bit modification, you need to enable checksums or wal_log_hints which will either lead to more cpu or I/O. fortunately we don't need this map for consistence of database, so we could create and manage it in memory to limit the impact on performance. The drawback is that If the db crashes or someone closes it , the next incremental backup will be full , we could think to flush the map to disk if the PostgreSQL will receive a signal of closing process or something similar. In this way we obtain : 1) we read only small part of a database ( the probability of a changed chunk are less the the changed of the whole file ) 2) we do not need to calculate the checksum, saving cpu 3) we save i/o in reading and writing ( we will send only the changed block from last incremental backup ) 4) we save network 5) we save time during backup. if we read and write less data, we reduce the time to do an incremental backup. 6) I think the bitmap map in memory will not impact too much on the performance of the bgwriter. What do you think about? I think with this method has 3 drawbacks compare to method proposed a. either enable checksum or wal_log_hints, so it will incur extra I/O if you enable wal_log_hints b. backends also need to update the map which though a small cost, but still ... c. map is not crash safe, due to which sometimes full back up is needed. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com Hi Amit, thank you for your comments . However , about drawbacks: a) It's not clear to me why the method needs checksum enable, I mean, if the bgwriter or another process flushes a dirty buffer, it's only have to signal in the map that the blocks are changed with an update of the value from 0 to 1.They not need to verify the checksum of the block, we could assume that when a dirty buffers is flushed, the block is changed [ or better in my idea, the chunk of N blocks ]. We could think an advanced setting that verify the checksum, but I think will be heavier. b) yes the backends need to update the map, but it's in memory, and as I show, could be very small if we you chunk of blocks.If we not compress the map, I not think could be a bottleneck. c) the map is not crash safe by design, because it needs only for incremental backup to track what blocks needs to be backuped, not for consistency or recovery of the whole cluster, so it's not an heavy cost for the whole cluster to maintain it. we could think an option (but it's heavy) to write it at every flush on file to have crash-safe map
Re: [HACKERS] Proposal: Incremental Backup
2014-07-29 18:35 GMT+02:00 Marco Nenciarini marco.nenciar...@2ndquadrant.it : Il 25/07/14 20:44, Robert Haas ha scritto: On Fri, Jul 25, 2014 at 2:21 PM, Claudio Freire klaussfre...@gmail.com wrote: On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: 1. Proposal = Our proposal is to introduce the concept of a backup profile. The backup profile consists of a file with one line per file detailing tablespace, path, modification time, size and checksum. Using that file the BASE_BACKUP command can decide which file needs to be sent again and which is not changed. The algorithm should be very similar to rsync, but since our files are never bigger than 1 GB per file that is probably granular enough not to worry about copying parts of files, just whole files. That wouldn't nearly as useful as the LSN-based approach mentioned before. I've had my share of rsyncing live databases (when resizing filesystems, not for backup, but the anecdotal evidence applies anyhow) and with moderately write-heavy databases, even if you only modify a tiny portion of the records, you end up modifying a huge portion of the segments, because the free space choice is random. There have been patches going around to change the random nature of that choice, but none are very likely to make a huge difference for this application. In essence, file-level comparisons get you only a mild speed-up, and are not worth the effort. I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids the I/O of inspecting the LSN of entire segments (necessary optimization for huge multi-TB databases) and backups only the portions modified when segments do contain changes, so it's the best of both worlds. Any partial implementation would either require lots of I/O (LSN only) or save very little (file only) unless it's an almost read-only database. I agree with much of that. However, I'd question whether we can really seriously expect to rely on file modification times for critical data-integrity operations. I wouldn't like it if somebody ran ntpdate to fix the time while the base backup was running, and it set the time backward, and the next differential backup consequently omitted some blocks that had been modified during the base backup. Our proposal doesn't rely on file modification times for data integrity. We are using the file mtime only as a fast indication that the file has changed, and transfer it again without performing the checksum. If timestamp and size match we rely on *checksums* to decide if it has to be sent. In SMART MODE we would use the file mtime to skip the checksum check in some cases, but it wouldn't be the default operation mode and it will have all the necessary warnings attached. However the SMART MODE isn't a core part of our proposal, and can be delayed until we agree on the safest way to bring it to the end user. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it Hello, I think it's very useful an incremental/differential backup method, by the way the method has two drawbacks: 1) In a database normally, even if the percent of modify rows is small compared to total rows, the probability to change only some files /tables is small, because the rows are normally not ordered inside a tables and the update are random. If some tables are static, probably they are lookup tables or something like a registry, and normally these tables are small . 2) every time a file changed require every time to read all file. So if the point A is true, probably you are reading a large part of the databases and then send that part , instead of sending a small part. In my opinion to solve these problems we need a different implementation of incremental backup. I will try to show my idea about it. I think we need a bitmap map in memory to track the changed chunks of the file/s/table [ for chunk I mean an X number of tracked pages , to divide the every tracked files in chunks ], so we could send only the changed blocks from last incremental backup ( that could be a full for incremental backup ).The map could have one submaps for every tracked files, so it's more simple. So ,if we track with one bit a chunk of 8 page blocks ( 64KB) [ a chunk of 8 block is only an example] , If we use one map of 1Mbit ( 1Mbit are 125KB of memory ) we could track a table with a total size of 64Gb, probably we could use a compression algorithm because the map is done by 1 and 0 . This is a very simple idea, but it shows that the map does not need too much memory if we track groups of blocks i.e. chunk, obviously the problem is more complex, and probably there are better and more robust solutions. Probably we need more space for the header of map to track the
Re: [HACKERS] In-Memory Columnar Store
2013/12/9 knizhnik knizh...@garret.ru Hello! I want to annouce my implementation of In-Memory Columnar Store extension for PostgreSQL: Documentation: http://www.garret.ru/imcs/user_guide.html Sources: http://www.garret.ru/imcs-1.01.tar.gz Any feedbacks, bug reports and suggestions are welcome. Vertical representation of data is stored in PostgreSQL shared memory. This is why it is important to be able to utilize all available physical memory. Now servers with Tb or more RAM are not something exotic, especially in financial world. But there is limitation in Linux with standard 4kb pages for maximal size of mapped memory segment: 256Gb. It is possible to overcome this limitation either by creating multiple segments - but it requires too much changes in PostgreSQL memory manager. Or just set MAP_HUGETLB flag (assuming that huge pages were allocated in the system). I found several messages related with MAP_HUGETLB flag, the most recent one was from 21 of November: http://www.postgresql.org/message-id/20131125032920.ga23...@toroid.org I wonder what is the current status of this patch? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers Hello, excellent work! I begin to do testing and it's very fast, by the way I found a strange case of endless query with CPU a 100% when the value used as filter does not exists: I am testing with postgres 9.3.1 on debian and I used default value for the extension except memory ( 512mb ) how to recreate the test case : ## create a table : create table endless ( col1 int , col2 char(30) , col3 int ) ; ## insert some values: insert into endless values ( 1, 'ahahahaha', 3); insert into endless values ( 2, 'ghghghghg', 4); ## create the column store objects: select cs_create('endless','col1','col2'); cs_create --- (1 row) ## try and test column store : select cs_avg(col3) from endless_get('ahahahaha'); cs_avg 3 (1 row) select cs_avg(col3) from endless_get('ghghghghg'); cs_avg 4 (1 row) ## now select with a value that does not exist : select cs_avg(col3) from endless_get('testing'); # and now start to loop on cpu and seems to never ends , I had to terminate backend Bye Mat
[HACKERS] Fixed Cardinality estimation with equality predicates between column of the same table
Hi all, I see a strange behavior ( for me ) on 9.2 (but seems the same on 9.1 and 9.3) of the optimizer on query like that : /* create a table with random data and 2 rows */ create table test1 ( id int not null primary key, state1 int not null default 0, state2 int not null default 0, state3 int not null default 0 ); insert into test1 (id, state1, state2, state3) select i, (random()*3)::int, (random())::int, (random()*100)::int from generate_series (1, 2) as gs(i) ; analyze test1 ; /* between same columns */ explain select * from test1 where state1=state1 ; QUERY PLAN -- Seq Scan on test1 (cost=0.00..359.00 rows=100 width=16) Filter: (state1 = state1) (2 rows) test3=# explain select * from test1 where state2=state2 ; QUERY PLAN -- Seq Scan on test1 (cost=0.00..359.00 rows=100 width=16) Filter: (state2 = state2) (2 rows) /* between different columns of same table */ test3=# explain select * from test1 where state1=state2 ; QUERY PLAN -- Seq Scan on test1 (cost=0.00..359.00 rows=100 width=16) Filter: (state1 = state2) (2 rows) === /* create a table with random data and 10 rows to verify */ create table test2 ( id int not null primary key, state1 int not null default 0, state2 int not null default 0, state3 int not null default 0 ); insert into test2 (id, state1, state2, state3) select i, (random()*3)::int, (random())::int, (random()*100)::int from generate_series (1, 10) as gs(i) ; test3=# analyze test2 ; ANALYZE test3=# explain select * from test2 where state1=state3;QUERY PLAN --- Seq Scan on test2 (cost=0.00..1791.00 rows=500 width=16) Filter: (state1 = state3) (2 rows) test3=# explain select * from test2 where state1=state2; QUERY PLAN --- Seq Scan on test2 (cost=0.00..1791.00 rows=500 width=16) Filter: (state1 = state2) (2 rows) test3=# explain select * from test2 where state1=state1; QUERY PLAN --- Seq Scan on test2 (cost=0.00..1791.00 rows=500 width=16) Filter: (state1 = state1) (2 rows) It's seems always 0.5% of the rows , and it seems indipendent of the type of data you have in row : /*add a column where costant value named c3 */ alter table test1 add c3 int default 1 ; ALTER TABLE analyze test1 ; ANALYZE explain select * from test1 where state1=c3; QUERY PLAN -- Seq Scan on test1 (cost=0.00..378.00 rows=100 width=20) Filter: (state1 = c3) (2 rows) /*add a column where costant value named c3 */ alter table test2 add c3 int default 1 ; ALTER TABLE analyze test2 ; ANALYZE explain select * from test2 where state1=c3; QUERY PLAN --- Seq Scan on test2 (cost=0.00..1887.00 rows=500 width=20) Filter: (state1 = c3) (2 rows) /* add another constant column */ test3=# alter table test2 add c4 int default 1 ; ALTER TABLE test3=# analyze test2 ; ANALYZE test3=# explain select * from test2 where c3=c4 ; QUERY PLAN --- Seq Scan on test2 (cost=0.00..1887.00 rows=500 width=24) Filter: (c3 = c4) obviously the statistics are ok : Always 0.5%. Greetings Matteo
Re: [HACKERS] Postgresql 9.1 replication failing
Hello Jim, I think you not have other possibilities if the archives are corrupted and there are no possibilities to restore it, you need to recreate the standby starting from a base backup. Kind Regards 2011/12/1 Jim Buttafuoco j...@contacttelecom.com Simon, What do you mean, start over with a base backup? Jim On Dec 1, 2011, at 4:08 PM, Simon Riggs wrote: On Thu, Dec 1, 2011 at 7:09 PM, Jim Buttafuoco j...@contacttelecom.comwrote: the WAL file on the master is long gone, how would one inspect the web segment? Any way to have PG move on? Regenerate the master. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services ___ Jim Buttafuoco j...@contacttelecom.com 603-647-7170 ext. - Office 603-490-3409 - Cell jimbuttafuoco - Skype
Re: [HACKERS] So, is COUNT(*) fast now?
2011/10/22 Andres Freund and...@anarazel.de On Friday, October 21, 2011 08:14:12 PM Robert Haas wrote: On Fri, Oct 21, 2011 at 2:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Oct 21, 2011 at 1:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: I don't know why you'd imagine that touching an index is free, or even cheap, CPU-wise. The whole point of the index-only optimization is to avoid I/O. When you try it on a case where there's no I/O to be saved, and no shared-buffers contention to be avoided, there's no way it's going to be a win. Well, call me naive, but I would have thought touching six times less data would make the operation run faster, not slower. It's not touching six times less data. It's touching the exact same number of tuples either way, just index tuples in one case and heap tuples in the other. Yeah, but it works out to fewer pages. But access to those is not sequential. I guess if you measure cache hit ratios the index scan will come out significantly worse. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers But access to those is not sequential yes, I am agree. In my opinion the problem is that. If the query needs to scan all the b-tree index without to access the table rows, the better way to read the index is like sequential one, in fact , query like count(*) or other not need the data are in order so I think we could read all blocks (better, only the leaf blocks) without to touching too much the branch blocks. For example query like this : select column_a from table ; is better to read the data from indexes like sequential For query like this : select column_a from table order by column_a ; is better to read the data from indexes in range scan from root block to first branch blocks and their leaf blocks, so we could save the sorting. Mat
Re: [HACKERS] BUG or strange behaviour of update on primary key
Hi there, I could workaround the behavior with deferred constraint, and it's ok, but as I show, I have different behavior for constraint with the same definition in two rdbms and Postgresql depends on the physical order of row (with the same definition of constraint NOT DEFERRABLE INITIALLY IMMEDIATE) , or better Postgresql seems to check for every row, even if the command is one (I am doing one update on all of rows) , right? . Moreover , in documentation the definition says that a not deferrable constraints will check after every command , not after every row of the command: http://www.postgresql.org/docs/9.1/static/sql-createtable.html DEFERRABLE NOT DEFERRABLE This controls whether the constraint can be deferred.* A constraint that is not deferrable will be checked immediately after every command*. Checking of constraints that are deferrable can be postponed until the end of the transaction (using the SET CONSTRAINTShttp://www.postgresql.org/docs/9.0/static/sql-set-constraints.html command). NOT DEFERRABLE is the default. Currently, only UNIQUE, PRIMARY KEY, EXCLUDE, and REFERENCES (foreign key) constraints accept this clause. NOT NULL and CHECK constraints are not deferrable. --- If this is historical buggy behavior for performance , I think we have to change the definition of NOT DEFERRABLE in documentation, because Postgresql is not checking at end of a dml, but for every row modified by the command or there is something needs a patch. Regards, Mat 2011/10/18 Robert Haas robertmh...@gmail.com On Mon, Oct 17, 2011 at 7:30 PM, desmodemone desmodem...@gmail.com wrote: Seems an Oracle bug not Postgresql one! I don't think it's a bug for it to work. It'd probably work in PostgreSQL too, if you inserted (2) first and then (1). It's just that, as Tom says, if you want it to be certain to work (rather than depending on the order in which the rows are inserted), you need the checks to be deferred. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] BUG or strange behaviour of update on primary key
2011/10/18 Robert Haas robertmh...@gmail.com On Mon, Oct 17, 2011 at 7:30 PM, desmodemone desmodem...@gmail.com wrote: Seems an Oracle bug not Postgresql one! I don't think it's a bug for it to work. It'd probably work in PostgreSQL too, if you inserted (2) first and then (1). It's just that, as Tom says, if you want it to be certain to work (rather than depending on the order in which the rows are inserted), you need the checks to be deferred. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company 2011/10/18 desmodemone desmodem...@gmail.com Hi there, I could workaround the behavior with deferred constraint, and it's ok, but as I show, I have different behavior for constraint with the same definition in two rdbms and Postgresql depends on the physical order of row (with the same definition of constraint NOT DEFERRABLE INITIALLY IMMEDIATE) , or better Postgresql seems to check for every row, even if the command is one (I am doing one update on all of rows) , right? . Moreover , in documentation the definition says that a not deferrable constraints will check after every command , not after every row of the command: http://www.postgresql.org/docs/9.1/static/sql-createtable.html DEFERRABLE NOT DEFERRABLE This controls whether the constraint can be deferred.* A constraint that is not deferrable will be checked immediately after every command*. Checking of constraints that are deferrable can be postponed until the end of the transaction (using the SET CONSTRAINTShttp://www.postgresql.org/docs/9.0/static/sql-set-constraints.html command). NOT DEFERRABLE is the default. Currently, only UNIQUE, PRIMARY KEY, EXCLUDE, and REFERENCES (foreign key) constraints accept this clause. NOT NULL and CHECK constraints are not deferrable. --- If this is historical buggy behavior for performance , I think we have to change the definition of NOT DEFERRABLE in documentation, because Postgresql is not checking at end of a dml, but for every row modified by the command or there is something needs a patch. Regards, Mat Hello there, I think I have find a limit of this workaround. Imagine I have two tables in Oracle or other rdbms with a foreign key between them : testup3 ( a int) primary key on a NOT DEFERRABLE INITIALLY IMMEDIATE ; testup4 ( a int) foreign key on a references testup3(a) ; For first table I could create this (to have a normal sql standard behavior on update with multiple rows) : testup3 ( a int) primary key on a DEFERRABLE INITIALLY IMMEDIATE ; By the way I could not create a foreign key on a DEFERRABLE constraint , in fact I obtain an error like this : ERROR: cannot use a deferrable unique constraint for referenced table So if I have a normal ERD schema with FK , I could not use the workaround of DEFERRABLE constraints . I found an old discussion on this : http://archives.postgresql.org/pgsql-hackers/2010-06/msg00151.php http://archives.postgresql.org/pgsql-hackers/2010-06/msg00168.php In my opinion it could be a big limitation for who want migrate applications or is developing applications on different db. Any suggest or idea ? Regards, Mat
[HACKERS] BUG or strange behaviour of update on primary key
Hello there, two guys of our developer team ( Lorenzo and Federico ) have seen a strange behaviour (in 8.4 and 9.1.1 ) on update, and I think is a bug or something really strange or I not understand correctly this behavior . I explain now ( begin transaction or auto commit is the same): create table testup ( a int ) ; alter table testup add primary key (a ) ; insert into testup values (1); insert into testup values (2); update testup set a=a+1 ; ERROR: duplicate key value violates unique constraint testup_pkey DETTAGLI: Key (a)=(2) already exists. by the way : test=# update testup set a=a-1 ; UPDATE 2 SUCCESFUL -- REVERSE ORDER -- Now create the same table with rows in reverse physical order: create table testup2 ( a int ) ; alter table testup2 add primary key (a ) ; insert into testup2 values (2) ; insert into testup2 values (1); update testup2 set a=a+1 ; UPDATE 2 SUCCESFUL by the way : test=# update testup2 set a=a-1 ; ERROR: duplicate key value violates unique constraint testup2_pkey DETTAGLI: Key (a)=(1) already exists. I have tested in Oracle 11gR1 and 11gR2 without the same behaviour : Oracle : SQL create table a ( b number ) ; Tabella creata. SQL alter table a add primary key (b) ; Tabella modificata. SQL insert into a values (1 ) ; Creata 1 riga. SQL insert into a values (2) ; Creata 1 riga. SQL commit ; Commit completato. SQL update a set b=b+1 ; Aggiornate 2 righe. SQL commit ; Commit completato. SQL update a set b=b-1; Aggiornate 2 righe. SQL commit; Commit completato. In MySQL 5.1.58 with InnoDB the behaviour is more strange (always for +1 and indipendent from the reverse order O_o) : mysql create table testup ( a int ) engine innodb ; Query OK, 0 rows affected (0.21 sec) mysql alter table testup add primary key (a) ; Query OK, 0 rows affected (0.34 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql insert into testup values (1) ; Query OK, 1 row affected (0.12 sec) mysql insert into testup values (2) ; Query OK, 1 row affected (0.15 sec) mysql commit ; Query OK, 0 rows affected (0.00 sec) mysql update testup set a=a+1 ; ERROR 1062 (23000): Duplicate entry '2' for key 'PRIMARY' (-- like postgres!) mysql update testup set a=a-1 ; Query OK, 2 rows affected (0.16 sec) Rows matched: 2 Changed: 2 Warnings: 0 mysql commit ; Query OK, 0 rows affected (0.00 sec) mysql update testup set a=a+1 ; ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY' mysql select * from testup ; +---+ | a | +---+ | 0 | | 1 | +---+ 2 rows in set (0.00 sec) -- REVERSE ORDER -- mysql truncate table testup ; Query OK, 0 rows affected (0.11 sec) mysql insert into testup values (2) ; Query OK, 1 row affected (0.12 sec) mysql insert into testup values (1) ; Query OK, 1 row affected (0.17 sec) mysql update testup set a=a+1 ; (-- O_O is tottaly different from postgres!) ERROR 1062 (23000): Duplicate entry '2' for key 'PRIMARY' mysql update testup set a=a-1 ; Query OK, 2 rows affected (0.16 sec) Rows matched: 2 Changed: 2 Warnings: 0 In MySql with Myisam is tottaly different and similar to Oracle : mysql create table testup_myisam ( a int ) engine myisam ; Query OK, 0 rows affected (0.17 sec) mysql insert into testup_myisam values (2) ; Query OK, 1 row affected (0.00 sec) mysql insert into testup_myisam values (1) ; Query OK, 1 row affected (0.00 sec) mysql update testup_myisam set a=a+1 ; Query OK, 2 rows affected (0.00 sec) Rows matched: 2 Changed: 2 Warnings: 0 mysql update testup_myisam set a=a-1 ; Query OK, 2 rows affected (0.00 sec) Rows matched: 2 Changed: 2 Warnings: 0 -- REVERSE ORDER -- mysql truncate table testup_myisam ; Query OK, 0 rows affected (0.00 sec) mysql insert into testup_myisam values (1) ; Query OK, 1 row affected (0.00 sec) mysql insert into testup_myisam values (2) ; Query OK, 1 row affected (0.00 sec) mysql update testup_myisam set a=a+1 ; Query OK, 2 rows affected (0.00 sec) Rows matched: 2 Changed: 2 Warnings: 0 mysql update testup_myisam set a=a-1 ; Query OK, 2 rows affected (0.00 sec) Rows matched: 2 Changed: 2 Warnings: 0 The problem for us is when we develop or migrate applications between different databases. By the way I think is not right that an update on the same set of rows will be successful or failed if the rows are ordered or not, no? I think it is something in correlation with visibility of rows in MVCC (update=insert + delete tuple). What do you think about? See you soon Regards, Mat
Re: [HACKERS] BUG or strange behaviour of update on primary key
Hello there Thanks Tom! By the way I find something very funny : Oracle 11gR2 : SQL create table testup ( a number ) ; Tabella creata. SQL alter table testup add primary key (a) NOT DEFERRABLE INITIALLY IMMEDIATE ; Tabella modificata. SQL insert into testup values (1 ) ; Creata 1 riga. SQL insert into testup values (2 ) ; Creata 1 riga. SQL commit ; Commit completato. SQL update testup set a=a+1 ; Aggiornate 2 righe. -- Oracle Bug ?? SQL commit ; Commit completato. Postgresql : create table testup ( a int ) ; alter table testup add primary key(a) NOT DEFERRABLE INITIALLY IMMEDIATE ; insert into testup values (1); insert into testup values (2); update testup set a=a+1 ; ERROR: duplicate key value violates unique constraint testup_pkey Like Tom correctly says : alter table testup DROP CONSTRAINT testup_pkey ; alter table testup add primary key(a) DEFERRABLE INITIALLY IMMEDIATE ; update testup set a=a+1 ; UPDATE 2 commit; Seems an Oracle bug not Postgresql one! Regards, Mat 2011/10/18 Tom Lane t...@sss.pgh.pa.us desmodemone desmodem...@gmail.com writes: create table testup ( a int ) ; alter table testup add primary key (a ) ; insert into testup values (1); insert into testup values (2); update testup set a=a+1 ; ERROR: duplicate key value violates unique constraint testup_pkey DETTAGLI: Key (a)=(2) already exists. If you want that to work reliably, you need to mark the primary key constraint as deferred. By default, uniqueness is checked immediately when a row is inserted or updated --- and here, when you update 1 to 2, it's not unique because the second row hasn't been visited yet. regards, tom lane