Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On Wed, Aug 27, 2014 at 3:53 PM, Michael Paquier wrote: > On Wed, Aug 27, 2014 at 3:41 PM, Andres Freund > wrote: > > Can you add it to the next CF? I'll try to look earlier, but can't > > promise anything. > > > > I very much would like this to get committed in some form or another. > Added it here to keep track of it: > https://commitfest.postgresql.org/action/patch_view?id=1563 > Attached is a fairly-refreshed patch that should be used as a base for the next commit fest. The following changes should be noticed: - Use of AccessExclusiveLock when swapping relfilenodes of an index and its concurrent entry instead of ShareUpdateExclusiveLock for safety. At the limit of my understanding, that's the consensus reached until now. - Cleanup of many comments and refresh of the documentation that was somewhat wrongly-formulated or shaped at some places - Addition of support for autocommit off in psql for REINDEX [ TABLE | INDEX ] CONCURRENTLY - Some more code cleanup.. I haven't been through the tab completion support for psql but looking at tab-completion.c this seems a bit tricky with the stuff related to CREATE INDEX CONCURRENTLY already present. Nothing huge though. Regards, -- Michael From 5e9125da5cbeb2f6265b68ff14cc70e4cb10d502 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Wed, 27 Aug 2014 10:56:02 +0900 Subject: [PATCH] Support for REINDEX CONCURRENTLY Fully supported patch, with regression and isolation tests, including documentation and support for autocommit 'off' in psql. This version uses an exclusive lock when swapping relations for safety. psql tab completion has not been added yet in this version as this is rather independent from the core version and actually psql makes things a bit tricky with CREATE INDEX CONCURRENTLY using the same keywords. --- doc/src/sgml/mvcc.sgml | 5 +- doc/src/sgml/ref/reindex.sgml | 160 +++- src/backend/catalog/index.c| 476 ++-- src/backend/catalog/toasting.c | 2 +- src/backend/commands/indexcmds.c | 821 ++--- src/backend/commands/tablecmds.c | 33 +- src/backend/executor/execUtils.c | 14 + src/backend/nodes/copyfuncs.c | 1 + src/backend/nodes/equalfuncs.c | 1 + src/backend/parser/gram.y | 17 +- src/backend/tcop/utility.c | 12 +- src/bin/psql/common.c | 17 + src/include/catalog/index.h| 19 +- src/include/commands/defrem.h | 7 +- src/include/nodes/parsenodes.h | 1 + .../isolation/expected/reindex-concurrently.out| 78 ++ src/test/isolation/isolation_schedule | 1 + src/test/isolation/specs/reindex-concurrently.spec | 40 + src/test/regress/expected/create_index.out | 57 ++ src/test/regress/sql/create_index.sql | 42 ++ 20 files changed, 1615 insertions(+), 189 deletions(-) create mode 100644 src/test/isolation/expected/reindex-concurrently.out create mode 100644 src/test/isolation/specs/reindex-concurrently.spec diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml index cd55be8..653b120 100644 --- a/doc/src/sgml/mvcc.sgml +++ b/doc/src/sgml/mvcc.sgml @@ -864,7 +864,8 @@ ERROR: could not serialize access due to read/write dependencies among transact Acquired by VACUUM (without FULL), - ANALYZE, CREATE INDEX CONCURRENTLY, and + ANALYZE, CREATE INDEX CONCURRENTLY, + REINDEX CONCURRENTLY, ALTER TABLE VALIDATE and other ALTER TABLE variants (for full details see ). @@ -1143,7 +1144,7 @@ ERROR: could not serialize access due to read/write dependencies among transact Page-level Locks - + In addition to table and row locks, page-level share/exclusive locks are used to control read/write access to table pages in the shared buffer diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml index cabae19..0b7a93c 100644 --- a/doc/src/sgml/ref/reindex.sgml +++ b/doc/src/sgml/ref/reindex.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation -REINDEX { INDEX | TABLE | DATABASE | SYSTEM } name [ FORCE ] +REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] name [ FORCE ] @@ -68,9 +68,22 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } nam An index build with the CONCURRENTLY option failed, leaving an invalid index. Such indexes are useless but it can be convenient to use REINDEX to rebuild them. Note that - REINDEX will not perform a concurrent build. To build the - index without interfering with production you should drop the index and - reissue the CREATE INDEX CONCURRENTLY command. + REINDEX will perform a concurrent build if + CONCURRENTLY is speci
Re: [HACKERS] pgcrypto: PGP armor headers
On 09/30/2014 06:39 PM, Marko Tiikkaja wrote: On 9/30/14 5:17 PM, Heikki Linnakangas wrote: I'm actually now leaning towards providing just a single function, pgp_armor_headers(text, key OUT text, value OUT text), which returns all the keys and values. That gives maximum flexibility, and leaves it up to the user to decide what to do with duplicate keys. It's pretty easy to use that to extract just a single header, too: What do you think? Attached patch implements that, but the docs and regression tests now need adjustment. (You forgot the patch, but I can imagine what it would have been.) Oops. I'm not exactly sure to be honest. I would personally like to see a simple function for extracting a single header value in a scalar context without having to deal with all the pain of SRFs, multiple matches and no matches. Sure, that got a lot better in 9.3 with LATERAL but it's still way inferior to pgp_armor_header(). I can also see why someone would argue that I should just create the function myself and that it doesn't have to be shipped with postgres. But on the other hand, this is already an extension one has to explicitly go and CREATE, and that considered one more function probably wouldn't hurt too much. Yeah, building the function to extract a single value is pretty simple once you have the set-returning function: create function pgp_armor_header(armor text, key text) returns text language sql as $$ select string_agg(value, ' ') from pgp_armor_headers($1) where key = $2 $$; I spent a little time cleaning up the regression tests and docs, and ended up with the attached. But then I realized that there's a problem with UTF-8 conversion in the armor() function. It returns the armored blob as text, but forcibly converts the keys and values to UTF-8. That's not cool, because you will get invalidly encoded strings into the database, if you use the function while connected to a database that uses some other encoding than UTF-8. RFC4880 says that the headers are in UTF-8, but armor() cannot safely return UTF-8 encoded text unless the database's encoding is also UTF-8. It also rightly says that using anything else than plain ASCII, even though nominally it's UTF-8, is a bad idea, because the whole point of ASCII-armoring is to make the format safe from encoding conversions. We have two options: 1. Throw an error if there are any non-ASCII characters in the key/value arrays. 2. Don't convert them to UTF-8, but use the current database encoding. Both seem sane to me. If we use the current database encoding, then we have to also decide what to do with the input, in pgp_armor_headers(). If armor() uses the database encoding, but pgp_armor_headers() treats the input as UTF-8, then a round-trip with pgp_armor_headers(armor(?)) won't work. - Heikki diff --git a/contrib/pgcrypto/Makefile b/contrib/pgcrypto/Makefile index b6c9484..d1d75ca 100644 --- a/contrib/pgcrypto/Makefile +++ b/contrib/pgcrypto/Makefile @@ -26,7 +26,7 @@ MODULE_big = pgcrypto OBJS = $(SRCS:.c=.o) $(WIN32RES) EXTENSION = pgcrypto -DATA = pgcrypto--1.1.sql pgcrypto--1.0--1.1.sql pgcrypto--unpackaged--1.0.sql +DATA = pgcrypto--1.2.sql pgcrypto--1.1--1.2.sql pgcrypto--1.0--1.1.sql pgcrypto--unpackaged--1.0.sql PGFILEDESC = "pgcrypto - cryptographic functions" REGRESS = init md5 sha1 hmac-md5 hmac-sha1 blowfish rijndael \ diff --git a/contrib/pgcrypto/expected/pgp-armor.out b/contrib/pgcrypto/expected/pgp-armor.out index c955494..bcf9590 100644 --- a/contrib/pgcrypto/expected/pgp-armor.out +++ b/contrib/pgcrypto/expected/pgp-armor.out @@ -102,3 +102,245 @@ em9va2E= -END PGP MESSAGE- '); ERROR: Corrupt ascii-armor +-- corrupt (no space after the colon) +select * from pgp_armor_headers(' +-BEGIN PGP MESSAGE- +foo: + +em9va2E= += +-END PGP MESSAGE- +'); +ERROR: Corrupt ascii-armor +-- header with empty value +select * from pgp_armor_headers(' +-BEGIN PGP MESSAGE- +foo: + +em9va2E= += +-END PGP MESSAGE- +'); + key | value +-+--- + foo | +(1 row) + +-- simple +select * from pgp_armor_headers(' +-BEGIN PGP MESSAGE- +fookey: foovalue +barkey: barvalue + +em9va2E= += +-END PGP MESSAGE- +'); + key | value ++-- + fookey | foovalue + barkey | barvalue +(2 rows) + +-- insane keys, part 1 +select * from pgp_armor_headers(' +-BEGIN PGP MESSAGE- +insane:key : + +em9va2E= += +-END PGP MESSAGE- +'); + key | value +-+--- + insane:key | +(1 row) + +-- insane keys, part 2 +select * from pgp_armor_headers(' +-BEGIN PGP MESSAGE- +insane:key : text value here + +em9va2E= += +-END PGP MESSAGE- +'); + key | value +-+- + insane:key | text value here +(1 row) + +-- long value +select * from pgp_armor_headers(' +-BEGIN PGP MESSAGE- +long: this value is more than 76 characters long, but it should still
[HACKERS] "Value locking" Wiki page
The current approach to "value locking" remains a controversial aspect of my INSERT ... ON CONFLICT UPDATE patch. I must admit that this is a very complicated area, and it's difficult to keep things straight, particularly with the relevant discussion scattered all over the place. In the hope of unblocking things, I have created this Wiki page, which details the advantages and disadvantages of all 3 approaches that have been discussed, as suggested by myself and others: https://wiki.postgresql.org/wiki/Value_locking I actually think that we're only in disagreement on the extent to which the stated advantages and disadvantages are useful or harmful. So I think that at the moment this description is a reasonably balanced handling of the issues. Please add any points that I missed. This is by no means complete yet. There is pretty limited handling of what I call "#3. "Promise" index tuples", because there was no prototype version of that design, and there are many open questions about how it would be implemented relative to the other 2 approaches. Perhaps Andres or Simon would care to improve it. I think I've done a better job of describing #2, Heikki's design - hardly surprising, since there was a prototype that we discussed at length - but I'd appreciate it if other people would make sure that I have everything right there. This is hopefully the beginning of working the issues out. I have of course also described my own proposed design alongside the others. Please edit and add to my words as you see fit. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "Value locking" Wiki page
On 2014-10-01 01:44:04 -0700, Peter Geoghegan wrote: > In the hope of unblocking things, I have created this Wiki page, which > details the advantages and disadvantages of all 3 approaches that have > been discussed, as suggested by myself and others: > > https://wiki.postgresql.org/wiki/Value_locking Good. > I actually think that we're only in disagreement on the extent to > which the stated advantages and disadvantages are useful or harmful. > So I think that at the moment this description is a reasonably > balanced handling of the issues. Please add any points that I missed. > > This is by no means complete yet. There is pretty limited handling of > what I call "#3. "Promise" index tuples", because there was no > prototype version of that design, and there are many open questions > about how it would be implemented relative to the other 2 approaches. > Perhaps Andres or Simon would care to improve it. I think I've done a > better job of describing #2, Heikki's design - hardly surprising, > since there was a prototype that we discussed at length - but I'd > appreciate it if other people would make sure that I have everything > right there. This is hopefully the beginning of working the issues > out. I have of course also described my own proposed design alongside > the others. FWIW, Heikki's approach isn't what I'd have choosen, but it's something I can live with. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgcrypto: PGP armor headers
On 10/1/14, 9:11 AM, Heikki Linnakangas wrote: I spent a little time cleaning up the regression tests and docs, and ended up with the attached. But then I realized that there's a problem with UTF-8 conversion in the armor() function. It returns the armored blob as text, but forcibly converts the keys and values to UTF-8. That's not cool, because you will get invalidly encoded strings into the database, if you use the function while connected to a database that uses some other encoding than UTF-8. RFC4880 says that the headers are in UTF-8, but armor() cannot safely return UTF-8 encoded text unless the database's encoding is also UTF-8. It also rightly says that using anything else than plain ASCII, even though nominally it's UTF-8, is a bad idea, because the whole point of ASCII-armoring is to make the format safe from encoding conversions. Ugh. Right. We have two options: 1. Throw an error if there are any non-ASCII characters in the key/value arrays. 2. Don't convert them to UTF-8, but use the current database encoding. Both seem sane to me. If we use the current database encoding, then we have to also decide what to do with the input, in pgp_armor_headers(). If armor() uses the database encoding, but pgp_armor_headers() treats the input as UTF-8, then a round-trip with pgp_armor_headers(armor(?)) won't work. Yeah. Both options seem fine to me. Throwing an error perhaps slightly more so. .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "Value locking" Wiki page
On 10/01/2014 11:49 AM, Andres Freund wrote: On 2014-10-01 01:44:04 -0700, Peter Geoghegan wrote: In the hope of unblocking things, I have created this Wiki page, which details the advantages and disadvantages of all 3 approaches that have been discussed, as suggested by myself and others: https://wiki.postgresql.org/wiki/Value_locking Thank! That's a good summary. This is by no means complete yet. There is pretty limited handling of what I call "#3. "Promise" index tuples", because there was no prototype version of that design, and there are many open questions about how it would be implemented relative to the other 2 approaches. Perhaps Andres or Simon would care to improve it. I think I've done a better job of describing #2, Heikki's design - hardly surprising, since there was a prototype that we discussed at length - but I'd appreciate it if other people would make sure that I have everything right there. This is hopefully the beginning of working the issues out. I have of course also described my own proposed design alongside the others. FWIW, Heikki's approach isn't what I'd have choosen, but it's something I can live with. I didn't realize that "promise index tuples" were even seriously discussed. I guess that can be made to work, too, although I don't see the point. It wouldn't work with GiST indexes, for the same reasons as page-level locking won't work (a tuple can legally be inserted anywhere in a GiST index - it just degrades the index making searching more expensive). And lossy GiST opclasses are a problem too. It's actually more similar to approach #1 in that it puts the responsibility of the locking in the indexam. You would probably need the same kind of API changes to the indexam, and you could well imagine one indexam to implement index promise tuples, while another one might use heavy-weight locks. I don't see the advantage of #3. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "Value locking" Wiki page
On 2014-10-01 12:44:25 +0300, Heikki Linnakangas wrote: > I didn't realize that "promise index tuples" were even seriously discussed. > I guess that can be made to work, too, although I don't see the point. It > wouldn't work with GiST indexes, for the same reasons as page-level locking > won't work (a tuple can legally be inserted anywhere in a GiST index - it > just degrades the index making searching more expensive). And lossy GiST > opclasses are a problem too. > It's actually more similar to approach #1 in that it puts the responsibility > of the locking in the indexam. You would probably need the same kind of API > changes to the indexam, and you could well imagine one indexam to implement > index promise tuples, while another one might use heavy-weight locks. I > don't see the advantage of #3. I don't think all of that is of necessary consequence. What I was thinking of would actually works similar to #2: Just that, instead of a heap tuple, it inserts a index tuple that a) has a PROMISE_TUPLE flag set and b) the itemid points towards a xid instead of a heap tuple. That's sufficient for detecting uniqueness violations using a similar approach like in your proposal. Which does *not* have to happen inside the AM. Yes, it'd require some indexam API changes, but I don't think they'd be to severe. Most importantly the ability to insert, return promise tuples to outside and API to repoint a promise tuple to a real entry. I haven't thought about lossy indexes, but I actually do think they could be made work with that strategy. The difference to #2 primarily is that it avoids speculatively inserting in both the heap and index and just inserts into the index, thereby detecting conflicts a bit earlier, after less work. But, as I said, I'm ok with pursuing #2 on the basis that it's quite likely to be simpler Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq-dev: pg_config_manual.h redefines CACHE_LINE_SIZE
On 2014-09-30 21:57:56 +0200, Christoph Berg wrote: > Hi, > > there's a #define clash between pg_config_manual.h and FreeBSD's > /usr/include/machine-amd64/param.h which also defines CACHE_LINE_SIZE. > > It's probably not really a PostgreSQL bug, but it seems easy enough to > rename that definition now as it's only present in 9.4+. It's only > used in one file, xlog.c: 375d8526f2900d0c377f44532f6d09ee06531f67 Changed in 9.4/master. Thanks. Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "Value locking" Wiki page
On 1 October 2014 10:44, Heikki Linnakangas wrote: > I didn't realize that "promise index tuples" were even seriously discussed. > I guess that can be made to work, too, although I don't see the point. It > wouldn't work with GiST indexes, for the same reasons as page-level locking > won't work (a tuple can legally be inserted anywhere in a GiST index - it > just degrades the index making searching more expensive). And lossy GiST > opclasses are a problem too. GiST doesn't support unique indexes, so it is not in any way a problem. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "Value locking" Wiki page
On 10/01/2014 01:50 PM, Simon Riggs wrote: On 1 October 2014 10:44, Heikki Linnakangas wrote: I didn't realize that "promise index tuples" were even seriously discussed. I guess that can be made to work, too, although I don't see the point. It wouldn't work with GiST indexes, for the same reasons as page-level locking won't work (a tuple can legally be inserted anywhere in a GiST index - it just degrades the index making searching more expensive). And lossy GiST opclasses are a problem too. GiST doesn't support unique indexes, so it is not in any way a problem. GiST supports exclusion constraints. That is one of the main reasons I want to do promise tuples, instead of locking within the indexam: to support this feature with exclusion constraints. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)
On 2014-09-29 14:46:20 -0400, Robert Haas wrote: > This happened again, and I investigated further. Uh. Interestingly anole just succeeded twice: http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=anole&br=REL9_4_STABLE I plan to commit the mask/unmask patch regardless, but it's curious. The first of the two builds could have been you 'unsticking' it by manually mucking around. Did you also do that for the second build? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgcrypto: PGP armor headers
On 10/01/2014 11:58 AM, Marko Tiikkaja wrote: On 10/1/14, 9:11 AM, Heikki Linnakangas wrote: We have two options: 1. Throw an error if there are any non-ASCII characters in the key/value arrays. 2. Don't convert them to UTF-8, but use the current database encoding. Both seem sane to me. If we use the current database encoding, then we have to also decide what to do with the input, in pgp_armor_headers(). If armor() uses the database encoding, but pgp_armor_headers() treats the input as UTF-8, then a round-trip with pgp_armor_headers(armor(?)) won't work. Yeah. Both options seem fine to me. Throwing an error perhaps slightly more so. I went with 1, throw an error. I also added checks that the key or value doesn't contain any embedded newlines, and that the key doesn't contain an embedded ": ". Those would cause the armor to be invalid. I think this is now ready for commit, but since I've changed it quite significantly from what you originally submitted, please take a moment to review this. - Heikki diff --git a/.gitattributes b/.gitattributes index ff96567..9466800 100644 --- a/.gitattributes +++ b/.gitattributes @@ -14,6 +14,7 @@ README.* conflict-marker-size=32 # Certain data files that contain special whitespace, and other special cases *.data -whitespace contrib/tsearch2/sql/tsearch2.sql whitespace=space-before-tab,blank-at-eof,-blank-at-eol +contrib/pgcrypto/sql/pgp-armor.sql whitespace=-blank-at-eol doc/bug.templatewhitespace=space-before-tab,-blank-at-eof,blank-at-eol src/backend/catalog/sql_features.txt whitespace=space-before-tab,blank-at-eof,-blank-at-eol src/backend/tsearch/hunspell_sample.affix whitespace=-blank-at-eof diff --git a/contrib/pgcrypto/Makefile b/contrib/pgcrypto/Makefile index b6c9484..d1d75ca 100644 --- a/contrib/pgcrypto/Makefile +++ b/contrib/pgcrypto/Makefile @@ -26,7 +26,7 @@ MODULE_big = pgcrypto OBJS = $(SRCS:.c=.o) $(WIN32RES) EXTENSION = pgcrypto -DATA = pgcrypto--1.1.sql pgcrypto--1.0--1.1.sql pgcrypto--unpackaged--1.0.sql +DATA = pgcrypto--1.2.sql pgcrypto--1.1--1.2.sql pgcrypto--1.0--1.1.sql pgcrypto--unpackaged--1.0.sql PGFILEDESC = "pgcrypto - cryptographic functions" REGRESS = init md5 sha1 hmac-md5 hmac-sha1 blowfish rijndael \ diff --git a/contrib/pgcrypto/expected/pgp-armor.out b/contrib/pgcrypto/expected/pgp-armor.out index c955494..a8f73ac 100644 --- a/contrib/pgcrypto/expected/pgp-armor.out +++ b/contrib/pgcrypto/expected/pgp-armor.out @@ -102,3 +102,251 @@ em9va2E= -END PGP MESSAGE- '); ERROR: Corrupt ascii-armor +-- corrupt (no space after the colon) +select * from pgp_armor_headers(' +-BEGIN PGP MESSAGE- +foo: + +em9va2E= += +-END PGP MESSAGE- +'); +ERROR: Corrupt ascii-armor +-- header with empty value +select * from pgp_armor_headers(' +-BEGIN PGP MESSAGE- +foo: + +em9va2E= += +-END PGP MESSAGE- +'); + key | value +-+--- + foo | +(1 row) + +-- simple +select * from pgp_armor_headers(' +-BEGIN PGP MESSAGE- +fookey: foovalue +barkey: barvalue + +em9va2E= += +-END PGP MESSAGE- +'); + key | value ++-- + fookey | foovalue + barkey | barvalue +(2 rows) + +-- insane keys, part 1 +select * from pgp_armor_headers(' +-BEGIN PGP MESSAGE- +insane:key : + +em9va2E= += +-END PGP MESSAGE- +'); + key | value +-+--- + insane:key | +(1 row) + +-- insane keys, part 2 +select * from pgp_armor_headers(' +-BEGIN PGP MESSAGE- +insane:key : text value here + +em9va2E= += +-END PGP MESSAGE- +'); + key | value +-+- + insane:key | text value here +(1 row) + +-- long value +select * from pgp_armor_headers(' +-BEGIN PGP MESSAGE- +long: this value is more than 76 characters long, but it should still parse correctly as that''s permitted by RFC 4880 + +em9va2E= += +-END PGP MESSAGE- +'); + key | value +--+- + long | this value is more than 76 characters long, but it should still parse correctly as that's permitted by RFC 4880 +(1 row) + +-- long value, split up +select * from pgp_armor_headers(' +-BEGIN PGP MESSAGE- +long: this value is more than 76 characters long, but it should still +long: parse correctly as that''s permitted by RFC 4880 + +em9va2E= += +-END PGP MESSAGE- +'); + key | value +--+-- + long | this value is more than 76 characters long, but it should still + long | parse correctly as that's permitted by RFC 4880 +(2 rows) + +-- long value, split up, part 2 +select * from pgp_armor_headers(' +-BEGIN PGP MESSAGE- +
Re: [HACKERS] "Value locking" Wiki page
On 1 October 2014 09:44, Peter Geoghegan wrote: > In the hope of unblocking things, I have created this Wiki page, which > details the advantages and disadvantages of all 3 approaches that have > been discussed, as suggested by myself and others: > > https://wiki.postgresql.org/wiki/Value_locking A better job is needed at genuine balance on those items. You keep saying you don't care which approach you take and then every word you write goes against that, making it difficult to discuss. Quoting general research and other points about value locking are reasonable in the general section, but not in the description for 1. I'm glad you've called the first option "Heavyweight Page Locking". That name sums up what I see as the major objection to it, which is that performance and scalability of the whole server will be damaged signiciantly by using heavyweight locks for each row inserted. Please add that concern as a negative; I'm sure testing has been done, but not comparative testing against other techniques. I am motivated to explain the promise index tuple approach further because of this, which is an optimistic approach to locking. The stated disadvantages for 3 are not accurate, to put it mildly. But that's been useful because what was a small thought experiment is beginning to look like a useful approach. I will post a summary of my understanding. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On 2014-09-26 16:19:33 -0700, Peter Geoghegan wrote: > On Fri, Sep 26, 2014 at 3:25 PM, Peter Geoghegan wrote: > > On Fri, Sep 26, 2014 at 3:11 PM, Alvaro Herrera > > wrote: > >> FWIW there are 28 callers of HeapTupleHeaderGetXmin. > > > Don't forget about direct callers to HeapTupleHeaderGetRawXmin(), > > though. There are plenty of those in tqual.c. > > Which reminds me: commit 37484ad2 added the opportunistic freezing > stuff. To quote the commit message: > > """ > Instead of changing the tuple xmin to FrozenTransactionId, the combination > of HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID, which were previously never > set together, is now defined as HEAP_XMIN_FROZEN. A variety of previous > proposals to freeze tuples opportunistically before vacuum_freeze_min_age > is reached have foundered on the objection that replacing xmin by > FrozenTransactionId might hinder debugging efforts when things in this > area go awry; this patch is intended to solve that problem by keeping > the XID around (but largely ignoring the value to which it is set). > > """ > > Why wouldn't the same objection (the objection that the earlier > opportunistic freezing ideas stalled on) apply to directly setting > tuple xmin to InvalidTransactionId? Because it's pretty much unrelated? The FrozenTransactionId bit you reference is about tuples that actually survive, which isn't the case here. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Promise index tuples for UPSERT
Summary of algorithm to use "promise tuples" for concurrency control during UPSERT 1. Perform btree search to location of key, if it exists. a) If an unkilled index tuple exists, we decide this is an UPDATE and drop straight thru to step 2 b) If it does not exist, insert a "promise" tuple into unique index(s) - marked with the xid of the inserting transaction, but using the key. This happens while the page is locked, so it is not possible to insert a second promise tuple concurrently. Record the btree blockid on the index scan and move to step 3 When later insert scans see the promise tuple they perform XactLockTableWait() and when they get control they look again for the key. If they find a promise tuple with an aborted xid they replace that value with their own xid and continue as a). Otherwise b). 2. Find existing heap tuple Find heap tuple. Check it is actually valid. If not, go back to (1), kill the prior tuple and follow 1b) path If it is valid, perform heap_update as normal. 3. Insert new heap tuple Perform heap_insert Re-find index tuple using the btree blockid recorded at step 1; this may require moving right until we find the actual value we are looking for, so block splits don't negatively affect this approach. Once re-found we change the index tuple from a promise tuple to a normal index tuple, by setting tid and removing promise flag. Tuple remains same length because the value was known when promise tuple inserted, so this is an inplace update. Insert other index values normally. If a transaction that inserted a promise tuple dies, how is that cleaned up? Any user that sees a dead promise tuple with a value they want will clean it up. Dead promise tuples can be removed as needed, just as killed tuples currently are. VACUUM can remove dead transactions from index as it scans, no problems. Index bloat is less of a problem than with normal inserts since there are additional ways of removing promise tuples. Only one index tuple at a time can have a specific value, so we would actually reduce heap bloat caused by concurrent inserts. It's possible we find existing rows that we can't see according to our snapshot. That is handled in exactly the same as the way we treat UPDATEs. There is a potential loop here in that we might find an index tuple, follow it, find out the tuple is actually dead then return to insert a promise tuple, only to find that someone else just did that and we have to wait/re-follow the link to update the new tuple. That is an extremely unlikely race condition, though possible; there is no heavy locking to prevent that in this approach. In principle deadlocks are possible from that, but that is not any different from the current case where people that insert same key at same time might cause deadlocks. Its not a common application pattern and not something we should be protecting against. All of this is only needed for unique indexes. It can handled by a new API call for acquire_value_lock() and release_value_lock(), or similar. Advantages * We don't do anything weird in the heap - if this breaks, data is not corrupt * There is no heap bloat or index bloat above existing levels * The approach leverages existing mechanism for transaction waiting * Optimistic approach to value locking will improve performance over heavy weight locking Disadvantages * Not written yet - <1 month to code, still possible for 9.5, doesn't look hard Other stated possible disadvantages * Violates existing invariant that every index tuple has a corresponding heap tuple, which goes back to the Berkeley days. Currently, we always create heap tuples first, and physically delete them last. [Why is that a problem? Could be, but why?] ("Deleting them last" implies something is being touched in that regard by this suggestion. I'm not sure what you are referring to) * Relatedly, concern about breaking VACUUM. VACUUMing of btrees occurs with a list of TIDs to kill in index, built from the heap. Current bulk delete callback cannot be used for this. Super-exclusive lock is needed to delete tuples in btree page (i.e. VACUUM). Normally skipping of LockBufferForCleanup() (as added by bbb6e559) works okay in heap because it tends to imply that list of tids won't be bulk deleted in index, where we currently cannot skip for failure to quickly acquire super exclusive lock. So this could make the problem addressed by bbb6e559 return to some extent. [Don't see any problems; just test the xid as we scan a promise tuple and remove it if needed] "Index-only" bloat becomes a possibility. Implications are not well understood. [FUD, no reason to believe there is a problem.] We have to re-find any items using an ordinary index scan, not a tid. Must match our xid to that. [Explained above, easy and efficient.] Doesn't have a strategy for dealing with unprincipled deadlocks, at least at the moment. Presumably some aspects of #2 could be adopted here. [FUD. Unprincipled deadlock still not properly explained as yet. No
Re: [HACKERS] pg_receivexlog and replication slots
Hi, I've split and edited patch 0001: 0001) Old WIP patch for pg_recvlogical tests. Useful for easier development. 0002) Copy editing that should be in 9.4 0003) Rebased patches of yours 0004) My changes to 0003 besides the rebase. This'll be squashed, but I thought it might be interesting for you. I haven't tested my edits besides running 0001... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services >From 45582de61f670967f72835cd14d1c520c3d04fce Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Wed, 1 Oct 2014 12:57:15 +0200 Subject: [PATCH 1/4] WIP: pg_recvlogical tests --- src/bin/pg_basebackup/Makefile| 3 ++ src/bin/pg_basebackup/t/030_pg_recvlogical.pl | 45 +++ 2 files changed, 48 insertions(+) create mode 100644 src/bin/pg_basebackup/t/030_pg_recvlogical.pl diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile index 90d19e7..3c0a14e 100644 --- a/src/bin/pg_basebackup/Makefile +++ b/src/bin/pg_basebackup/Makefile @@ -51,7 +51,10 @@ clean distclean maintainer-clean: rm -rf tmp_check check: all +# Needs test_decoding to run the pg_recvlogical tests + $(MAKE) -C $(top_builddir)/contrib/test_decoding DESTDIR='$(CURDIR)'/tmp_check/install install >>'$(CURDIR)'/tmp_check/log/install.log 2>&1 $(prove_check) installcheck: +# XXX: We rely on test_decoding already being installed here $(prove_installcheck) diff --git a/src/bin/pg_basebackup/t/030_pg_recvlogical.pl b/src/bin/pg_basebackup/t/030_pg_recvlogical.pl new file mode 100644 index 000..1741319 --- /dev/null +++ b/src/bin/pg_basebackup/t/030_pg_recvlogical.pl @@ -0,0 +1,45 @@ +use strict; +use warnings; +use TestLib; +use Test::More tests => 9; + +program_help_ok('pg_recvlogical'); +program_version_ok('pg_recvlogical'); +program_options_handling_ok('pg_recvlogical'); + + +my $tempdir = tempdir; +start_test_server $tempdir; + +open HBA, ">>$tempdir/pgdata/pg_hba.conf"; +print HBA "local replication all trust\n"; +close HBA; +system_or_bail 'pg_ctl', '-s', '-D', "$tempdir/pgdata", 'reload'; + +open HBA, ">>$tempdir/pgdata/pg_hba.conf"; +print HBA "local replication all trust\n"; +close HBA; + +open CONF, ">>$tempdir/pgdata/postgresql.conf"; +print CONF "max_wal_senders = 1\n"; +print CONF "max_replication_slots = 2\n"; +print CONF "wal_level = logical\n"; +close CONF; + +restart_test_server; + +command_fails([ 'pg_recvlogical', '-d', 'postgres', '--slot', 'regress', '--plugin', 'barf', '--create'], + 'pg recvlogical cannot create slot with nonexistant plugin'); +command_ok([ 'pg_recvlogical', '-d', 'postgres', '--slot', 'regress', '--create'], + 'pg recvlogical can create a slot'); +command_fails([ 'pg_recvlogical', '-d', 'postgres', '--slot', 'regress', '--create'], + 'pg recvlogical cannot create slot with a already used name'); +command_fails([ 'pg_recvlogical', '-d', 'postgres', '--slot', 'regress', '--create'], + 'pg recvlogical cannot create slot when all slots are in use'); +command_ok([ 'pg_recvlogical', '-d', 'postgres', '--slot', 'regress', '--drop'], + 'pg recvlogical can drop slot'); +command_fails([ 'pg_recvlogical', '-d', 'postgres', '--slot', 'regress', '--start'], + 'pg recvlogical cannot stream from nonexistant slot'); + +# cannot easily test streaming actual changes because that goes on +# forever. -- 1.8.3.251.g1462b67 >From b9f73cef1a9dc1aa96ad1807893e78c924f690f6 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Mon, 29 Sep 2014 15:35:40 +0200 Subject: [PATCH 2/4] Message and comment policing of pg_recvlogical.c. Several comments still referred to 'initiating', 'freeing', 'stopping' replication slots. These were terms used during different phases of the development of logical decoding, but are no long accurate. Author: Michael Paquier Backpatch to 9.4 where pg_recvlogical was introduced. --- src/bin/pg_basebackup/pg_recvlogical.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c index f3b0f34..4144688 100644 --- a/src/bin/pg_basebackup/pg_recvlogical.c +++ b/src/bin/pg_basebackup/pg_recvlogical.c @@ -868,7 +868,7 @@ main(int argc, char **argv) /* - * stop a replication slot + * drop a replication slot */ if (do_drop_slot) { @@ -876,7 +876,7 @@ main(int argc, char **argv) if (verbose) fprintf(stderr, - _("%s: freeing replication slot \"%s\"\n"), + _("%s: dropping replication slot \"%s\"\n"), progname, replication_slot); snprintf(query, sizeof(query), "DROP_REPLICATION_SLOT \"%s\"", @@ -892,8 +892,8 @@ main(int argc, char **argv) if (PQntuples(res) != 0 || PQnfields(res) != 0) { fprintf(stderr, - _("%s: could not stop logical replication: got %d rows and %d fields, expected %d rows and %d fields\n"), - progname, PQntuples(res), PQnfields(res), 0, 0
Re: [HACKERS] CREATE IF NOT EXISTS INDEX
On Wed, Oct 1, 2014 at 7:57 AM, José Luis Tallón wrote: > > [snip] > > Please excuse my jumping in, but the EXPECTED syntax is: > > CREATE INDEX IF NOT EXISTS . > > whereas your current patch implements: > > CREATE [IF NOT EXISTS] INDEX > > > if I'm reading the grammar correctly. > I think it's not wrong. Look at other CINE that already implemented [1] [2]. But CINE for CREATE TABLE is like your proposal [3] : CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name ... So, what's the correct/best grammar? CREATE [ IF NOT EXISTS ] [ UNIQUE ] INDEX index_name or CREATE [ UNIQUE ] INDEX [ IF NOT EXISTS ] index_name > I guess it would be most interesting to implement this minor change for the next version of the patch. Please do remember to update the documentation accordingly. > I will... > By the way, you also forgot to remove a previous patch implementing "namespace_namerelation_name" for RLS messages. Maybe a rebase is needed? > Sorry... my mistake. Fix attached. Regards, [1] http://www.postgresql.org/docs/devel/static/sql-createschema.html [2] http://www.postgresql.org/docs/devel/static/sql-createsequence.html [3] http://www.postgresql.org/docs/devel/static/sql-createtable.html -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog: http://fabriziomello.github.io >> Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello >> Github: http://github.com/fabriziomello diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index e469b17..7886729 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation -CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ] ON table_name [ USING method ] +CREATE [ IF NOT EXISTS ] [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ] ON table_name [ USING method ] ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ) [ WITH ( storage_parameter = value [, ... ] ) ] [ TABLESPACE tablespace_name ] @@ -99,6 +99,16 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name + IF NOT EXISTS + + +Do nothing (except issuing a notice) if a index with the same name +already exists. + + + + + UNIQUE diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index ee10594..8905e30 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -697,7 +697,8 @@ index_create(Relation heapRelation, bool allow_system_table_mods, bool skip_build, bool concurrent, - bool is_internal) + bool is_internal, + bool if_not_exists) { Oid heapRelationId = RelationGetRelid(heapRelation); Relation pg_class; @@ -773,10 +774,22 @@ index_create(Relation heapRelation, elog(ERROR, "shared relations must be placed in pg_global tablespace"); if (get_relname_relid(indexRelationName, namespaceId)) + { + if (if_not_exists) + { + ereport(NOTICE, + (errcode(ERRCODE_DUPLICATE_TABLE), + errmsg("relation \"%s\" already exists, skipping", + indexRelationName))); + heap_close(pg_class, RowExclusiveLock); + return InvalidOid; + } + ereport(ERROR, (errcode(ERRCODE_DUPLICATE_TABLE), errmsg("relation \"%s\" already exists", indexRelationName))); + } /* * construct tuple descriptor for index tuples diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c index 160f006..5ef6dcc 100644 --- a/src/backend/catalog/toasting.c +++ b/src/backend/catalog/toasting.c @@ -342,7 +342,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, rel->rd_rel->reltablespace, collationObjectId, classObjectId, coloptions, (Datum) 0, true, false, false, false, - true, false, false, true); + true, false, false, true, false); heap_close(toast_rel, NoLock); diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 8a1cb4b..a03773b 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -610,7 +610,14 @@ DefineIndex(Oid relationId, stmt->isconstraint, stmt->deferrable, stmt->initdeferred, allowSystemTableMods, skip_build || stmt->concurrent, - stmt->concurrent, !check_rights); + stmt->concurrent, !check_rights, + stmt->if_not_exists); + + if (!OidIsValid(indexRelationId)) + { + heap_close(rel, NoLock); + return indexRelationId; + } /* Add any requested comment */ if (stmt->idxcomment != NULL) diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 225756c..39b55db 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -2907,6 +2907,7 @@ _copyIndexStmt(const IndexStmt *from) COPY_SCALAR_FIELD(deferrable); COPY_SCAL
Re: [HACKERS] "Value locking" Wiki page
On 1 October 2014 11:58, Heikki Linnakangas wrote: > On 10/01/2014 01:50 PM, Simon Riggs wrote: >> >> On 1 October 2014 10:44, Heikki Linnakangas >> wrote: >> >>> I didn't realize that "promise index tuples" were even seriously >>> discussed. >>> I guess that can be made to work, too, although I don't see the point. It >>> wouldn't work with GiST indexes, for the same reasons as page-level >>> locking >>> won't work (a tuple can legally be inserted anywhere in a GiST index - it >>> just degrades the index making searching more expensive). And lossy GiST >>> opclasses are a problem too. >> >> >> GiST doesn't support unique indexes, so it is not in any way a problem. > > > GiST supports exclusion constraints. That is one of the main reasons I want > to do promise tuples, instead of locking within the indexam: to support this > feature with exclusion constraints. That does sound interesting, but I am concerned the semantics may cause issues. If I go to insert a row for 'UK' and find an existing row for 'Europe', do we really want to update the population of Europe to be the population of the UK, simply because the UK and Europe have an exclusion conflict? Please give some concrete examples of a business request that might be satisified by such a feature. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgcrypto: PGP armor headers
On 10/1/14 1:01 PM, Heikki Linnakangas wrote: On 10/01/2014 11:58 AM, Marko Tiikkaja wrote: On 10/1/14, 9:11 AM, Heikki Linnakangas wrote: We have two options: 1. Throw an error if there are any non-ASCII characters in the key/value arrays. 2. Don't convert them to UTF-8, but use the current database encoding. Both seem sane to me. If we use the current database encoding, then we have to also decide what to do with the input, in pgp_armor_headers(). If armor() uses the database encoding, but pgp_armor_headers() treats the input as UTF-8, then a round-trip with pgp_armor_headers(armor(?)) won't work. Yeah. Both options seem fine to me. Throwing an error perhaps slightly more so. I went with 1, throw an error. I also added checks that the key or value doesn't contain any embedded newlines, and that the key doesn't contain an embedded ": ". Those would cause the armor to be invalid. Great. I think this is now ready for commit, but since I've changed it quite significantly from what you originally submitted, please take a moment to review this. 1) I see this compiler warning: pgp-pgsql.c: In function ‘pg_armor’: pgp-pgsql.c:960:18: warning: ‘values’ may be used uninitialized in this function [-Wmaybe-uninitialized] pgp_armor_encode((uint8 *) VARDATA(data), data_len, &buf, It's bogus, but worth silencing anyway. 2) There's what looks like an extra whitespace change in pgp_armor_encode(), but maybe that's intentional? 3) Also, I think the attached two corner cases deserve their own tests. Other than the above, the patch looks good to me. Huge thanks for your work on this one! .marko *** a/contrib/pgcrypto/expected/pgp-armor.out --- b/contrib/pgcrypto/expected/pgp-armor.out *** *** 112,117 em9va2E= --- 112,137 -END PGP MESSAGE- '); ERROR: Corrupt ascii-armor + -- corrupt (no empty line) + select * from pgp_armor_headers(' + -BEGIN PGP MESSAGE- + em9va2E= + = + -END PGP MESSAGE- + '); + ERROR: Corrupt ascii-armor + -- no headers + select * from pgp_armor_headers(' + -BEGIN PGP MESSAGE- + + em9va2E= + = + -END PGP MESSAGE- + '); + key | value + -+--- + (0 rows) + -- header with empty value select * from pgp_armor_headers(' -BEGIN PGP MESSAGE- *** a/contrib/pgcrypto/sql/pgp-armor.sql --- b/contrib/pgcrypto/sql/pgp-armor.sql *** *** 67,72 em9va2E= --- 67,89 -END PGP MESSAGE- '); + -- corrupt (no empty line) + select * from pgp_armor_headers(' + -BEGIN PGP MESSAGE- + em9va2E= + = + -END PGP MESSAGE- + '); + + -- no headers + select * from pgp_armor_headers(' + -BEGIN PGP MESSAGE- + + em9va2E= + = + -END PGP MESSAGE- + '); + -- header with empty value select * from pgp_armor_headers(' -BEGIN PGP MESSAGE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CINE in CREATE TABLE AS ... and CREATE MATERIALIZED VIEW ...
Hi all, We already have IF NOT EXISTS for CREATE TABLE. There are some reason to don't have to CREATE TABLE AS and CREATE MATERIALIZED VIEW?? Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog: http://fabriziomello.github.io >> Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello >> Github: http://github.com/fabriziomello
Re: [HACKERS] "Value locking" Wiki page
On 10/01/2014 02:42 PM, Simon Riggs wrote: On 1 October 2014 11:58, Heikki Linnakangas wrote: On 10/01/2014 01:50 PM, Simon Riggs wrote: On 1 October 2014 10:44, Heikki Linnakangas wrote: I didn't realize that "promise index tuples" were even seriously discussed. I guess that can be made to work, too, although I don't see the point. It wouldn't work with GiST indexes, for the same reasons as page-level locking won't work (a tuple can legally be inserted anywhere in a GiST index - it just degrades the index making searching more expensive). And lossy GiST opclasses are a problem too. GiST doesn't support unique indexes, so it is not in any way a problem. GiST supports exclusion constraints. That is one of the main reasons I want to do promise tuples, instead of locking within the indexam: to support this feature with exclusion constraints. That does sound interesting, but I am concerned the semantics may cause issues. If I go to insert a row for 'UK' and find an existing row for 'Europe', do we really want to update the population of Europe to be the population of the UK, simply because the UK and Europe have an exclusion conflict? Clearly not, but you might want to insert the tuple to another table instead, or skip it altogether. Or you might want to UPDATE Europe into Continental Europe, and then insert the row for UK. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and replication slots
Thanks! On Wed, Oct 1, 2014 at 8:38 PM, Andres Freund wrote: > 0001) Old WIP patch for pg_recvlogical tests. Useful for easier > development. > >From where is this patch? Is it some rest from the time when pg_recvlogical was developed? > 0002) Copy editing that should be in 9.4 > Definitely makes sense for a backpatch. > 0003) Rebased patches of yours > 0004) My changes to 0003 besides the rebase. This'll be squashed, but I > thought it might be interesting for you. > (thanks for caring) -/* Drop a replication slot */ +/* Drop a replication slot. */ I don't recall that comments on a single line have a dot even if this single line is a complete sentence. -static void StreamLog(); +static void StreamLogicalLog(); Not related at all to those patches, but for correctness it is better to add a declaration "(void)". Except those small things the changes look good to me. Regards, -- Michael
Re: [HACKERS] pg_receivexlog and replication slots
On 2014-10-01 21:54:56 +0900, Michael Paquier wrote: > Thanks! > > On Wed, Oct 1, 2014 at 8:38 PM, Andres Freund > wrote: > > > 0001) Old WIP patch for pg_recvlogical tests. Useful for easier > > development. > > > From where is this patch? Is it some rest from the time when pg_recvlogical > was developed? > > > > 0002) Copy editing that should be in 9.4 > > > Definitely makes sense for a backpatch. > > > > 0003) Rebased patches of yours > > > 0004) My changes to 0003 besides the rebase. This'll be squashed, but I > > thought it might be interesting for you. > > > (thanks for caring) > -/* Drop a replication slot */ > +/* Drop a replication slot. */ > I don't recall that comments on a single line have a dot even if this > single line is a complete sentence. Then it shouldn't start with a uppercase - which you changed... > -static void StreamLog(); > +static void StreamLogicalLog(); > Not related at all to those patches, but for correctness it is better to > add a declaration "(void)". Agreed. > Except those small things the changes look good to me. Cool. Will push and then, sometime this week, review the next patch. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgcrypto: PGP armor headers
On 10/01/2014 02:47 PM, Marko Tiikkaja wrote: On 10/1/14 1:01 PM, Heikki Linnakangas wrote: I think this is now ready for commit, but since I've changed it quite significantly from what you originally submitted, please take a moment to review this. 1) I see this compiler warning: pgp-pgsql.c: In function ‘pg_armor’: pgp-pgsql.c:960:18: warning: ‘values’ may be used uninitialized in this function [-Wmaybe-uninitialized] pgp_armor_encode((uint8 *) VARDATA(data), data_len, &buf, It's bogus, but worth silencing anyway. 2) There's what looks like an extra whitespace change in pgp_armor_encode(), but maybe that's intentional? 3) Also, I think the attached two corner cases deserve their own tests. Other than the above, the patch looks good to me. Huge thanks for your work on this one! Ok, committed with those fixes. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "Value locking" Wiki page
On 1 October 2014 13:43, Heikki Linnakangas wrote: >> That does sound interesting, but I am concerned the semantics may cause >> issues. >> >> If I go to insert a row for 'UK' and find an existing row for >> 'Europe', do we really want to update the population of Europe to be >> the population of the UK, simply because the UK and Europe have an >> exclusion conflict? > > Clearly not, but you might want to insert the tuple to another table > instead, or skip it altogether. Or you might want to UPDATE Europe into > Continental Europe, and then insert the row for UK. Not trying to catch you out, just trying to make sure we don't make technical decisions based upon unachievable ideas. I can't see value in having upsert work against exclusion constraint indexes; thus this only needs to work for btrees, or similar exact indexes. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "Value locking" Wiki page
On 1 October 2014 14:31, Simon Riggs wrote: > On 1 October 2014 13:43, Heikki Linnakangas wrote: > >>> That does sound interesting, but I am concerned the semantics may cause >>> issues. >>> >>> If I go to insert a row for 'UK' and find an existing row for >>> 'Europe', do we really want to update the population of Europe to be >>> the population of the UK, simply because the UK and Europe have an >>> exclusion conflict? >> >> Clearly not, but you might want to insert the tuple to another table >> instead, or skip it altogether. Or you might want to UPDATE Europe into >> Continental Europe, and then insert the row for UK. Sorry, let me explain more clearly - neither of those two use cases matches the upsert case. If you wish to skip an insert that fails on a uniqueness constraint, that is already possible. Just wrap in a subtransaction and trap the error. The only reason we need UPSERT is if we intend to update. If we just intend to ignore, then we could add such a check to any index AM that supports unique/exclusion constraints, but without pursuing the full locking needed for upsert path. I wasn't aware that you could do both an INSERT and an UPDATE at same time using the proposed feature. There is no feature option to refer to the conflicting row that is already in the table. The only thing we have at present is the ability to refer to the incoming data row using CONFLICTING(). Update triggers allow you to refer to OLD and NEW, but with an exclusion constraint the row values don't match, so using OLD and NEW would not be appropriate - that isn't how an update trigger works now. So AFAICS neither of those use cases matches. > Not trying to catch you out, just trying to make sure we don't make > technical decisions based upon unachievable ideas. > > I can't see value in having upsert work against exclusion constraint > indexes; thus this only needs to work for btrees, or similar exact > indexes. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "Value locking" Wiki page
On 10/01/2014 04:31 PM, Simon Riggs wrote: On 1 October 2014 13:43, Heikki Linnakangas wrote: That does sound interesting, but I am concerned the semantics may cause issues. If I go to insert a row for 'UK' and find an existing row for 'Europe', do we really want to update the population of Europe to be the population of the UK, simply because the UK and Europe have an exclusion conflict? Clearly not, but you might want to insert the tuple to another table instead, or skip it altogether. Or you might want to UPDATE Europe into Continental Europe, and then insert the row for UK. Not trying to catch you out, just trying to make sure we don't make technical decisions based upon unachievable ideas. I can't see value in having upsert work against exclusion constraint indexes; thus this only needs to work for btrees, or similar exact indexes. Well, if nothing else, it would be nice to fix the concurrency issue we have with exclusion constraints today, which is that if two backends insert the same value at the same time, they might both get an error, even though you'd only need to abort one of them. That's a special case of UPSERT if you squint a little. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time measurement format - more human readable
On 9/29/14, 1:08 AM, Andres Freund wrote: On 2014-09-28 20:32:30 -0400, Gregory Smith wrote: There are already a wide range of human readable time interval output formats available in the database; see the list at http://www.postgresql.org/docs/current/static/datatype-datetime.html#INTERVAL-STYLE-OUTPUT-TABLE He's talking about psql's \timing... I got that. My point was that even though psql's timing report is kind of a quick thing hacked into there, if it were revised I'd expect two things will happen eventually: -Asking if any of the the interval conversion code can be re-used for this purpose, rather than adding yet another custom to one code path "standard". -Asking if this should really just be treated like a full interval instead, and then overlapping with a significant amount of that baggage so that you have all the existing format choices. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "Value locking" Wiki page
On 10/01/2014 04:46 PM, Simon Riggs wrote: On 1 October 2014 14:31, Simon Riggs wrote: On 1 October 2014 13:43, Heikki Linnakangas wrote: That does sound interesting, but I am concerned the semantics may cause issues. If I go to insert a row for 'UK' and find an existing row for 'Europe', do we really want to update the population of Europe to be the population of the UK, simply because the UK and Europe have an exclusion conflict? Clearly not, but you might want to insert the tuple to another table instead, or skip it altogether. Or you might want to UPDATE Europe into Continental Europe, and then insert the row for UK. Sorry, let me explain more clearly - neither of those two use cases matches the upsert case. If you wish to skip an insert that fails on a uniqueness constraint, that is already possible. Just wrap in a subtransaction and trap the error. Uh, if that's an option, couldn't you just use a subtransaction always, and forget about this patch? However, a subtransaction is a lot more expensive; you'll consume an XID for every inserted or updated row, for starters. The only reason we need UPSERT is if we intend to update. If we just intend to ignore, then we could add such a check to any index AM that supports unique/exclusion constraints, but without pursuing the full locking needed for upsert path. I wasn't aware that you could do both an INSERT and an UPDATE at same time using the proposed feature. I'm not actually sure if the proposed syntax would allow that, I haven't been paying much attention to that part. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and replication slots
Andres Freund wrote: > From d667f7a63cd62733d88ec5b7228dfd5d7136b9d7 Mon Sep 17 00:00:00 2001 > From: Michael Paquier > Date: Mon, 1 Sep 2014 20:48:43 +0900 > Subject: [PATCH 3/4] Refactoring of pg_basebackup utilities > > Code duplication is reduced with the introduction of new APIs for each > individual replication command: > - IDENTIFY_SYSTEM > - CREATE_REPLICATION_SLOT > - DROP_REPLICATION_SLOT > A couple of variables used to identify a timeline ID are changed as well > to be more consistent with core code. > --- > src/bin/pg_basebackup/pg_basebackup.c | 21 + > src/bin/pg_basebackup/pg_receivexlog.c | 38 ++-- > src/bin/pg_basebackup/pg_recvlogical.c | 116 ++-- > src/bin/pg_basebackup/streamutil.c | 159 > + > src/bin/pg_basebackup/streamutil.h | 10 +++ > 5 files changed, 207 insertions(+), 137 deletions(-) Not objecting to any of this, but isn't it a bit funny that a patch that aims to reduce duplication ends up with more lines than there were originally? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] open items for 9.4
On 09/30/2014 09:10 PM, Gregory Smith wrote: On 9/29/14, 2:30 PM, Andres Freund wrote: Can we explain those reasons in the form of documentation? Yes. Try and benchmark it. It'll be hardware and workload dependant. I missed this whole thing, and eventually I have to circle back to it. I could do it this week. Ah, that would be great! Could you (or someone else familiar with the useful benchmarks) give me more detail on how to replicate one case where things should improve? I can do that, and I have a lab full of hardware if it's easier to spot on one type of server. That exercise will probably lead to a useful opinion on the feature in its final form, any associated GUC, tunables, and necessary level of associated documentation in a day or two. To see the most improvement from the patch, try a workload that's otherwise bottlenecked on XLogInsert. For example, with pgbench: psql postgres -c "create table foo (id int4)" pgbench postgres -n -f fooinsert.sql -c 4 -T10 and in the test script: insert into foo select g from generate_series(1, 1) g; - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and replication slots
On 2014-10-01 11:21:12 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > > From d667f7a63cd62733d88ec5b7228dfd5d7136b9d7 Mon Sep 17 00:00:00 2001 > > From: Michael Paquier > > Date: Mon, 1 Sep 2014 20:48:43 +0900 > > Subject: [PATCH 3/4] Refactoring of pg_basebackup utilities > > > > Code duplication is reduced with the introduction of new APIs for each > > individual replication command: > > - IDENTIFY_SYSTEM > > - CREATE_REPLICATION_SLOT > > - DROP_REPLICATION_SLOT > > A couple of variables used to identify a timeline ID are changed as well > > to be more consistent with core code. > > --- > > src/bin/pg_basebackup/pg_basebackup.c | 21 + > > src/bin/pg_basebackup/pg_receivexlog.c | 38 ++-- > > src/bin/pg_basebackup/pg_recvlogical.c | 116 ++-- > > src/bin/pg_basebackup/streamutil.c | 159 > > + > > src/bin/pg_basebackup/streamutil.h | 10 +++ > > 5 files changed, 207 insertions(+), 137 deletions(-) > > Not objecting to any of this, but isn't it a bit funny that a patch that > aims to reduce duplication ends up with more lines than there were > originally? The reduction is there in combination with a later patch - which needs most of the code moved to streamutil.c. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Full_page_write is off in backup mode
On Mon, Sep 29, 2014 at 12:06 PM, searcher s wrote: > Hi, > I am using postgres 9.2.2. > The postgresql documentation says, full_page_writes is forcibly on after > executing the function pg_start_backup. But in my server full_page_writes is > still off (not changed). The value of the GUC doesn't actually change. But the system still performs full page writes during the backup. Search the source code for forcePageWrites to see how it works internally. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)
On Wed, Oct 1, 2014 at 7:00 AM, Andres Freund wrote: > On 2014-09-29 14:46:20 -0400, Robert Haas wrote: >> This happened again, and I investigated further. > > Uh. Interestingly anole just succeeded twice: > http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=anole&br=REL9_4_STABLE > > I plan to commit the mask/unmask patch regardless, but it's curious. The > first of the two builds could have been you 'unsticking' it by manually > mucking around. Did you also do that for the second build? No, but I think the failures have always been intermittent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)
On 2014-10-01 10:45:13 -0400, Robert Haas wrote: > On Wed, Oct 1, 2014 at 7:00 AM, Andres Freund wrote: > > On 2014-09-29 14:46:20 -0400, Robert Haas wrote: > >> This happened again, and I investigated further. > > > > Uh. Interestingly anole just succeeded twice: > > http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=anole&br=REL9_4_STABLE > > > > I plan to commit the mask/unmask patch regardless, but it's curious. The > > first of the two builds could have been you 'unsticking' it by manually > > mucking around. Did you also do that for the second build? > > No, but I think the failures have always been intermittent. There's no record of any relevantly failing builds on 9.4: http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=anole&br=REL9_4_STABLE and none from master either: http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=anole&br=HEAD Is it setup for master now? Because it has reported back for 9.4 twice, but never for master. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)
On Wed, Oct 1, 2014 at 10:50 AM, Andres Freund wrote: > On 2014-10-01 10:45:13 -0400, Robert Haas wrote: >> On Wed, Oct 1, 2014 at 7:00 AM, Andres Freund wrote: >> > On 2014-09-29 14:46:20 -0400, Robert Haas wrote: >> >> This happened again, and I investigated further. >> > >> > Uh. Interestingly anole just succeeded twice: >> > http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=anole&br=REL9_4_STABLE >> > >> > I plan to commit the mask/unmask patch regardless, but it's curious. The >> > first of the two builds could have been you 'unsticking' it by manually >> > mucking around. Did you also do that for the second build? >> >> No, but I think the failures have always been intermittent. > > There's no record of any relevantly failing builds on 9.4: > http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=anole&br=REL9_4_STABLE > and none from master either: > http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=anole&br=HEAD > > Is it setup for master now? Because it has reported back for 9.4 twice, > but never for master. As far as I can tell, it's configured to run everything. I just checked, though, and found it wedged again. I'm not sure whether it was the same problem, though; I ended up just killing all of the postgres processes to fix it. We may be just at the beginning of an exciting debugging journey. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
On Tue, Sep 30, 2014 at 6:16 PM, Alvaro Herrera wrote: > Robert Haas wrote: >> I favor option (a). There's something to be said for your proposal >> in terms of logical consistency with what we have now, but to be >> honest I'm not sure it's the behavior anyone wants (I would welcome >> more feedback on what people actually want). I think we should view >> an attempt to set a limit for a particular table as a way to control >> the rate at which that table is vacuumed - period. > > After re-reading this whole thread one more time, I think I have come to > agree with you and Amit here, because not only it is simpler to > implement, but it is also simpler to document. Per Greg Smith's opinion > elsewhere in the thread, it seems that for end users it doesn't make > sense to make the already complicated mechanism even more complicated. > > So in essence what we're going to do is that the balance mechanism > considers only tables that don't have per-table configuration options; > for those that do, we will use the values configured there without any > changes. > > I'll see about implementing this and making sure it finds its way to > 9.4beta3. Cool! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum scheduling starvation and frenzy
On Tue, Sep 30, 2014 at 5:59 PM, Alvaro Herrera wrote: > Jeff Janes wrote: >> > I think that instead of >> > trying to get a single target database in that foreach loop, we could >> > try to build a prioritized list (in-wraparound-danger first, then >> > in-multixid-wraparound danger, then the one with the oldest autovac time >> > of all the ones that remain); then recheck the wrap-around condition by >> > seeing whether there are other workers in that database that started >> > after the wraparound condition appeared. >> >> I think we would want to check for one worker that is still running, and at >> least one other worker that started and completed since the wraparound >> threshold was exceeded. If there are multiple tables in the database that >> need full scanning, it would make sense to have multiple workers. But if a >> worker already started and finished without increasing the frozenxid and, >> another attempt probably won't accomplish much either. But I have no idea >> how to do that bookkeeping, or how much of an improvement it would be over >> something simpler. > > How about something like this: > > * if autovacuum is disabled, then don't check these conditions; the only > reason we're in do_start_worker() in that case is that somebody > signalled postmaster that some database needs a for-wraparound emergency > vacuum. > > * if autovacuum is on, and the database was processed less than > autovac_naptime/2 ago, and there are no workers running in that database > now, then ignore the database. > > Otherwise, consider it for xid-wraparound vacuuming. So if we launched > a worker recently, but it already finished, we would start another one. > (If the worker finished, the database should not be in need of a > for-wraparound vacuum again, so this seems sensible). Also, we give > priority to a database in danger sooner than the full autovac_naptime > period; not immediately after the previous worker started, which should > give room for other databases to be processed. > > The attached patch implements that. I only tested it on HEAD, but > AFAICS it applies cleanly to 9.4 and 9.3; fairly sure it won't apply to > 9.2. Given the lack of complaints, I'm unsure about backpatching > further back than 9.3 anyway. This kind of seems like throwing darts at the wall. It could be better if we are right to skip the database already being vacuumed for wraparound, or worse if we're not. I'm not sure that we should do this at all, or at least not without testing it extensively first. We could easily shoot ourselves in the foot. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.5] Custom Plan API
On Tue, Jul 8, 2014 at 6:55 AM, Kouhei Kaigai wrote: > * Syntax also reflects what the command does more. New syntax to > define custom plan provider is: > CREATE CUSTOM PLAN PROVIDER > FOR HANDLER ; -1 on 'cpp' prefix. I don't see acronyms used in the syntax documentation and cpp will make people reflexively think 'c++'. How about and ? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum scheduling starvation and frenzy
Robert Haas wrote: > This kind of seems like throwing darts at the wall. It could be > better if we are right to skip the database already being vacuumed for > wraparound, or worse if we're not. Well, it only skips the DB for half the naptime interval, so that other databases have a chance to be chosen before that. If you set up a nonsensical interval such as one day, this might be problematic. (I'm not sure I understand the darts analogy.) Maybe instead of some interval we could have a flag that alternates between on and off: let one other database be chosen, then the one in danger, then some other database again. But if you have large numbers of databases, this isn't a very solution; you only waste half the workers rather than all of them .. meh. Here's another idea: have a counter of the number of tables that are in danger of xid/multixact wraparound; only let that many workers process the database in a row. Of course, the problem is how to determine how many tables are in danger when we haven't even connected to the database in the first place. We could try to store a counter in pgstats, ugh. Or have the first for-wraparound worker store a number in shared memory which launcher can read. Double ugh. > I'm not sure that we should do this at all, or at least not without > testing it extensively first. We could easily shoot ourselves in the > foot. Well, we need to do *something*, because having workers directed towards a database on which they can't do any good causes problems too -- other databases accumulate bloat in the meantime. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and replication slots
Hi, I pushed the first part. On 2014-10-01 21:54:56 +0900, Michael Paquier wrote: > On Wed, Oct 1, 2014 at 8:38 PM, Andres Freund > wrote: > > 0001) Old WIP patch for pg_recvlogical tests. Useful for easier > > development. > From where is this patch? Is it some rest from the time when pg_recvlogical > was developed? IIRC I wrote it when looking at Peter's prove patches. There's a bit of a problem with it because it requires test_decoding, a contrib module, to be installed. That's easily solvable for 'make check', but less clearly so for 'make installcheck'. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)
Hi, I have a patch which is actually not commitfest-ready now, but it always better to start discussing proof of concept having some patch instead of just an idea. Since I'am a DBA rather than C programmer, I will appreciate any suggestions/critics about the patch and code quality to make things better. What is all about. >From an Oracle DBA's point of view, currently we have a lack of performance diagnostics tools. Saying that, principally they mean an Oracle Wait Interface analogue. The Basic idea is to have counters or sensors all around database kernel to measure what a particular backend is currently waiting for and how long/how often it waits. Obviously, implementing such a complex system is not an easy task. However decomposing the task and implementing some small diagnostics tools proved to be a good solution: things like pg_stat_bgwriter, pg_stat_statements or pg_stat_archiver make life significantly easier. Implementing such histogram for LWLock tracing was my goal. Why LWLock tracing is important. Suppose we have a PostgreSQL instance under heavy write workload, but we do not know any details. We could pull from time to time pg_stat_lwlock function which would say pid n1 currently in WALWriteLock and pid n2 in WALInsertLock. That means we should think about write ahead log tuning. Or pid n1 is in some clog-related LWLock, which means we need move clog to ramdisk. This is a stupid example, but it shows how useful LWLock tracing could be for DBAs. Even better idea is to collect daily LWLock distribution, find most frequent of them etc. Problems. As far as I know, there are two major problems implementing LWLock tracing in Postgres: Performance and stability of the server. The patch https://commitfest.postgresql.org/action/patch_view?id=885 (discussion starts here I hope - http://www.postgresql.org/message-id/4fe8ca2c.3030...@uptime.jp) demonstrates performance problems; LWLOCK_STAT, LOCK_DEBUG and DTrace-like approach are slow, unsafe for production use and a bit clumsy for using by DBA. An Idea. An idea of this patch is to trace LWLocks with the lowest possible performance impact. We put integer lwLockID into procarray, then acquiring the LWLock we put its id to procarray and now we could pull procarray using a function to see if particular pid holds LWLock. Not perfect, but if we see sometimes somebody consumes a lot of particular LWLocks, we could investigate this matter in a more precise way using another tool. Something like that was implemented in the attached patch: issuing pgbench -c 50 -t 1000 -j 50 we have something like that: postgres=# select now(),* from pg_stat_lwlock ; now | lwlockid | pid ---+--+-- 2014-10-01 15:11:43.848765+02 | 57 | 4257 (1 row) postgres=# select now(),* from pg_stat_lwlock ; now | lwlockid | pid ---+--+-- 2014-10-01 15:11:45.892428+02 | 67 | 4269 2014-10-01 15:11:45.892428+02 | 67 | 4258 2014-10-01 15:11:45.892428+02 | 57 | 4270 2014-10-01 15:11:45.892428+02 | 67 | 4245 2014-10-01 15:11:45.892428+02 | 67 | 4271 2014-10-01 15:11:45.892428+02 | 57 | 4256 2014-10-01 15:11:45.892428+02 | 54 | 4241 (7 rows) postgres=# select now(),* from pg_stat_lwlock ; now | lwlockid | pid ---+--+-- 2014-10-01 15:11:47.211024+02 | 58 | 4262 2014-10-01 15:11:47.211024+02 | 69 | 4243 2014-10-01 15:11:47.211024+02 | 69 | 4246 (3 rows) postgres=# select now(),* from pg_stat_lwlock ; now | lwlockid | pid -+--+- (0 rows) postgres=# select now(),* from pg_stat_lwlock ; now | lwlockid | pid ---+--+-- 2014-10-01 15:11:49.897357+02 | 55 | 4240 2014-10-01 15:11:49.897357+02 | 61 | 4264 2014-10-01 15:11:49.897357+02 | 55 | 4258 2014-10-01 15:11:49.897357+02 | 61 | 4260 2014-10-01 15:11:49.897357+02 | 61 | 4283 2014-10-01 15:11:49.897357+02 | 62 | 4242 We could collect it to some view or table on a periodic basis. Questions. 1. I've decided to put pg_stat_lwlock into extension pg_stat_lwlock (simply for test purposes). Is it OK, or better to implement it somewhere inside pg_catalog or in another extension (for example pg_stat_statements)? 2. Currently lwLockID is in main procarray (this is for PoC purpose only). I know why procarray was split into two and I know why the main one should be kept as small as possible. Anyway, which design approach better: to keep it inside the main one (this is an important feature, lwLockID is small and I use proc->lwLockId = T_ID(l) to get the id) or to put it into another procarray (create new one or use existent)? 3. Which is the best way to retrieve the name of LWLock instead of only its ID (for usability reasons, WALWriteLock looks m
[HACKERS] NEXT VALUE FOR
Hi SQL:2003 introduced the function NEXT VALUE FOR . Google tells me that at least DB2, SQL Server and a few niche databases understand it so far. As far as I can tell there is no standardised equivalent of currval and setval (but I only have access to second hand information about the standard, like articles and the manuals of other products). Here is a starter patch to add it. To avoid a shift/reduce conflict, I had to reclassify the keyword NEXT. I admit that I don't fully understand the consequences of that change! Please let me know if you think this could fly. Best regards, Thomas Munro diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 3a7cfa9..f9ab887 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -10778,6 +10778,9 @@ table2-mapping setval + + NEXT VALUE FOR + This section describes functions for operating on sequence @@ -10817,6 +10820,11 @@ table2-mapping Advance sequence and return new value +NEXT VALUE FOR sequence_name +bigint +Advance sequence and return new value, using SQL 2003 syntax + + setval(regclass, bigint) bigint Set sequence's current value @@ -10929,6 +10937,24 @@ nextval('foo'::text) foo is looked up at + NEXT VALUE FOR + + +The SQL standard specifies this syntax for getting the next value from +a sequence object. It is equivalent to nextval, +but takes a sequence name directly rather than a regclass +or a text value. This form is more portable to other +databases. The following statements are equivalent: + + +SELECT nextval('foo_id_seq'); +SELECT NEXT VALUE FOR foo_id_seq; + + + + + + currval diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 77d2f29..acbe13c 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -11936,6 +11936,17 @@ func_expr_common_subexpr: v->location = @1; $$ = (Node *)v; } + | NEXT VALUE_P FOR any_name +{ + /* + * Translate as "nextval(::regclass)". + */ + char *name = NameListToString($4); + $$ = (Node *) makeFuncCall(SystemFuncName("nextval"), + list_make1(makeStringConstCast(name, @4, + SystemTypeName("regclass"))), + @1); +} | XMLCONCAT '(' expr_list ')' { $$ = makeXmlExpr(IS_XMLCONCAT, NULL, NIL, $3, @1); @@ -13157,7 +13168,6 @@ unreserved_keyword: | MOVE | NAME_P | NAMES - | NEXT | NO | NOTHING | NOTIFY @@ -13371,6 +13381,7 @@ type_func_name_keyword: | LEFT | LIKE | NATURAL + | NEXT | NOTNULL | OUTER_P | OVERLAPS diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h index 3c8c1b9..90a3b09 100644 --- a/src/include/parser/kwlist.h +++ b/src/include/parser/kwlist.h @@ -245,7 +245,7 @@ PG_KEYWORD("names", NAMES, UNRESERVED_KEYWORD) PG_KEYWORD("national", NATIONAL, COL_NAME_KEYWORD) PG_KEYWORD("natural", NATURAL, TYPE_FUNC_NAME_KEYWORD) PG_KEYWORD("nchar", NCHAR, COL_NAME_KEYWORD) -PG_KEYWORD("next", NEXT, UNRESERVED_KEYWORD) +PG_KEYWORD("next", NEXT, TYPE_FUNC_NAME_KEYWORD) PG_KEYWORD("no", NO, UNRESERVED_KEYWORD) PG_KEYWORD("none", NONE, COL_NAME_KEYWORD) PG_KEYWORD("not", NOT, RESERVED_KEYWORD) diff --git a/src/test/regress/expected/sequence.out b/src/test/regress/expected/sequence.out index a27b5fd..0e3ade0 100644 --- a/src/test/regress/expected/sequence.out +++ b/src/test/regress/expected/sequence.out @@ -168,6 +168,13 @@ SELECT nextval('sequence_test'::text); DISCARD SEQUENCES; SELECT currval('sequence_test'::regclass); ERROR: currval of sequence "sequence_test" is not yet defined in this session +-- SQL:2003 syntax +SELECT NEXT VALUE FOR sequence_test; + nextval +- + 100 +(1 row) + DROP SEQUENCE sequence_test; -- renaming sequences CREATE SEQUENCE foo_seq; diff --git a/src/test/regress/sql/sequence.sql b/src/test/regress/sql/sequence.sql index 8d3b700..c90641d 100644 --- a/src/test/regress/sql/sequence.sql +++ b/src/test/regress/sql/sequence.sql @@ -76,6 +76,9 @@ SELECT nextval('sequence_test'::text); DISCARD SEQUENCES; SELECT currval('sequence_test'::regclass); +-- SQL:2003 syntax +SELECT NEXT VALUE FOR sequence_test; + DROP SEQUENCE sequence_test; -- renaming sequences -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow format 0000-0000-0000 in postgresql MAC parser
On 01-10-14 01:19, Michael Paquier wrote: > Looking at your patch, you should update the documentation as well, > the list of authorized outputs being clearly listed: > http://www.postgresql.org/docs/devel/static/datatype-net-types.html#DATATYPE-MACADDR > This consists in adding simply one line to doc/src/sgml/datatype.sgml. > Regards, It has been registered now (https://commitfest.postgresql.org/action/patch_view?id=1585). I've got an updated version of the patch with the documentation fix. -- Herwin Weststrate Quarantainenet BV diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 3e83dbb..0d277fa 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -3628,6 +3628,7 @@ SELECT person.name, holidays.num_weeks FROM person, holidays '08002b:010203' '08002b-010203' '0800.2b01.0203' + '0800-2b01-0203' '08002b010203' @@ -3649,7 +3650,7 @@ SELECT person.name, holidays.num_weeks FROM person, holidays - The remaining four input formats are not part of any standard. + The remaining five input formats are not part of any standard. diff --git a/src/backend/utils/adt/mac.c b/src/backend/utils/adt/mac.c index aa9993f..509315a 100644 --- a/src/backend/utils/adt/mac.c +++ b/src/backend/utils/adt/mac.c @@ -57,6 +57,9 @@ macaddr_in(PG_FUNCTION_ARGS) count = sscanf(str, "%2x%2x.%2x%2x.%2x%2x%1s", &a, &b, &c, &d, &e, &f, junk); if (count != 6) + count = sscanf(str, "%2x%2x-%2x%2x-%2x%2x%1s", + &a, &b, &c, &d, &e, &f, junk); + if (count != 6) count = sscanf(str, "%2x%2x%2x%2x%2x%2x%1s", &a, &b, &c, &d, &e, &f, junk); if (count != 6) diff --git a/src/test/regress/expected/macaddr.out b/src/test/regress/expected/macaddr.out index 91edc5a..90e9b34 100644 --- a/src/test/regress/expected/macaddr.out +++ b/src/test/regress/expected/macaddr.out @@ -7,14 +7,15 @@ INSERT INTO macaddr_data VALUES (2, '08-00-2b-01-02-03'); INSERT INTO macaddr_data VALUES (3, '08002b:010203'); INSERT INTO macaddr_data VALUES (4, '08002b-010203'); INSERT INTO macaddr_data VALUES (5, '0800.2b01.0203'); -INSERT INTO macaddr_data VALUES (6, '08002b010203'); -INSERT INTO macaddr_data VALUES (7, '0800:2b01:0203'); -- invalid +INSERT INTO macaddr_data VALUES (6, '0800-2b01-0203'); +INSERT INTO macaddr_data VALUES (7, '08002b010203'); +INSERT INTO macaddr_data VALUES (8, '0800:2b01:0203'); -- invalid ERROR: invalid input syntax for type macaddr: "0800:2b01:0203" -LINE 1: INSERT INTO macaddr_data VALUES (7, '0800:2b01:0203'); +LINE 1: INSERT INTO macaddr_data VALUES (8, '0800:2b01:0203'); ^ -INSERT INTO macaddr_data VALUES (8, 'not even close'); -- invalid +INSERT INTO macaddr_data VALUES (9, 'not even close'); -- invalid ERROR: invalid input syntax for type macaddr: "not even close" -LINE 1: INSERT INTO macaddr_data VALUES (8, 'not even close'); +LINE 1: INSERT INTO macaddr_data VALUES (9, 'not even close'); ^ INSERT INTO macaddr_data VALUES (10, '08:00:2b:01:02:04'); INSERT INTO macaddr_data VALUES (11, '08:00:2b:01:02:02'); @@ -30,12 +31,13 @@ SELECT * FROM macaddr_data; 4 | 08:00:2b:01:02:03 5 | 08:00:2b:01:02:03 6 | 08:00:2b:01:02:03 + 7 | 08:00:2b:01:02:03 10 | 08:00:2b:01:02:04 11 | 08:00:2b:01:02:02 12 | 08:00:2a:01:02:03 13 | 08:00:2c:01:02:03 14 | 08:00:2a:01:02:04 -(11 rows) +(12 rows) CREATE INDEX macaddr_data_btree ON macaddr_data USING btree (b); CREATE INDEX macaddr_data_hash ON macaddr_data USING hash (b); @@ -52,9 +54,10 @@ SELECT a, b, trunc(b) FROM macaddr_data ORDER BY 2, 1; 4 | 08:00:2b:01:02:03 | 08:00:2b:00:00:00 5 | 08:00:2b:01:02:03 | 08:00:2b:00:00:00 6 | 08:00:2b:01:02:03 | 08:00:2b:00:00:00 + 7 | 08:00:2b:01:02:03 | 08:00:2b:00:00:00 10 | 08:00:2b:01:02:04 | 08:00:2b:00:00:00 13 | 08:00:2c:01:02:03 | 08:00:2c:00:00:00 -(11 rows) +(12 rows) SELECT b < '08:00:2b:01:02:04' FROM macaddr_data WHERE a = 1; -- true ?column? @@ -113,12 +116,13 @@ SELECT ~b FROM macaddr_data; f7:ff:d4:fe:fd:fc f7:ff:d4:fe:fd:fc f7:ff:d4:fe:fd:fc + f7:ff:d4:fe:fd:fc f7:ff:d4:fe:fd:fb f7:ff:d4:fe:fd:fd f7:ff:d5:fe:fd:fc f7:ff:d3:fe:fd:fc f7:ff:d5:fe:fd:fb -(11 rows) +(12 rows) SELECT b & '00:00:00:ff:ff:ff' FROM macaddr_data; ?column? @@ -129,12 +133,13 @@ SELECT b & '00:00:00:ff:ff:ff' FROM macaddr_data; 00:00:00:01:02:03 00:00:00:01:02:03 00:00:00:01:02:03 + 00:00:00:01:02:03 00:00:00:01:02:04 00:00:00:01:02:02 00:00:00:01:02:03 00:00:00:01:02:03 00:00:00:01:02:04 -(11 rows) +(12 rows) SELECT b | '01:02:03:04:05:06' FROM macaddr_data; ?column? @@ -145,11 +150,12 @@ SELECT b | '01:02:03:04:05:06' FROM macaddr_data; 09:02:2b:05:07:07 09:02:2b:05:07:07 09:02:2b:05:07:07 + 09:02:2b:05:07:07 09:02:2b:05:07:06 09:02:2b:05:07:06 09:02:2b:05:07:07 09:02:2f:05:07:07 09:0
Re: [HACKERS] pg_receivexlog and replication slots
On 10/01/2014 08:59 AM, Andres Freund wrote: On 2014-10-01 21:54:56 +0900, Michael Paquier wrote: Thanks! On Wed, Oct 1, 2014 at 8:38 PM, Andres Freund wrote: > 0001) Old WIP patch for pg_recvlogical tests. Useful for easier > development. > From where is this patch? Is it some rest from the time when pg_recvlogical was developed? > 0002) Copy editing that should be in 9.4 > Definitely makes sense for a backpatch. > 0003) Rebased patches of yours > 0004) My changes to 0003 besides the rebase. This'll be squashed, but I > thought it might be interesting for you. > (thanks for caring) -/* Drop a replication slot */ +/* Drop a replication slot. */ I don't recall that comments on a single line have a dot even if this single line is a complete sentence. Then it shouldn't start with a uppercase - which you changed... -static void StreamLog(); +static void StreamLogicalLog(); Not related at all to those patches, but for correctness it is better to add a declaration "(void)". Agreed. Except those small things the changes look good to me. Cool. Will push and then, sometime this week, review the next patch. Greetings, Andres Freund You might want to do that function renaming in pg_receivexlog.c as well. Jan -- Jan Wieck Senior Software Engineer http://slony.info -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and replication slots
On 2014-10-01 13:17:17 -0400, Jan Wieck wrote: > >>-static void StreamLog(); > >>+static void StreamLogicalLog(); > >>Not related at all to those patches, but for correctness it is better to > >>add a declaration "(void)". > > > >Agreed. > > > >>Except those small things the changes look good to me. > > > >Cool. Will push and then, sometime this week, review the next patch. > You might want to do that function renaming in pg_receivexlog.c as well. Why? You mean to StreamPhysicalLog()? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and replication slots
On 10/01/2014 01:19 PM, Andres Freund wrote: On 2014-10-01 13:17:17 -0400, Jan Wieck wrote: >>-static void StreamLog(); >>+static void StreamLogicalLog(); >>Not related at all to those patches, but for correctness it is better to >>add a declaration "(void)". > >Agreed. > >>Except those small things the changes look good to me. > >Cool. Will push and then, sometime this week, review the next patch. You might want to do that function renaming in pg_receivexlog.c as well. Why? You mean to StreamPhysicalLog()? Greetings, Andres Freund Like that. Plus, there is one more occurrence of StreamLog() in pg_recvlogical.c. Jan -- Jan Wieck Senior Software Engineer http://slony.info -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and replication slots
On 2014-10-01 13:22:53 -0400, Jan Wieck wrote: > On 10/01/2014 01:19 PM, Andres Freund wrote: > >On 2014-10-01 13:17:17 -0400, Jan Wieck wrote: > -static void StreamLog(); > +static void StreamLogicalLog(); > Not related at all to those patches, but for correctness it is better to > add a declaration "(void)". > >>> > >>>Agreed. > >>> > Except those small things the changes look good to me. > >>> > >>>Cool. Will push and then, sometime this week, review the next patch. > > > >>You might want to do that function renaming in pg_receivexlog.c as well. > > > >Why? You mean to StreamPhysicalLog()? > > Like that. Don't see that much point - it just seemed wrong to have StreamLog do two somewhat different things. And StreamLog() in receivexlog is much older... > Plus, there is one more occurrence of StreamLog() in pg_recvlogical.c. Gah. Wrongly split commit :(. Fixed in 9.4, it's already fixed by the subsequent commit in master. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and replication slots
On 10/01/2014 01:32 PM, Andres Freund wrote: On 2014-10-01 13:22:53 -0400, Jan Wieck wrote: On 10/01/2014 01:19 PM, Andres Freund wrote: >On 2014-10-01 13:17:17 -0400, Jan Wieck wrote: -static void StreamLog(); +static void StreamLogicalLog(); Not related at all to those patches, but for correctness it is better to add a declaration "(void)". >>> >>>Agreed. >>> Except those small things the changes look good to me. >>> >>>Cool. Will push and then, sometime this week, review the next patch. > >>You might want to do that function renaming in pg_receivexlog.c as well. > >Why? You mean to StreamPhysicalLog()? Like that. Don't see that much point - it just seemed wrong to have StreamLog do two somewhat different things. And StreamLog() in receivexlog is much older... The point is that StreamLog is semantically a superset of StreamWhateverLog. Jan Plus, there is one more occurrence of StreamLog() in pg_recvlogical.c. Gah. Wrongly split commit :(. Fixed in 9.4, it's already fixed by the subsequent commit in master. Greetings, Andres Freund -- Jan Wieck Senior Software Engineer http://slony.info -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 2014-09-25 16:50:44 +0200, Andres Freund wrote: > On 2014-09-25 10:44:40 -0400, Robert Haas wrote: > > On Thu, Sep 25, 2014 at 10:42 AM, Robert Haas wrote: > > > On Thu, Sep 25, 2014 at 10:24 AM, Andres Freund > > > wrote: > > >> On 2014-09-25 10:22:47 -0400, Robert Haas wrote: > > >>> On Thu, Sep 25, 2014 at 10:14 AM, Andres Freund > > >>> wrote: > > >>> > That leads me to wonder: Have you measured different, lower, number of > > >>> > buffer mapping locks? 128 locks is, if we'd as we should align them > > >>> > properly, 8KB of memory. Common L1 cache sizes are around 32k... > > >>> > > >>> Amit has some results upthread showing 64 being good, but not as good > > >>> as 128. I haven't verified that myself, but have no reason to doubt > > >>> it. > > >> > > >> How about you push the spinlock change and I crosscheck the partition > > >> number on a multi socket x86 machine? Seems worthwile to make sure that > > >> it doesn't cause problems on x86. I seriously doubt it'll, but ... > > > > > > OK. > > > > Another thought is that we should test what impact your atomics-based > > lwlocks have on this. > > Yes, I'd planned to test that as well. I think that it will noticeably > reduce the need to increase the number of partitions for workloads that > fit into shared_buffers. But it won't do much about exclusive > acquirations of the buffer mapping locks. So I think there's independent > benefit of increasing the number. Here we go. Postgres was configured with. -c shared_buffers=8GB \ -c log_line_prefix="[%m %p] " \ -c log_min_messages=debug1 \ -p 5440 \ -c checkpoint_segments=600 -c max_connections=200 Each individual measurement (#TPS) is the result of a pgbench -h /tmp/ -p 5440 postgres -n -M prepared -c $clients -j $clients -S -T 10 run. Master is as of ef8863844bb0b0dab7b92c5f278302a42b4bf05a. First, a scale 200 run. That fits entirely into shared_buffers: #scale #client #partitions #TPS 200 1 16 8353.547724 8145.296655 8263.295459 200 16 16 171014.763118 193971.091518 133992.128348 200 32 16 259119.988034 234619.421322 201879.618322 200 64 16 178909.038670 179425.091562 181391.354613 200 96 16 141402.895201 138392.705402 137216.416951 200 128 16 125643.089677 124465.288860 122527.209125 (other runs here stricken, they were contorted due some concurrent activity. But nothing interesting). So, there's quite some variation in here. Not very surprising given the short runtimes, but still. Looking at a profile nearly all the contention is around GetSnapshotData(). That might hide the interesting scalability effects of the partition number. So I next tried my rwlock-contention branch. #scale #client #partitions #TPS 200 1 1 8540.390223 8285.628397 8497.022656 200 16 1 136875.484896 164302.769380 172053.413980 200 32 1 308624.650724 240502.019046 260825.231470 200 64 1 453004.188676 406226.943046 406973.325822 200 96 1 442608.459701 450185.431848 445549.710907 200 128 1 487138.077973 496233.594356 457877.992783 200 1 16 9477.217454 8181.098317 8457.276961 200 16 16 154224.573476 170238.637315 182941.035416 200 32 16 302230.215403 285124.708236 265917.729628 200 64 16 405151.647136 443473.797835 456072.782722 200 96 16 443360.377281 457164.981119 474049.685940 200 128 16 490616.257063 458273.380238 466429.948417 200 1 64 8410.981874 11554.7089668359.294710 200 16 64 139378.312883 168398.919590 166184.744944 200 32 64 288657.701012 283588.901083 302241.706222 200 64 64 424838.919754 416926.779367 436848.292520 200 96 64 462352.017671 446384.114441 483332.592663 200 128 64 471578.594596 488862.395621 466692.726385 200 1 128 8350.274549 8140.699687 8305
Re: [HACKERS] "Value locking" Wiki page
On Wed, Oct 1, 2014 at 4:23 AM, Simon Riggs wrote: > Quoting general research and other points about value locking are > reasonable in the general section, but not in the description for 1. I also made a similar comparison of #3. I don't think that reflects a bias. > I'm glad you've called the first option "Heavyweight Page Locking". > That name sums up what I see as the major objection to it, which is > that performance and scalability of the whole server will be damaged > signiciantly by using heavyweight locks for each row inserted. Please > add that concern as a negative; I'm sure testing has been done, but > not comparative testing against other techniques. Comparative testing against both other techniques (#1, at a time when #1 and #2 were otherwise comparable), and plain inserts and updates has shown the performance to be very good. The evidence suggests that using a heavyweight lock like this is fine. Don't promise tuples use heavyweight locks? The coarser granularity did not appear to be significant once we optimize retries to avoid hwlocking. #1 came out a bit ahead of #2. So maybe the bloat of #2 is almost, but not quite, cancelled out by the hwlocking coarseness. But then, I specifically stated that it seemed that not having bloating was not much of an advantage for #1. We're going to have a benchmark between #1 and #2 when #2 is properly integrated with the rest of the updated patch (just because we can). #1 is the fastest of #1 and #2 last I checked, but there wasn't all that much in it. > I am motivated to > explain the promise index tuple approach further because of this, > which is an optimistic approach to locking. > The stated disadvantages for 3 are not accurate, to put it mildly. Honestly, how could I know what they are? The explanation I heard from you and Andres has been very short on details. The points for #3 are *my* concerns. I had to start somewhere. I'll consider your rebuttal to those points that you make on a new thread separately. > But that's been useful because what was a small thought experiment is > beginning to look like a useful approach. I will post a summary of my > understanding. Thanks. Actually, this wiki page will probably pay for itself just by allowing us to refer to #1, #2, and #3. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 2014-10-01 20:54:39 +0200, Andres Freund wrote: > Here we go. > > Postgres was configured with. > -c shared_buffers=8GB \ > -c log_line_prefix="[%m %p] " \ > -c log_min_messages=debug1 \ > -p 5440 \ > -c checkpoint_segments=600 > -c max_connections=200 Robert reminded me that I missed to report the hardware aspect of this... 4x E5-4620 @ 2.20GHz: 32 cores, 64 threads 256GB RAM Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "Value locking" Wiki page
On Wed, Oct 1, 2014 at 6:49 AM, Heikki Linnakangas wrote: > Well, if nothing else, it would be nice to fix the concurrency issue we have > with exclusion constraints today, which is that if two backends insert the > same value at the same time, they might both get an error, even though you'd > only need to abort one of them. That's a special case of UPSERT if you > squint a little. In my view, it makes sense to fix that, and to make INSERT ... ON CONFLICT IGNORE work with exclusion constraints. However, it does not make sense to have INSERT ... ON CONFLICT UPDATE work with exclusion constraints. The user-visible semantics are very tricky, and I don't think it's useful. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum scheduling starvation and frenzy
On Wed, Oct 1, 2014 at 11:44 AM, Alvaro Herrera wrote: > Robert Haas wrote: >> This kind of seems like throwing darts at the wall. It could be >> better if we are right to skip the database already being vacuumed for >> wraparound, or worse if we're not. > > Well, it only skips the DB for half the naptime interval, so that other > databases have a chance to be chosen before that. If you set up a > nonsensical interval such as one day, this might be problematic. > > (I'm not sure I understand the darts analogy.) I guess I meant: this seems pretty hit-or-miss. I don't see why we should expect it to be better than what we have now. Sure, maybe there's a table in some other database that needs to be vacuumed for bloat more urgently than a table in the wraparound database needs to be vacuumed to prevent XID wraparound. But the reverse could be true also - in which case your patch could cause a cluster that would merely have bloated to instead shut down. The way to really know would be for the AV launcher to have knowledge of how many tables there are in each database that are beyond the wraparound theshold and not already been vacuumed. Then we could skip wraparound databases where that number is 0, and give priority to those where it isn't. I guess this is more or less what you said in the portion of your email I'm not quoting here, but like you I'm not quite sure how to implement that. Still, I'm reluctant to just go change the behavior; I think it's optimistic to think that any algorithm for making decisions without real knowledge will be better than any other. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Promise index tuples for UPSERT
On 10/01/2014 02:34 PM, Simon Riggs wrote: Summary of algorithm to use "promise tuples" for concurrency control during UPSERT 1. Perform btree search to location of key, if it exists. a) If an unkilled index tuple exists, we decide this is an UPDATE and drop straight thru to step 2 b) If it does not exist, insert a "promise" tuple into unique index(s) - marked with the xid of the inserting transaction, but using the key. This happens while the page is locked, so it is not possible to insert a second promise tuple concurrently. Record the btree blockid on the index scan and move to step 3 When later insert scans see the promise tuple they perform XactLockTableWait() and when they get control they look again for the key. If they find a promise tuple with an aborted xid they replace that value with their own xid and continue as a). Otherwise b). XactLockTableWait() waits until the end of transaction, that's not you want here. If the backend that inserted the promise tuple decides to not proceed with the insertion, and removes the promise tuple, the backend waiting on it needs to be woken up more or less immediately, not when the transaction completes. I had this exact same issue in my earlier patch versions, fixed it with a new kind of heavy-weight lock, and a new field in PGPROC (http://www.postgresql.org/message-id/52d00d2d.6030...@vmware.com). That wasn't very pretty, but it got the job done. Some other design might be better. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Promise index tuples for UPSERT
On Wed, Oct 1, 2014 at 4:34 AM, Simon Riggs wrote: > Summary of algorithm to use "promise tuples" for concurrency control > during UPSERT > Index bloat is less of a problem than with normal inserts since there > are additional ways of removing promise tuples. Only one index tuple > at a time can have a specific value, so we would actually reduce heap > bloat caused by concurrent inserts. I am not all that concerned about bloat. I didn't think it was a major advantage of #1 relative to #2 when I investigated the differences, and looked at the numbers. And I'm speaking as the advocate of the design with zero bloat. > It's possible we find existing rows that we can't see according to our > snapshot. That is handled in exactly the same as the way we treat > UPDATEs. This isn't a special case in the patch. It's more like REPEATABLE READ and SERIALIZABLE have a special responsibility to make sure they're not updating an invisible-to-MVCC-snapshot (not "instantaneously invisible", by which I mean invisible in the HeapTupleSatisfiesUpdate() sense). Otherwise, we don't care if it's MVCC-visible or not. I imagine that by "That is handled in exactly the same as the way we treat UPDATEs", you mean that we do the EvalPlanQual() stuff. We only do that in the event of a concurrent UPDATE/DELETE, though. Otherwise, we trust the underlying relation scan to accurate represent that tuples pulled up are visible. So I don't understand the comparison (but tell me if I've misunderstood). > There is a potential loop here in that we might find an index tuple, > follow it, find out the tuple is actually dead then return to insert a > promise tuple, only to find that someone else just did that and we > have to wait/re-follow the link to update the new tuple. That is an > extremely unlikely race condition, though possible; there is no heavy > locking to prevent that in this approach. In principle deadlocks are > possible from that, but that is not any different from the current > case where people that insert same key at same time might cause > deadlocks. Its not a common application pattern and not something we > should be protecting against. People are going to use this feature in cases where they could almost use an UPDATE. It will happen if you let it happen. But even if you don't, a guarantee is infinitely more useful than a non-guarantee to app developers. I'll be up-front about this: you have very little chance of getting me to budge on this point. I *really* hate the idea of allowing this kind of thing. I don't think I'm alone here. > All of this is only needed for unique indexes. > > It can handled by a new API call for acquire_value_lock() and > release_value_lock(), or similar. > > Advantages > * We don't do anything weird in the heap - if this breaks, data is not corrupt Index corruption leads to wrong answers. I don't think this is a very good argument, fwiw. > * There is no heap bloat or index bloat above existing levels Fair enough. > * The approach leverages existing mechanism for transaction waiting That's not really an advantage. That more or less applies to all designs. > * Optimistic approach to value locking will improve performance over > heavy weight locking There is no evidence that promise index tuples will perform better. #2 didn't perform better than #1. > Disadvantages > * Not written yet - <1 month to code, still possible for 9.5, doesn't look > hard Maybe, but that is beside the point, which is: If there were any fundamental problems with either #1 or #2, I think I'd have figured them out by now. They are less risky today - it might be that #3 turns out to have the same properties, but I cannot be sure. That counts for something. I feel perfectly entitled to hold that kind of uncertainty against any design, tbh. > Other stated possible disadvantages > * Violates existing invariant that every index tuple has a > corresponding heap tuple, which goes back to the Berkeley days. > Currently, we always create heap tuples first, and physically delete > them last. [Why is that a problem? Could be, but why?] Unknown unknowns. Huge amounts of code must be audited to figure out that it isn't an issue. So I don't know, but then I'm not the one making the proposal. > ("Deleting them last" implies something is being touched in that > regard by this suggestion. I'm not sure what you are referring to) The uncertain scope of the problem is a big part of the problem. > * Relatedly, concern about breaking VACUUM. VACUUMing of btrees occurs > with a list of TIDs to kill in index, built from the heap. Current > bulk delete callback cannot be used for this. Super-exclusive lock is > needed to delete tuples in btree page (i.e. VACUUM). Normally skipping > of LockBufferForCleanup() (as added by bbb6e559) works okay in heap > because it tends to imply that list of tids won't be bulk deleted in > index, where we currently cannot skip for failure to quickly acquire > super exclusive lock. So this could make
Re: [HACKERS] Promise index tuples for UPSERT
On Wed, Oct 1, 2014 at 12:54 PM, Heikki Linnakangas wrote: > XactLockTableWait() waits until the end of transaction, that's not you want > here. If the backend that inserted the promise tuple decides to not proceed > with the insertion, and removes the promise tuple, the backend waiting on it > needs to be woken up more or less immediately, not when the transaction > completes. Simon has not been inconsistent here: he has said that deadlocks may be possible. I happen to think that allowing them would be a major mistake on our part, but that's another story. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
* Robert Haas (robertmh...@gmail.com) wrote: > On Mon, Sep 29, 2014 at 12:05 PM, Stephen Frost wrote: > > Perhaps I'm just being a bit over the top, but all this per-character > > work feels a bit ridiculous.. When we're using MAXIMUM_ALIGNOF, I > > suppose it's not so bad, but is there no hope to increase that and make > > this whole process more efficient? Just a thought. > > I'm not sure I understand what you're getting at here. Was just thinking that we might be able to work out what needs to be done without having to actually do it on a per-character basis. That said, I'm not sure it's really worth the effort given that we're talking about at most 8 bytes currently. I had originally been thinking that we might increase the minimum size as it might make things more efficient, but it's not clear that'd really end up being the case either and, regardless, it's probably not worth worrying about at this point. > > After reading through the code for 0001, I decided to actually take it > > out for a spin- see attached. I then passed a few megabytes of data > > through it and it seemed to work just fine. > > That's good. Just to note this- the testing which I did used a random number of segments and random amounts of data with each and hit specific edge cases and all looked good. > > Lastly, I will say that I feel it'd be good to support bi-directional > > communication as I think it'll be needed eventually, but I'm not sure > > that has to happen now. > > I agree we need bidirectional communication; I just don't agree that > the other direction should use the libpq format. The data going from > the worker to the process that launched it is stuff like errors and > tuples, for which we already have a wire format. The data going in > the other direction is going to be things like plan trees to be > executed, for which we don't. But if we can defer the issue, so much > the better. Things will become clearer as we get closer to being > done. Sounds good to me. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [v9.5] Custom Plan API
2014-10-02 0:41 GMT+09:00 Merlin Moncure : > On Tue, Jul 8, 2014 at 6:55 AM, Kouhei Kaigai wrote: >> * Syntax also reflects what the command does more. New syntax to >> define custom plan provider is: >> CREATE CUSTOM PLAN PROVIDER >> FOR HANDLER ; > > -1 on 'cpp' prefix. I don't see acronyms used in the syntax > documentation and cpp will make people reflexively think 'c++'. How > about and ? > It is not a living code. I already eliminated the SQL syntax portion, instead of the internal interface (register_custom_path_provider) that registers callbacks on extension load time. Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] superuser() shortcuts
Stephen, We, as a community, have gotten flak from time-to-time about the > superuser role. We also tend to wish to avoid unnecessary > optimization as it complicates the code base and makes folks reviewing > the code wonder at the exceptions. > I have attached a patch for consideration/discussion that addresses these items. I have based it off of current master (0c013e0). I have done some cursory testing including check-world with success. > My 2c about this function is that it should be completely removed > and the place where it's checked replaced with just the > 'has_rolreplication' call and error. It's only called in one > place and it'd be a simple one-liner anyway. As for > has_rolreplication, I don't understand why it's in miscinit.c when > the rest of the has_* set is in acl.c. > With all the other pg_has_* functions being found in acl.c I agree that it would seem odd that this (or any other related function) would be found elsewhere. Since aclchk.c also has a couple of functions that follow the pattern of "has__privilege", I moved this function there, for now, as well as modified it to include superuser checks as they were not previously there. The only related function I found in acl.c was "has_rolinherit" which is a static function. :-/ There is also a static function "has_rolcatupdate" in aclchk.c. I would agree that acl.c (or aclchk.c?) would be a more appropriate location for these functions, though in some of the above cases, it might require exposing them (I'm not sure that I see any harm in doing so). Perhaps refactoring to consolidate all of these in either acl.c or aclchk.c with the following pattern might be a step in the right direction? * has_createrole_privilege * has_bypassrls_privilege * has_inherit_privilege * has_catupdate_privilege * has_replication_privilege * has_???_privilege In either case, I think following a "convention" on the naming of these functions (unless there is semantic reason not to) would also help to reduce any confusion or head scratching. Removing these design patterns may also help to avoid ending up with > more of them in the future as folks copy and/or crib off of what we've > already done to implement their features... > I agree. -Adam -- Adam Brightwell - adam.brightw...@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c new file mode 100644 index 133143d..695a13c *** a/src/backend/access/transam/xlogfuncs.c --- b/src/backend/access/transam/xlogfuncs.c *** *** 27,32 --- 27,33 #include "miscadmin.h" #include "replication/walreceiver.h" #include "storage/smgr.h" + #include "utils/acl.h" #include "utils/builtins.h" #include "utils/numeric.h" #include "utils/guc.h" *** pg_start_backup(PG_FUNCTION_ARGS) *** 54,60 backupidstr = text_to_cstring(backupid); ! if (!superuser() && !has_rolreplication(GetUserId())) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), errmsg("must be superuser or replication role to run a backup"))); --- 55,61 backupidstr = text_to_cstring(backupid); ! if (!has_replication_privilege(GetUserId())) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), errmsg("must be superuser or replication role to run a backup"))); *** pg_stop_backup(PG_FUNCTION_ARGS) *** 82,88 { XLogRecPtr stoppoint; ! if (!superuser() && !has_rolreplication(GetUserId())) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), (errmsg("must be superuser or replication role to run a backup"; --- 83,89 { XLogRecPtr stoppoint; ! if (!has_replication_privilege(GetUserId())) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), (errmsg("must be superuser or replication role to run a backup"; diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c new file mode 100644 index d30612c..a386fe2 *** a/src/backend/catalog/aclchk.c --- b/src/backend/catalog/aclchk.c *** has_createrole_privilege(Oid roleid) *** 5080,5085 --- 5080,5107 return result; } + /* + * Check whether specified role has REPLICATION priviledge (or is a superuser) + */ + bool + has_replication_privilege(Oid roleid) + { + bool result = false; + HeapTuple utup; + + /* Superusers bypass all permission checking. */ + if (superuser_arg(roleid)) + return true; + + utup = SearchSysCache1(AUTHOID, ObjectIdGetDatum(roleid)); + if (HeapTupleIsValid(utup)) + { + result = ((Form_pg_authid) GETSTRUCT(utup))->rolreplication; + ReleaseSysCache(utup); + } + return result; + } + bool has_bypassrls_privilege(Oid roleid) { diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c new file mode 100644 index c9a9baf..ed89b23 *** a/src/backend/commands/alter.c -
Re: [HACKERS] "Value locking" Wiki page
On Wed, Oct 1, 2014 at 2:42 PM, Simon Riggs wrote: >> GiST supports exclusion constraints. That is one of the main reasons I want >> to do promise tuples, instead of locking within the indexam: to support this >> feature with exclusion constraints. > > That does sound interesting, but I am concerned the semantics may cause > issues. > > If I go to insert a row for 'UK' and find an existing row for > 'Europe', do we really want to update the population of Europe to be > the population of the UK, simply because the UK and Europe have an > exclusion conflict? > > Please give some concrete examples of a business request that might be > satisified by such a feature. The ON CONFLICT UPDATE semantics don't seem particularly useful for exclusion constraints. However, a reasonable business request for exclusion constraints would be to have a "boss mode" for the canonical room reservation example - an INSERT that is guaranteed not to fail by either deleting conflicting rows or updating them so the exclusion constraints don't overlap (e.g. truncate the time intervals) or the rows fail the index predicate (e.g. soft delete). AFAICS this is currently not possible to implement correctly without a retry loop. The hypothetical ON CONFLICT REPLACE and ON CONFLICT UPDATE-AND-THEN-INSERT modes would also make sense in the unique index case. Not saying that I view this as necessary for the first cut of the feature, just providing an example where it could be useful. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] UPSERT wiki page, and SQL MERGE syntax
Having been surprisingly successful at advancing our understanding of arguments for and against various approaches to "value locking", I decided to try the same thing out elsewhere. I have created a general-purpose UPSERT wiki page. The page is: https://wiki.postgresql.org/wiki/UPSERT Right now, I would like to address the less contentious but still unresolved question of whether or not we should use the SQL-standard MERGE syntax instead of the non-standard INSERT ... ON CONFLICT UPDATE syntax that my patch proposes. Note that I'm trying to direct the conversation along certain lines: Is MERGE the right syntax for this particular effort ("UPSERT")? And, in particular, not: Is SQL MERGE useful? I think that it is useful, but is a largely unrelated feature. Please correct the Wiki page if I have failed to credit SQL MERGE with some advantage that someone stated at some point. I don't recall any arguments for it, other than that it is in the SQL standard, but maybe I missed one. In general, add to this page, and edit it as you see fit. It'll be useful to centralize the references, discussion and state of the patch in one agreed upon place, as the patch continues to evolve. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins
Tomas Vondra wrote: > On 12.9.2014 23:22, Robert Haas wrote: >> My first thought is to revert to NTUP_PER_BUCKET=1, but it's >> certainly arguable. Either method, though, figures to be better than >> doing nothing, so let's do something. > > OK, but can we commit the remaining part first? Because it'll certainly > interact with this proposed part, and it's easier to tweak when the code > is already there. Instead of rebasing over and over. The patch applied and built without problem, and pass `make check-world`. The only thing that caught my eye was the addition of debug code using printf() instead of logging at a DEBUG level. Is there any reason for that? I still need to work through the (rather voluminous) email threads and the code to be totally comfortable with all the issues, but if Robert and Heikki are comfortable with it, I'm not gonna argue. Preliminary benchmarks look outstanding! I tried to control carefully to ensure consistent results (by dropping, recreating, vacuuming, analyzing, and checkpointing before each test), but there were surprising outliers in the unpatched version. It turned out that it was because slight differences in the random samples caused different numbers of buckets for both unpatched and patched, but the patched version dealt with the difference gracefully while the unpatched version's performance fluctuated randomly. My thinking is that we should get this much committed and then discuss further optimizations. I tried to throw something at it that would be something of a "worst case" because with the new code it would start with one batch and go to two. Five runs per test, alternating between unpatched and patched. First I tried with the default work_mem size of 4MB: Planning time / Execution time (ms) unpatched, work_mem = 4MB 0.694 / 10392.930 0.327 / 10388.787 0.412 / 10533.036 0.650 / 17186.719 0.338 / 10707.423 Fast: Buckets: 2048 Batches: 1 Memory Usage: 3516kB Slow: Buckets: 1024 Batches: 1 Memory Usage: 3516kB patched, work_mem = 4MB 0.764 / 5021.792 0.655 / 4991.887 0.436 / 5068.777 0.410 / 5057.902 0.444 / 5021.543 1st & 5th: Buckets: 131072 (originally 1024) Batches: 2 (originally 1) Memory Usage: 3073kB all others: Buckets: 131072 (originally 2048) Batches: 2 (originally 1) Memory Usage: 3073kB Then, just to see how both dealt with extra memory I did it again with 1GB: unpatched, work_mem = 1GB 0.328 / 10407.836 0.319 / 10465.348 0.324 / 16606.948 0.569 / 10408.671 0.542 / 16996.209 Memory usage same as before. Guess which runs started with 1024 buckets. ;-) 0.556 / 3974.741 0.325 / 4012.613 0.334 / 3941.734 0.834 / 3894.391 0.603 / 3959.440 2nd & 4th: Buckets: 131072 (originally 2048) Batches: 1 (originally 1) Memory Usage: 4540kB all others: Buckets: 131072 (originally 1024) Batches: 1 (originally 1) Memory Usage: 4540kB My only concern from the benchmarks is that it seemed like there was a statistically significant increase in planning time: unpatched plan time average: 0.450 ms patched plan time average: 0.536 ms That *might* just be noise, but it seems likely to be real. For the improvement in run time, I'd put up with an extra 86us in planning time per hash join; but if there's any way to shave some of that off, all the better. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] superuser() shortcuts
Adam, * Brightwell, Adam (adam.brightw...@crunchydatasolutions.com) wrote: > > We, as a community, have gotten flak from time-to-time about the > > superuser role. We also tend to wish to avoid unnecessary > > optimization as it complicates the code base and makes folks reviewing > > the code wonder at the exceptions. > > I have attached a patch for consideration/discussion that addresses these > items. I have based it off of current master (0c013e0). I have done some > cursory testing including check-world with success. Thanks! Please add it to the next commitfest. > With all the other pg_has_* functions being found in acl.c I agree that it > would seem odd that this (or any other related function) would be found > elsewhere. Since aclchk.c also has a couple of functions that follow the > pattern of "has__privilege", I moved this function there, for now, as > well as modified it to include superuser checks as they were not previously > there. The only related function I found in acl.c was "has_rolinherit" > which is a static function. :-/ There is also a static function > "has_rolcatupdate" in aclchk.c. I would agree that acl.c (or aclchk.c?) > would be a more appropriate location for these functions, though in some of > the above cases, it might require exposing them (I'm not sure that I see > any harm in doing so). I don't think has_rolinherit or has_rolcatupdate really need to move and it seems unlikely that they'd be needed from elsewhere.. Is there a reason you think they'd need to be exposed? I've not looked at the patch at all though, perhaps that makes it clear. > Perhaps refactoring to consolidate all of these in either acl.c or aclchk.c > with the following pattern might be a step in the right direction? > > * has_createrole_privilege > * has_bypassrls_privilege These are already in the right place, right? > * has_inherit_privilege > * has_catupdate_privilege These probably don't need to move as they're only used in the .c files that they're defined in (unless there's a reason that needs to change). > * has_replication_privilege This is what I was on about, so I agree that it should be created. ;) > * has_???_privilege Right, other things might be 'has_backup_privilege', for things like pg_start/stop_backup and friends. > In either case, I think following a "convention" on the naming of these > functions (unless there is semantic reason not to) would also help to > reduce any confusion or head scratching. Agreed. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)
On 10/02/2014 12:19 AM, Ilya Kosmodemiansky wrote: > From an Oracle DBA's point of view, currently we have a lack of > performance diagnostics tools. Agreed. Sometimes very frustratingly so. > Even better idea is to collect daily LWLock distribution, find most > frequent of them etc. While we could add this within PostgreSQL, I wonder if it's worth looking at whether it can be done non-intrusively with operating system facilities like perf. I've had really good results using perf to trace and graph xlog generation and other metrics in the past. > The patch https://commitfest.postgresql.org/action/patch_view?id=885 > (discussion starts here I hope - > http://www.postgresql.org/message-id/4fe8ca2c.3030...@uptime.jp) > demonstrates performance problems; LWLOCK_STAT, LOCK_DEBUG and > DTrace-like approach are slow, unsafe for production use and a bit > clumsy for using by DBA. It's not at all clear to me that a DTrace-like (or perf-based, rather) approach is unsafe, slow, or unsuitable for production use. Resolving lock IDs to names might be an issue, though. With appropriate wrapper tools I think we could have quite a useful library of perf-based diagnostics and tracing tools for PostgreSQL. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "Value locking" Wiki page
On 10/02/2014 03:06 AM, Peter Geoghegan wrote: > In my view, it makes sense to fix that, and to make INSERT ... ON > CONFLICT IGNORE work with exclusion constraints. However, it does not > make sense to have INSERT ... ON CONFLICT UPDATE work with exclusion > constraints. The user-visible semantics are very tricky, and I don't > think it's useful. If you were doing a multi-valued INSERT ... ON CONFLICT IGNORE and wanted to do this with exclusion constraints, what would you do if the insert its self contains values that can't exist in the table together? Right now the whole insert will fail. Would that still be the case? Would you insert one tuple then ignore the other? If so, what guarantee if any would be made about which tuple would get inserted? I think insert...ignore for exclusion constraints could be interesting, but I'm not sure it's really the same as the upsert case. I guess I also just don't really understand the utility of insert ... on conflict ignore for GiST exclusion constraints. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On 10/02/2014 07:52 AM, Peter Geoghegan wrote: > Having been surprisingly successful at advancing our understanding of > arguments for and against various approaches to "value locking", I > decided to try the same thing out elsewhere. I have created a > general-purpose UPSERT wiki page. > > The page is: https://wiki.postgresql.org/wiki/UPSERT Thanks. That'll help keep things moving forward rather than around in circles. > In general, add to this page, and edit it as you see fit. It'll be > useful to centralize the references, discussion and state of the patch > in one agreed upon place, as the patch continues to evolve. I added a summary of the status quo of upsert in Pg as it stands, and a brief discussion of the state in other RDBMSes. I'd love it if someone who knows MySQL better could add info on MySQL's ON DUPLICATE KEY feature - advantages/problems, etc. I've added a few points to the goals section: - Any new upsert approach must be a usability improvement on the status quo; we don't want to introduce subtle behaviour or unnecessary foot-guns. - if possible, upsert of multiple values is desirable. We currently have to loop on a per-value basis. ... and some miscellaneous edits/formatting changes. I've also added sections for the other options: * A custom syntax https://wiki.postgresql.org/wiki/UPSERT#Custom_syntax and * Adopting an existing non-standard syntax https://wiki.postgresql.org/wiki/UPSERT#Adopting_an_existing_non-standard_syntax which I'd appreciate it if you could fill out based on your existing research and notes. I don't think it makes sense for me to write those when you've already done the required study/note taking and just need to transfer it over. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping from blocked send() reprised.
> >> Sorry, I missed this message and only cought up when reading your CF > >> status mail. I've attached three patches: > > > > Could let me know how to get the CF status mail? > > I think he meant this email I sent last weekend: > > http://www.postgresql.org/message-id/542672d2.3060...@vmware.com I see, that's what I also received. Thank you. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Typo fixes in src/backend/rewrite/rewriteHandler.c
Here is the comments in process_matched_tle() in rewriteHandler.c. 883 * such nodes; consider 884 * UPDATE tab SET col.fld1.subfld1 = x, col.fld2.subfld2 = y 885 * The two expressions produced by the parser will look like 886 * FieldStore(col, fld1, FieldStore(placeholder, subfld1, x)) 887 * FieldStore(col, fld2, FieldStore(placeholder, subfld2, x)) I think the second one is not correct and should be FieldStore(col, fld2, FieldStore(placeholder, subfld2, y)) Just like this, 891 * FieldStore(FieldStore(col, fld1, 892 *FieldStore(placeholder, subfld1, x)), 893 * fld2, FieldStore(placeholder, subfld2, x)) should be FieldStore(FieldStore(col, fld1, FieldStore(placeholder, subfld1, x)), fld2, FieldStore(placeholder, subfld2, y)) Patch attached. Thanks, Best regards, Etsuro Fujita diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c index cb65c05..93fda07 100644 --- a/src/backend/rewrite/rewriteHandler.c +++ b/src/backend/rewrite/rewriteHandler.c @@ -883,13 +883,13 @@ process_matched_tle(TargetEntry *src_tle, * UPDATE tab SET col.fld1.subfld1 = x, col.fld2.subfld2 = y * The two expressions produced by the parser will look like * FieldStore(col, fld1, FieldStore(placeholder, subfld1, x)) - * FieldStore(col, fld2, FieldStore(placeholder, subfld2, x)) + * FieldStore(col, fld2, FieldStore(placeholder, subfld2, y)) * However, we can ignore the substructure and just consider the top * FieldStore or ArrayRef from each assignment, because it works to * combine these as * FieldStore(FieldStore(col, fld1, * FieldStore(placeholder, subfld1, x)), - * fld2, FieldStore(placeholder, subfld2, x)) + * fld2, FieldStore(placeholder, subfld2, y)) * Note the leftmost expression goes on the inside so that the * assignments appear to occur left-to-right. * -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers