Re: [HACKERS] Slow synchronous logical replication
On Mon, Oct 9, 2017 at 4:37 PM, Konstantin Knizhnik wrote: > Thank you for explanations. > > On 08.10.2017 16:00, Craig Ringer wrote: >> >> I think it'd be helpful if you provided reproduction instructions, >> test programs, etc, making it very clear when things are / aren't >> related to your changes. > > > It will be not so easy to provide some reproducing scenario, because > actually it involves many components (postgres_fdw, pg_pasthman, > pg_shardman, LR,...) > and requires multinode installation. > But let me try to explain what going on: > So we have implement sharding - splitting data between several remote tables > using pg_pathman and postgres_fdw. > It means that insert or update of parent table cause insert or update of > some derived partitions which is forwarded by postgres_fdw to the > correspondent node. > Number of shards is significantly larger than number of nodes, i.e. for 5 > nodes we have 50 shards. Which means that at each onde we have 10 shards. > To provide fault tolerance each shard is replicated using logical > replication to one or more nodes. Right now we considered only redundancy > level 1 - each shard has only one replica. > So from each node we establish 10 logical replication channels. > > We want commit to wait until data is actually stored at all replicas, so we > are using synchronous replication: > So we set synchronous_commit option to "on" and include all ten 10 > subscriptions in synchronous_standby_names list. > > In this setup commit latency is very large (about 100msec and most of the > time is actually spent in commit) and performance is very bad - pgbench > shows about 300 TPS for optimal number of clients (about 10, for larger > number performance is almost the same). Without logical replication at the > same setup we get about 6000 TPS. > > I have checked syncrepl.c file, particularly SyncRepGetSyncRecPtr function. > Each wal sender independently calculates minimal LSN among all synchronous > replicas and wakeup backends waiting for this LSN. It means that transaction > performing update of data in one shard will actually wait confirmation from > replication channels for all shards. > If some shard is updated rarely than other or is not updated at all (for > example because communication channels between this node is broken), then > all backens will stuck. > Also all backends are competing for the single SyncRepLock, which also can > be a contention point. > IIUC, I guess you meant to say that in current synchronous logical replication a transaction has to wait for updated table data to be replicated even on servers that don't subscribe for the table. If we change it so that a transaction needs to wait for only the server that are subscribing for the table it would be more efficiency, for at least your use case. We send at least the begin and commit data to all subscriptions and then wait for the reply from them but can we skip to wait them, for example, when the walsender actually didn't send any data modified by the transaction? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How does postgres store the join predicate for a relation in a given query
On Tue, Oct 10, 2017 at 7:29 PM, Gourav Kumar wrote: > Hi all, > > When you fire a query in postgresql, it will first parse the query and > create the data structures for storing various aspects of the query and > executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.). > > I want to know how does postgresql stores the join predicates of a query. > Like which data structure is used to store the join predicates. > > How can we find the join predicates applied on a relation from relid, Oid or > RangeTblEntry ? > Every relation has a RelOptInfo associated with it. Predicates applicable to it are stored in this RelOptInfo as a list. For base relations (simple tables) it's in baserestrictinfo. The join predicates applicable are in joininfo. You can get RelOptInfo of a given simple table using find_base_rel(). HTH. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] show precise repos version for dev builds?
On Sun, Oct 1, 2017 at 8:10 AM, Tom Lane wrote: > > configure --with-extra-version=whateveryouwant I see that this build option has been around since 9.4; is anyone using it to mark patched production builds? EnterpriseDB or 2ndQuadrant? How about the cloud providers? -Jeremy -- http://about.me/jeremy_schneider -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
Hi, On 2017-10-03 13:58:37 -0400, Robert Haas wrote: > On Tue, Oct 3, 2017 at 12:23 PM, Andres Freund wrote: > > Makes sense? > > Yes. Here's an updated version of this patchset. Changes: - renamed pq_send$type_pre to pq_write*type - renamed pq_beginmessage_pre/pq_beginmessage_keep to the _reuse suffix - removed unaligned memory access issues by again using memcpy - gcc and clang both successfully optimize it away - moved permanent buffer for SendRowDescriptionMessage to postgres.c, and have other callers use already pre-existing buffers. - replace all pq_sendint with pq_sendint$width in core - converted applicable pq_begin/endmessage in printtup.c users to use DR_printtup->buf. - added comments explaining restrict usage - expanded commit messages considerably - Small stuff. The naming I'd discussed a bit back and forth with Robert over IM, thanks! - Andres >From 89e301384c9fbc071de19c1517ffe29371fc6f36 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Wed, 20 Sep 2017 13:01:22 -0700 Subject: [PATCH 1/6] Add configure infrastructure to detect support for C99's restrict. Will be used in later commits improving performance for a few key routines where information about aliasing allows for significantly better code generation. This allows to use the C99 'restrict' keyword without breaking C89, or for that matter C++, compilers. If not supported it's defined to be empty. Author: Andres Freund Discussion: https://postgr.es/m/20170914063418.sckdzgjfrsbek...@alap3.anarazel.de --- configure | 46 +++ configure.in | 1 + src/include/pg_config.h.in| 14 + src/include/pg_config.h.win32 | 6 ++ 4 files changed, 67 insertions(+) diff --git a/configure b/configure index b0582657bf4..ca54242d5d7 100755 --- a/configure +++ b/configure @@ -11545,6 +11545,52 @@ _ACEOF ;; esac +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for C/C++ restrict keyword" >&5 +$as_echo_n "checking for C/C++ restrict keyword... " >&6; } +if ${ac_cv_c_restrict+:} false; then : + $as_echo_n "(cached) " >&6 +else + ac_cv_c_restrict=no + # The order here caters to the fact that C++ does not require restrict. + for ac_kw in __restrict __restrict__ _Restrict restrict; do + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +typedef int * int_ptr; + int foo (int_ptr $ac_kw ip) { + return ip[0]; + } +int +main () +{ +int s[1]; + int * $ac_kw t = s; + t[0] = 0; + return foo(t) + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO"; then : + ac_cv_c_restrict=$ac_kw +fi +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext + test "$ac_cv_c_restrict" != no && break + done + +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_restrict" >&5 +$as_echo "$ac_cv_c_restrict" >&6; } + + case $ac_cv_c_restrict in + restrict) ;; + no) $as_echo "#define restrict /**/" >>confdefs.h + ;; + *) cat >>confdefs.h <<_ACEOF +#define restrict $ac_cv_c_restrict +_ACEOF + ;; + esac + { $as_echo "$as_me:${as_lineno-$LINENO}: checking for printf format archetype" >&5 $as_echo_n "checking for printf format archetype... " >&6; } if ${pgac_cv_printf_archetype+:} false; then : diff --git a/configure.in b/configure.in index 4548db0dd3c..ab990d69f4c 100644 --- a/configure.in +++ b/configure.in @@ -1299,6 +1299,7 @@ fi m4_defun([AC_PROG_CC_STDC], []) dnl We don't want that. AC_C_BIGENDIAN AC_C_INLINE +AC_C_RESTRICT PGAC_PRINTF_ARCHETYPE AC_C_FLEXIBLE_ARRAY_MEMBER PGAC_C_SIGNED diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in index b0298cca19c..80ee37dd622 100644 --- a/src/include/pg_config.h.in +++ b/src/include/pg_config.h.in @@ -923,6 +923,20 @@ if such a type exists, and if the system does not define it. */ #undef intptr_t +/* Define to the equivalent of the C99 'restrict' keyword, or to + nothing if this is not supported. Do not define if restrict is + supported directly. */ +#undef restrict +/* Work around a bug in Sun C++: it does not support _Restrict or + __restrict__, even though the corresponding Sun C compiler ends up with + "#define restrict _Restrict" or "#define restrict __restrict__" in the + previous line. Perhaps some future version of Sun C++ will work with + restrict; if so, hopefully it defines __RESTRICT like Sun C does. */ +#if defined __SUNPRO_CC && !defined __RESTRICT +# define _Restrict +# define __restrict__ +#endif + /* Define to empty if the C compiler does not understand signed types. */ #undef signed diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32 index b76aad02676..b96e93328fe 100644 --- a/src/include/pg_config.h.win32 +++ b/src/include/pg_config.h.win32 @@ -681,6 +681,12 @@ #define inline __inline #endif +/* Define to the equivalent of the C99 'restrict' keyword, or to + nothing if this is not supported. Do not define if restrict is + supported directly. */ +/* wor
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Wed, Oct 11, 2017 at 11:31 AM, Wood, Dan wrote: > I found one glitch with our merge of the original dup row fix. With that > corrected AND Alvaro’s Friday fix things are solid. > No dup’s. No index corruption. > > Thanks so much. Nice to hear that! You guys seem to be doing extensive testing and actually report back about it, which is really nice to see. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
I found one glitch with our merge of the original dup row fix. With that corrected AND Alvaro’s Friday fix things are solid. No dup’s. No index corruption. Thanks so much. On 10/10/17, 7:25 PM, "Michael Paquier" wrote: On Tue, Oct 10, 2017 at 11:14 PM, Alvaro Herrera wrote: > I was seeing just the reindex problem. I don't see any more dups. > > But I've tried to reproduce it afresh now, and let it run for a long > time and nothing happened. Maybe I made a mistake last week and > ran an unfixed version. I don't see any more problems now. Okay, so that's one person more going to this trend, making three with Peter and I. >> If you are getting the dup rows consider the code in the block in >> heapam.c that starts with the comment “replace multi by update xid”. >> >> When I repro this I find that MultiXactIdGetUpdateXid() returns 0. >> There is an updater in the multixact array however the status is >> MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate. I >> assume this is a preliminary status before the following row in the >> hot chain has it’s multixact set to NoKeyUpdate. > > Yes, the "For" version is the locker version rather than the actual > update. That lock is acquired by EvalPlanQual locking the row just > before doing the update. I think GetUpdateXid has no reason to return > such an Xid, since it's not an update. > >> Since a 0 is returned this does precede cutoff_xid and >> TransactionIdDidCommit(0) will return false. This ends up aborting >> the multixact on the row even though the real xid is committed. This >> sets XMAX to 0 and that row becomes visible as one of the dups. >> Interestingly the real xid of the updater is 122944 and the cutoff_xid >> is 122945. > > I haven't seen this effect. Please keep us updated if you're able to > verify corruption this way. Me neither. It would be nice to not live long with such a sword of Damocles. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_regress help output
On 10/10/2017 07:06 PM, Tom Lane wrote: > Joe Conway writes: >> I have been annoyed at least twice now by the lack of pg_regress command >> line help output for the "--bindir=" option. In passing I noted >> that there was no output for "--help" or "--version" options either. > >> Any objections to the attached? > > +1 for documenting it, but the phrasing seems a bit awkward: > > ! printf(_(" --bindir=BINPATH use BINPATH for programs that > are run;\n")); > ! printf(_("if BINPATH empty, use PATH > from the environment\n")); > > Maybe just "if empty, use PATH ..." ? Ok, so like this? 8<-- --bindir=BINPATH use BINPATH for programs that are run;\n")); if empty, use PATH from the environment\n")); 8<-- > Also, why is the patch apparently changing whitespace in all the help > lines? Seems like that will create a lot of make-work for translators. I debated with myself about that. In other cases, e.g. initdb or psql, where we mix short+long options and long-only options, we indent the long-only options in the output to match up with the long-options of the mixed lines (whew, hopefully that is clear). Previously we were not showing mixed short+long options for pg_regress at all, and hence only indenting the long-only options minimally. But the addition of -h and -V (again consistent with other programs we ship), it made sense to be consistent in the way we indent. But I am fine with leaving the original lines output the way they were if preferred. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Tue, Oct 10, 2017 at 11:14 PM, Alvaro Herrera wrote: > I was seeing just the reindex problem. I don't see any more dups. > > But I've tried to reproduce it afresh now, and let it run for a long > time and nothing happened. Maybe I made a mistake last week and > ran an unfixed version. I don't see any more problems now. Okay, so that's one person more going to this trend, making three with Peter and I. >> If you are getting the dup rows consider the code in the block in >> heapam.c that starts with the comment “replace multi by update xid”. >> >> When I repro this I find that MultiXactIdGetUpdateXid() returns 0. >> There is an updater in the multixact array however the status is >> MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate. I >> assume this is a preliminary status before the following row in the >> hot chain has it’s multixact set to NoKeyUpdate. > > Yes, the "For" version is the locker version rather than the actual > update. That lock is acquired by EvalPlanQual locking the row just > before doing the update. I think GetUpdateXid has no reason to return > such an Xid, since it's not an update. > >> Since a 0 is returned this does precede cutoff_xid and >> TransactionIdDidCommit(0) will return false. This ends up aborting >> the multixact on the row even though the real xid is committed. This >> sets XMAX to 0 and that row becomes visible as one of the dups. >> Interestingly the real xid of the updater is 122944 and the cutoff_xid >> is 122945. > > I haven't seen this effect. Please keep us updated if you're able to > verify corruption this way. Me neither. It would be nice to not live long with such a sword of Damocles. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_regress help output
Joe Conway writes: > I have been annoyed at least twice now by the lack of pg_regress command > line help output for the "--bindir=" option. In passing I noted > that there was no output for "--help" or "--version" options either. > Any objections to the attached? +1 for documenting it, but the phrasing seems a bit awkward: ! printf(_(" --bindir=BINPATH use BINPATH for programs that are run;\n")); ! printf(_("if BINPATH empty, use PATH from the environment\n")); Maybe just "if empty, use PATH ..." ? Also, why is the patch apparently changing whitespace in all the help lines? Seems like that will create a lot of make-work for translators. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it time to kill support for very old servers?
On Wed, Oct 11, 2017 at 10:46 AM, Andres Freund wrote: > I'm not following. The "D" is in the 'dispchar' field, not the value > field, no? The default value is NULL? Oops, yes. I misread the code. Other debug options are not documented, so fine for me to not provide any documentation then. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it time to kill support for very old servers?
Hi, On 2017-10-11 10:40:11 +0900, Michael Paquier wrote: > >> + if (conn->forced_protocol_version != NULL) > >> + { > >> + conn->pversion = atoi(conn->forced_protocol_version); > >> + } > >> This should check for strlen > 0 as well. > > > > Why? Note that we don't do elsehwere in fe-connect.c. > > Because it seems to me that the default value of the parameter should > be an empty string instead of D. Feels more consistent with the > others. I'm not following. The "D" is in the 'dispchar' field, not the value field, no? The default value is NULL? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it time to kill support for very old servers?
On Wed, Oct 11, 2017 at 10:14 AM, Andres Freund wrote: > Hi, > > On 2017-10-11 10:09:34 +0900, Michael Paquier wrote: >> On Wed, Oct 11, 2017 at 9:39 AM, Andres Freund wrote: >> > On 2017-09-20 01:32:36 -0700, Andres Freund wrote: >> >> Coverage of the relevant files is a good bit higher afterwards. Although >> >> our libpq coverage is generally pretty damn awful. >> > >> > Any opinions on this? Obviously this needs some cleanup, but I'd like to >> > know whether we've concensus on adding a connection option for this goal >> > before investing more time into this. >> > >> > A nearby thread [1] whacks around some the v2 code, which triggered me >> > to look into this. I obviously can just use thiese patches to test those >> > patches during development, but it seems better to keep coverage. >> >> FWIW, I think that moving forward with such a possibility is a good >> thing, including having a connection parameter. This would pay in the >> long term if a new protocol version is added. > >> 0001 should document the new parameter. > > I'm actually inclined not to, and keep this as a undocumented debugging > option. Limiting the use of this option to people willing to read the > code seems like a good idea to me. It seems important to me to document things present. There is a section in the docs listing developer-only parameters for runtime configuration, why not separating "Parameter Key Words" into two sections then? One for the main parameters and one for developer-only parameters. >> + if (conn->forced_protocol_version != NULL) >> + { >> + conn->pversion = atoi(conn->forced_protocol_version); >> + } >> This should check for strlen > 0 as well. > > Why? Note that we don't do elsehwere in fe-connect.c. Because it seems to me that the default value of the parameter should be an empty string instead of D. Feels more consistent with the others. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it time to kill support for very old servers?
Hi, On 2017-10-11 10:09:34 +0900, Michael Paquier wrote: > On Wed, Oct 11, 2017 at 9:39 AM, Andres Freund wrote: > > On 2017-09-20 01:32:36 -0700, Andres Freund wrote: > >> Coverage of the relevant files is a good bit higher afterwards. Although > >> our libpq coverage is generally pretty damn awful. > > > > Any opinions on this? Obviously this needs some cleanup, but I'd like to > > know whether we've concensus on adding a connection option for this goal > > before investing more time into this. > > > > A nearby thread [1] whacks around some the v2 code, which triggered me > > to look into this. I obviously can just use thiese patches to test those > > patches during development, but it seems better to keep coverage. > > FWIW, I think that moving forward with such a possibility is a good > thing, including having a connection parameter. This would pay in the > long term if a new protocol version is added. > 0001 should document the new parameter. I'm actually inclined not to, and keep this as a undocumented debugging option. Limiting the use of this option to people willing to read the code seems like a good idea to me. > > + if (conn->forced_protocol_version != NULL) > + { > + conn->pversion = atoi(conn->forced_protocol_version); > + } > This should check for strlen > 0 as well. Why? Note that we don't do elsehwere in fe-connect.c. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it time to kill support for very old servers?
On Wed, Oct 11, 2017 at 9:39 AM, Andres Freund wrote: > On 2017-09-20 01:32:36 -0700, Andres Freund wrote: >> Coverage of the relevant files is a good bit higher afterwards. Although >> our libpq coverage is generally pretty damn awful. > > Any opinions on this? Obviously this needs some cleanup, but I'd like to > know whether we've concensus on adding a connection option for this goal > before investing more time into this. > > A nearby thread [1] whacks around some the v2 code, which triggered me > to look into this. I obviously can just use thiese patches to test those > patches during development, but it seems better to keep coverage. FWIW, I think that moving forward with such a possibility is a good thing, including having a connection parameter. This would pay in the long term if a new protocol version is added. 0001 should document the new parameter. + if (conn->forced_protocol_version != NULL) + { + conn->pversion = atoi(conn->forced_protocol_version); + } This should check for strlen > 0 as well. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_regress help output
I have been annoyed at least twice now by the lack of pg_regress command line help output for the "--bindir=" option. In passing I noted that there was no output for "--help" or "--version" options either. Any objections to the attached? It could be argued that it ought to be back-patched too, but I won't bother unless someone cares enough to request it. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index 7c628df..f3dcc1f 100644 *** a/src/test/regress/pg_regress.c --- b/src/test/regress/pg_regress.c *** help(void) *** 2004,2040 printf(_("Usage:\n %s [OPTION]... [EXTRA-TEST]...\n"), progname); printf(_("\n")); printf(_("Options:\n")); ! printf(_(" --config-auth=DATADIR update authentication settings for DATADIR\n")); ! printf(_(" --create-role=ROLEcreate the specified role before testing\n")); ! printf(_(" --dbname=DB use database DB (default \"regression\")\n")); ! printf(_(" --debug turn on debug mode in programs that are run\n")); ! printf(_(" --dlpath=DIR look for dynamic libraries in DIR\n")); ! printf(_(" --encoding=ENCODING use ENCODING as the encoding\n")); ! printf(_(" --inputdir=DIRtake input files from DIR (default \".\")\n")); ! printf(_(" --launcher=CMDuse CMD as launcher of psql\n")); ! printf(_(" --load-extension=EXT load the named extension before running the\n")); ! printf(_("tests; can appear multiple times\n")); ! printf(_(" --load-language=LANG load the named language before running the\n")); ! printf(_("tests; can appear multiple times\n")); ! printf(_(" --max-connections=N maximum number of concurrent connections\n")); ! printf(_("(default is 0, meaning unlimited)\n")); ! printf(_(" --max-concurrent-tests=N maximum number of concurrent tests in schedule\n")); ! printf(_("(default is 0, meaning unlimited)\n")); ! printf(_(" --outputdir=DIR place output files in DIR (default \".\")\n")); ! printf(_(" --schedule=FILE use test ordering schedule from FILE\n")); ! printf(_("(can be used multiple times to concatenate)\n")); ! printf(_(" --temp-instance=DIR create a temporary instance in DIR\n")); ! printf(_(" --use-existinguse an existing installation\n")); printf(_("\n")); printf(_("Options for \"temp-instance\" mode:\n")); ! printf(_(" --no-locale use C locale\n")); ! printf(_(" --port=PORT start postmaster on PORT\n")); ! printf(_(" --temp-config=FILEappend contents of FILE to temporary config\n")); printf(_("\n")); printf(_("Options for using an existing installation:\n")); ! printf(_(" --host=HOST use postmaster running on HOST\n")); ! printf(_(" --port=PORT use postmaster running at PORT\n")); ! printf(_(" --user=USER connect as USER\n")); printf(_("\n")); printf(_("The exit status is 0 if all tests passed, 1 if some tests failed, and 2\n")); printf(_("if the tests could not be run for some reason.\n")); --- 2004,2044 printf(_("Usage:\n %s [OPTION]... [EXTRA-TEST]...\n"), progname); printf(_("\n")); printf(_("Options:\n")); ! printf(_(" --bindir=BINPATH use BINPATH for programs that are run;\n")); ! printf(_("if BINPATH empty, use PATH from the environment\n")); ! printf(_(" --config-auth=DATADIR update authentication settings for DATADIR\n")); ! printf(_(" --create-role=ROLEcreate the specified role before testing\n")); ! printf(_(" --dbname=DB use database DB (default \"regression\")\n")); ! printf(_(" --debug turn on debug mode in programs that are run\n")); ! printf(_(" --dlpath=DIR look for dynamic libraries in DIR\n")); ! printf(_(" --encoding=ENCODING use ENCODING as the encoding\n")); ! printf(_(" -h, --helpshow this help, then exit\n")); ! printf(_(" --inputdir=DIRtake input files from DIR (default \".\")\n")); ! printf(_(" --launcher=CMDuse CMD as launcher of psql\n")); ! printf(_(" --load-extension=EXT load the named extension before running the\n")); ! printf(_("tests; can appear multiple times\n")); ! printf(_(" --load-language=LANG load the named language before running the\n")); ! printf(_("tests; can appear multiple times\n")); ! printf(_(" --max-connections=N maximum number of concurrent connections\n")); ! printf(_("
Re: [HACKERS] Is it time to kill support for very old servers?
On 2017-09-20 01:32:36 -0700, Andres Freund wrote: > On 2017-09-18 02:53:03 -0700, Andres Freund wrote: > > On 2017-09-13 23:39:21 -0400, Tom Lane wrote: > > > The real problem in this area, to my mind, is that we're not testing that > > > code --- either end of it --- in any systematic way. If it's broken it > > > could take us quite a while to notice. > > > > Independent of the thrust of my question - why aren't we adding a > > 'force-v2' option to libpq? A test that basically does something like > > postgres[22923][1]=# \setenv PGFORCEV2 1 > > postgres[22923][1]=# \c > > You are now connected to database "postgres" as user "andres". > > postgres[22924][1]=> > > seems easy enough to add, in fact I tested the above. > > > > And the protocol coverage of the v2 protocol seems small enough that a > > single not too large file ought to cover most if it quite easily. > > Here's what I roughly was thinking of. I don't quite like the name, and > the way the version is specified for libpq (basically just the "raw" > integer). Not sure if others have an opinion on that. I personally > would lean towards not documenting this option... > > There's a few things that I couldn't find easy ways to test: > - the v2 specific binary protocol - I don't quite see how we could test > that without writing C > - version error checks - pg_regress/psql errors out in non-interactive > mode if a connection fails to be established. This we could verify > with a s simple tap test. > > Coverage of the relevant files is a good bit higher afterwards. Although > our libpq coverage is generally pretty damn awful. Any opinions on this? Obviously this needs some cleanup, but I'd like to know whether we've concensus on adding a connection option for this goal before investing more time into this. A nearby thread [1] whacks around some the v2 code, which triggered me to look into this. I obviously can just use thiese patches to test those patches during development, but it seems better to keep coverage. Thanks, Andres [1] https://postgr.es/m/20170914063418.sckdzgjfrsbek...@alap3.anarazel.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On markers of changed data
Alvaro, * Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote: > Greg Stark wrote: > > > The general shape of what I would like to see is some log which lists > > where each checkpoint starts and ends and what blocks are modified > > since the previous checkpoint. Then to generate an incremental backup > > from any point in time to the current you union all the block lists > > between them and fetch those blocks. There are other ways of using > > this aside from incremental backups on disk too -- you could imagine a > > replica that has fallen behind requesting the block lists and then > > fetching just those blocks instead of needing to receive and apply all > > the wal. > > Hmm, this sounds pretty clever. And we already have the blocks touched > by each record thanks to the work for pg_rewind (so we don't have to do > any nasty tricks like the stuff Suzuki-san did for pg_lesslog, where > each WAL record had to be processed individually to know what blocks it > referenced), so it shouldn't be *too* difficult ... Yeah, it sounds interesting, but I was just chatting w/ David about it and we were thinking about how checkpoints are really rather often done, so you end up with quite a few of these lists being out there. Now, if the lists were always kept in a sorted fashion, then perhaps we would be able to essentially merge-sort them all back together and de-dup that way, but even then, you're talking about an awful lot if you're looking at daily incrementals- that's 288 standard 5-minute checkpoints, each with some 128k pages changed, assuming max_wal_size of 1GB, and I think we can all agree that the default max_wal_size is for rather small systems. That ends up being something around 2MB per checkpoint to store the pages in or half a gig per day just to keep track of the pages which changed in each checkpoint across that day. There's a bit of hand-waving in there, but I don't think it's all that much to reach a conclusion that this might not be the best approach. David and I were kicking around the notion of a 'last LSN' which is kept on a per-relation basis, but, of course, that ends up not really being granular enough, and would likely be a source of contention unless we could work out a way to make it "lazy" updated somehow, or similar. > > It would also be useful for going in the reverse direction: look up > > all the records (or just the last record) that modified a given block. > > Well, a LSN map is what I was suggesting. Not sure I entirely followed what you were getting at here..? Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] On markers of changed data
Greg Stark wrote: > The general shape of what I would like to see is some log which lists > where each checkpoint starts and ends and what blocks are modified > since the previous checkpoint. Then to generate an incremental backup > from any point in time to the current you union all the block lists > between them and fetch those blocks. There are other ways of using > this aside from incremental backups on disk too -- you could imagine a > replica that has fallen behind requesting the block lists and then > fetching just those blocks instead of needing to receive and apply all > the wal. Hmm, this sounds pretty clever. And we already have the blocks touched by each record thanks to the work for pg_rewind (so we don't have to do any nasty tricks like the stuff Suzuki-san did for pg_lesslog, where each WAL record had to be processed individually to know what blocks it referenced), so it shouldn't be *too* difficult ... > It would also be useful for going in the reverse direction: look up > all the records (or just the last record) that modified a given block. Well, a LSN map is what I was suggesting. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Extended statistics is not working on Vars hidden under a RelabelType
On 10/10/2017 05:03 AM, David Rowley wrote: > Basically, $subject is causing us not to properly find matching > extended stats in this case. > > The attached patch fixes it. > > The following test cases is an example of the misbehaviour. Note > rows=1 vs rows=98 in the Gather node. > Thanks for noticing this. The patch seems fine to me. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On markers of changed data
On 8 October 2017 at 08:52, Andrey Borodin wrote: > > 1. Any other marker would be better (It can be WAL scan during archiving, > some new LSN-based mechanics* et c.) The general shape of what I would like to see is some log which lists where each checkpoint starts and ends and what blocks are modified since the previous checkpoint. Then to generate an incremental backup from any point in time to the current you union all the block lists between them and fetch those blocks. There are other ways of using this aside from incremental backups on disk too -- you could imagine a replica that has fallen behind requesting the block lists and then fetching just those blocks instead of needing to receive and apply all the wal. Or possibly even making a cost-based decision between the two depending on which would be faster. It would also be useful for going in the reverse direction: look up all the records (or just the last record) that modified a given block. Instead of having to scan all the wal you would only need to scan the checkpoint eras that had touched that block. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Help required to debug pg_repack breaking logical replication
On 08/10/17 15:21, Craig Ringer wrote: > On 8 October 2017 at 02:37, Daniele Varrazzo > wrote: >> Hello, >> >> we have been reported, and I have experienced a couple of times, >> pg_repack breaking logical replication. >> >> - https://github.com/reorg/pg_repack/issues/135 >> - https://github.com/2ndQuadrant/pglogical/issues/113 > > Yeah, I was going to say I've seen reports of this with pglogical, but > I see you've linked to them. > > I haven't had a chance to look into it though, and haven't had a > suitable reproducible test case. > >> In the above issue #113, Petr Jelinek commented: >> >>> From quick look at pg_repack, the way it does table rewrite is almost >>> guaranteed >>> to break logical decoding unless there is zero unconsumed changes for a >>> given table >>> as it does not build the necessary mappings info for logical decoding that >>> standard >>> heap rewrite in postgres does. >> >> unfortunately he didn't follow up to further details requests. > > At a guess he's referring to src/backend/access/heap/rewriteheap.c . > > I'd explain better if I understood what was going on myself, but I > haven't really understood the logical decoding parts of that code. > >> - Is Petr diagnosis right and freezing of logical replication is to be >> blamed to missing mapping? >> - Can you suggest a test to reproduce the issue reliably? >> - What are mapped relations anyway? > > I can't immediately give you the answers you seek, but start by > studying src/backend/access/heap/rewriteheap.c . Notably > logical_end_heap_rewrite, logical_rewrite_heap_tuple, > logical_begin_heap_rewrite. > Yes that's exactly it. When table is rewritten we need to create mapping for every tuple that was created or removed (ie, insert, update or delete operation happened on it) since the oldest replication slot xmin for logical decoding to continue to work on that table after the rewrite. And pg_repack doesn't create that mapping. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] 10.0: Logical replication doesn't execute BEFORE UPDATE OF trigger
On 10/10/17 17:22, Aleksander Alekseev wrote: > Hi Petr, > >> let me start by saying that my view is that this is simply a >> documentation bug. Meaning that I didn't document that it does not work, >> but I also never intended it to work. Main reason is that we can't >> support the semantics of "UPDATE OF" correctly (see bellow). And I think >> it's better to not support something at all rather than making it behave >> differently in different cases. >> >> Now about the proposed patch, I don't think this is correct way to >> support this as it will only work when either PRIMARY KEY column was >> changed or when REPLICA IDENTITY is set to FULL for the table. And even >> then it will have very different semantics from how it works when the >> row is updated by SQL statement (all non-toasted columns will be >> reported as changed regardless of actually being changed or not). >> >> The more proper way to do this would be to run data comparison of the >> new tuple and the existing tuple values which a) will have different >> behavior than normal "UPDATE OF" triggers have because we still can't >> tell what columns were mentioned in the original query and b) will not >> exactly help performance (and perhaps c) one can write the trigger to >> behave this way already without impacting all other use-cases). > > I personally think that solution proposed by Masahiko is much better > than current behavior. Well the proposed patch adds inconsistent behavior - if in your test you'd change the trigger to be UPDATE OF v instead of k and updated v, it would still not be triggered even with this patch. That's IMHO worse than current behavior which at least consistently doesn't work. > We could document current limitations you've > mentioned and probably add a corresponding warning during execution of > ALTER TABLE ... ENABLE REPLICA TRIGGER. I don't insist on this > particular approach though. > > What I really don't like is that PostgreSQL allows to create something > that supposedly should work but in fact doesn't. Such behavior is > obviously a bug. So as an alternative we could just return an error in > response to ALTER TABLE ... ENABLE REPLICA TRIGGER query for triggers > that we know will never be executed. > Well ENABLE REPLICA is not specific to builtin logical replication though. It only depends on the value of session_replication_role GUC. Any session can set that and if that session executes SQL the UPDATE OF triggers will work as expected. It's similar situation to statement triggers IMHO, we allow ENABLE REPLICA statement triggers but they are not triggered by logical replication because there is no statement. And given that UPDATE OF also depends on statement to work as expected (it's specified to be triggered for columns listed in the statement, not for what has been actually changed by the execution) I don't really see the difference between this and statement triggers except that statement trigger behavior is documented. I understand that it's somewhat confusing and I am not saying it's ideal, but I don't see any other behavior that would work for what your test tries to do and still be consistent with rest of the system. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How does postgres store the join predicate for a relation in a given query
On Tue, Oct 10, 2017 at 07:29:24PM +0530, Gourav Kumar wrote: > When you fire a query in postgresql, it will first parse the query and > create the data structures for storing various aspects of the query and > executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.). > > I want to know how does postgresql stores the join predicates of a query. > Like which data structure is used to store the join predicates. > > How can we find the join predicates applied on a relation from relid, Oid > or RangeTblEntry ? > > I want to construct a join graph for a given query, for which I need the > join predicates between two relations. In the usingClause or quals fields of a JoinExpr. See src/backend/parser/gram.y, search for join_qual. Of course, WHERE clauses have to be inspected as well, which go into the whereClause of of a SelectStmt; search for where_clause in src/backend/parser/gram.y. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] 10.0: Logical replication doesn't execute BEFORE UPDATE OF trigger
Hi Petr, > let me start by saying that my view is that this is simply a > documentation bug. Meaning that I didn't document that it does not work, > but I also never intended it to work. Main reason is that we can't > support the semantics of "UPDATE OF" correctly (see bellow). And I think > it's better to not support something at all rather than making it behave > differently in different cases. > > Now about the proposed patch, I don't think this is correct way to > support this as it will only work when either PRIMARY KEY column was > changed or when REPLICA IDENTITY is set to FULL for the table. And even > then it will have very different semantics from how it works when the > row is updated by SQL statement (all non-toasted columns will be > reported as changed regardless of actually being changed or not). > > The more proper way to do this would be to run data comparison of the > new tuple and the existing tuple values which a) will have different > behavior than normal "UPDATE OF" triggers have because we still can't > tell what columns were mentioned in the original query and b) will not > exactly help performance (and perhaps c) one can write the trigger to > behave this way already without impacting all other use-cases). I personally think that solution proposed by Masahiko is much better than current behavior. We could document current limitations you've mentioned and probably add a corresponding warning during execution of ALTER TABLE ... ENABLE REPLICA TRIGGER. I don't insist on this particular approach though. What I really don't like is that PostgreSQL allows to create something that supposedly should work but in fact doesn't. Such behavior is obviously a bug. So as an alternative we could just return an error in response to ALTER TABLE ... ENABLE REPLICA TRIGGER query for triggers that we know will never be executed. -- Best regards, Aleksander Alekseev signature.asc Description: PGP signature
Re: [HACKERS] Proposal: Local indexes for partitioned table
Robert Haas wrote: > On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera > wrote: > > 2. create one index for each existing partition. These would be > >identical to what would happen if you created the index directly on > >each partition, except that there is an additional dependency to the > >parent's abstract index. > > One thing I'm a bit worried about is how to name these subordinate > indexes. They have to have names because that's how pg_class works, > and those names can't all be the same, again because that's how > pg_class works. There's no problem right away when you first create > the partitioned index, because you can just pick names out of a hat > using whatever name-generating algorithm seems best. However, when > you dump-and-restore (including but not limited to the pg_upgrade > case) you've got to preserve those names. If you just generate a new > name that may or may not be the same as the old one, then it may > collide with a user-specified name that only occurs later in the dump. > Also, you'll have trouble if the user has applied a COMMENT or a > SECURITY LABEL to the index because that command works by name, or if > the user has a reference to the index name inside a function or > whatever. > > These are pretty annoying corner-case bugs because they're not likely > to come up very often. Most people won't notice or care if the index > name changes. But I don't think it's acceptable to just ignore the > problem. I agree it's a problem that needs to be addressed directly. > An idea I had was to treat the abstract index - to use your > term - sort of the way we treat an extension. Normally, when you > create an index on a partitioned table, it cascades, but for dump and > restore purpose, we tag on some syntax that says "well, don't actually > create the subordinate indexes, i'll tell you about those later". > Then for each subordinate index we issue a separate CREATE INDEX > command followed by ALTER INDEX abstract_index ATTACH PARTITION > concrete_index or something of that sort. That means you can't > absolutely count on the parent index to have all of the children it's > supposed to have but maybe that's OK. Hmm ... yeah, ATTACH and DETACH sound acceptable to me. On DETACH, the abstract index should be marked indisvalid=false unless a substitute index already exists; and on ATTACH when indisvalid=false we verify that all local indexes exist, and if so we can flip indisvalid. That way, we can continue to rely on the parent index always having all its children when the flag is set. I'm not clear on a syntax that creates the main index and hopes to later have the sub-indexes created. Another approach is to do it the other way around, i.e. create the children first, then once they're all in place create the main one normally, which merely verifies that all the requisite children exist. This is related to what I proposed to occur when a regular table is joined as a partition of the partitioned table: we run a verification that an index matching the parent's abstract indexes exists, and if not we raise an error. (Alternatively we could allow the case, and mark the abstract index as indisvalid=false, but that seems to violate POLA). > Another thing that would let you do is CREATE INDEX CONCURRENTLY > replacement_concrete_index; ALTER INDEX abstract_index DETACH > PARTITION old_concrete_index, ATTACH PARTITION > replacement_concrete_index; DROP INDEX CONCURRENTLY > old_concrete_index, which seems like a thing someone might want to do. Yeah, this is a point I explicitly mentioned, and this proposal seems like a good way. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Wood, Dan wrote: > I’m unclear on what is being repro’d in 9.6. Are you getting the > duplicate rows problem or just the reindex problem? Are you testing > with asserts enabled(I’m not)? I was seeing just the reindex problem. I don't see any more dups. But I've tried to reproduce it afresh now, and let it run for a long time and nothing happened. Maybe I made a mistake last week and ran an unfixed version. I don't see any more problems now. > If you are getting the dup rows consider the code in the block in > heapam.c that starts with the comment “replace multi by update xid”. > > When I repro this I find that MultiXactIdGetUpdateXid() returns 0. > There is an updater in the multixact array however the status is > MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate. I > assume this is a preliminary status before the following row in the > hot chain has it’s multixact set to NoKeyUpdate. Yes, the "For" version is the locker version rather than the actual update. That lock is acquired by EvalPlanQual locking the row just before doing the update. I think GetUpdateXid has no reason to return such an Xid, since it's not an update. > Since a 0 is returned this does precede cutoff_xid and > TransactionIdDidCommit(0) will return false. This ends up aborting > the multixact on the row even though the real xid is committed. This > sets XMAX to 0 and that row becomes visible as one of the dups. > Interestingly the real xid of the updater is 122944 and the cutoff_xid > is 122945. I haven't seen this effect. Please keep us updated if you're able to verify corruption this way. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] How does postgres store the join predicate for a relation in a given query
Hi all, When you fire a query in postgresql, it will first parse the query and create the data structures for storing various aspects of the query and executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.). I want to know how does postgresql stores the join predicates of a query. Like which data structure is used to store the join predicates. How can we find the join predicates applied on a relation from relid, Oid or RangeTblEntry ? I want to construct a join graph for a given query, for which I need the join predicates between two relations. -- Thanks, Gourav Kumar
[HACKERS] Re: [BUGS] 10.0: Logical replication doesn't execute BEFORE UPDATE OF trigger
On 10/10/17 09:53, Masahiko Sawada wrote: > On Tue, Oct 10, 2017 at 11:29 AM, Masahiko Sawada > wrote: >> On Mon, Oct 9, 2017 at 11:13 PM, Aleksander Alekseev >> wrote: >>> Hi hackers, >>> >>> I've found something that looks like a bug. >>> >>> Steps to reproduce >>> -- >>> >>> There are 3 instances of PostgreSQL 10.0 - inst1, inst2 and inst3. There >>> is a table `test` on every instance: >>> >>> ``` >>> CREATE TABLE test(k TEXT PRIMARY KEY, v TEXT); >>> ``` >>> >>> Both inst1 and inst2 have `allpub` publication: >>> >>> ``` >>> CREATE PUBLICATION allpub FOR ALL TABLES; >>> ``` >>> >>> ... and inst3 is subscribed for both publications: >>> >>> ``` >>> CREATE SUBSCRIPTION allsub1 >>> CONNECTION 'host=10.128.0.16 user=eax dbname=eax' >>> PUBLICATION allpub; >>> >>> CREATE SUBSCRIPTION allsub2 >>> CONNECTION 'host=10.128.0.26 user=eax dbname=eax' >>> PUBLICATION allpub; >>> ``` >>> >>> So basically it's two masters, one replica configuration. To resolve >>> insert/update conflicts I've created the following triggers on inst3: >>> >>> ``` >>> CREATE OR REPLACE FUNCTION test_before_insert() >>> RETURNS trigger AS $$ >>> BEGIN >>> >>> RAISE NOTICE 'test_before_insert trigger executed'; >>> >>> IF EXISTS (SELECT 1 FROM test where k = new.k) THEN >>>RAISE NOTICE 'test_before_insert trigger - merging data'; >>>UPDATE test SET v = v || ';' || new.v WHERE k = new.k; >>>RETURN NULL; >>> END IF; >>> >>> RETURN NEW; >>> >>> END >>> $$ LANGUAGE plpgsql; >>> >>> >>> CREATE OR REPLACE FUNCTION test_before_update() >>> RETURNS trigger AS $$ >>> BEGIN >>> >>> RAISE NOTICE 'test_before_update trigger executed'; >>> >>> IF EXISTS (SELECT 1 FROM test where k = new.k) THEN >>>RAISE NOTICE 'test_before_update trigger - merging data'; >>>UPDATE test SET v = v || ';' || new.v WHERE k = new.k; >>>DELETE FROM test where k = old.k; >>>RETURN NULL; >>> END IF; >>> >>> RETURN NEW; >>> >>> END >>> $$ LANGUAGE plpgsql; >>> >>> create trigger test_before_insert_trigger >>> before insert on test >>> for each row execute procedure test_before_insert(); >>> >>> create trigger test_before_update_trigger >>> before update of k on test >>> for each row execute procedure test_before_update(); >>> >>> ALTER TABLE test ENABLE REPLICA TRIGGER test_before_insert_trigger; >>> ALTER TABLE test ENABLE REPLICA TRIGGER test_before_update_trigger; >>> ``` >>> >>> The INSERT trigger works just as expected, however the UPDATE trigger >>> doesn't. On inst1: >>> >>> ``` >>> insert into test values ('k1', 'v1'); >>> ``` >>> >>> In inst2: >>> >>> ``` >>> insert into test values ('k4', 'v4'); >>> update test set k = 'k1' where k = 'k4'; >>> ``` >>> >>> Now on inst3: >>> >>> ``` >>> select * from test; >>> ``` >>> >>> Expected result >>> --- >>> >>> Rows are merged to: >>> >>> ``` >>> k | v >>> +--- >>> k1 | v1;v4 >>> ``` >>> >>> This is what would happen if all insert/update queries would have been >>> executed on one instance. >>> >>> Actual result >>> - >>> >>> Replication fails, log contains: >>> >>> ``` >>> [3227] ERROR: duplicate key value violates unique constraint "test_pkey" >>> [3227] DETAIL: Key (k)=(k1) already exists. >>> [3176] LOG: worker process: logical replication worker for subscription >>> 16402 (PID 3227) exited with exit code 1 >>> ``` >>> >>> What do you think? >>> >> >> I think the cause of this issue is that the apply worker doesn't set >> updatedCols of RangeTblEntry when applying updates. So TriggerEnabled >> always ends up with false. I'll make a patch and submit. >> > > Attached patch store the updated columns bitmap set to RangeTblEntry. > In my environment this bug seems to be fixed by the patch. > Hi, let me start by saying that my view is that this is simply a documentation bug. Meaning that I didn't document that it does not work, but I also never intended it to work. Main reason is that we can't support the semantics of "UPDATE OF" correctly (see bellow). And I think it's better to not support something at all rather than making it behave differently in different cases. Now about the proposed patch, I don't think this is correct way to support this as it will only work when either PRIMARY KEY column was changed or when REPLICA IDENTITY is set to FULL for the table. And even then it will have very different semantics from how it works when the row is updated by SQL statement (all non-toasted columns will be reported as changed regardless of actually being changed or not). The more proper way to do this would be to run data comparison of the new tuple and the existing tuple values which a) will have different behavior than normal "UPDATE OF" triggers have because we still can't tell what columns were mentioned in the original query and b) will not exactly help performance (and perhaps c) one can write the trigger to behave this way already without impacting all other use-cases). -- Petr Jelinek http://www.2ndQua
Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256
On Tue, Sep 26, 2017 at 11:09 AM, Michael Paquier wrote: > On Mon, Sep 25, 2017 at 11:22 PM, Peter Eisentraut > wrote: >> I think the channel-binding negotiation on the client side is wrong. >> The logic in the patch is >> >> +#ifdef USE_SSL >> + if (state->ssl_in_use) >> + appendPQExpBuffer(&buf, "p=%s", SCRAM_CHANNEL_TLS_UNIQUE); >> + else >> + appendPQExpBuffer(&buf, "y"); >> +#else >> + appendPQExpBuffer(&buf, "n"); >> +#endif >> >> But if SSL is compiled in but not used, the client does not in fact >> support channel binding (for that connection), so it should send "n". > > For others, details about this flag are here: >gs2-cbind-flag = ("p=" cb-name) / "n" / "y" > ;; "n" -> client doesn't support channel binding. > ;; "y" -> client does support channel binding > ;;but thinks the server does not. > ;; "p" -> client requires channel binding. > ;; The selected channel binding follows "p=". > > And channel binding description is here: > https://tools.ietf.org/html/rfc5802#section-6 Changed the patch to do that. Note that if the client enforces the SASL mechanism to SCRAM-SHA-256-PLUS in a non-SSL context then libpq complains. This can be done in libpq using pgsaslname. >> The "y" flag should be sent if ssl_in_use but the client did not see the >> server advertise SCRAM-SHA256-PLUS. That logic is missing entirely in >> this patch. > > Okay. I think I get your point here. I agree that the client is > deficient here. This needs some more work. Hm. I take back the argument that we can use the backend version here. conn->sversion is only filled at the end of authentication. So the best thing I think can be done here is to check if channel binding has been advertised or not, and send "y" if the client thinks that the server should do support it. If the client has chosen not to use channel binding, by for example enforcing pgsaslname in libpq to SCRAM-SHA-256, then "n" should be sent. I have changed the patch accordingly, doing at the same time some refactoring in pg_SASL_init. pg_fe_scram_init() gets initialized only when the mechanism is selected. >> You have the server reject a client that does not support channel >> binding ("n") on all SSL connections. I don't think this is correct. >> It is up to the client to use channel binding or not, even on SSL >> connections. > > It seems that I got confused with the meaning of "y" mixed with > ssl_in_use. The server should reject "y" instead only if SSL is in > use. Okay. In order to address that, what just needs to be done is to remove the error message that I have added in the backend when "n" is sent by the client. So this thing is ripped off. This way, when a v10 libpq connects to a v11 backend, the connection is able to work even with SSL connections. >> We should update pg_hba.conf to allow a method specification of >> "scram-sha256-plus", i.e., only advertise the channel binding variant to >> the client. Then you could make policy decisions like rejecting clients >> that do not use channel binding on SSL connections. This could be a >> separate patch later. > > OK, I agree that there could be some value in that. This complicates a > bit hba rule checks, but nothing really complicated either. This would be useful, but I have let it for now to not complicate the series more. >> The error message in the "p" case if SSL is not in use is a bit >> confusing: "but the server does not need it". I think this could be >> left at the old message "but it is not supported". This ties into my >> interpretation from above that whether channel binding is "supported" >> depends on whether SSL is in use for a particular connection. > > Check. Okay, I switched the error message to that (diff with previous patch series): @@ -850,7 +850,7 @@ read_client_first_message(scram_state *state, char *input) if (!state->ssl_in_use) ereport(ERROR, (errcode(ERRCODE_PROTOCOL_VIOLATION), -errmsg("client supports SCRAM channel binding, but server does not need it for non-SSL connections"))); +errmsg("client requires SCRAM channel binding, but it is not supported"))); >> Some small code things: >> - prefer to use size_t over int for length (tls_finish_len etc.) >> - tls_finish should be tls_finished >> - typos: certificate_bash -> certificate_hash > > Yes, thanks for spotting those. Fixed. >> In the patch for tls-server-end-point, I think the selection of the hash >> function deviates slightly from the RFC. The RFC only says to >> substitute MD5 and SHA-1. It doesn't say to turn SHA-224 into SHA-256, >> for example. There is also the problem that the code as written will >> turn any unrecognized hash method into SHA-256. If think the code >> should single out MD5 and SHA-1 only and then use EVP_get_digestbynid() >> for
Re: [HACKERS] [POC] hash partitioning
On Tue, Oct 10, 2017 at 3:42 PM, Ashutosh Bapat wrote: > On Tue, Oct 10, 2017 at 3:32 PM, amul sul wrote: > >>> +hash_part? true : key->parttypbyval[j], >>> +key->parttyplen[j]); >>> parttyplen is the length of partition key attribute, whereas what you want >>> here >>> is the length of type of modulus and remainder. Is that correct? Probably we >>> need some special handling wherever parttyplen and parttypbyval is used >>> e.g. in >>> call to partition_bounds_equal() from build_joinrel_partition_info(). >>> >> >> Unless I am missing something, I don't think we should worry about parttyplen >> because in the datumCopy() when the datatype is pass-by-value then typelen >> is ignored. > > That's true, but it's ugly, passing typbyvalue of one type and len of other. > How about the attached patch(0003)? Also, the dim variable is renamed to natts. Regards, Amul 0001-partition_bounds_copy-code-refactoring-v1.patch Description: Binary data 0002-hash-partitioning_another_design-v24.patch Description: Binary data 0003-Enable-partition-wise-join-support-v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] 10.0: Logical replication doesn't execute BEFORE UPDATE OF trigger
Hi Masahiko, > > I think the cause of this issue is that the apply worker doesn't set > > updatedCols of RangeTblEntry when applying updates. So TriggerEnabled > > always ends up with false. I'll make a patch and submit. > > > > Attached patch store the updated columns bitmap set to RangeTblEntry. > In my environment this bug seems to be fixed by the patch. Thanks a lot for a quick response. I can confirm that your patch fixes the issue and passes all tests. Hopefully someone will merge it shortly. Here is another patch from me. It adds a corresponding TAP test. Before applying your patch: ``` t/001_rep_changes.pl .. ok t/002_types.pl ok t/003_constraints.pl .. 1/5 # Failed test 'check replica trigger with specified list of affected columns applied on subscriber' # at t/003_constraints.pl line 151. # got: 'k2|v1' # expected: 'k2|v1 # triggered|true' # Looks like you failed 1 test of 5. t/003_constraints.pl .. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/5 subtests t/004_sync.pl . ok t/005_encoding.pl . ok t/006_rewrite.pl .. ok t/007_ddl.pl .. ok ``` After: ``` t/001_rep_changes.pl .. ok t/002_types.pl ok t/003_constraints.pl .. ok t/004_sync.pl . ok t/005_encoding.pl . ok t/006_rewrite.pl .. ok t/007_ddl.pl .. ok All tests successful. ``` -- Best regards, Aleksander Alekseev diff --git a/src/test/subscription/t/003_constraints.pl b/src/test/subscription/t/003_constraints.pl index 06863aef84..f1fc5ae863 100644 --- a/src/test/subscription/t/003_constraints.pl +++ b/src/test/subscription/t/003_constraints.pl @@ -3,7 +3,7 @@ use strict; use warnings; use PostgresNode; use TestLib; -use Test::More tests => 4; +use Test::More tests => 5; # Initialize publisher node my $node_publisher = get_new_node('publisher'); @@ -21,6 +21,9 @@ $node_publisher->safe_psql('postgres', $node_publisher->safe_psql('postgres', "CREATE TABLE tab_fk_ref (id int PRIMARY KEY, bid int REFERENCES tab_fk (bid));" ); +$node_publisher->safe_psql('postgres', +"CREATE TABLE tab_upd_tst (k TEXT PRIMARY KEY, v TEXT);" +); # Setup structure on subscriber $node_subscriber->safe_psql('postgres', @@ -28,6 +31,9 @@ $node_subscriber->safe_psql('postgres', $node_subscriber->safe_psql('postgres', "CREATE TABLE tab_fk_ref (id int PRIMARY KEY, bid int REFERENCES tab_fk (bid));" ); +$node_subscriber->safe_psql('postgres', +"CREATE TABLE tab_upd_tst (k TEXT PRIMARY KEY, v TEXT);" +); # Setup logical replication my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres'; @@ -112,5 +118,38 @@ $result = $node_subscriber->safe_psql('postgres', "SELECT count(*), min(bid), max(bid) FROM tab_fk_ref;"); is($result, qq(2|1|2), 'check replica trigger applied on subscriber'); +# Add replica trigger with specified list of affected columns +$node_subscriber->safe_psql( + 'postgres', qq{ +CREATE OR REPLACE FUNCTION upd_fn() +RETURNS trigger AS \$\$ +BEGIN +INSERT INTO tab_upd_tst VALUES ('triggered', 'true'); +RETURN NEW; +END +\$\$ LANGUAGE plpgsql; + +CREATE TRIGGER upd_trg +BEFORE UPDATE OF k ON tab_upd_tst +FOR EACH ROW EXECUTE PROCEDURE upd_fn(); + +ALTER TABLE tab_upd_tst ENABLE REPLICA TRIGGER upd_trg; +}); + +# Insert data +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_upd_tst (k, v) VALUES ('k1', 'v1');"); +$node_publisher->safe_psql('postgres', + "UPDATE tab_upd_tst SET k = 'k2' WHERE k = 'k1';"); + +$node_publisher->poll_query_until('postgres', $caughtup_query) + or die "Timed out while waiting for subscriber to catch up"; + +# Trigger should be executed +$result = + $node_subscriber->safe_psql('postgres', "SELECT k, v FROM tab_upd_tst ORDER BY k"); +is( $result, qq(k2|v1 +triggered|true), 'check replica trigger with specified list of affected columns applied on subscriber'); + $node_subscriber->stop('fast'); $node_publisher->stop('fast'); signature.asc Description: PGP signature
[HACKERS] More stats about skipped vacuums
Hello. Once in a while I am asked about table bloat. In most cases the cause is long lasting transactions and vacuum canceling in some cases. Whatever the case users don't have enough clues to why they have bloated tables. At the top of the annoyances list for users would be that they cannot know whether autovacuum decided that a table needs vacuum or not. I suppose that it could be shown in pg_stat_*_tables. n_mod_since_analyze | 2 + vacuum_requred | true last_vacuum | 2017-10-10 17:21:54.380805+09 If vacuum_required remains true for a certain time, it means that vacuuming stopped halfway or someone is killing it repeatedly. That status could be shown in the same view. n_mod_since_analyze | 2 + vacuum_requred | true last_vacuum | 2017-10-10 17:21:54.380805+09 last_autovacuum | 2017-10-10 17:21:54.380805+09 + last_autovacuum_status | Killed by lock conflict Where the "Killed by lock conflict" would be one of the followings. - Completed (oldest xmin = 8023) - May not be fully truncated (yielded at 1324 of 6447 expected) - Truncation skipped - Skipped by lock failure - Killed by lock conflict If we want more formal expression, we can show the values in the following shape. And adding some more values could be useful. n_mod_since_analyze | 2 + vacuum_requred | true + last_vacuum_oldest_xid | 8023 + last_vacuum_left_to_truncate | 5123 + last_vacuum_truncated| 387 last_vacuum | 2017-10-10 17:21:54.380805+09 last_autovacuum | 2017-10-10 17:21:54.380805+09 + last_autovacuum_status | Killed by lock conflict ... autovacuum_count | 128 + incomplete_autovacuum_count | 53 # The last one might be needless.. Where the "Killed by lock conflict" is one of the followings. - Completed - Truncation skipped - Partially truncated - Skipped - Killed by lock conflict This seems enough to find the cause of a table bloat. The same discussion could be applied to analyze but it might be the another issue. There may be a better way to indicate the vacuum soundness. Any opinions and suggestions are welcome. I'm going to make a patch to do the 'formal' one for the time being. regards. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] hash partitioning
On Tue, Oct 10, 2017 at 3:40 PM, amul sul wrote: >> >> natts represents the number of attributes, but for the hash partition bound >> we >> are not dealing with the attribute so that I have used short-form of >> dimension, >> thoughts? > > Okay, I think the dimension(dim) is also unfit here. Any suggestions? > I think natts is ok, since we are dealing with the number of attributes in the pack of datums; esp. when ndatums is already taken. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] hash partitioning
On Tue, Oct 10, 2017 at 3:32 PM, amul sul wrote: >> +hash_part? true : key->parttypbyval[j], >> +key->parttyplen[j]); >> parttyplen is the length of partition key attribute, whereas what you want >> here >> is the length of type of modulus and remainder. Is that correct? Probably we >> need some special handling wherever parttyplen and parttypbyval is used e.g. >> in >> call to partition_bounds_equal() from build_joinrel_partition_info(). >> > > Unless I am missing something, I don't think we should worry about parttyplen > because in the datumCopy() when the datatype is pass-by-value then typelen > is ignored. That's true, but it's ugly, passing typbyvalue of one type and len of other. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] hash partitioning
On Tue, Oct 10, 2017 at 3:32 PM, amul sul wrote: > On Mon, Oct 9, 2017 at 5:51 PM, Ashutosh Bapat > wrote: >> On Mon, Oct 9, 2017 at 4:44 PM, amul sul wrote: >> > > Thanks Ashutosh for your review, please find my comment inline. > >> >>> 0002 few changes in partition-wise join code to support >>> hash-partitioned table as well & regression tests. >> >> +switch (key->strategy) >> +{ >> +case PARTITION_STRATEGY_HASH: >> +/* >> + * Indexes array is same as the greatest modulus. >> + * See partition_bounds_equal() for more explanation. >> + */ >> +num_indexes = DatumGetInt32(src->datums[ndatums - 1][0]); >> +break; >> This logic is duplicated at multiple places. I think it's time we >> consolidate >> these changes in a function/macro and call it from the places where we have >> to >> calculate number of indexes based on the information in partition descriptor. >> Refactoring existing code might be a separate patch and then add hash >> partitioning case in hash partitioning patch. >> > > Make sense, added get_partition_bound_num_indexes() to get number of index > elements in 0001 & get_greatest_modulus() as name suggested to get the > greatest > modulus of the hash partition bound in 0002. > >> +intdim = hash_part? 2 : partnatts; >> Call the variable as natts_per_datum or just natts? >> > > natts represents the number of attributes, but for the hash partition bound we > are not dealing with the attribute so that I have used short-form of > dimension, > thoughts? Okay, I think the dimension(dim) is also unfit here. Any suggestions? > >> +hash_part? true : key->parttypbyval[j], >> +key->parttyplen[j]); >> parttyplen is the length of partition key attribute, whereas what you want >> here >> is the length of type of modulus and remainder. Is that correct? Probably we >> need some special handling wherever parttyplen and parttypbyval is used e.g. >> in >> call to partition_bounds_equal() from build_joinrel_partition_info(). >> > > Unless I am missing something, I don't think we should worry about parttyplen > because in the datumCopy() when the datatype is pass-by-value then typelen > is ignored. > > Regards, > Amul -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] hash partitioning
On Mon, Oct 9, 2017 at 5:51 PM, Ashutosh Bapat wrote: > On Mon, Oct 9, 2017 at 4:44 PM, amul sul wrote: > Thanks Ashutosh for your review, please find my comment inline. > >> 0002 few changes in partition-wise join code to support >> hash-partitioned table as well & regression tests. > > +switch (key->strategy) > +{ > +case PARTITION_STRATEGY_HASH: > +/* > + * Indexes array is same as the greatest modulus. > + * See partition_bounds_equal() for more explanation. > + */ > +num_indexes = DatumGetInt32(src->datums[ndatums - 1][0]); > +break; > This logic is duplicated at multiple places. I think it's time we consolidate > these changes in a function/macro and call it from the places where we have to > calculate number of indexes based on the information in partition descriptor. > Refactoring existing code might be a separate patch and then add hash > partitioning case in hash partitioning patch. > Make sense, added get_partition_bound_num_indexes() to get number of index elements in 0001 & get_greatest_modulus() as name suggested to get the greatest modulus of the hash partition bound in 0002. > +intdim = hash_part? 2 : partnatts; > Call the variable as natts_per_datum or just natts? > natts represents the number of attributes, but for the hash partition bound we are not dealing with the attribute so that I have used short-form of dimension, thoughts? > +hash_part? true : key->parttypbyval[j], > +key->parttyplen[j]); > parttyplen is the length of partition key attribute, whereas what you want > here > is the length of type of modulus and remainder. Is that correct? Probably we > need some special handling wherever parttyplen and parttypbyval is used e.g. > in > call to partition_bounds_equal() from build_joinrel_partition_info(). > Unless I am missing something, I don't think we should worry about parttyplen because in the datumCopy() when the datatype is pass-by-value then typelen is ignored. Regards, Amul 0001-partition_bounds_copy-code-refactoring-v1.patch Description: Binary data 0002-hash-partitioning_another_design-v24.patch Description: Binary data 0003-Enable-partition-wise-join-support-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise aggregation/grouping
On Tue, Oct 10, 2017 at 1:31 PM, David Rowley wrote: > > I don't think there's any need to invent any new GUC. You could just > divide cpu_tuple_cost by something. > > I did a quick benchmark on my laptop to see how much Append really > costs, and with the standard costs the actual cost seems to be about > cpu_tuple_cost / 2.4. So probably cpu_tuple_cost / 2 might be > realistic. create_set_projection_path() does something similar and > brincostestimate() does some similar magic and applies 0.1 * > cpu_operator_cost to the total cost. > > > # -- How does that compare to the cpu_tuple_cost? > # select current_Setting('cpu_tuple_cost')::float8 / 0.00416630302337493743; > ?column? > > 2.400209476818 > (1 row) > > Maybe it's worth trying with different row counts to see if the > additional cost is consistent, but it's probably not worth being too > critical here. > This looks good to me. I think it should be a separate, yet very small patch. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GUC for cleanup indexes threshold.
Hello, Darafei. You wrote: DP> The following review has been posted through the commitfest application: DP> make installcheck-world: tested, passed DP> Implements feature: tested, passed DP> Spec compliant: tested, passed DP> Documentation:tested, passed DP> We're using Postgres with this patch for some time. DP> In our use case we've got a quickly growing large table with events from our users. DP> Table has a structure of (user_id, ts, ). Events are DP> append only, each user generates events in small predictable time frame, mostly each second. DP> From time to time we need to read this table in fashion of WHERE DP> ts BETWEEN a AND b AND user_id=c. DP> Such query leads to enormous amount of seeks, as records of each DP> user are scattered across relation and there are no pages that DP> contain two events from same user. DP> To fight it, we created a btree index on (user_id, ts, DP> ). Plan switched to index only scans, but heap fetches DP> and execution times were still the same. DP> Manual DP> We noticed that autovacuum skips scanning the relation and freezing the Visibility Map. DP> We started frequently performing VACUUM manually on the relation. DP> This helped with freezing the Visibility Map. DP> However, we found out that VACUUM makes a full scan over the index. DP> As index does not fit into memory, this means that each run DP> flushes all the disk caches and eats up Amazon IOPS credits. DP> With this patch behavior is much better for us - VACUUM finishes real quick. DP> As a future improvement, a similar improvement for other index types will be useful. DP> After it happens, I'm looking forward to autovacuum kicking in on DP> append-only tables, to freeze the Visibility Map. DP> The new status of this patch is: Ready for Committer Seems like, we may also going to hit it and it would be cool this vacuum issue solved for next PG version. -- With best wishes, Pavel mailto:pa...@gf.microolap.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Columnar storage support
Unfortunately C-Store doesn't allow to take all advantages of columnar store: you still not be able to perform vector operation.s C-Store allows to reduce size of data read from the disk because of 1. fetching only columns which are used in the query, 2. data compression. It will lead to some benefits in query execution speed for cold data (when it is not loaded in cache). For warm data there is almost no difference (except very huge tables which can not fit in memory). But the main advantage of vertical data format - vector data processing - is possible only with specialized executor. There is prototype of vector executor for C-Store: https://github.com/citusdata/postgres_vectorization_test It provides 3-4x speedup of some queries, but it is really prototype and research project, for from practical usage. I have also developed two columnar store extensions for Postgres: IMCS (In-Memory-Columnar-Store): https://github.com/knizhnik/imcs.git VOPS (Vectorized Operations): https://github.com/postgrespro/vops.git First one is more oriented on in-memory databases (although support spilling data to the disk) and requires to use special functions to manipulate with columnar data. In this case columnar store is copy of main (horizontal) store (normal Postgres tables). VOPS is more recent work, allowing to use more or less normal SQL (using foreign data wrapper and user defined types/operators). In VOPS data is stored inside normal Postgres tables, but using vectors (tiles) instead of scalars. Both IMCS and VOPS provides 10-100 times speed improvement on queries like Q1 in TPC-H (sequential scan with filtering and aggregation). In queries involving joins there is almost no benefit comparing with normal Postgres. There is also columnar storage extension developed by Fujitsu: https://www.postgresql.org/message-id/cajrrpgfac7wc9nk6ptty6yn-nn+hcy8xolah2doyhvg5d6h...@mail.gmail.com But the published patch is only first step in this direction and it is not possible neither to use it in practice, neither perform some experiments measuring possible improvement of performance. On 09.10.2017 23:06, Joshua D. Drake wrote: On 10/09/2017 01:03 PM, legrand legrand wrote: Is there a chance that pluggable storage permits creation of a columnar rdbms as monetDB in PostgreSQL ? Thanks un advance for thé answer The extension C-Store from Citus is probably what you are looking for. jD -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise aggregation/grouping
On 10 October 2017 at 17:57, Ashutosh Bapat wrote: > Append node just returns the result of ExecProcNode(). Charging > cpu_tuple_cost may make it too expensive. In other places where we > charge cpu_tuple_cost there's some processing done to the tuple like > ExecStoreTuple() in SeqNext(). May be we need some other measure for > Append's processing of the tuple. I don't think there's any need to invent any new GUC. You could just divide cpu_tuple_cost by something. I did a quick benchmark on my laptop to see how much Append really costs, and with the standard costs the actual cost seems to be about cpu_tuple_cost / 2.4. So probably cpu_tuple_cost / 2 might be realistic. create_set_projection_path() does something similar and brincostestimate() does some similar magic and applies 0.1 * cpu_operator_cost to the total cost. # create table p (a int, b int); # create table p1 () inherits (p); # insert into p1 select generate_series(1,100); # vacuum analyze p1; # \q $ echo "select count(*) from p1;" > p1.sql $ echo "select count(*) from p;" > p.sql $ pgbench -T 60 -f p1.sql -n latency average = 58.567 ms $ pgbench -T 60 -f p.sql -n latency average = 72.984 ms $ psql psql (11devel) Type "help" for help. # -- check the cost of the plan. # explain select count(*) from p1; QUERY PLAN -- Aggregate (cost=16925.00..16925.01 rows=1 width=8) -> Seq Scan on p1 (cost=0.00..14425.00 rows=100 width=0) (2 rows) # -- selecting from the parent is the same due to zero Append cost. # explain select count(*) from p; QUERY PLAN Aggregate (cost=16925.00..16925.01 rows=1 width=8) -> Append (cost=0.00..14425.00 rows=101 width=0) -> Seq Scan on p (cost=0.00..0.00 rows=1 width=0) -> Seq Scan on p1 (cost=0.00..14425.00 rows=100 width=0) (4 rows) # -- extrapolate the additional time taken for the Append scan and work out what the planner # -- should add to the plan's cost, then divide by the number of rows in p1 to work out the # -- tuple cost of pulling a row through the append. # select (16925.01 * (72.984 / 58.567) - 16925.01) / 100; ?column? 0.00416630302337493743 (1 row) # show cpu_tuple_cost; cpu_tuple_cost 0.01 (1 row) # -- How does that compare to the cpu_tuple_cost? # select current_Setting('cpu_tuple_cost')::float8 / 0.00416630302337493743; ?column? 2.400209476818 (1 row) Maybe it's worth trying with different row counts to see if the additional cost is consistent, but it's probably not worth being too critical here. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] 10.0: Logical replication doesn't execute BEFORE UPDATE OF trigger
On Tue, Oct 10, 2017 at 11:29 AM, Masahiko Sawada wrote: > On Mon, Oct 9, 2017 at 11:13 PM, Aleksander Alekseev > wrote: >> Hi hackers, >> >> I've found something that looks like a bug. >> >> Steps to reproduce >> -- >> >> There are 3 instances of PostgreSQL 10.0 - inst1, inst2 and inst3. There >> is a table `test` on every instance: >> >> ``` >> CREATE TABLE test(k TEXT PRIMARY KEY, v TEXT); >> ``` >> >> Both inst1 and inst2 have `allpub` publication: >> >> ``` >> CREATE PUBLICATION allpub FOR ALL TABLES; >> ``` >> >> ... and inst3 is subscribed for both publications: >> >> ``` >> CREATE SUBSCRIPTION allsub1 >> CONNECTION 'host=10.128.0.16 user=eax dbname=eax' >> PUBLICATION allpub; >> >> CREATE SUBSCRIPTION allsub2 >> CONNECTION 'host=10.128.0.26 user=eax dbname=eax' >> PUBLICATION allpub; >> ``` >> >> So basically it's two masters, one replica configuration. To resolve >> insert/update conflicts I've created the following triggers on inst3: >> >> ``` >> CREATE OR REPLACE FUNCTION test_before_insert() >> RETURNS trigger AS $$ >> BEGIN >> >> RAISE NOTICE 'test_before_insert trigger executed'; >> >> IF EXISTS (SELECT 1 FROM test where k = new.k) THEN >>RAISE NOTICE 'test_before_insert trigger - merging data'; >>UPDATE test SET v = v || ';' || new.v WHERE k = new.k; >>RETURN NULL; >> END IF; >> >> RETURN NEW; >> >> END >> $$ LANGUAGE plpgsql; >> >> >> CREATE OR REPLACE FUNCTION test_before_update() >> RETURNS trigger AS $$ >> BEGIN >> >> RAISE NOTICE 'test_before_update trigger executed'; >> >> IF EXISTS (SELECT 1 FROM test where k = new.k) THEN >>RAISE NOTICE 'test_before_update trigger - merging data'; >>UPDATE test SET v = v || ';' || new.v WHERE k = new.k; >>DELETE FROM test where k = old.k; >>RETURN NULL; >> END IF; >> >> RETURN NEW; >> >> END >> $$ LANGUAGE plpgsql; >> >> create trigger test_before_insert_trigger >> before insert on test >> for each row execute procedure test_before_insert(); >> >> create trigger test_before_update_trigger >> before update of k on test >> for each row execute procedure test_before_update(); >> >> ALTER TABLE test ENABLE REPLICA TRIGGER test_before_insert_trigger; >> ALTER TABLE test ENABLE REPLICA TRIGGER test_before_update_trigger; >> ``` >> >> The INSERT trigger works just as expected, however the UPDATE trigger >> doesn't. On inst1: >> >> ``` >> insert into test values ('k1', 'v1'); >> ``` >> >> In inst2: >> >> ``` >> insert into test values ('k4', 'v4'); >> update test set k = 'k1' where k = 'k4'; >> ``` >> >> Now on inst3: >> >> ``` >> select * from test; >> ``` >> >> Expected result >> --- >> >> Rows are merged to: >> >> ``` >> k | v >> +--- >> k1 | v1;v4 >> ``` >> >> This is what would happen if all insert/update queries would have been >> executed on one instance. >> >> Actual result >> - >> >> Replication fails, log contains: >> >> ``` >> [3227] ERROR: duplicate key value violates unique constraint "test_pkey" >> [3227] DETAIL: Key (k)=(k1) already exists. >> [3176] LOG: worker process: logical replication worker for subscription >> 16402 (PID 3227) exited with exit code 1 >> ``` >> >> What do you think? >> > > I think the cause of this issue is that the apply worker doesn't set > updatedCols of RangeTblEntry when applying updates. So TriggerEnabled > always ends up with false. I'll make a patch and submit. > Attached patch store the updated columns bitmap set to RangeTblEntry. In my environment this bug seems to be fixed by the patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center set_updated_columns.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Omission in GRANT documentation
grant.sgml says that the default privileges granted to PUBLIC are as follows: CONNECT and CREATE TEMP TABLE for databases; EXECUTE privilege for functions; and USAGE privilege for languages. But types also have the USAGE privilege for PUBLIC by default: test=> CREATE TYPE bug_status AS ENUM ('new', 'open', 'closed'); CREATE TYPE test=> GRANT USAGE ON TYPE bug_status TO duff; GRANT test=> REVOKE USAGE ON TYPE bug_status FROM duff; REVOKE test=> \dT+ bug_status List of data types Schema |Name| ... | Owner | Access privileges | ... ++-+-+---+- public | bug_status | | laurenz | =U/laurenz +| || | | laurenz=U/laurenz | (1 row) Hence I propose the attached documentation patch. Yours, Laurenz AlbeFrom e1213e1e91cd0c45fcca8df492f1017f2eacc4bc Mon Sep 17 00:00:00 2001 From: Laurenz Albe Date: Tue, 10 Oct 2017 09:21:36 +0200 Subject: [PATCH] Fix documentation of default privileges for types Document that PUBLIC has USAGE privileges on newly created types. --- doc/src/sgml/ref/grant.sgml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/src/sgml/ref/grant.sgml b/doc/src/sgml/ref/grant.sgml index c63252c..8936963 100644 --- a/doc/src/sgml/ref/grant.sgml +++ b/doc/src/sgml/ref/grant.sgml @@ -161,7 +161,7 @@ GRANT role_name [, ...] TO PUBLIC are as follows: CONNECT and CREATE TEMP TABLE for databases; EXECUTE privilege for functions; and - USAGE privilege for languages. + USAGE privilege for languages and types. The object owner can, of course, REVOKE both default and expressly granted privileges. (For maximum security, issue the REVOKE in the same transaction that -- 2.9.5 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers