date:20171010

Re: [HACKERS] Slow synchronous logical replication

2017-10-10 Thread Masahiko Sawada

On Mon, Oct 9, 2017 at 4:37 PM, Konstantin Knizhnik
 wrote:
> Thank you for explanations.
>
> On 08.10.2017 16:00, Craig Ringer wrote:
>>
>> I think it'd be helpful if you provided reproduction instructions,
>> test programs, etc, making it very clear when things are / aren't
>> related to your changes.
>
>
> It will be not so easy to provide some reproducing scenario, because
> actually it involves many components (postgres_fdw, pg_pasthman,
> pg_shardman, LR,...)
> and requires multinode installation.
> But let me try to explain what going on:
> So we have implement sharding - splitting data between several remote tables
> using pg_pathman and postgres_fdw.
> It means that insert or update of parent table  cause insert or update of
> some derived partitions which is forwarded by postgres_fdw to the
> correspondent node.
> Number of shards is significantly larger than number of nodes, i.e. for 5
> nodes we have 50 shards. Which means that at each onde we have 10 shards.
> To provide fault tolerance each shard is replicated using logical
> replication to one or more nodes. Right now we considered only redundancy
> level 1 - each shard has only one replica.
> So from each node we establish 10 logical replication channels.
>
> We want commit to wait until data is actually stored at all replicas, so we
> are using synchronous replication:
> So we set synchronous_commit option to "on" and include all ten 10
> subscriptions in synchronous_standby_names list.
>
> In this setup commit latency is very large (about 100msec and most of the
> time is actually spent in commit) and performance is very bad - pgbench
> shows about 300 TPS for optimal number of clients (about 10, for larger
> number performance is almost the same). Without logical replication at the
> same setup we get about 6000 TPS.
>
> I have checked syncrepl.c file, particularly SyncRepGetSyncRecPtr function.
> Each wal sender independently calculates minimal LSN among all synchronous
> replicas and wakeup backends waiting for this LSN. It means that transaction
> performing update of data in one shard will actually wait confirmation from
> replication channels for all shards.
> If some shard is updated rarely than other or is not updated at all (for
> example because communication channels between this node is broken), then
> all backens will stuck.
> Also all backends are competing for the single SyncRepLock, which also can
> be a contention point.
>

IIUC, I guess you meant to say that in current synchronous logical
replication a transaction has to wait for updated table data to be
replicated even on servers that don't subscribe for the table. If we
change it so that a transaction needs to wait for only the server that
are subscribing for the table it would be more efficiency, for at
least your use case.
We send at least the begin and commit data to all subscriptions and
then wait for the reply from them but can we skip to wait them, for
example, when the walsender actually didn't send any data modified by
the transaction?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How does postgres store the join predicate for a relation in a given query

2017-10-10 Thread Ashutosh Bapat

On Tue, Oct 10, 2017 at 7:29 PM, Gourav Kumar  wrote:
> Hi all,
>
> When you fire a query in postgresql, it will first parse the query and
> create the data structures for storing various aspects of the query and
> executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.).
>
> I want to know how does postgresql stores the join predicates of a query.
> Like which data structure is used to store the join predicates.
>
> How can we find the join predicates applied on a relation from relid, Oid or
> RangeTblEntry ?
>

Every relation has a RelOptInfo associated with it. Predicates
applicable to it are stored in this RelOptInfo as a list. For base
relations (simple tables) it's in baserestrictinfo. The join
predicates applicable are in joininfo. You can get RelOptInfo of a
given simple table using find_base_rel().

HTH.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] show precise repos version for dev builds?

2017-10-10 Thread Jeremy Schneider

On Sun, Oct 1, 2017 at 8:10 AM, Tom Lane  wrote:
>
> configure --with-extra-version=whateveryouwant

I see that this build option has been around since 9.4; is anyone
using it to mark patched production builds?  EnterpriseDB or
2ndQuadrant? How about the cloud providers?

-Jeremy

-- 
http://about.me/jeremy_schneider

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns

2017-10-10 Thread Andres Freund

Hi,

On 2017-10-03 13:58:37 -0400, Robert Haas wrote:
> On Tue, Oct 3, 2017 at 12:23 PM, Andres Freund  wrote:
> > Makes sense?
> 
> Yes.

Here's an updated version of this patchset. Changes:

- renamed pq_send$type_pre to pq_write*type
- renamed pq_beginmessage_pre/pq_beginmessage_keep to the _reuse suffix
- removed unaligned memory access issues by again using memcpy - gcc and
  clang both successfully optimize it away
- moved permanent buffer for SendRowDescriptionMessage to postgres.c,
  and have other callers use already pre-existing buffers.
- replace all pq_sendint with pq_sendint$width in core
- converted applicable pq_begin/endmessage in printtup.c users to use
  DR_printtup->buf.
- added comments explaining restrict usage
- expanded commit messages considerably
- Small stuff.

The naming I'd discussed a bit back and forth with Robert over IM,
thanks!

- Andres
>From 89e301384c9fbc071de19c1517ffe29371fc6f36 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 20 Sep 2017 13:01:22 -0700
Subject: [PATCH 1/6] Add configure infrastructure to detect support for C99's
 restrict.

Will be used in later commits improving performance for a few key
routines where information about aliasing allows for significantly
better code generation.

This allows to use the C99 'restrict' keyword without breaking C89, or
for that matter C++, compilers. If not supported it's defined to be
empty.

Author: Andres Freund
Discussion: https://postgr.es/m/20170914063418.sckdzgjfrsbek...@alap3.anarazel.de
---
 configure | 46 +++
 configure.in  |  1 +
 src/include/pg_config.h.in| 14 +
 src/include/pg_config.h.win32 |  6 ++
 4 files changed, 67 insertions(+)

diff --git a/configure b/configure
index b0582657bf4..ca54242d5d7 100755
--- a/configure
+++ b/configure
@@ -11545,6 +11545,52 @@ _ACEOF
 ;;
 esac
 
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for C/C++ restrict keyword" >&5
+$as_echo_n "checking for C/C++ restrict keyword... " >&6; }
+if ${ac_cv_c_restrict+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_cv_c_restrict=no
+   # The order here caters to the fact that C++ does not require restrict.
+   for ac_kw in __restrict __restrict__ _Restrict restrict; do
+ cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+typedef int * int_ptr;
+	int foo (int_ptr $ac_kw ip) {
+	return ip[0];
+   }
+int
+main ()
+{
+int s[1];
+	int * $ac_kw t = s;
+	t[0] = 0;
+	return foo(t)
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  ac_cv_c_restrict=$ac_kw
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ test "$ac_cv_c_restrict" != no && break
+   done
+
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_restrict" >&5
+$as_echo "$ac_cv_c_restrict" >&6; }
+
+ case $ac_cv_c_restrict in
+   restrict) ;;
+   no) $as_echo "#define restrict /**/" >>confdefs.h
+ ;;
+   *)  cat >>confdefs.h <<_ACEOF
+#define restrict $ac_cv_c_restrict
+_ACEOF
+ ;;
+ esac
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for printf format archetype" >&5
 $as_echo_n "checking for printf format archetype... " >&6; }
 if ${pgac_cv_printf_archetype+:} false; then :
diff --git a/configure.in b/configure.in
index 4548db0dd3c..ab990d69f4c 100644
--- a/configure.in
+++ b/configure.in
@@ -1299,6 +1299,7 @@ fi
 m4_defun([AC_PROG_CC_STDC], []) dnl We don't want that.
 AC_C_BIGENDIAN
 AC_C_INLINE
+AC_C_RESTRICT
 PGAC_PRINTF_ARCHETYPE
 AC_C_FLEXIBLE_ARRAY_MEMBER
 PGAC_C_SIGNED
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index b0298cca19c..80ee37dd622 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -923,6 +923,20 @@
if such a type exists, and if the system does not define it. */
 #undef intptr_t
 
+/* Define to the equivalent of the C99 'restrict' keyword, or to
+   nothing if this is not supported.  Do not define if restrict is
+   supported directly.  */
+#undef restrict
+/* Work around a bug in Sun C++: it does not support _Restrict or
+   __restrict__, even though the corresponding Sun C compiler ends up with
+   "#define restrict _Restrict" or "#define restrict __restrict__" in the
+   previous line.  Perhaps some future version of Sun C++ will work with
+   restrict; if so, hopefully it defines __RESTRICT like Sun C does.  */
+#if defined __SUNPRO_CC && !defined __RESTRICT
+# define _Restrict
+# define __restrict__
+#endif
+
 /* Define to empty if the C compiler does not understand signed types. */
 #undef signed
 
diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index b76aad02676..b96e93328fe 100644
--- a/src/include/pg_config.h.win32
+++ b/src/include/pg_config.h.win32
@@ -681,6 +681,12 @@
 #define inline __inline
 #endif
 
+/* Define to the equivalent of the C99 'restrict' keyword, or to
+   nothing if this is not supported.  Do not define if restrict is
+   supported directly.  */
+/* wor

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-10 Thread Michael Paquier

On Wed, Oct 11, 2017 at 11:31 AM, Wood, Dan  wrote:
> I found one glitch with our merge of the original dup row fix.  With that 
> corrected AND Alvaro’s Friday fix things are solid.
> No dup’s.  No index corruption.
>
> Thanks so much.

Nice to hear that! You guys seem to be doing extensive testing and
actually report back about it, which is really nice to see.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-10 Thread Wood, Dan

I found one glitch with our merge of the original dup row fix.  With that 
corrected AND Alvaro’s Friday fix things are solid.
No dup’s.  No index corruption.

Thanks so much. 

On 10/10/17, 7:25 PM, "Michael Paquier"  wrote:

On Tue, Oct 10, 2017 at 11:14 PM, Alvaro Herrera
 wrote:
> I was seeing just the reindex problem.  I don't see any more dups.
>
> But I've tried to reproduce it afresh now, and let it run for a long
> time and nothing happened.  Maybe I made a mistake last week and
> ran an unfixed version.  I don't see any more problems now.

Okay, so that's one person more going to this trend, making three with
Peter and I.

>> If you are getting the dup rows consider the code in the block in
>> heapam.c that starts with the comment “replace multi by update xid”.
>>
>> When I repro this I find that MultiXactIdGetUpdateXid() returns 0.
>> There is an updater in the multixact array however the status is
>> MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate.  I
>> assume this is a preliminary status before the following row in the
>> hot chain has it’s multixact set to NoKeyUpdate.
>
> Yes, the "For" version is the locker version rather than the actual
> update.  That lock is acquired by EvalPlanQual locking the row just
> before doing the update.  I think GetUpdateXid has no reason to return
> such an Xid, since it's not an update.
>
>> Since a 0 is returned this does precede cutoff_xid and
>> TransactionIdDidCommit(0) will return false.  This ends up aborting
>> the multixact on the row even though the real xid is committed.  This
>> sets XMAX to 0 and that row becomes visible as one of the dups.
>> Interestingly the real xid of the updater is 122944 and the cutoff_xid
>> is 122945.
>
> I haven't seen this effect. Please keep us updated if you're able to
> verify corruption this way.

Me neither. It would be nice to not live long with such a sword of Damocles.
-- 
Michael




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_regress help output

2017-10-10 Thread Joe Conway

On 10/10/2017 07:06 PM, Tom Lane wrote:
> Joe Conway  writes:
>> I have been annoyed at least twice now by the lack of pg_regress command
>> line help output for the "--bindir=" option. In passing I noted
>> that there was no output for "--help" or "--version" options either.
> 
>> Any objections to the attached?
> 
> +1 for documenting it, but the phrasing seems a bit awkward:
> 
> ! printf(_("  --bindir=BINPATH  use BINPATH for programs that 
> are run;\n"));
> ! printf(_("if BINPATH empty, use PATH 
> from the environment\n"));
> 
> Maybe just "if empty, use PATH ..." ?

Ok, so like this?

8<--
--bindir=BINPATH   use BINPATH for programs that are run;\n"));
   if empty, use PATH from the environment\n"));
8<--

> Also, why is the patch apparently changing whitespace in all the help
> lines?  Seems like that will create a lot of make-work for translators.

I debated with myself about that.

In other cases, e.g. initdb or psql, where we mix short+long options and
long-only options, we indent the long-only options in the output to
match up with the long-options of the mixed lines (whew, hopefully that
is clear).

Previously we were not showing mixed short+long options for pg_regress
at all, and hence only indenting the long-only options minimally. But
the addition of -h and -V (again consistent with other programs we
ship), it made sense to be consistent in the way we indent.

But I am fine with leaving the original lines output the way they were
if preferred.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-10 Thread Michael Paquier

On Tue, Oct 10, 2017 at 11:14 PM, Alvaro Herrera
 wrote:
> I was seeing just the reindex problem.  I don't see any more dups.
>
> But I've tried to reproduce it afresh now, and let it run for a long
> time and nothing happened.  Maybe I made a mistake last week and
> ran an unfixed version.  I don't see any more problems now.

Okay, so that's one person more going to this trend, making three with
Peter and I.

>> If you are getting the dup rows consider the code in the block in
>> heapam.c that starts with the comment “replace multi by update xid”.
>>
>> When I repro this I find that MultiXactIdGetUpdateXid() returns 0.
>> There is an updater in the multixact array however the status is
>> MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate.  I
>> assume this is a preliminary status before the following row in the
>> hot chain has it’s multixact set to NoKeyUpdate.
>
> Yes, the "For" version is the locker version rather than the actual
> update.  That lock is acquired by EvalPlanQual locking the row just
> before doing the update.  I think GetUpdateXid has no reason to return
> such an Xid, since it's not an update.
>
>> Since a 0 is returned this does precede cutoff_xid and
>> TransactionIdDidCommit(0) will return false.  This ends up aborting
>> the multixact on the row even though the real xid is committed.  This
>> sets XMAX to 0 and that row becomes visible as one of the dups.
>> Interestingly the real xid of the updater is 122944 and the cutoff_xid
>> is 122945.
>
> I haven't seen this effect. Please keep us updated if you're able to
> verify corruption this way.

Me neither. It would be nice to not live long with such a sword of Damocles.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_regress help output

2017-10-10 Thread Tom Lane

Joe Conway  writes:
> I have been annoyed at least twice now by the lack of pg_regress command
> line help output for the "--bindir=" option. In passing I noted
> that there was no output for "--help" or "--version" options either.

> Any objections to the attached?

+1 for documenting it, but the phrasing seems a bit awkward:

!   printf(_("  --bindir=BINPATH  use BINPATH for programs that 
are run;\n"));
!   printf(_("if BINPATH empty, use PATH 
from the environment\n"));

Maybe just "if empty, use PATH ..." ?

Also, why is the patch apparently changing whitespace in all the help
lines?  Seems like that will create a lot of make-work for translators.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is it time to kill support for very old servers?

2017-10-10 Thread Michael Paquier

On Wed, Oct 11, 2017 at 10:46 AM, Andres Freund  wrote:
> I'm not following. The "D" is in the 'dispchar' field, not the value
> field, no? The default value is NULL?

Oops, yes. I misread the code. Other debug options are not documented,
so fine for me to not provide any documentation then.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is it time to kill support for very old servers?

2017-10-10 Thread Andres Freund

Hi,

On 2017-10-11 10:40:11 +0900, Michael Paquier wrote:
> >> +   if (conn->forced_protocol_version != NULL)
> >> +   {
> >> +   conn->pversion = atoi(conn->forced_protocol_version);
> >> +   }
> >> This should check for strlen > 0 as well.
> >
> > Why? Note that we don't do elsehwere in fe-connect.c.
> 
> Because it seems to me that the default value of the parameter should
> be an empty string instead of D. Feels more consistent with the
> others.

I'm not following. The "D" is in the 'dispchar' field, not the value
field, no? The default value is NULL?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is it time to kill support for very old servers?

2017-10-10 Thread Michael Paquier

On Wed, Oct 11, 2017 at 10:14 AM, Andres Freund  wrote:
> Hi,
>
> On 2017-10-11 10:09:34 +0900, Michael Paquier wrote:
>> On Wed, Oct 11, 2017 at 9:39 AM, Andres Freund  wrote:
>> > On 2017-09-20 01:32:36 -0700, Andres Freund wrote:
>> >> Coverage of the relevant files is a good bit higher afterwards. Although
>> >> our libpq coverage is generally pretty damn awful.
>> >
>> > Any opinions on this? Obviously this needs some cleanup, but I'd like to
>> > know whether we've concensus on adding a connection option for this goal
>> > before investing more time into this.
>> >
>> > A nearby thread [1] whacks around some the v2 code, which triggered me
>> > to look into this. I obviously can just use thiese patches to test those
>> > patches during development, but it seems better to keep coverage.
>>
>> FWIW, I think that moving forward with such a possibility is a good
>> thing, including having a connection parameter. This would pay in the
>> long term if a new protocol version is added.
>
>> 0001 should document the new parameter.
>
> I'm actually inclined not to, and keep this as a undocumented debugging
> option. Limiting the use of this option to people willing to read the
> code seems like a good idea to me.

It seems important to me to document things present. There is a
section in the docs listing developer-only parameters for runtime
configuration, why not separating "Parameter Key Words" into two
sections then? One for the main parameters and one for developer-only
parameters.

>> +   if (conn->forced_protocol_version != NULL)
>> +   {
>> +   conn->pversion = atoi(conn->forced_protocol_version);
>> +   }
>> This should check for strlen > 0 as well.
>
> Why? Note that we don't do elsehwere in fe-connect.c.

Because it seems to me that the default value of the parameter should
be an empty string instead of D. Feels more consistent with the
others.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is it time to kill support for very old servers?

2017-10-10 Thread Andres Freund

Hi,

On 2017-10-11 10:09:34 +0900, Michael Paquier wrote:
> On Wed, Oct 11, 2017 at 9:39 AM, Andres Freund  wrote:
> > On 2017-09-20 01:32:36 -0700, Andres Freund wrote:
> >> Coverage of the relevant files is a good bit higher afterwards. Although
> >> our libpq coverage is generally pretty damn awful.
> >
> > Any opinions on this? Obviously this needs some cleanup, but I'd like to
> > know whether we've concensus on adding a connection option for this goal
> > before investing more time into this.
> >
> > A nearby thread [1] whacks around some the v2 code, which triggered me
> > to look into this. I obviously can just use thiese patches to test those
> > patches during development, but it seems better to keep coverage.
> 
> FWIW, I think that moving forward with such a possibility is a good
> thing, including having a connection parameter. This would pay in the
> long term if a new protocol version is added.

> 0001 should document the new parameter.

I'm actually inclined not to, and keep this as a undocumented debugging
option. Limiting the use of this option to people willing to read the
code seems like a good idea to me.

> 
> +   if (conn->forced_protocol_version != NULL)
> +   {
> +   conn->pversion = atoi(conn->forced_protocol_version);
> +   }
> This should check for strlen > 0 as well.

Why? Note that we don't do elsehwere in fe-connect.c.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is it time to kill support for very old servers?

2017-10-10 Thread Michael Paquier

On Wed, Oct 11, 2017 at 9:39 AM, Andres Freund  wrote:
> On 2017-09-20 01:32:36 -0700, Andres Freund wrote:
>> Coverage of the relevant files is a good bit higher afterwards. Although
>> our libpq coverage is generally pretty damn awful.
>
> Any opinions on this? Obviously this needs some cleanup, but I'd like to
> know whether we've concensus on adding a connection option for this goal
> before investing more time into this.
>
> A nearby thread [1] whacks around some the v2 code, which triggered me
> to look into this. I obviously can just use thiese patches to test those
> patches during development, but it seems better to keep coverage.

FWIW, I think that moving forward with such a possibility is a good
thing, including having a connection parameter. This would pay in the
long term if a new protocol version is added. 0001 should document the
new parameter.

+   if (conn->forced_protocol_version != NULL)
+   {
+   conn->pversion = atoi(conn->forced_protocol_version);
+   }
This should check for strlen > 0 as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_regress help output

2017-10-10 Thread Joe Conway

I have been annoyed at least twice now by the lack of pg_regress command
line help output for the "--bindir=" option. In passing I noted
that there was no output for "--help" or "--version" options either.

Any objections to the attached? It could be argued that it ought to be
back-patched too, but I won't bother unless someone cares enough to
request it.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 7c628df..f3dcc1f 100644
*** a/src/test/regress/pg_regress.c
--- b/src/test/regress/pg_regress.c
*** help(void)
*** 2004,2040 
  	printf(_("Usage:\n  %s [OPTION]... [EXTRA-TEST]...\n"), progname);
  	printf(_("\n"));
  	printf(_("Options:\n"));
! 	printf(_("  --config-auth=DATADIR update authentication settings for DATADIR\n"));
! 	printf(_("  --create-role=ROLEcreate the specified role before testing\n"));
! 	printf(_("  --dbname=DB   use database DB (default \"regression\")\n"));
! 	printf(_("  --debug   turn on debug mode in programs that are run\n"));
! 	printf(_("  --dlpath=DIR  look for dynamic libraries in DIR\n"));
! 	printf(_("  --encoding=ENCODING   use ENCODING as the encoding\n"));
! 	printf(_("  --inputdir=DIRtake input files from DIR (default \".\")\n"));
! 	printf(_("  --launcher=CMDuse CMD as launcher of psql\n"));
! 	printf(_("  --load-extension=EXT  load the named extension before running the\n"));
! 	printf(_("tests; can appear multiple times\n"));
! 	printf(_("  --load-language=LANG  load the named language before running the\n"));
! 	printf(_("tests; can appear multiple times\n"));
! 	printf(_("  --max-connections=N   maximum number of concurrent connections\n"));
! 	printf(_("(default is 0, meaning unlimited)\n"));
! 	printf(_("  --max-concurrent-tests=N  maximum number of concurrent tests in schedule\n"));
! 	printf(_("(default is 0, meaning unlimited)\n"));
! 	printf(_("  --outputdir=DIR   place output files in DIR (default \".\")\n"));
! 	printf(_("  --schedule=FILE   use test ordering schedule from FILE\n"));
! 	printf(_("(can be used multiple times to concatenate)\n"));
! 	printf(_("  --temp-instance=DIR   create a temporary instance in DIR\n"));
! 	printf(_("  --use-existinguse an existing installation\n"));
  	printf(_("\n"));
  	printf(_("Options for \"temp-instance\" mode:\n"));
! 	printf(_("  --no-locale   use C locale\n"));
! 	printf(_("  --port=PORT   start postmaster on PORT\n"));
! 	printf(_("  --temp-config=FILEappend contents of FILE to temporary config\n"));
  	printf(_("\n"));
  	printf(_("Options for using an existing installation:\n"));
! 	printf(_("  --host=HOST   use postmaster running on HOST\n"));
! 	printf(_("  --port=PORT   use postmaster running at PORT\n"));
! 	printf(_("  --user=USER   connect as USER\n"));
  	printf(_("\n"));
  	printf(_("The exit status is 0 if all tests passed, 1 if some tests failed, and 2\n"));
  	printf(_("if the tests could not be run for some reason.\n"));
--- 2004,2044 
  	printf(_("Usage:\n  %s [OPTION]... [EXTRA-TEST]...\n"), progname);
  	printf(_("\n"));
  	printf(_("Options:\n"));
! 	printf(_("  --bindir=BINPATH  use BINPATH for programs that are run;\n"));
! 	printf(_("if BINPATH empty, use PATH from the environment\n"));
! 	printf(_("  --config-auth=DATADIR update authentication settings for DATADIR\n"));
! 	printf(_("  --create-role=ROLEcreate the specified role before testing\n"));
! 	printf(_("  --dbname=DB   use database DB (default \"regression\")\n"));
! 	printf(_("  --debug   turn on debug mode in programs that are run\n"));
! 	printf(_("  --dlpath=DIR  look for dynamic libraries in DIR\n"));
! 	printf(_("  --encoding=ENCODING   use ENCODING as the encoding\n"));
! 	printf(_("  -h, --helpshow this help, then exit\n"));
! 	printf(_("  --inputdir=DIRtake input files from DIR (default \".\")\n"));
! 	printf(_("  --launcher=CMDuse CMD as launcher of psql\n"));
! 	printf(_("  --load-extension=EXT  load the named extension before running the\n"));
! 	printf(_("tests; can appear multiple times\n"));
! 	printf(_("  --load-language=LANG  load the named language before running the\n"));
! 	printf(_("tests; can appear multiple times\n"));
! 	printf(_("  --max-connections=N   maximum number of concurrent connections\n"));
! 	printf(_("

Re: [HACKERS] Is it time to kill support for very old servers?

2017-10-10 Thread Andres Freund

On 2017-09-20 01:32:36 -0700, Andres Freund wrote:
> On 2017-09-18 02:53:03 -0700, Andres Freund wrote:
> > On 2017-09-13 23:39:21 -0400, Tom Lane wrote:
> > > The real problem in this area, to my mind, is that we're not testing that
> > > code --- either end of it --- in any systematic way.  If it's broken it
> > > could take us quite a while to notice.
> >
> > Independent of the thrust of my question - why aren't we adding a
> > 'force-v2' option to libpq?  A test that basically does something like
> > postgres[22923][1]=# \setenv PGFORCEV2 1
> > postgres[22923][1]=# \c
> > You are now connected to database "postgres" as user "andres".
> > postgres[22924][1]=>
> > seems easy enough to add, in fact I tested the above.
> >
> > And the protocol coverage of the v2 protocol seems small enough that a
> > single not too large file ought to cover most if it quite easily.
> 
> Here's what I roughly was thinking of.  I don't quite like the name, and
> the way the version is specified for libpq (basically just the "raw"
> integer).   Not sure if others have an opinion on that.  I personally
> would lean towards not documenting this option...
> 
> There's a few things that I couldn't find easy ways to test:
> - the v2 specific binary protocol - I don't quite see how we could test
>   that without writing C
> - version error checks - pg_regress/psql errors out in non-interactive
>   mode if a connection fails to be established. This we could verify
>   with a s simple tap test.
> 
> Coverage of the relevant files is a good bit higher afterwards. Although
> our libpq coverage is generally pretty damn awful.

Any opinions on this? Obviously this needs some cleanup, but I'd like to
know whether we've concensus on adding a connection option for this goal
before investing more time into this.

A nearby thread [1] whacks around some the v2 code, which triggered me
to look into this. I obviously can just use thiese patches to test those
patches during development, but it seems better to keep coverage.

Thanks,

Andres

[1] https://postgr.es/m/20170914063418.sckdzgjfrsbek...@alap3.anarazel.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On markers of changed data

2017-10-10 Thread Stephen Frost

Alvaro,

* Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote:
> Greg Stark wrote:
> 
> > The general shape of what I would like to see is some log which lists
> > where each checkpoint starts and ends and what blocks are modified
> > since the previous checkpoint. Then to generate an incremental backup
> > from any point in time to the current you union all the block lists
> > between them and fetch those blocks. There are other ways of using
> > this aside from incremental backups on disk too -- you could imagine a
> > replica that has fallen behind requesting the block lists and then
> > fetching just those blocks instead of needing to receive and apply all
> > the wal.
> 
> Hmm, this sounds pretty clever.  And we already have the blocks touched
> by each record thanks to the work for pg_rewind (so we don't have to do
> any nasty tricks like the stuff Suzuki-san did for pg_lesslog, where
> each WAL record had to be processed individually to know what blocks it
> referenced), so it shouldn't be *too* difficult ...

Yeah, it sounds interesting, but I was just chatting w/ David about it
and we were thinking about how checkpoints are really rather often done,
so you end up with quite a few of these lists being out there.

Now, if the lists were always kept in a sorted fashion, then perhaps we
would be able to essentially merge-sort them all back together and
de-dup that way, but even then, you're talking about an awful lot if
you're looking at daily incrementals- that's 288 standard 5-minute
checkpoints, each with some 128k pages changed, assuming max_wal_size of
1GB, and I think we can all agree that the default max_wal_size is for
rather small systems.  That ends up being something around 2MB per
checkpoint to store the pages in or half a gig per day just to keep
track of the pages which changed in each checkpoint across that day.

There's a bit of hand-waving in there, but I don't think it's all that
much to reach a conclusion that this might not be the best approach.
David and I were kicking around the notion of a 'last LSN' which is kept
on a per-relation basis, but, of course, that ends up not really being
granular enough, and would likely be a source of contention unless we
could work out a way to make it "lazy" updated somehow, or similar.

> > It would also be useful for going in the reverse direction: look up
> > all the records (or just the last record) that modified a given block.
> 
> Well, a LSN map is what I was suggesting.

Not sure I entirely followed what you were getting at here..?

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] On markers of changed data

2017-10-10 Thread Alvaro Herrera

Greg Stark wrote:

> The general shape of what I would like to see is some log which lists
> where each checkpoint starts and ends and what blocks are modified
> since the previous checkpoint. Then to generate an incremental backup
> from any point in time to the current you union all the block lists
> between them and fetch those blocks. There are other ways of using
> this aside from incremental backups on disk too -- you could imagine a
> replica that has fallen behind requesting the block lists and then
> fetching just those blocks instead of needing to receive and apply all
> the wal.

Hmm, this sounds pretty clever.  And we already have the blocks touched
by each record thanks to the work for pg_rewind (so we don't have to do
any nasty tricks like the stuff Suzuki-san did for pg_lesslog, where
each WAL record had to be processed individually to know what blocks it
referenced), so it shouldn't be *too* difficult ...

> It would also be useful for going in the reverse direction: look up
> all the records (or just the last record) that modified a given block.

Well, a LSN map is what I was suggesting.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Extended statistics is not working on Vars hidden under a RelabelType

2017-10-10 Thread Tomas Vondra

On 10/10/2017 05:03 AM, David Rowley wrote:
> Basically, $subject is causing us not to properly find matching
> extended stats in this case.
> 
> The attached patch fixes it.
> 
> The following test cases is an example of the misbehaviour. Note
> rows=1 vs rows=98 in the Gather node.
> 

Thanks for noticing this. The patch seems fine to me.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On markers of changed data

2017-10-10 Thread Greg Stark

On 8 October 2017 at 08:52, Andrey Borodin  wrote:
>
> 1. Any other marker would be better (It can be WAL scan during archiving, 
> some new LSN-based mechanics* et c.)

The general shape of what I would like to see is some log which lists
where each checkpoint starts and ends and what blocks are modified
since the previous checkpoint. Then to generate an incremental backup
from any point in time to the current you union all the block lists
between them and fetch those blocks. There are other ways of using
this aside from incremental backups on disk too -- you could imagine a
replica that has fallen behind requesting the block lists and then
fetching just those blocks instead of needing to receive and apply all
the wal. Or possibly even making a cost-based decision between the two
depending on which would be faster.

It would also be useful for going in the reverse direction: look up
all the records (or just the last record) that modified a given block.
Instead of having to scan all the wal you would only need to scan the
checkpoint eras that had touched that block.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Help required to debug pg_repack breaking logical replication

2017-10-10 Thread Petr Jelinek

On 08/10/17 15:21, Craig Ringer wrote:
> On 8 October 2017 at 02:37, Daniele Varrazzo  
> wrote:
>> Hello,
>>
>> we have been reported, and I have experienced a couple of times,
>> pg_repack breaking logical replication.
>>
>> - https://github.com/reorg/pg_repack/issues/135
>> - https://github.com/2ndQuadrant/pglogical/issues/113
> 
> Yeah, I was going to say I've seen reports of this with pglogical, but
> I see you've linked to them.
> 
> I haven't had a chance to look into it though, and haven't had a
> suitable reproducible test case.
> 
>> In the above issue #113, Petr Jelinek commented:
>>
>>> From quick look at pg_repack, the way it does table rewrite is almost 
>>> guaranteed
>>> to break logical decoding unless there is zero unconsumed changes for a 
>>> given table
>>> as it does not build the necessary mappings info for logical decoding that 
>>> standard
>>> heap rewrite in postgres does.
>>
>> unfortunately he didn't follow up to further details requests.
> 
> At a guess he's referring to src/backend/access/heap/rewriteheap.c .
> 
> I'd explain better if I understood what was going on myself, but I
> haven't really understood the logical decoding parts of that code.
> 
>> - Is Petr diagnosis right and freezing of logical replication is to be
>> blamed to missing mapping?
>> - Can you suggest a test to reproduce the issue reliably?
>> - What are mapped relations anyway?
> 
> I can't immediately give you the answers you seek, but start by
> studying src/backend/access/heap/rewriteheap.c . Notably
> logical_end_heap_rewrite, logical_rewrite_heap_tuple,
> logical_begin_heap_rewrite.
> 

Yes that's exactly it. When table is rewritten we need to create mapping
for every tuple that was created or removed (ie, insert, update or
delete operation happened on it) since the oldest replication slot xmin
for logical decoding to continue to work on that table after the
rewrite. And pg_repack doesn't create that mapping.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [BUGS] 10.0: Logical replication doesn't execute BEFORE UPDATE OF trigger

2017-10-10 Thread Petr Jelinek

On 10/10/17 17:22, Aleksander Alekseev wrote:
> Hi Petr,
> 
>> let me start by saying that my view is that this is simply a
>> documentation bug. Meaning that I didn't document that it does not work,
>> but I also never intended it to work. Main reason is that we can't
>> support the semantics of "UPDATE OF" correctly (see bellow). And I think
>> it's better to not support something at all rather than making it behave
>> differently in different cases.
>>
>> Now about the proposed patch, I don't think this is correct way to
>> support this as it will only work when either PRIMARY KEY column was
>> changed or when REPLICA IDENTITY is set to FULL for the table. And even
>> then it will have very different semantics from how it works when the
>> row is updated by SQL statement (all non-toasted columns will be
>> reported as changed regardless of actually being changed or not).
>>
>> The more proper way to do this would be to run data comparison of the
>> new tuple and the existing tuple values which a) will have different
>> behavior than normal "UPDATE OF" triggers have because we still can't
>> tell what columns were mentioned in the original query and b) will not
>> exactly help performance (and perhaps c) one can write the trigger to
>> behave this way already without impacting all other use-cases).
> 
> I personally think that solution proposed by Masahiko is much better
> than current behavior.

Well the proposed patch adds inconsistent behavior - if in your test
you'd change the trigger to be UPDATE OF v instead of k and updated v,
it would still not be triggered even with this patch. That's IMHO worse
than current behavior which at least consistently doesn't work.

> We could document current limitations you've
> mentioned and probably add a corresponding warning during execution of
> ALTER TABLE ... ENABLE REPLICA TRIGGER. I don't insist on this
> particular approach though.
> 
> What I really don't like is that PostgreSQL allows to create something
> that supposedly should work but in fact doesn't. Such behavior is
> obviously a bug. So as an alternative we could just return an error in
> response to ALTER TABLE ... ENABLE REPLICA TRIGGER query for triggers
> that we know will never be executed.
> 

Well ENABLE REPLICA is not specific to builtin logical replication
though. It only depends on the value of session_replication_role GUC.
Any session can set that and if that session executes SQL the UPDATE OF
triggers will work as expected. It's similar situation to statement
triggers IMHO, we allow ENABLE REPLICA statement triggers but they are
not triggered by logical replication because there is no statement. And
given that UPDATE OF also depends on statement to work as expected (it's
specified to be triggered for columns listed in the statement, not for
what has been actually changed by the execution) I don't really see the
difference between this and statement triggers except that statement
trigger behavior is documented.

I understand that it's somewhat confusing and I am not saying it's
ideal, but I don't see any other behavior that would work for what your
test tries to do and still be consistent with rest of the system.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How does postgres store the join predicate for a relation in a given query

2017-10-10 Thread Nico Williams

On Tue, Oct 10, 2017 at 07:29:24PM +0530, Gourav Kumar wrote:
> When you fire a query in postgresql, it will first parse the query and
> create the data structures for storing various aspects of the query and
> executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.).
> 
> I want to know how does postgresql stores the join predicates of a query.
> Like which data structure is used to store the join predicates.
> 
> How can we find the join predicates applied on a relation from relid, Oid
> or RangeTblEntry ?
> 
> I want to construct a join graph for a given query, for which I need the
> join predicates between two relations.

In the usingClause or quals fields of a JoinExpr.  See
src/backend/parser/gram.y, search for join_qual.

Of course, WHERE clauses have to be inspected as well, which go into the
whereClause of of a SelectStmt; search for where_clause in
src/backend/parser/gram.y.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [BUGS] 10.0: Logical replication doesn't execute BEFORE UPDATE OF trigger

2017-10-10 Thread Aleksander Alekseev

Hi Petr,

> let me start by saying that my view is that this is simply a
> documentation bug. Meaning that I didn't document that it does not work,
> but I also never intended it to work. Main reason is that we can't
> support the semantics of "UPDATE OF" correctly (see bellow). And I think
> it's better to not support something at all rather than making it behave
> differently in different cases.
> 
> Now about the proposed patch, I don't think this is correct way to
> support this as it will only work when either PRIMARY KEY column was
> changed or when REPLICA IDENTITY is set to FULL for the table. And even
> then it will have very different semantics from how it works when the
> row is updated by SQL statement (all non-toasted columns will be
> reported as changed regardless of actually being changed or not).
> 
> The more proper way to do this would be to run data comparison of the
> new tuple and the existing tuple values which a) will have different
> behavior than normal "UPDATE OF" triggers have because we still can't
> tell what columns were mentioned in the original query and b) will not
> exactly help performance (and perhaps c) one can write the trigger to
> behave this way already without impacting all other use-cases).

I personally think that solution proposed by Masahiko is much better
than current behavior. We could document current limitations you've
mentioned and probably add a corresponding warning during execution of
ALTER TABLE ... ENABLE REPLICA TRIGGER. I don't insist on this
particular approach though.

What I really don't like is that PostgreSQL allows to create something
that supposedly should work but in fact doesn't. Such behavior is
obviously a bug. So as an alternative we could just return an error in
response to ALTER TABLE ... ENABLE REPLICA TRIGGER query for triggers
that we know will never be executed.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-10 Thread Alvaro Herrera

Robert Haas wrote:
> On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera  
> wrote:
> > 2. create one index for each existing partition.  These would be
> >identical to what would happen if you created the index directly on
> >each partition, except that there is an additional dependency to the
> >parent's abstract index.
> 
> One thing I'm a bit worried about is how to name these subordinate
> indexes.  They have to have names because that's how pg_class works,
> and those names can't all be the same, again because that's how
> pg_class works.  There's no problem right away when you first create
> the partitioned index, because you can just pick names out of a hat
> using whatever name-generating algorithm seems best.  However, when
> you dump-and-restore (including but not limited to the pg_upgrade
> case) you've got to preserve those names.  If you just generate a new
> name that may or may not be the same as the old one, then it may
> collide with a user-specified name that only occurs later in the dump.
> Also, you'll have trouble if the user has applied a COMMENT or a
> SECURITY LABEL to the index because that command works by name, or if
> the user has a reference to the index name inside a function or
> whatever.
> 
> These are pretty annoying corner-case bugs because they're not likely
> to come up very often.  Most people won't notice or care if the index
> name changes.  But I don't think it's acceptable to just ignore the
> problem.

I agree it's a problem that needs to be addressed directly.

> An idea I had was to treat the abstract index - to use your
> term - sort of the way we treat an extension.  Normally, when you
> create an index on a partitioned table, it cascades, but for dump and
> restore purpose, we tag on some syntax that says "well, don't actually
> create the subordinate indexes, i'll tell you about those later".
> Then for each subordinate index we issue a separate CREATE INDEX
> command followed by ALTER INDEX abstract_index ATTACH PARTITION
> concrete_index or something of that sort.  That means you can't
> absolutely count on the parent index to have all of the children it's
> supposed to have but maybe that's OK.

Hmm ... yeah, ATTACH and DETACH sound acceptable to me.  On DETACH, the
abstract index should be marked indisvalid=false unless a substitute
index already exists; and on ATTACH when indisvalid=false we verify that
all local indexes exist, and if so we can flip indisvalid.  That way, we
can continue to rely on the parent index always having all its children
when the flag is set.

I'm not clear on a syntax that creates the main index and hopes to later
have the sub-indexes created.  Another approach is to do it the other
way around, i.e. create the children first, then once they're all in
place create the main one normally, which merely verifies that all the
requisite children exist.  This is related to what I proposed to occur
when a regular table is joined as a partition of the partitioned table:
we run a verification that an index matching the parent's abstract
indexes exists, and if not we raise an error.  (Alternatively we could
allow the case, and mark the abstract index as indisvalid=false, but
that seems to violate POLA).

> Another thing that would let you do is CREATE INDEX CONCURRENTLY
> replacement_concrete_index; ALTER INDEX abstract_index DETACH
> PARTITION old_concrete_index, ATTACH PARTITION
> replacement_concrete_index; DROP INDEX CONCURRENTLY
> old_concrete_index, which seems like a thing someone might want to do.

Yeah, this is a point I explicitly mentioned, and this proposal seems
like a good way.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-10 Thread Alvaro Herrera

Wood, Dan wrote:
> I’m unclear on what is being repro’d in 9.6.  Are you getting the
> duplicate rows problem or just the reindex problem?  Are you testing
> with asserts enabled(I’m not)?

I was seeing just the reindex problem.  I don't see any more dups.

But I've tried to reproduce it afresh now, and let it run for a long
time and nothing happened.  Maybe I made a mistake last week and
ran an unfixed version.  I don't see any more problems now.

> If you are getting the dup rows consider the code in the block in
> heapam.c that starts with the comment “replace multi by update xid”.
>
> When I repro this I find that MultiXactIdGetUpdateXid() returns 0.
> There is an updater in the multixact array however the status is
> MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate.  I
> assume this is a preliminary status before the following row in the
> hot chain has it’s multixact set to NoKeyUpdate.

Yes, the "For" version is the locker version rather than the actual
update.  That lock is acquired by EvalPlanQual locking the row just
before doing the update.  I think GetUpdateXid has no reason to return
such an Xid, since it's not an update.

> Since a 0 is returned this does precede cutoff_xid and
> TransactionIdDidCommit(0) will return false.  This ends up aborting
> the multixact on the row even though the real xid is committed.  This
> sets XMAX to 0 and that row becomes visible as one of the dups.
> Interestingly the real xid of the updater is 122944 and the cutoff_xid
> is 122945.

I haven't seen this effect.  Please keep us updated if you're able to
verify corruption this way.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] How does postgres store the join predicate for a relation in a given query

2017-10-10 Thread Gourav Kumar

Hi all,

When you fire a query in postgresql, it will first parse the query and
create the data structures for storing various aspects of the query and
executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.).

I want to know how does postgresql stores the join predicates of a query.
Like which data structure is used to store the join predicates.

How can we find the join predicates applied on a relation from relid, Oid
or RangeTblEntry ?

I want to construct a join graph for a given query, for which I need the
join predicates between two relations.

-- 
Thanks,
Gourav Kumar

[HACKERS] Re: [BUGS] 10.0: Logical replication doesn't execute BEFORE UPDATE OF trigger

2017-10-10 Thread Petr Jelinek

On 10/10/17 09:53, Masahiko Sawada wrote:
> On Tue, Oct 10, 2017 at 11:29 AM, Masahiko Sawada  
> wrote:
>> On Mon, Oct 9, 2017 at 11:13 PM, Aleksander Alekseev
>>  wrote:
>>> Hi hackers,
>>>
>>> I've found something that looks like a bug.
>>>
>>> Steps to reproduce
>>> --
>>>
>>> There are 3 instances of PostgreSQL 10.0 - inst1, inst2 and inst3. There
>>> is a table `test` on every instance:
>>>
>>> ```
>>> CREATE TABLE test(k TEXT PRIMARY KEY, v TEXT);
>>> ```
>>>
>>> Both inst1 and inst2 have `allpub` publication:
>>>
>>> ```
>>> CREATE PUBLICATION allpub FOR ALL TABLES;
>>> ```
>>>
>>> ... and inst3 is subscribed for both publications:
>>>
>>> ```
>>> CREATE SUBSCRIPTION allsub1
>>>   CONNECTION 'host=10.128.0.16 user=eax dbname=eax'
>>>   PUBLICATION allpub;
>>>
>>> CREATE SUBSCRIPTION allsub2
>>>   CONNECTION 'host=10.128.0.26 user=eax dbname=eax'
>>>   PUBLICATION allpub;
>>> ```
>>>
>>> So basically it's two masters, one replica configuration. To resolve
>>> insert/update conflicts I've created the following triggers on inst3:
>>>
>>> ```
>>> CREATE OR REPLACE FUNCTION test_before_insert()
>>> RETURNS trigger AS $$
>>> BEGIN
>>>
>>> RAISE NOTICE 'test_before_insert trigger executed';
>>>
>>> IF EXISTS (SELECT 1 FROM test where k = new.k) THEN
>>>RAISE NOTICE 'test_before_insert trigger - merging data';
>>>UPDATE test SET v = v || ';' || new.v WHERE k = new.k;
>>>RETURN NULL;
>>> END IF;
>>>
>>> RETURN NEW;
>>>
>>> END
>>> $$ LANGUAGE plpgsql;
>>>
>>>
>>> CREATE OR REPLACE FUNCTION test_before_update()
>>> RETURNS trigger AS $$
>>> BEGIN
>>>
>>> RAISE NOTICE 'test_before_update trigger executed';
>>>
>>> IF EXISTS (SELECT 1 FROM test where k = new.k) THEN
>>>RAISE NOTICE 'test_before_update trigger - merging data';
>>>UPDATE test SET v = v || ';' || new.v WHERE k = new.k;
>>>DELETE FROM test where k = old.k;
>>>RETURN NULL;
>>> END IF;
>>>
>>> RETURN NEW;
>>>
>>> END
>>> $$ LANGUAGE plpgsql;
>>>
>>> create trigger test_before_insert_trigger
>>> before insert on test
>>> for each row execute procedure test_before_insert();
>>>
>>> create trigger test_before_update_trigger
>>> before update of k on test
>>> for each row execute procedure test_before_update();
>>>
>>> ALTER TABLE test ENABLE REPLICA TRIGGER test_before_insert_trigger;
>>> ALTER TABLE test ENABLE REPLICA TRIGGER test_before_update_trigger;
>>> ```
>>>
>>> The INSERT trigger works just as expected, however the UPDATE trigger
>>> doesn't. On inst1:
>>>
>>> ```
>>> insert into test values ('k1', 'v1');
>>> ```
>>>
>>> In inst2:
>>>
>>> ```
>>> insert into test values ('k4', 'v4');
>>> update test set k = 'k1' where k = 'k4';
>>> ```
>>>
>>> Now on inst3:
>>>
>>> ```
>>> select * from test;
>>> ```
>>>
>>> Expected result
>>> ---
>>>
>>> Rows are merged to:
>>>
>>> ```
>>>  k  |   v
>>> +---
>>>  k1 | v1;v4
>>> ```
>>>
>>> This is what would happen if all insert/update queries would have been
>>> executed on one instance.
>>>
>>> Actual result
>>> -
>>>
>>> Replication fails, log contains:
>>>
>>> ```
>>> [3227] ERROR:  duplicate key value violates unique constraint "test_pkey"
>>> [3227] DETAIL:  Key (k)=(k1) already exists.
>>> [3176] LOG:  worker process: logical replication worker for subscription 
>>> 16402 (PID 3227) exited with exit code 1
>>> ```
>>>
>>> What do you think?
>>>
>>
>> I think the cause of this issue is that the apply worker doesn't set
>> updatedCols of RangeTblEntry when applying updates. So TriggerEnabled
>> always ends up with false. I'll make a patch and submit.
>>
> 
> Attached patch store the updated columns bitmap set to RangeTblEntry.
> In my environment this bug seems to be fixed by the patch.
>
Hi,

let me start by saying that my view is that this is simply a
documentation bug. Meaning that I didn't document that it does not work,
but I also never intended it to work. Main reason is that we can't
support the semantics of "UPDATE OF" correctly (see bellow). And I think
it's better to not support something at all rather than making it behave
differently in different cases.

Now about the proposed patch, I don't think this is correct way to
support this as it will only work when either PRIMARY KEY column was
changed or when REPLICA IDENTITY is set to FULL for the table. And even
then it will have very different semantics from how it works when the
row is updated by SQL statement (all non-toasted columns will be
reported as changed regardless of actually being changed or not).

The more proper way to do this would be to run data comparison of the
new tuple and the existing tuple values which a) will have different
behavior than normal "UPDATE OF" triggers have because we still can't
tell what columns were mentioned in the original query and b) will not
exactly help performance (and perhaps c) one can write the trigger to
behave this way already without impacting all other use-cases).

-- 
  Petr Jelinek  http://www.2ndQua

Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-10-10 Thread Michael Paquier

On Tue, Sep 26, 2017 at 11:09 AM, Michael Paquier
 wrote:
> On Mon, Sep 25, 2017 at 11:22 PM, Peter Eisentraut
>  wrote:
>> I think the channel-binding negotiation on the client side is wrong.
>> The logic in the patch is
>>
>> +#ifdef USE_SSL
>> +   if (state->ssl_in_use)
>> +   appendPQExpBuffer(&buf, "p=%s", SCRAM_CHANNEL_TLS_UNIQUE);
>> +   else
>> +   appendPQExpBuffer(&buf, "y");
>> +#else
>> +   appendPQExpBuffer(&buf, "n");
>> +#endif
>>
>> But if SSL is compiled in but not used, the client does not in fact
>> support channel binding (for that connection), so it should send "n".
>
> For others, details about this flag are here:
>gs2-cbind-flag  = ("p=" cb-name) / "n" / "y"
>  ;; "n" -> client doesn't support channel binding.
>  ;; "y" -> client does support channel binding
>  ;;but thinks the server does not.
>  ;; "p" -> client requires channel binding.
>  ;; The selected channel binding follows "p=".
>
> And channel binding description is here:
> https://tools.ietf.org/html/rfc5802#section-6

Changed the patch to do that. Note that if the client enforces the
SASL mechanism to SCRAM-SHA-256-PLUS in a non-SSL context then libpq
complains. This can be done in libpq using pgsaslname.

>> The "y" flag should be sent if ssl_in_use but the client did not see the
>> server advertise SCRAM-SHA256-PLUS.  That logic is missing entirely in
>> this patch.
>
> Okay. I think I get your point here. I agree that the client is
> deficient here. This needs some more work.

Hm. I take back the argument that we can use the backend version here.
conn->sversion is only filled at the end of authentication. So the
best thing I think can be done here is to check if channel binding has
been advertised or not, and send "y" if the client thinks that the
server should do support it. If the client has chosen not to use
channel binding, by for example enforcing pgsaslname in libpq to
SCRAM-SHA-256, then "n" should be sent. I have changed the patch
accordingly, doing at the same time some refactoring in pg_SASL_init.
pg_fe_scram_init() gets initialized only when the mechanism is
selected.

>> You have the server reject a client that does not support channel
>> binding ("n") on all SSL connections.  I don't think this is correct.
>> It is up to the client to use channel binding or not, even on SSL
>> connections.
>
> It seems that I got confused with the meaning of "y" mixed with
> ssl_in_use. The server should reject "y" instead only if SSL is in
> use.

Okay. In order to address that, what just needs to be done is to
remove the error message that I have added in the backend when "n" is
sent by the client. So this thing is ripped off. This way, when a v10
libpq connects to a v11 backend, the connection is able to work even
with SSL connections.

>> We should update pg_hba.conf to allow a method specification of
>> "scram-sha256-plus", i.e., only advertise the channel binding variant to
>> the client.  Then you could make policy decisions like rejecting clients
>> that do not use channel binding on SSL connections.  This could be a
>> separate patch later.
>
> OK, I agree that there could be some value in that. This complicates a
> bit hba rule checks, but nothing really complicated either.

This would be useful, but I have let it for now to not complicate the
series more.

>> The error message in the "p" case if SSL is not in use is a bit
>> confusing: "but the server does not need it".  I think this could be
>> left at the old message "but it is not supported".  This ties into my
>> interpretation from above that whether channel binding is "supported"
>> depends on whether SSL is in use for a particular connection.
>
> Check.

Okay, I switched the error message to that (diff with previous patch series):
@@ -850,7 +850,7 @@ read_client_first_message(scram_state *state, char *input)
if (!state->ssl_in_use)
ereport(ERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
-errmsg("client supports SCRAM channel
binding, but server does not need it for non-SSL connections")));
+errmsg("client requires SCRAM channel
binding, but it is not supported")));

>> Some small code things:
>> - prefer to use size_t over int for length (tls_finish_len etc.)
>> - tls_finish should be tls_finished
>> - typos: certificate_bash -> certificate_hash
>
> Yes, thanks for spotting those.

Fixed.

>> In the patch for tls-server-end-point, I think the selection of the hash
>> function deviates slightly from the RFC.  The RFC only says to
>> substitute MD5 and SHA-1.  It doesn't say to turn SHA-224 into SHA-256,
>> for example.  There is also the problem that the code as written will
>> turn any unrecognized hash method into SHA-256.  If think the code
>> should single out MD5 and SHA-1 only and then use EVP_get_digestbynid()
>> for

Re: [HACKERS] [POC] hash partitioning

2017-10-10 Thread amul sul

On Tue, Oct 10, 2017 at 3:42 PM, Ashutosh Bapat
 wrote:
> On Tue, Oct 10, 2017 at 3:32 PM, amul sul  wrote:
>
>>> +hash_part? true : key->parttypbyval[j],
>>> +key->parttyplen[j]);
>>> parttyplen is the length of partition key attribute, whereas what you want 
>>> here
>>> is the length of type of modulus and remainder. Is that correct? Probably we
>>> need some special handling wherever parttyplen and parttypbyval is used 
>>> e.g. in
>>> call to partition_bounds_equal() from build_joinrel_partition_info().
>>>
>>
>> Unless I am missing something, I don't think we should worry about parttyplen
>> because in the datumCopy() when the datatype is pass-by-value then typelen
>> is ignored.
>
> That's true, but it's ugly, passing typbyvalue of one type and len of other.
>

How about the attached patch(0003)?
Also, the dim variable is renamed to natts.

Regards,
Amul


0001-partition_bounds_copy-code-refactoring-v1.patch
Description: Binary data


0002-hash-partitioning_another_design-v24.patch
Description: Binary data


0003-Enable-partition-wise-join-support-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [BUGS] 10.0: Logical replication doesn't execute BEFORE UPDATE OF trigger

2017-10-10 Thread Aleksander Alekseev

Hi Masahiko,

> > I think the cause of this issue is that the apply worker doesn't set
> > updatedCols of RangeTblEntry when applying updates. So TriggerEnabled
> > always ends up with false. I'll make a patch and submit.
> >
> 
> Attached patch store the updated columns bitmap set to RangeTblEntry.
> In my environment this bug seems to be fixed by the patch.

Thanks a lot for a quick response. I can confirm that your patch fixes
the issue and passes all tests. Hopefully someone will merge it shortly.

Here is another patch from me. It adds a corresponding TAP test. Before
applying your patch:

```
t/001_rep_changes.pl .. ok
t/002_types.pl  ok   
t/003_constraints.pl .. 1/5
#   Failed test 'check replica trigger with specified list of affected columns 
applied on subscriber'
#   at t/003_constraints.pl line 151.
#  got: 'k2|v1'
# expected: 'k2|v1
# triggered|true'
# Looks like you failed 1 test of 5.
t/003_constraints.pl .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/5 subtests
t/004_sync.pl . ok
t/005_encoding.pl . ok
t/006_rewrite.pl .. ok
t/007_ddl.pl .. ok
```

After:

```
t/001_rep_changes.pl .. ok
t/002_types.pl  ok
t/003_constraints.pl .. ok
t/004_sync.pl . ok
t/005_encoding.pl . ok
t/006_rewrite.pl .. ok
t/007_ddl.pl .. ok
All tests successful.
```

-- 
Best regards,
Aleksander Alekseev
diff --git a/src/test/subscription/t/003_constraints.pl b/src/test/subscription/t/003_constraints.pl
index 06863aef84..f1fc5ae863 100644
--- a/src/test/subscription/t/003_constraints.pl
+++ b/src/test/subscription/t/003_constraints.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 4;
+use Test::More tests => 5;
 
 # Initialize publisher node
 my $node_publisher = get_new_node('publisher');
@@ -21,6 +21,9 @@ $node_publisher->safe_psql('postgres',
 $node_publisher->safe_psql('postgres',
 "CREATE TABLE tab_fk_ref (id int PRIMARY KEY, bid int REFERENCES tab_fk (bid));"
 );
+$node_publisher->safe_psql('postgres',
+"CREATE TABLE tab_upd_tst (k TEXT PRIMARY KEY, v TEXT);"
+);
 
 # Setup structure on subscriber
 $node_subscriber->safe_psql('postgres',
@@ -28,6 +31,9 @@ $node_subscriber->safe_psql('postgres',
 $node_subscriber->safe_psql('postgres',
 "CREATE TABLE tab_fk_ref (id int PRIMARY KEY, bid int REFERENCES tab_fk (bid));"
 );
+$node_subscriber->safe_psql('postgres',
+"CREATE TABLE tab_upd_tst (k TEXT PRIMARY KEY, v TEXT);"
+);
 
 # Setup logical replication
 my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
@@ -112,5 +118,38 @@ $result = $node_subscriber->safe_psql('postgres',
 	"SELECT count(*), min(bid), max(bid) FROM tab_fk_ref;");
 is($result, qq(2|1|2), 'check replica trigger applied on subscriber');
 
+# Add replica trigger with specified list of affected columns
+$node_subscriber->safe_psql(
+	'postgres', qq{
+CREATE OR REPLACE FUNCTION upd_fn()
+RETURNS trigger AS \$\$
+BEGIN
+INSERT INTO tab_upd_tst VALUES ('triggered', 'true');
+RETURN NEW;
+END
+\$\$ LANGUAGE plpgsql;
+
+CREATE TRIGGER upd_trg
+BEFORE UPDATE OF k ON tab_upd_tst
+FOR EACH ROW EXECUTE PROCEDURE upd_fn();
+
+ALTER TABLE tab_upd_tst ENABLE REPLICA TRIGGER upd_trg;
+});
+
+# Insert data
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO tab_upd_tst (k, v) VALUES ('k1', 'v1');");
+$node_publisher->safe_psql('postgres',
+	"UPDATE tab_upd_tst SET k = 'k2' WHERE k = 'k1';");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for subscriber to catch up";
+
+# Trigger should be executed
+$result =
+  $node_subscriber->safe_psql('postgres', "SELECT k, v FROM tab_upd_tst ORDER BY k");
+is( $result, qq(k2|v1
+triggered|true), 'check replica trigger with specified list of affected columns applied on subscriber');
+
 $node_subscriber->stop('fast');
 $node_publisher->stop('fast');


signature.asc
Description: PGP signature

[HACKERS] More stats about skipped vacuums

2017-10-10 Thread Kyotaro HORIGUCHI

Hello.
Once in a while I am asked about table bloat. In most cases the
cause is long lasting transactions and vacuum canceling in some
cases. Whatever the case users don't have enough clues to why
they have bloated tables.

At the top of the annoyances list for users would be that they
cannot know whether autovacuum decided that a table needs vacuum
or not. I suppose that it could be shown in pg_stat_*_tables.

  n_mod_since_analyze | 2
+ vacuum_requred  | true
  last_vacuum | 2017-10-10 17:21:54.380805+09

If vacuum_required remains true for a certain time, it means that
vacuuming stopped halfway or someone is killing it repeatedly.
That status could be shown in the same view.

  n_mod_since_analyze | 2
+ vacuum_requred  | true
  last_vacuum | 2017-10-10 17:21:54.380805+09
  last_autovacuum | 2017-10-10 17:21:54.380805+09
+ last_autovacuum_status  | Killed by lock conflict

Where the "Killed by lock conflict" would be one of the followings.

  - Completed (oldest xmin = 8023)
  - May not be fully truncated (yielded at 1324 of 6447 expected)
  - Truncation skipped
  - Skipped by lock failure
  - Killed by lock conflict


If we want more formal expression, we can show the values in the
following shape. And adding some more values could be useful.

  n_mod_since_analyze  | 2
+ vacuum_requred   | true
+ last_vacuum_oldest_xid   | 8023
+ last_vacuum_left_to_truncate | 5123
+ last_vacuum_truncated| 387
  last_vacuum  | 2017-10-10 17:21:54.380805+09
  last_autovacuum  | 2017-10-10 17:21:54.380805+09
+ last_autovacuum_status   | Killed by lock conflict
...
  autovacuum_count | 128
+ incomplete_autovacuum_count  | 53

# The last one might be needless..

Where the "Killed by lock conflict" is one of the followings.

   - Completed
   - Truncation skipped
   - Partially truncated
   - Skipped
   - Killed by lock conflict

This seems enough to find the cause of a table bloat. The same
discussion could be applied to analyze but it might be the
another issue.

There may be a better way to indicate the vacuum soundness. Any
opinions and suggestions are welcome.

I'm going to make a patch to do the 'formal' one for the time
being.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-10-10 Thread Ashutosh Bapat

On Tue, Oct 10, 2017 at 3:40 PM, amul sul  wrote:
>>
>> natts represents the number of attributes, but for the hash partition bound 
>> we
>> are not dealing with the attribute so that I have used short-form of 
>> dimension,
>> thoughts?
>
> Okay, I think the dimension(dim) is also unfit here.  Any suggestions?
>


I think natts is ok, since we are dealing with the number of
attributes in the pack of datums; esp. when ndatums is already taken.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-10-10 Thread Ashutosh Bapat

On Tue, Oct 10, 2017 at 3:32 PM, amul sul  wrote:

>> +hash_part? true : key->parttypbyval[j],
>> +key->parttyplen[j]);
>> parttyplen is the length of partition key attribute, whereas what you want 
>> here
>> is the length of type of modulus and remainder. Is that correct? Probably we
>> need some special handling wherever parttyplen and parttypbyval is used e.g. 
>> in
>> call to partition_bounds_equal() from build_joinrel_partition_info().
>>
>
> Unless I am missing something, I don't think we should worry about parttyplen
> because in the datumCopy() when the datatype is pass-by-value then typelen
> is ignored.

That's true, but it's ugly, passing typbyvalue of one type and len of other.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-10-10 Thread amul sul

On Tue, Oct 10, 2017 at 3:32 PM, amul sul  wrote:
> On Mon, Oct 9, 2017 at 5:51 PM, Ashutosh Bapat
>  wrote:
>> On Mon, Oct 9, 2017 at 4:44 PM, amul sul  wrote:
>>
>
> Thanks Ashutosh for your review, please find my comment inline.
>
>>
>>> 0002 few changes in partition-wise join code to support
>>> hash-partitioned table as well & regression tests.
>>
>> +switch (key->strategy)
>> +{
>> +case PARTITION_STRATEGY_HASH:
>> +/*
>> + * Indexes array is same as the greatest modulus.
>> + * See partition_bounds_equal() for more explanation.
>> + */
>> +num_indexes = DatumGetInt32(src->datums[ndatums - 1][0]);
>> +break;
>> This logic is duplicated at multiple places.  I think it's time we 
>> consolidate
>> these changes in a function/macro and call it from the places where we have 
>> to
>> calculate number of indexes based on the information in partition descriptor.
>> Refactoring existing code might be a separate patch and then add hash
>> partitioning case in hash partitioning patch.
>>
>
> Make sense, added get_partition_bound_num_indexes() to get number of index
> elements in 0001 & get_greatest_modulus() as name suggested to get the 
> greatest
> modulus of the hash partition bound in 0002.
>
>> +intdim = hash_part? 2 : partnatts;
>> Call the variable as natts_per_datum or just natts?
>>
>
> natts represents the number of attributes, but for the hash partition bound we
> are not dealing with the attribute so that I have used short-form of 
> dimension,
> thoughts?

Okay, I think the dimension(dim) is also unfit here.  Any suggestions?

>
>> +hash_part? true : key->parttypbyval[j],
>> +key->parttyplen[j]);
>> parttyplen is the length of partition key attribute, whereas what you want 
>> here
>> is the length of type of modulus and remainder. Is that correct? Probably we
>> need some special handling wherever parttyplen and parttypbyval is used e.g. 
>> in
>> call to partition_bounds_equal() from build_joinrel_partition_info().
>>
>
> Unless I am missing something, I don't think we should worry about parttyplen
> because in the datumCopy() when the datatype is pass-by-value then typelen
> is ignored.
>
> Regards,
> Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-10-10 Thread amul sul

On Mon, Oct 9, 2017 at 5:51 PM, Ashutosh Bapat
 wrote:
> On Mon, Oct 9, 2017 at 4:44 PM, amul sul  wrote:
>

Thanks Ashutosh for your review, please find my comment inline.

>
>> 0002 few changes in partition-wise join code to support
>> hash-partitioned table as well & regression tests.
>
> +switch (key->strategy)
> +{
> +case PARTITION_STRATEGY_HASH:
> +/*
> + * Indexes array is same as the greatest modulus.
> + * See partition_bounds_equal() for more explanation.
> + */
> +num_indexes = DatumGetInt32(src->datums[ndatums - 1][0]);
> +break;
> This logic is duplicated at multiple places.  I think it's time we consolidate
> these changes in a function/macro and call it from the places where we have to
> calculate number of indexes based on the information in partition descriptor.
> Refactoring existing code might be a separate patch and then add hash
> partitioning case in hash partitioning patch.
>

Make sense, added get_partition_bound_num_indexes() to get number of index
elements in 0001 & get_greatest_modulus() as name suggested to get the greatest
modulus of the hash partition bound in 0002.

> +intdim = hash_part? 2 : partnatts;
> Call the variable as natts_per_datum or just natts?
>

natts represents the number of attributes, but for the hash partition bound we
are not dealing with the attribute so that I have used short-form of dimension,
thoughts?

> +hash_part? true : key->parttypbyval[j],
> +key->parttyplen[j]);
> parttyplen is the length of partition key attribute, whereas what you want 
> here
> is the length of type of modulus and remainder. Is that correct? Probably we
> need some special handling wherever parttyplen and parttypbyval is used e.g. 
> in
> call to partition_bounds_equal() from build_joinrel_partition_info().
>

Unless I am missing something, I don't think we should worry about parttyplen
because in the datumCopy() when the datatype is pass-by-value then typelen
is ignored.

Regards,
Amul


0001-partition_bounds_copy-code-refactoring-v1.patch
Description: Binary data


0002-hash-partitioning_another_design-v24.patch
Description: Binary data


0003-Enable-partition-wise-join-support-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-10 Thread Ashutosh Bapat

On Tue, Oct 10, 2017 at 1:31 PM, David Rowley
 wrote:
>
> I don't think there's any need to invent any new GUC. You could just
> divide cpu_tuple_cost by something.
>
> I did a quick benchmark on my laptop to see how much Append really
> costs, and with the standard costs the actual cost seems to be about
> cpu_tuple_cost / 2.4. So probably cpu_tuple_cost / 2 might be
> realistic. create_set_projection_path() does something similar and
> brincostestimate() does some similar magic and applies 0.1 *
> cpu_operator_cost to the total cost.
>
>
> # -- How does that compare to the cpu_tuple_cost?
> # select current_Setting('cpu_tuple_cost')::float8 / 0.00416630302337493743;
> ?column?
> 
>  2.400209476818
> (1 row)
>
> Maybe it's worth trying with different row counts to see if the
> additional cost is consistent, but it's probably not worth being too
> critical here.
>

This looks good to me. I think it should be a separate, yet very small patch.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-10-10 Thread Pavel Golub

Hello, Darafei.

You wrote:

DP> The following review has been posted through the commitfest application:
DP> make installcheck-world:  tested, passed
DP> Implements feature:   tested, passed
DP> Spec compliant:   tested, passed
DP> Documentation:tested, passed

DP> We're using Postgres with this patch for some time.

DP> In our use case we've got a quickly growing large table with events from 
our users.
DP> Table has a structure of (user_id, ts, ). Events are
DP> append only, each user generates events in small predictable time frame, 
mostly each second.
DP> From time to time we need to read this table in fashion of WHERE
DP> ts BETWEEN a AND b AND user_id=c.
DP> Such query leads to enormous amount of seeks, as records of each
DP> user are scattered across relation and there are no pages that
DP> contain two events from same user.

DP> To fight it, we created a btree index on (user_id, ts,
DP> ). Plan switched to index only scans, but heap fetches
DP> and execution times were still the same. 
DP> Manual 
DP> We noticed that autovacuum skips scanning the relation and freezing the 
Visibility Map.

DP> We started frequently performing VACUUM manually on the relation.
DP> This helped with freezing the Visibility Map.
DP> However, we found out that VACUUM makes a full scan over the index.
DP> As index does not fit into memory, this means that each run
DP> flushes all the disk caches and eats up Amazon IOPS credits. 

DP> With this patch behavior is much better for us - VACUUM finishes real quick.

DP> As a future improvement, a similar improvement for other index types will 
be useful.
DP> After it happens, I'm looking forward to autovacuum kicking in on
DP> append-only tables, to freeze the Visibility Map.

DP> The new status of this patch is: Ready for Committer


Seems  like,  we  may  also  going to hit it and it would be cool this
vacuum issue solved for next PG version.

-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Columnar storage support

2017-10-10 Thread Konstantin Knizhnik

Unfortunately C-Store doesn't allow to take all advantages of columnar
store: you still not be able to perform vector operation.s

C-Store allows to reduce size of data read from the disk because of
1. fetching only columns which are used in the query,
2. data compression.

It will lead to some benefits in query execution speed for cold data
(when it is not loaded in cache).
For warm data there is almost no difference (except very huge tables
which can not fit in memory).

But the main advantage of vertical data format - vector data processing
- is possible only with specialized executor.

There is prototype of vector executor for C-Store:
https://github.com/citusdata/postgres_vectorization_test
It provides 3-4x speedup of some queries, but it is really prototype and
research project, for from practical usage.

I have also developed two columnar store extensions for Postgres:
IMCS (In-Memory-Columnar-Store): https://github.com/knizhnik/imcs.git
VOPS (Vectorized Operations): https://github.com/postgrespro/vops.git

First one is more oriented on in-memory databases (although support
spilling data to the disk) and requires to use special functions to
manipulate with columnar data.
In this case columnar store is copy of main (horizontal) store (normal
Postgres tables).

VOPS is more recent work, allowing to use more or less normal SQL (using
foreign data wrapper and user defined types/operators).
In VOPS data is stored inside normal Postgres tables, but using vectors
(tiles) instead of scalars.

Both IMCS and VOPS provides 10-100 times speed improvement on queries
like Q1 in TPC-H (sequential scan with filtering and aggregation).
In queries involving joins there is almost no benefit comparing with
normal Postgres.

There is also columnar storage extension developed by Fujitsu:
https://www.postgresql.org/message-id/cajrrpgfac7wc9nk6ptty6yn-nn+hcy8xolah2doyhvg5d6h...@mail.gmail.com
But the published patch is only first step in this direction and it is
not possible neither to use it in practice, neither perform some
experiments measuring possible improvement of performance.

On 09.10.2017 23:06, Joshua D. Drake wrote:

On 10/09/2017 01:03 PM, legrand legrand wrote:
Is there a chance that pluggable storage permits creation of a
columnar rdbms

as monetDB in PostgreSQL ?
Thanks un advance for thé answer

The extension C-Store from Citus is probably what you are looking for.

--
Sent from:
http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-10 Thread David Rowley

On 10 October 2017 at 17:57, Ashutosh Bapat
 wrote:
> Append node just returns the result of ExecProcNode(). Charging
> cpu_tuple_cost may make it too expensive. In other places where we
> charge cpu_tuple_cost there's some processing done to the tuple like
> ExecStoreTuple() in SeqNext(). May be we need some other measure for
> Append's processing of the tuple.

I don't think there's any need to invent any new GUC. You could just
divide cpu_tuple_cost by something.

I did a quick benchmark on my laptop to see how much Append really
costs, and with the standard costs the actual cost seems to be about
cpu_tuple_cost / 2.4. So probably cpu_tuple_cost / 2 might be
realistic. create_set_projection_path() does something similar and
brincostestimate() does some similar magic and applies 0.1 *
cpu_operator_cost to the total cost.

# create table p (a int, b int);
# create table p1 () inherits (p);
# insert into p1 select generate_series(1,100);
# vacuum analyze p1;
# \q
$ echo "select count(*) from p1;" > p1.sql
$ echo "select count(*) from p;" > p.sql
$ pgbench -T 60 -f p1.sql -n

latency average = 58.567 ms

$ pgbench -T 60 -f p.sql -n
latency average = 72.984 ms

$ psql
psql (11devel)
Type "help" for help.

# -- check the cost of the plan.
# explain select count(*) from p1;
QUERY PLAN
--
 Aggregate  (cost=16925.00..16925.01 rows=1 width=8)
   ->  Seq Scan on p1  (cost=0.00..14425.00 rows=100 width=0)
(2 rows)

# -- selecting from the parent is the same due to zero Append cost.
# explain select count(*) from p;
   QUERY PLAN

 Aggregate  (cost=16925.00..16925.01 rows=1 width=8)
   ->  Append  (cost=0.00..14425.00 rows=101 width=0)
 ->  Seq Scan on p  (cost=0.00..0.00 rows=1 width=0)
 ->  Seq Scan on p1  (cost=0.00..14425.00 rows=100 width=0)
(4 rows)

# -- extrapolate the additional time taken for the Append scan and
work out what the planner
# -- should add to the plan's cost, then divide by the number of rows
in p1 to work out the
# -- tuple cost of pulling a row through the append.
# select (16925.01 * (72.984 / 58.567) - 16925.01)  / 100;
?column?

 0.00416630302337493743
(1 row)

# show cpu_tuple_cost;
 cpu_tuple_cost

 0.01
(1 row)

# -- How does that compare to the cpu_tuple_cost?
# select current_Setting('cpu_tuple_cost')::float8 / 0.00416630302337493743;
?column?

 2.400209476818
(1 row)

Maybe it's worth trying with different row counts to see if the
additional cost is consistent, but it's probably not worth being too
critical here.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [BUGS] 10.0: Logical replication doesn't execute BEFORE UPDATE OF trigger

2017-10-10 Thread Masahiko Sawada

On Tue, Oct 10, 2017 at 11:29 AM, Masahiko Sawada  wrote:
> On Mon, Oct 9, 2017 at 11:13 PM, Aleksander Alekseev
>  wrote:
>> Hi hackers,
>>
>> I've found something that looks like a bug.
>>
>> Steps to reproduce
>> --
>>
>> There are 3 instances of PostgreSQL 10.0 - inst1, inst2 and inst3. There
>> is a table `test` on every instance:
>>
>> ```
>> CREATE TABLE test(k TEXT PRIMARY KEY, v TEXT);
>> ```
>>
>> Both inst1 and inst2 have `allpub` publication:
>>
>> ```
>> CREATE PUBLICATION allpub FOR ALL TABLES;
>> ```
>>
>> ... and inst3 is subscribed for both publications:
>>
>> ```
>> CREATE SUBSCRIPTION allsub1
>>   CONNECTION 'host=10.128.0.16 user=eax dbname=eax'
>>   PUBLICATION allpub;
>>
>> CREATE SUBSCRIPTION allsub2
>>   CONNECTION 'host=10.128.0.26 user=eax dbname=eax'
>>   PUBLICATION allpub;
>> ```
>>
>> So basically it's two masters, one replica configuration. To resolve
>> insert/update conflicts I've created the following triggers on inst3:
>>
>> ```
>> CREATE OR REPLACE FUNCTION test_before_insert()
>> RETURNS trigger AS $$
>> BEGIN
>>
>> RAISE NOTICE 'test_before_insert trigger executed';
>>
>> IF EXISTS (SELECT 1 FROM test where k = new.k) THEN
>>RAISE NOTICE 'test_before_insert trigger - merging data';
>>UPDATE test SET v = v || ';' || new.v WHERE k = new.k;
>>RETURN NULL;
>> END IF;
>>
>> RETURN NEW;
>>
>> END
>> $$ LANGUAGE plpgsql;
>>
>>
>> CREATE OR REPLACE FUNCTION test_before_update()
>> RETURNS trigger AS $$
>> BEGIN
>>
>> RAISE NOTICE 'test_before_update trigger executed';
>>
>> IF EXISTS (SELECT 1 FROM test where k = new.k) THEN
>>RAISE NOTICE 'test_before_update trigger - merging data';
>>UPDATE test SET v = v || ';' || new.v WHERE k = new.k;
>>DELETE FROM test where k = old.k;
>>RETURN NULL;
>> END IF;
>>
>> RETURN NEW;
>>
>> END
>> $$ LANGUAGE plpgsql;
>>
>> create trigger test_before_insert_trigger
>> before insert on test
>> for each row execute procedure test_before_insert();
>>
>> create trigger test_before_update_trigger
>> before update of k on test
>> for each row execute procedure test_before_update();
>>
>> ALTER TABLE test ENABLE REPLICA TRIGGER test_before_insert_trigger;
>> ALTER TABLE test ENABLE REPLICA TRIGGER test_before_update_trigger;
>> ```
>>
>> The INSERT trigger works just as expected, however the UPDATE trigger
>> doesn't. On inst1:
>>
>> ```
>> insert into test values ('k1', 'v1');
>> ```
>>
>> In inst2:
>>
>> ```
>> insert into test values ('k4', 'v4');
>> update test set k = 'k1' where k = 'k4';
>> ```
>>
>> Now on inst3:
>>
>> ```
>> select * from test;
>> ```
>>
>> Expected result
>> ---
>>
>> Rows are merged to:
>>
>> ```
>>  k  |   v
>> +---
>>  k1 | v1;v4
>> ```
>>
>> This is what would happen if all insert/update queries would have been
>> executed on one instance.
>>
>> Actual result
>> -
>>
>> Replication fails, log contains:
>>
>> ```
>> [3227] ERROR:  duplicate key value violates unique constraint "test_pkey"
>> [3227] DETAIL:  Key (k)=(k1) already exists.
>> [3176] LOG:  worker process: logical replication worker for subscription 
>> 16402 (PID 3227) exited with exit code 1
>> ```
>>
>> What do you think?
>>
>
> I think the cause of this issue is that the apply worker doesn't set
> updatedCols of RangeTblEntry when applying updates. So TriggerEnabled
> always ends up with false. I'll make a patch and submit.
>

Attached patch store the updated columns bitmap set to RangeTblEntry.
In my environment this bug seems to be fixed by the patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


set_updated_columns.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Omission in GRANT documentation

2017-10-10 Thread Laurenz Albe

grant.sgml says that

   the default privileges granted to PUBLIC are as follows: CONNECT and
 
   CREATE TEMP TABLE for databases; EXECUTE privilege for functions;
   and USAGE privilege for languages.

But types also have the USAGE privilege for PUBLIC by default:

test=> CREATE TYPE bug_status AS ENUM ('new', 'open', 'closed');
CREATE TYPE
test=> GRANT USAGE ON TYPE bug_status TO duff;
GRANT
test=> REVOKE USAGE ON TYPE bug_status FROM duff;
REVOKE
test=> \dT+ bug_status
 List of data types
 Schema |Name| ... |  Owner  | Access privileges | ...
++-+-+---+-
 public | bug_status |     | laurenz | =U/laurenz   +| 
|| | | laurenz=U/laurenz | 
(1 row)

Hence I propose the attached documentation patch.

Yours,
Laurenz AlbeFrom e1213e1e91cd0c45fcca8df492f1017f2eacc4bc Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Tue, 10 Oct 2017 09:21:36 +0200
Subject: [PATCH] Fix documentation of default privileges for types

Document that PUBLIC has USAGE privileges on newly created types.
---
 doc/src/sgml/ref/grant.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/grant.sgml b/doc/src/sgml/ref/grant.sgml
index c63252c..8936963 100644
--- a/doc/src/sgml/ref/grant.sgml
+++ b/doc/src/sgml/ref/grant.sgml
@@ -161,7 +161,7 @@ GRANT role_name [, ...] TO PUBLIC are as follows:
CONNECT and CREATE TEMP TABLE for
databases; EXECUTE privilege for functions; and
-   USAGE privilege for languages.
+   USAGE privilege for languages and types.
The object owner can, of course, REVOKE
both default and  expressly granted privileges. (For maximum
security, issue the REVOKE in the same transaction that
-- 
2.9.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

42 matches

Mail list logo