RE: speeding up planning with partitions

2019-02-07 Thread Tsunakawa, Takayuki
From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
> Sent: Friday, February 08, 2019 3:52 PM
> Hmm, I had rebased v20 over HEAD as of yesterday evening.  CF bot seemed
> to be happy with it too:
> 
> http://cfbot.cputube.org/amit-langote.html
> 
> Also, I rebased the patches again on the latest HEAD as of this morning
> and there were no issues.

There seem to have been some modifications between the latest snapshot tarball 
and HEAD.  I've just managed to apply your v20 to HEAD.  Thanks.


Regards
Takayuki Tsunakawa





Re: libpq compression

2019-02-07 Thread Konstantin Knizhnik




On 08.02.2019 10:14, Andres Freund wrote:

Hi,

On 2018-03-30 15:53:39 +0300, Konstantin Knizhnik wrote:

Taken in account that vulnerability was found in SSL compression and so
SSLComppression is considered to be deprecated and insecure
(http://www.postgresql-archive.org/disable-SSL-compression-td6010072.html),
it will be nice to have some alternative mechanism of reducing libpq
traffic.

I have implemented some prototype implementation of it (patch is attached).
To use zstd compression, Postgres should be configured with --with-zstd.
Otherwise compression will use zlib unless it is disabled by --without-zlib
option.
I have added compression=on/off parameter to connection string and -Z option
to psql and pgbench utilities.
Below are some results:

I think compression is pretty useful, and I'm not convinced that the
threat model underlying the attacks on SSL really apply to postgres. But
having said that, have you done any analysis of whether your
implementation has the same issues?


Sorry, I am not an expert in security area, so I cannot perform analysis 
whether using compression in SSL protocol
is vulnerable and is it really applicable to libpq communication between 
Postgres client and server.
The main idea of compression implementation at libpq level was not to 
solve this possible vulnerability
(I am also not convinced that such kind of attack is applicable to 
postgres client-server communication)
but reduce traffic without requirement to use SSL (which may not be 
possible or convenient because of many other reasons
not only related with potential vulnerability). Also I believe (although 
I have not performed this test yet)
that zstd compression is much more efficient than one used in SSL both 
in speed and compression ratio.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: use Getopt::Long for catalog scripts

2019-02-07 Thread John Naylor
On Thu, Feb 7, 2019 at 6:53 PM David Fetter  wrote:
> [some style suggestions]

I've included these in v2, and also some additional tweaks for consistency.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 0bf2b3ffdaa16d85856074dda8e15c09fc99d0fe Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Thu, 7 Feb 2019 20:10:50 +0100
Subject: [PATCH v2] Use Getopt::Long for catalog scripts

Replace hand-rolled option parsing with the Getopt module. This is
shorter and easier to read. In passing, make some cosmetic adjustments
for consistency.
---
 src/backend/catalog/Makefile |  2 +-
 src/backend/catalog/genbki.pl| 49 +---
 src/backend/utils/Gen_fmgrtab.pl | 38 -
 src/backend/utils/Makefile   |  2 +-
 4 files changed, 28 insertions(+), 63 deletions(-)

diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index d9f92ba701..b06ee49ca8 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -89,7 +89,7 @@ generated-header-symlinks: $(top_builddir)/src/include/catalog/header-stamp
 # version number when it changes.
 bki-stamp: genbki.pl Catalog.pm $(POSTGRES_BKI_SRCS) $(POSTGRES_BKI_DATA) $(top_srcdir)/configure.in
 	$(PERL) -I $(catalogdir) $< \
-		-I $(top_srcdir)/src/include/ --set-version=$(MAJORVERSION) \
+		--include-path=$(top_srcdir)/src/include/ --set-version=$(MAJORVERSION) \
 		$(POSTGRES_BKI_SRCS)
 	touch $@
 
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index be81094ffb..4935e00fb2 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -16,6 +16,7 @@
 
 use strict;
 use warnings;
+use Getopt::Long;
 
 use File::Basename;
 use File::Spec;
@@ -23,43 +24,21 @@ BEGIN  { use lib File::Spec->rel2abs(dirname(__FILE__)); }
 
 use Catalog;
 
-my @input_files;
 my $output_path = '';
 my $major_version;
 my $include_path;
 
-# Process command line switches.
-while (@ARGV)
-{
-	my $arg = shift @ARGV;
-	if ($arg !~ /^-/)
-	{
-		push @input_files, $arg;
-	}
-	elsif ($arg =~ /^-I/)
-	{
-		$include_path = length($arg) > 2 ? substr($arg, 2) : shift @ARGV;
-	}
-	elsif ($arg =~ /^-o/)
-	{
-		$output_path = length($arg) > 2 ? substr($arg, 2) : shift @ARGV;
-	}
-	elsif ($arg =~ /^--set-version=(.*)$/)
-	{
-		$major_version = $1;
-		die "Invalid version string.\n"
-		  if !($major_version =~ /^\d+$/);
-	}
-	else
-	{
-		usage();
-	}
-}
+GetOptions(
+	'output:s'   => \$output_path,
+	'set-version:s'  => \$major_version,
+	'include-path:s' => \$include_path) || usage();
 
 # Sanity check arguments.
-die "No input files.\n" if !@input_files;
-die "--set-version must be specified.\n" if !defined $major_version;
-die "-I, the header include path, must be specified.\n" if !$include_path;
+die "No input files.\n" unless @ARGV;
+die "--set-version must be specified.\n" unless $major_version;
+die "Invalid version string: $major_version\n"
+  unless $major_version =~ /^\d+$/;
+die "--include-path must be specified.\n" unless $include_path;
 
 # Make sure paths end with a slash.
 if ($output_path ne '' && substr($output_path, -1) ne '/')
@@ -79,7 +58,7 @@ my @toast_decls;
 my @index_decls;
 my %oidcounts;
 
-foreach my $header (@input_files)
+foreach my $header (@ARGV)
 {
 	$header =~ /(.+)\.h$/
 	  or die "Input files need to be header files.\n";
@@ -917,12 +896,12 @@ sub form_pg_type_symbol
 sub usage
 {
 	die <] [--include-path/-i ] header...
 
 Options:
--I   include path
--o   output path
+--output Output directory (default '.')
 --set-versionPostgreSQL version number for initdb cross-check
+--include-path   Include path in source tree
 
 genbki.pl generates BKI files and symbol definition
 headers from specially formatted header files and .dat
diff --git a/src/backend/utils/Gen_fmgrtab.pl b/src/backend/utils/Gen_fmgrtab.pl
index f17a78ebcd..2ffacae666 100644
--- a/src/backend/utils/Gen_fmgrtab.pl
+++ b/src/backend/utils/Gen_fmgrtab.pl
@@ -18,32 +18,14 @@ use Catalog;
 
 use strict;
 use warnings;
+use Getopt::Long;
 
-# Collect arguments
-my @input_files;
 my $output_path = '';
 my $include_path;
 
-while (@ARGV)
-{
-	my $arg = shift @ARGV;
-	if ($arg !~ /^-/)
-	{
-		push @input_files, $arg;
-	}
-	elsif ($arg =~ /^-o/)
-	{
-		$output_path = length($arg) > 2 ? substr($arg, 2) : shift @ARGV;
-	}
-	elsif ($arg =~ /^-I/)
-	{
-		$include_path = length($arg) > 2 ? substr($arg, 2) : shift @ARGV;
-	}
-	else
-	{
-		usage();
-	}
-}
+GetOptions(
+	'output:s'   => \$output_path,
+	'include-path:s' => \$include_path) || usage();
 
 # Make sure output_path ends in a slash.
 if ($output_path ne '' && substr($output_path, -1) ne '/')
@@ -52,8 +34,8 @@ if ($output_path ne '' && substr($output_path, -1) ne '/')
 }
 
 # Sanity check arguments.
-die "No input files.\n"   if !@input_files;
-die "No include path; you must 

Re: ON SELECT rule on a table without columns

2019-02-07 Thread Andres Freund
Hi,

On 2019-02-08 12:18:32 +0530, Ashutosh Sharma wrote:
> When "ON SELECT" rule is created on a table without columns, it
> successfully converts a table into the view. However, when the same is
> done using CREATE VIEW command, it fails with an error saying: "view
> must have at least one column". Here is what I'm trying to say:
> 
> -- create table t1 without columns
> create table t1();
> 
> -- create table t2 without columns
> create table t2();
> 
> -- create ON SELECT rule on t1 - this would convert t1 from table to view
> create rule "_RETURN" as on select to t1 do instead select * from t2;
> 
> -- now check the definition of t1
> \d t1
> 
> postgres=# \d+ t1
> View "public.t1"
>  Column | Type | Collation | Nullable | Default | Storage | Description
> +--+---+--+-+-+-
> View definition:
>  SELECT
>FROM t2;
> 
> The output of "\d+ t1" shows the definition of converted view t1 which
> doesn't have any columns in the select query.
> 
> Now, when i try creating another view with the same definition using
> CREATE VIEW command, it fails with the error -> ERROR:  view must have
> at least one column. See below
> 
> postgres=# create view v1 as select from t2;
> ERROR:  view must have at least one column
> 
> OR,
> 
> postgres=# create view v1 as select * from t2;
> ERROR:  view must have at least one column
> 
> Isn't that a bug in create rule command or am i missing something here ?
> 
> If it is a bug, then, attached is the patch that fixes it.
> 
> -- 
> With Regards,
> Ashutosh Sharma
> EnterpriseDB:http://www.enterprisedb.com

> diff --git a/src/backend/rewrite/rewriteDefine.c 
> b/src/backend/rewrite/rewriteDefine.c
> index 3496e6f..cb51955 100644
> --- a/src/backend/rewrite/rewriteDefine.c
> +++ b/src/backend/rewrite/rewriteDefine.c
> @@ -473,6 +473,11 @@ DefineQueryRewrite(const char *rulename,
>errmsg("could not convert 
> table \"%s\" to a view because it has row security enabled",
>   
> RelationGetRelationName(event_relation;
>  
> + if (event_relation->rd_rel->relnatts == 0)
> + ereport(ERROR,
> + 
> (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> +  errmsg("view must have at 
> least one column")));
> +
>   if (relation_has_policies(event_relation))
>   ereport(ERROR,
>   
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

Maybe I'm missing something, but why do we want to forbid this? Given
that we these days allows selects without columns, I see no reason to
require this for views.  The view error check long predates allowing
SELECT and CREATE TABLE without columns. I think it's existence is just
an oversight.  Tom, you did relaxed the permissive cases, any opinion?

Greetings,

Andres Freund



Re: libpq compression

2019-02-07 Thread Andres Freund
Hi,

On 2018-03-30 15:53:39 +0300, Konstantin Knizhnik wrote:
> Taken in account that vulnerability was found in SSL compression and so
> SSLComppression is considered to be deprecated and insecure
> (http://www.postgresql-archive.org/disable-SSL-compression-td6010072.html),
> it will be nice to have some alternative mechanism of reducing libpq
> traffic.
> 
> I have implemented some prototype implementation of it (patch is attached).
> To use zstd compression, Postgres should be configured with --with-zstd.
> Otherwise compression will use zlib unless it is disabled by --without-zlib
> option.
> I have added compression=on/off parameter to connection string and -Z option
> to psql and pgbench utilities.
> Below are some results:

I think compression is pretty useful, and I'm not convinced that the
threat model underlying the attacks on SSL really apply to postgres. But
having said that, have you done any analysis of whether your
implementation has the same issues?

Greetings,

Andres Freund



Re: ON SELECT rule on a table without columns

2019-02-07 Thread Rushabh Lathia
On Fri, Feb 8, 2019 at 12:18 PM Ashutosh Sharma 
wrote:

> Hi All,
>
> When "ON SELECT" rule is created on a table without columns, it
> successfully converts a table into the view. However, when the same is
> done using CREATE VIEW command, it fails with an error saying: "view
> must have at least one column". Here is what I'm trying to say:
>
> -- create table t1 without columns
> create table t1();
>
> -- create table t2 without columns
> create table t2();
>
> -- create ON SELECT rule on t1 - this would convert t1 from table to view
> create rule "_RETURN" as on select to t1 do instead select * from t2;
>
> -- now check the definition of t1
> \d t1
>
> postgres=# \d+ t1
> View "public.t1"
>  Column | Type | Collation | Nullable | Default | Storage | Description
> +--+---+--+-+-+-
> View definition:
>  SELECT
>FROM t2;
>
> The output of "\d+ t1" shows the definition of converted view t1 which
> doesn't have any columns in the select query.
>
> Now, when i try creating another view with the same definition using
> CREATE VIEW command, it fails with the error -> ERROR:  view must have
> at least one column. See below
>
> postgres=# create view v1 as select from t2;
> ERROR:  view must have at least one column
>
> OR,
>
> postgres=# create view v1 as select * from t2;
> ERROR:  view must have at least one column
>
> Isn't that a bug in create rule command or am i missing something here ?
>

Yes, it's looks like a bug to me.

>
> If it is a bug, then, attached is the patch that fixes it.
>

I had quick glance to the patch - here are few commits:

1)

+if (event_relation->rd_rel->relnatts == 0)

Can't use direct relnatts - as need to consider attisdropped.

2)
I think you may like to change the error message to be in-line with
the other error message in the similar code area.

May be something like:
"could not convert table \"%s\" to a view because table does not have any
column"


Regards,
Rushabh Lathia
www.EnterpriseDB.com


Re: libpq compression

2019-02-07 Thread Andres Freund
Hi,

On 2019-02-08 07:01:01 +, Iwata, Aya wrote:
> > I still not sure whether it is good idea to make it possible to user to
> > explicitly specify compression algorithm.
> > Right now used streaming compression algorithm is hardcoded and depends on
> > --use-zstd ort --use-zlib configuration options.
> > If client and server were built with the same options, then they are able
> > to use compression.
> I understand that compression algorithm is hardcoded in your proposal.
> However given the possibility of future implementation, I think 
> it would be better for it to have a flexibility to choose compression library.

Agreed. I think that's a hard requirement. Making explicitly not forward
compatible interfaces is a bad idea.

Greetings,

Andres Freund



RE: libpq compression

2019-02-07 Thread Iwata, Aya
Hi, 

I am sorry for my late reply.

> I fixed all issues you have reported except using list of supported 
> compression
> algorithms.
Sure. I confirmed that.

> It will require extra round of communication between client and server to
> make a decision about used compression algorithm.
In beginning of this thread, Robbie Harwood said  that no extra communication 
needed.
I think so, too.

> I still not sure whether it is good idea to make it possible to user to
> explicitly specify compression algorithm.
> Right now used streaming compression algorithm is hardcoded and depends on
> --use-zstd ort --use-zlib configuration options.
> If client and server were built with the same options, then they are able
> to use compression.
I understand that compression algorithm is hardcoded in your proposal.
However given the possibility of future implementation, I think 
it would be better for it to have a flexibility to choose compression library.

src/backend/libpq/pqcomm.c :
In current Postgres source code, pq_recvbuf() calls secure_read() 
and pq_getbyte_if_available() also calls secure_read().
It means these functions are on the same level. 
However in your change, pq_getbyte_if_available() calls pq_recvbuf(), 
and  pq_recvbuf() calls secure_read(). The level of these functions is 
different.

I think the purpose of pq_getbyte_if_available() is to get a character if it 
exists and 
the purpose of pq_recvbuf() is to acquire data up to the expected length.
In your change, pq_getbyte_if_available() may have to do unnecessary process 
waiting or something.

So how about changing your code like this?
The part that checks whether it is compressed is implemented as a #define 
macro(like fe_misc.c). And pq_recvbuf() and pq_getbyte_if_available() modify 
little, like this;

-   r = secure_read(MyProcPort, PqRecvBuffer + PqRecvLength,
-   PQ_RECV_BUFFER_SIZE - 
PqRecvLength);
+r = SOME_DEFINE_NAME_();

configure:
Adding following message to the top of zlib in configure
```
{$as_echo "$as_me:${as_lineno-$LINENO}:checking whethere to build with zstd 
support">&5
$as_echo_n "checking whether to build with zstd suppor... ">&6;}
```

Regards,
Aya Iwata


Re: speeding up planning with partitions

2019-02-07 Thread Amit Langote
Tsunakawa-san,

On 2019/02/08 15:40, Tsunakawa, Takayuki wrote:
> Hi Amit,
> 
> I'm afraid v20-0001 fails to apply to the current HEAD (precisely, 
> ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.gz).  Could you 
> check it?

Hmm, I had rebased v20 over HEAD as of yesterday evening.  CF bot seemed
to be happy with it too:

http://cfbot.cputube.org/amit-langote.html

Also, I rebased the patches again on the latest HEAD as of this morning
and there were no issues.

Thanks,
Amit




ON SELECT rule on a table without columns

2019-02-07 Thread Ashutosh Sharma
Hi All,

When "ON SELECT" rule is created on a table without columns, it
successfully converts a table into the view. However, when the same is
done using CREATE VIEW command, it fails with an error saying: "view
must have at least one column". Here is what I'm trying to say:

-- create table t1 without columns
create table t1();

-- create table t2 without columns
create table t2();

-- create ON SELECT rule on t1 - this would convert t1 from table to view
create rule "_RETURN" as on select to t1 do instead select * from t2;

-- now check the definition of t1
\d t1

postgres=# \d+ t1
View "public.t1"
 Column | Type | Collation | Nullable | Default | Storage | Description
+--+---+--+-+-+-
View definition:
 SELECT
   FROM t2;

The output of "\d+ t1" shows the definition of converted view t1 which
doesn't have any columns in the select query.

Now, when i try creating another view with the same definition using
CREATE VIEW command, it fails with the error -> ERROR:  view must have
at least one column. See below

postgres=# create view v1 as select from t2;
ERROR:  view must have at least one column

OR,

postgres=# create view v1 as select * from t2;
ERROR:  view must have at least one column

Isn't that a bug in create rule command or am i missing something here ?

If it is a bug, then, attached is the patch that fixes it.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index 3496e6f..cb51955 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -473,6 +473,11 @@ DefineQueryRewrite(const char *rulename,
 		 errmsg("could not convert table \"%s\" to a view because it has row security enabled",
 RelationGetRelationName(event_relation;
 
+			if (event_relation->rd_rel->relnatts == 0)
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+		 errmsg("view must have at least one column")));
+
 			if (relation_has_policies(event_relation))
 ereport(ERROR,
 		(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),


RE: speeding up planning with partitions

2019-02-07 Thread Tsunakawa, Takayuki
Hi Amit,

I'm afraid v20-0001 fails to apply to the current HEAD (precisely, 
ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.gz).  Could you 
check it?

I'm trying to reproduce what Imai-san hit with my patch.  His environment is 
master@Jan/28 + v18 of your patches.  When he added my patch there, CREATE 
TABLE crashed.

On the other hand, the current master + my patch succeeds.  So, I wanted to 
test with the current HEAD + v20 of your patch + my patch.


Regards
Takayuki Tsunakawa







Re: Allow some recovery parameters to be changed with reload

2019-02-07 Thread Andres Freund
Hi,

On 2019-02-08 09:19:31 +0900, Michael Paquier wrote:
> On Thu, Feb 07, 2019 at 11:06:27PM +0100, Peter Eisentraut wrote:
> > Probably right.  I figured it would be useful to see what the outcome is
> > with primary_conninfo, so they can be treated similarly.
> 
> The interactions with waiting for WAL to be available and the WAL
> receiver stresses me a bit for restore_command, as you could finish
> with the startup process switching to use restore_command with a WAL
> receiver still working behind and overwriting partially the recovered
> segment, which could lead to corruption.  We should be *very* careful
> about that.

I'm not clear on the precise mechanics you're imagining here, could you
expand a bit? We kill the walreceiver when switching from receiver to
restore command, and wait for it to acknowledge that, no?
C.F. ShutdownWalRcv() call in the lastSourceFailed branch of
WaitForWALToBecomeAvailable().

Greetings,

Andres Freund



Re: Synchronize with imath upstream

2019-02-07 Thread Tom Lane
Noah Misch  writes:
> On Thu, Feb 07, 2019 at 10:12:05AM -0500, Tom Lane wrote:
>> I have no particular opinion on whether pgindent should be part of the
>> mix for imath, but I do strongly recommend setting up and documenting a
>> reproducible import process, as I did for src/timezone.

> Good idea.  I've modified the imath.c header comment to take the form of an
> import process; see attached imath1.29-pgedits-v2.patch.

Maybe easier to keep the instructions in a separate README file?
(I don't see a need to put a PG copyright in this file, when the
mods from upstream are this small.)

Looks good otherwise.

regards, tom lane



RE: speeding up planning with partitions

2019-02-07 Thread Imai, Yoshikazu
Amit-san,

On Thu, Feb 7, 2019 at 10:22 AM, Amit Langote wrote:
> Rebased over bdd9a99aac.

I did code review of 0001 and I have some suggestions. Could you check them?

1.
0001: line 418
+* add_inherit_target_child_root() would've added only those 
that are

add_inherit_target_child_root() doesn't exist now, so an above comment needs to 
be modified.

2.
0001: line 508-510

In set_inherit_target_rel_pathlists():
+   /* Nothing to do if all the children were excluded. */
+   if (IS_DUMMY_REL(rel))
+   return;

These codes aren't needed or can be replaced by Assert because 
set_inherit_target_rel_pathlists is only called from set_rel_pathlist which 
excutes IS_DUMMY_REL(rel) before calling set_inherit_target_rel_pathlists, as 
follows.

set_rel_pathlist(...)
{
...
if (IS_DUMMY_REL(rel))
{
/* We already proved the relation empty, so nothing more to do */
}
else if (rte->inh)
{
/*
 * If it's a target relation, set the pathlists of children instead.
 * Otherwise, we'll append the outputs of children, so process it as
 * an "append relation".
 */
if (root->inherited_update && root->parse->resultRelation == rti) 
{
inherited_update = true;
set_inherit_target_rel_pathlists(root, rel, rti, rte);

3.
0001: line 1919-1920

-   case CONSTRAINT_EXCLUSION_ON:
-   break;  /* always try to 
exclude */

CONSTRAINT_EXCLUSION_ON is no longer used, so should we remove it also from guc 
parameters?

4.
0001:

Can we remove enum InheritanceKind which is no longer used?


I also see the patch from a perspective of separating codes from 0001 which are 
not essential of Overhaul inheritance update/delete planning. Although almost 
all of codes are related each other, but I found below two things can be moved 
to another patch.

---
0001: line 550-608

This section seems to be just refinement of set_append_rel_size().
So can we separate this from 0001 to another patch?

---
0001: line 812-841, 940-947, 1525-1536, 1938-1947 

These codes are related to removement of InheritanceKind from 
relation_excluded_by_constraints(), so I think it is something like cleaning of 
unneeded codes. Can we separate this to patch as 
some-code-clearnups-of-0001.patch? Of course, we can do that only if removing 
of these codes from 0001 would not bother success of "make check" of 0001.
I also think that what I pointed out at above 3. and 4. can also be included in 
some-code-clearnups-of-0001.patch.

What do you think?


--
Yoshikazu Imai



Re: Synchronize with imath upstream

2019-02-07 Thread Noah Misch
On Thu, Feb 07, 2019 at 10:12:05AM -0500, Tom Lane wrote:
> Noah Misch  writes:
> > On Wed, Feb 06, 2019 at 10:15:24AM -0500, Tom Lane wrote:
> >> I don't object to keeping imported code in a form that matches upstream
> >> as best we can.  (Should we also exclude such files from pgindent'ing?)
> 
> > I think it depends on how much time one spends merging upstream changes 
> > versus
> > making PostgreSQL-specific edits.  For IMath, both amounts are too small to
> > get excited about.  Does pgindent materially complicate src/timezone merges?
> 
> My practice with src/timezone is to pgindent the upstream code and then
> diff it; given that extra step, it's not really any more complex (and
> maybe less so, as this hides minor whitespace changes for instance).

Let's continue to pgindent src/timezone and IMath, then.

> There are some other deltas to worry about as well, see
> src/timezone/README.
> 
> I have no particular opinion on whether pgindent should be part of the
> mix for imath, but I do strongly recommend setting up and documenting a
> reproducible import process, as I did for src/timezone.

Good idea.  I've modified the imath.c header comment to take the form of an
import process; see attached imath1.29-pgedits-v2.patch.
diff --git a/contrib/pgcrypto/imath.c b/contrib/pgcrypto/imath.c
index b94a51b..96be668 100644
--- a/contrib/pgcrypto/imath.c
+++ b/contrib/pgcrypto/imath.c
@@ -1,90 +1,69 @@
-/* imath version 1.3 */
 /*
-  Name:imath.c
-  Purpose: Arbitrary precision integer arithmetic routines.
-  Author:  M. J. Fromberger 
-  Info:Id: imath.c 21 2006-04-02 18:58:36Z sting
-
-  Copyright (C) 2002 Michael J. Fromberger, All Rights Reserved.
-
-  Permission is hereby granted, free of charge, to any person
-  obtaining a copy of this software and associated documentation files
-  (the "Software"), to deal in the Software without restriction,
-  including without limitation the rights to use, copy, modify, merge,
-  publish, distribute, sublicense, and/or sell copies of the Software,
-  and to permit persons to whom the Software is furnished to do so,
-  subject to the following conditions:
-
-  The above copyright notice and this permission notice shall be
-  included in all copies or substantial portions of the Software.
-
-  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
-  EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
-  MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-  NONINFRINGEMENT.  IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
-  BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
-  ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
-  CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+  Name: imath.c
+  Purpose:  Arbitrary precision integer arithmetic routines.
+  Author:   M. J. Fromberger
+
+  Copyright (C) 2002-2007 Michael J. Fromberger, All Rights Reserved.
+
+  Permission is hereby granted, free of charge, to any person obtaining a copy
+  of this software and associated documentation files (the "Software"), to deal
+  in the Software without restriction, including without limitation the rights
+  to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+  copies of the Software, and to permit persons to whom the Software is
+  furnished to do so, subject to the following conditions:
+
+  The above copyright notice and this permission notice shall be included in
+  all copies or substantial portions of the Software.
+
+  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+  IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+  FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+  AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+  LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+  OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
   SOFTWARE.
  */
-/* contrib/pgcrypto/imath.c */
 
-#include "postgres.h"
-#include "px.h"
 #include "imath.h"
 
-#undef assert
-#define assert(TEST) Assert(TEST)
-#define TRACEABLE_CLAMP 0
-#define TRACEABLE_FREE 0
-
-/* {{{ Constants */
+#include 
+#include 
+#include 
+#include 
 
 const mp_result MP_OK = 0; /* no error, all is well  */
-const mp_result MP_FALSE = 0;  /* boolean false  */
-const mp_result MP_TRUE = -1;  /* boolean true   */
-const mp_result MP_MEMORY = -2; /* out of memory */
+const mp_result MP_FALSE = 0;  /* boolean false  */
+const mp_result MP_TRUE = -1;  /* boolean true   */
+const mp_result MP_MEMORY = -2; /* out of memory  */
 const mp_result MP_RANGE = -3; /* argument out of range  */
-const mp_result MP_UNDEF = -4; /* result undefined   */
-const mp_result MP_TRUNC = -5; /* output truncated  

Re: Problem while updating a foreign table pointing to a partitioned table on foreign server

2019-02-07 Thread Etsuro Fujita

(2019/02/08 10:09), Michael Paquier wrote:

On Thu, Feb 07, 2019 at 09:55:18PM +0900, Etsuro Fujita wrote:

I 100% agree with Tom, and actually, I tried to address his comments, but I
haven't come up with a clear solution for that yet.  I really want to
address this, but I won't have much time to work on that at least until
after this development cycle, so what I'm thinking is to mark this as
Returned with feedback, or if possible, to move this to the 2019-07 CF.


Simply marking it as returned with feedback does not seem adapted to
me as we may lose track of it.  Moving it to the future CF would make
more sense in my opinion.


OK, I have moved this to the 2019-07 CF, keeping Waiting on Author.

Best regards,
Etsuro Fujita




RE: Commit Fest 2019-01 is now closed

2019-02-07 Thread Tsunakawa, Takayuki
From: Gavin Flower [mailto:gavinflo...@archidevsys.co.nz]
> Remember also that about 1 in 12 men are colour blind.

Thank you for referring to it.  I'm one of them -- I'm visually impaired and 
use screen reader software.  I didn't notice any change in the 2019-03 CF page. 
 I thought "Ver" is the new column.


Regards
Takayuki Tsunaakwa



Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-07 Thread Peter Geoghegan
On Thu, Feb 7, 2019 at 7:35 PM Tom Lane  wrote:
> I have a working patch now, but I think I'm too tired to explain it,
> so I'm going to post it tomorrow instead.  It's a big enough change
> that it might be advisable for someone to review it --- are you
> interested?

I'd be happy to review it.

-- 
Peter Geoghegan



RE: speeding up planning with partitions

2019-02-07 Thread Tsunakawa, Takayuki
Hi Amit,

From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
> Maybe I chose the the subject line of this thread poorly when I began
> working on it.  It should perhaps have been something like "speeding up
> planning of point-lookup queries with many partitions" or something like
> that.  There are important use cases beyond point lookup even with
> partitioned tables (or maybe more so with partitioned tables), but perhaps
> unsurprisingly, the bottlenecks in those cases are not *just* in the
> planner.

No, it's simply my fault.  I wasn't aware of the CF Bot and the CF entry page 
that act on the latest submitted patch.  I'm relieved to see you have submitted 
the revised patch.


Regards
Takayuki Tsunakawa



Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-07 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Feb 6, 2019 at 9:15 PM Tom Lane  wrote:
>> I have a mostly-working patch along these lines that I hope to
>> finish up and post tomorrow.

> Do you think that you'll end up pushing the HEAD-only fix shortly?

I have a working patch now, but I think I'm too tired to explain it,
so I'm going to post it tomorrow instead.  It's a big enough change
that it might be advisable for someone to review it --- are you
interested?

regards, tom lane



Re: ALTER TABLE on system catalogs

2019-02-07 Thread Kyotaro HORIGUCHI
At Fri, 08 Feb 2019 12:03:31 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20190208.120331.167280496.horiguchi.kyot...@lab.ntt.co.jp>
> By the way, I'm confused to see that attributes that don't want
> to go external are marked as 'x' in system catalogs. Currently
> (putting aside its necessity) the following operation ends with
> successful attaching a new TOAST relation, which we really don't
> want.
> 
> ALTER TABLE pg_attribute ALTER COLUMN attrelid SET STORAGE plain;
> 
> Might be silly, we may need another storage class, say,
> Compression, which means try compress but don't go external.
> 
> The attached patch does that.
> 
> - All varlen fields of pg_class and pg_attribute are marked as
>   'c'.  (Catalog.pm, genbki.pl, genbki.h, pg_attribute/class.h)
> 
> - Try compress but don't try external for 'c' storage.
>   (tuptoaster.c, toasting.c)
> 
> 
> We could remove toast relation when no toastable attribute
> remains after ATLER TABLE ALTER COLUMN SET STOAGE, but it doesn't
> seem that simple. So the storage class is for internal use only.

I found that "ALTER TABLE.. SET STORAGE plain" doesn't remove
toast relation, so it's not so bad even if compress does the
same?

The attached 0001 is fixed from the previous version. 0002 is the
syntax.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 14312f2b53edb053281b80cf3df16851ef474525 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 8 Feb 2019 11:22:57 +0900
Subject: [PATCH 1/2] Explicitly mark some attributes in catalog as no need for
 toast relation

Currently, there's some attributes of catalogs that storage class is
'x' but really don't need toasted. This causes several sorts of
unwanted things.  This patch adds a new storage class 'c' (compress),
which means "try compress in-line, but don't go external', then set
storage class of the attributes to it.
---
 src/backend/access/heap/tuptoaster.c | 4 +++-
 src/backend/catalog/Catalog.pm   | 6 +-
 src/backend/catalog/genbki.pl| 5 -
 src/backend/catalog/toasting.c   | 2 +-
 src/backend/commands/tablecmds.c | 2 ++
 src/include/catalog/genbki.h | 2 ++
 src/include/catalog/pg_attribute.h   | 8 
 src/include/catalog/pg_class.h   | 6 +++---
 8 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index cd921a4600..9718d15487 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -888,13 +888,15 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 		 */
 		for (i = 0; i < numAttrs; i++)
 		{
+			Form_pg_attribute att = TupleDescAttr(tupleDesc, i);
+
 			if (toast_action[i] != ' ')
 continue;
 			if (VARATT_IS_EXTERNAL(DatumGetPointer(toast_values[i])))
 continue;		/* can't happen, toast_action would be 'p' */
 			if (VARATT_IS_COMPRESSED(DatumGetPointer(toast_values[i])))
 continue;
-			if (TupleDescAttr(tupleDesc, i)->attstorage != 'm')
+			if (att->attstorage != 'm' && att->attstorage != 'c' )
 continue;
 			if (toast_sizes[i] > biggest_size)
 			{
diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 3bf308fe3b..e6e127645f 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -169,7 +169,7 @@ sub ParseHeader
 
 $column{type}  = $atttype;
 $column{name}  = $attname;
-$column{is_varlen} = 1 if $is_varlen;
+$column{is_varlen} = 1 if ($is_varlen);
 
 foreach my $attopt (@attopts)
 {
@@ -198,6 +198,10 @@ sub ParseHeader
 	{
 		$column{lookup} = $1;
 	}
+	elsif ($attopt =~ /BKI_STORAGE\((\w)\)/)
+	{
+		$column{storage} = $1;
+	}
 	else
 	{
 		die
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index be81094ffb..1d6818c96f 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -734,13 +734,16 @@ sub morph_row_for_pgattr
 
 	$row->{attname} = $attname;
 
+	# copy explicitly specified storage
+	$row->{attstorage} = $attr->{storage} if ($attr->{storage});
+
 	# Copy the type data from pg_type, and add some type-dependent items
 	my $type = $types{$atttype};
 
 	$row->{atttypid}   = $type->{oid};
 	$row->{attlen} = $type->{typlen};
 	$row->{attbyval}   = $type->{typbyval};
-	$row->{attstorage} = $type->{typstorage};
+	$row->{attstorage} = $type->{typstorage} if (!$row->{attstorage});
 	$row->{attalign}   = $type->{typalign};
 
 	# set attndims if it's an array type
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 77be19175a..ac45c51286 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -435,7 +435,7 @@ needs_toast_table(Relation rel)
 maxlength_unknown = true;
 			else
 data_length += maxlen;
-			if (att->attstorage != 'p')
+			if (att->attstorage != 'p' && att->attstorage != 'c')
 

Re: ALTER TABLE on system catalogs

2019-02-07 Thread Kyotaro HORIGUCHI
Hi, I got redirected here by a kind suggestion by Tom.

At Fri, 28 Sep 2018 22:58:36 +0200, Peter Eisentraut 
 wrote in 
<61666008-d1cd-7a66-33c8-215449f5d...@2ndquadrant.com>
> On 21/08/2018 17:24, Andres Freund wrote:
> >> Attached is a patch that instead moves those special cases into
> >> needs_toast_table(), which was one of the options suggested by Andres.
> >> There were already similar checks there, so I guess this makes a bit of
> >> sense.
> > The big difference is that that then only takes effect when called with
> > check=true. The callers without it, importantly NewHeapCreateToastTable
> > & NewRelationCreateToastTable, then won't run into this check. But still
> > into the error (see below).
> 
> I don't follow.  The call to needs_toast_table() is not conditional on
> the check argument.  The check argument only checks that the correct
> lock level is passed in.
> 
> >> @@ -145,21 +146,6 @@ create_toast_table(Relation rel, Oid toastOid, Oid 
> >> toastIndexOid,
> >>ObjectAddress baseobject,
> >>toastobject;
> >>  
> >> -  /*
> >> -   * Toast table is shared if and only if its parent is.
> >> -   *
> >> -   * We cannot allow toasting a shared relation after initdb (because
> >> -   * there's no way to mark it toasted in other databases' pg_class).
> >> -   */
> >> -  shared_relation = rel->rd_rel->relisshared;
> >> -  if (shared_relation && !IsBootstrapProcessingMode())
> >> -  ereport(ERROR,
> >> -  
> >> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >> -   errmsg("shared tables cannot be toasted after 
> >> initdb")));
> > This error check imo shouldn't be removed, but moved down.
> 
> We could keep it, but it would probably be dead code since
> needs_toast_table() would return false for all shared tables anyway.

FWIW I agree to this point.

By the way, I'm confused to see that attributes that don't want
to go external are marked as 'x' in system catalogs. Currently
(putting aside its necessity) the following operation ends with
successful attaching a new TOAST relation, which we really don't
want.

ALTER TABLE pg_attribute ALTER COLUMN attrelid SET STORAGE plain;

Might be silly, but couldn't we have another storage class? Say,
Compression, which means try compress but don't go external.

The attached patch does that.

- All varlen fields of pg_class and pg_attribute are marked as
  'c'.  (Catalog.pm, genbki.pl, genbki.h, pg_attribute/class.h)

- Try compress but don't try external for 'c' storage.
  (tuptoaster.c, toasting.c)


We could remove toast relation when no toastable attribute
remains after ATLER TABLE ALTER COLUMN SET STOAGE, but it doesn't
seem that simple. So the storage class is for internal use only.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 0849c0f1c5cbbba2bedd0dc841b564f67a32b612 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 8 Feb 2019 11:22:57 +0900
Subject: [PATCH] Explicitly mark some attributes in catalog as no need for
 toast relation

Currently, there's some attributes of catalogs that storage class is
'x' but really don't need toasted. This causes several sorts of
unwanted things.  This patch adds a new storage class 'c' (compress),
which means "try compress in-line, but don't go external', then set
storage class of the attributes to it.
---
 src/backend/access/heap/tuptoaster.c | 4 +++-
 src/backend/catalog/Catalog.pm   | 6 +-
 src/backend/catalog/genbki.pl| 5 -
 src/backend/catalog/toasting.c   | 2 +-
 src/include/catalog/genbki.h | 2 ++
 src/include/catalog/pg_attribute.h   | 8 
 src/include/catalog/pg_class.h   | 6 +++---
 7 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index cd921a4600..9718d15487 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -888,13 +888,15 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 		 */
 		for (i = 0; i < numAttrs; i++)
 		{
+			Form_pg_attribute att = TupleDescAttr(tupleDesc, i);
+
 			if (toast_action[i] != ' ')
 continue;
 			if (VARATT_IS_EXTERNAL(DatumGetPointer(toast_values[i])))
 continue;		/* can't happen, toast_action would be 'p' */
 			if (VARATT_IS_COMPRESSED(DatumGetPointer(toast_values[i])))
 continue;
-			if (TupleDescAttr(tupleDesc, i)->attstorage != 'm')
+			if (att->attstorage != 'm' && att->attstorage != 'c' )
 continue;
 			if (toast_sizes[i] > biggest_size)
 			{
diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 3bf308fe3b..e6e127645f 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -169,7 +169,7 @@ sub ParseHeader
 
 $column{type}  = $atttype;
 $column{name}  = $attname;
-$column{is_varlen} = 1 if $is_varlen;
+$column{is_varlen} = 1 if ($is_varlen);
 
 

Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-07 Thread Peter Geoghegan
On Wed, Feb 6, 2019 at 9:15 PM Tom Lane  wrote:
> I have a mostly-working patch along these lines that I hope to
> finish up and post tomorrow.

Do you think that you'll end up pushing the HEAD-only fix shortly? It
would be nice to avoid immediate bitrot of an upcoming revision of the
nbtree patch series.

Thanks
-- 
Peter Geoghegan



Re: speeding up planning with partitions

2019-02-07 Thread Amit Langote
Tsunakawa-san,

On 2019/02/08 10:50, Tsunakawa, Takayuki wrote:
> From: David Rowley [mailto:david.row...@2ndquadrant.com]
>> It's good to see work being done to try and improve this, but I think
>> it's best to do it on another thread. I think there was some agreement
>> upthread about this not being Amit's patch's problem. Doing it here
>> will make keeping track of this more complex than it needs to be.
>> There's also Amit's issue of keeping his patch series up to date. The
>> CFbot is really useful to alert patch authors when that's required,
>> but having other patches posted to the same thread can cause the CFbot
>> to check the wrong patch.
> 
> OK, you're right.  I'll continue this on another thread.

Thank you.  I do appreciate that Imai-san has persistently tried to find
interesting problems to solve beyond the patches we're working on here.
Maybe I chose the the subject line of this thread poorly when I began
working on it.  It should perhaps have been something like "speeding up
planning of point-lookup queries with many partitions" or something like
that.  There are important use cases beyond point lookup even with
partitioned tables (or maybe more so with partitioned tables), but perhaps
unsurprisingly, the bottlenecks in those cases are not *just* in the planner.

Thanks,
Amit




Re: dsa_allocate() faliure

2019-02-07 Thread Thomas Munro
On Fri, Feb 8, 2019 at 4:49 AM Thomas Munro
 wrote:
> I don't have the answer yet but I have some progress: I finally
> reproduced the "could not find %d free pages" error by running lots of
> concurrent parallel queries.  Will investigate.

Sometimes FreeManagerPutInternal() returns a
number-of-contiguous-pages-created-by-this-insertion that is too large
by one.  If this happens to be a new max-number-of-contiguous-pages,
it causes trouble some arbitrary time later because the max is wrong
and this FPM cannot satisfy a request that large, and it may not be
recomputed for some time because the incorrect value prevents
recomputation.  Not sure yet if this is due to the lazy computation
logic or a plain old fence-post error in the btree consolidation code
or something else.

-- 
Thomas Munro
http://www.enterprisedb.com



RE: speeding up planning with partitions

2019-02-07 Thread Tsunakawa, Takayuki
From: David Rowley [mailto:david.row...@2ndquadrant.com]
> It's good to see work being done to try and improve this, but I think
> it's best to do it on another thread. I think there was some agreement
> upthread about this not being Amit's patch's problem. Doing it here
> will make keeping track of this more complex than it needs to be.
> There's also Amit's issue of keeping his patch series up to date. The
> CFbot is really useful to alert patch authors when that's required,
> but having other patches posted to the same thread can cause the CFbot
> to check the wrong patch.

OK, you're right.  I'll continue this on another thread.


Regards
Takayuki Tsunakawa



Re: speeding up planning with partitions

2019-02-07 Thread David Rowley
On Fri, 8 Feb 2019 at 14:34, Imai, Yoshikazu
 wrote:
>
> On Wed, Feb 6, 2019 at 2:04 AM, Tsunakawa, Takayuki wrote:
> > Can you compare the performance of auto and force_custom_plan again with
> > the attached patch?  It uses PGPROC's LOCALLOCK list instead of the hash
> > table.
>
> Thanks for the patch, but it seems to have some problems.
> When I executed create/drop/select commands to large partitions, like over 
> than 512 partitions, backend died unexpectedly. Since I could see the 
> difference of the performance of auto and force_custom_plan when partitions 
> is large, patch needs to be modified to check whether performance is improved 
> or not.

It's good to see work being done to try and improve this, but I think
it's best to do it on another thread. I think there was some agreement
upthread about this not being Amit's patch's problem. Doing it here
will make keeping track of this more complex than it needs to be.
There's also Amit's issue of keeping his patch series up to date. The
CFbot is really useful to alert patch authors when that's required,
but having other patches posted to the same thread can cause the CFbot
to check the wrong patch.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



RE: speeding up planning with partitions

2019-02-07 Thread Imai, Yoshikazu
Tsunakawa-san,

On Wed, Feb 6, 2019 at 2:04 AM, Tsunakawa, Takayuki wrote:
> Can you compare the performance of auto and force_custom_plan again with
> the attached patch?  It uses PGPROC's LOCALLOCK list instead of the hash
> table.

Thanks for the patch, but it seems to have some problems.
When I executed create/drop/select commands to large partitions, like over than 
512 partitions, backend died unexpectedly. Since I could see the difference of 
the performance of auto and force_custom_plan when partitions is large, patch 
needs to be modified to check whether performance is improved or not.

Thanks
--
Yoshikazu Imai



Re: Transaction commits VS Transaction commits (with parallel) VS query mean time

2019-02-07 Thread Haribabu Kommi
On Thu, Feb 7, 2019 at 9:31 PM Haribabu Kommi 
wrote:

> Hi Hackers,
>
> Does increase in Transaction commits per second means good query
> performance?
> Why I asked this question is, many monitoring tools display that number of
> transactions
> per second in the dashboard (including pgadmin).
>
> During the testing of bunch of queries with different set of
> configurations, I observed that
> TPS of some particular configuration has increased compared to default
> server configuration, but the overall query execution performance is
> decreased after comparing all queries run time.
>
> This is because of larger xact_commit value than default configuration.
> With the changed server configuration, that leads to generate more parallel
> workers and every parallel worker operation is treated as an extra commit,
> because of this reason, the total number of commits increased, but the
> overall query performance is decreased.
>
> Is there any relation of transaction commits to performance?
>
> Is there any specific reason to consider the parallel worker activity also
> as a transaction commit? Especially in my observation, if we didn't
> consider the parallel worker activity as separate commits, the test doesn't
> show an increase in transaction commits.
>

The following statements shows the increase in the xact_commit value with
parallel workers. I can understand that workers updating the seq_scan stats
as they performed the seq scan. Is the same applied to parallel worker
transaction
commits also?

The transaction commit counter is updated with all the internal operations
like
autovacuum, checkpoint and etc. The increase in counters with these
operations
are not that visible compared to parallel workers.

The spike of TPS with parallel workers is fine to consider?

postgres=# select relname, seq_scan from pg_stat_user_tables where relname
= 'tbl';
 relname | seq_scan
-+--
 tbl |   16
(1 row)

postgres=# begin;
BEGIN
postgres=# select xact_commit from pg_stat_database where datname =
'postgres';
 xact_commit
-
 524
(1 row)

postgres=# explain analyze select * from tbl where f1 = 1000;
QUERY PLAN

---
 Gather  (cost=0.00..3645.83 rows=1 width=214) (actual time=1.703..79.736
rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on tbl  (cost=0.00..3645.83 rows=1 width=214)
(actual time=28.180..51.672 rows=0 loops=3)
 Filter: (f1 = 1000)
 Rows Removed by Filter: 3
 Planning Time: 0.090 ms
 Execution Time: 79.776 ms
(8 rows)

postgres=# commit;
COMMIT
postgres=# select xact_commit from pg_stat_database where datname =
'postgres';
 xact_commit
-
 531
(1 row)

postgres=# select relname, seq_scan from pg_stat_user_tables where relname
= 'tbl';
 relname | seq_scan
-+--
 tbl |   19
(1 row)

Regards,
Haribabu Kommi
Fujitsu Australia


Re: Problem while updating a foreign table pointing to a partitioned table on foreign server

2019-02-07 Thread Michael Paquier
On Thu, Feb 07, 2019 at 09:55:18PM +0900, Etsuro Fujita wrote:
> I 100% agree with Tom, and actually, I tried to address his comments, but I
> haven't come up with a clear solution for that yet.  I really want to
> address this, but I won't have much time to work on that at least until
> after this development cycle, so what I'm thinking is to mark this as
> Returned with feedback, or if possible, to move this to the 2019-07 CF.

Simply marking it as returned with feedback does not seem adapted to
me as we may lose track of it.  Moving it to the future CF would make
more sense in my opinion.
--
Michael


signature.asc
Description: PGP signature


Re: Tighten up a few overly lax regexes in pg_dump's tap tests

2019-02-07 Thread David Rowley
On Fri, 8 Feb 2019 at 13:04, Michael Paquier  wrote:
>
> On Thu, Feb 07, 2019 at 03:33:54PM +0100, Daniel Gustafsson wrote:
> > Correct.  The idea was to maintain readability while making the regex a bit
> > better, without any claims to make it perfect.
>
> Agreed with your position.  I see no point is making all the
> expressions unreadable for little gain.  What Daniel proposed upthread
> had a better balance in my opinion than the previous behavior, without
> sacrifying the readability of anything and still improving the error
> detection.

FWIW, this was the first time I'd really worked with TAP tests.  I had
been looking into it because I needed to add a new test.

I was surprised to see it working this way and not just doing a diff
with some known good output.  I'm maybe just missing the reason that's
not possible, but to me, it seems a bit less error-prone than trying
to ensure an exact match with a bunch of regular expressions.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2019-02-07 Thread Michael Paquier
On Thu, Feb 07, 2019 at 12:07:01PM -0300, Alvaro Herrera wrote:
> On 2019-Feb-07, Peter Eisentraut wrote:
>> Another thing I was thinking of: We need some database-global tests.
>> For example, at some point during development, I had broken some variant
>> of REINDEX DATABASE.  Where could we put those tests?  Maybe with reindexdb?
> 
> What's wrong with a new reindex.sql in regress?

Depending on the numbers of objects created and remaining around
before the script is run in the main suite, this would be costly.  I
think that this approach would not scale well in the long-term.
Having TAP test with reindexdb gives you access to a full instance
with its contents always fully known at test time.
--
Michael


signature.asc
Description: PGP signature


Re: Connection slots reserved for replication

2019-02-07 Thread Michael Paquier
On Thu, Feb 07, 2019 at 04:16:22PM +0100, Oleksii Kliukin wrote:
> On 7. Feb 2019, at 07:51, Michael Paquier  wrote:
> I’ve noticed you returned the 'see max_connections’ parameter there. As 
> noticed
> previously in this thread by Kyotaro Horiguchi, it’s not clear what exactly we
> are supposed to see there, perhaps it makes sense to elaborate (besides, the
> comment on max_wal_senders at guc.c:382 hints that max_wal_senders depends on
> max_connections, which is not true anymore).

After waking up on that, it seems that I overdid this part.  I think
that I was trying to document the relationship with the multiple
parameters.  Now it is true that it is easy enough to grep for the GUC
strings and find them.  It may be better to avoid spawning the
calculations in the check functions in multiple lines as well.

>> It seems to me that it would be a good idea to have an
>> autovacuum-worker-related message in the context of InitProcess() for
>> a failed backend creation, but we can deal with that later if
>> necessary.
> 
> Hm.. I am wondering how the autovacuum workers can run out of slots
> there?

Normally they cannot if you look at the way the launcher decides new
workers.  Still, if one begins to hack the autovacuum code for a fork
or a new feature, I think that it would be useful to display a
context-related message instead of the confusing "too many clients" in
this case.  This way it is possible to debug properly things.

>> With all that reviewed, I finish with the attached that I am
>> comfortable enough to commit.  XLOG_PAGE_MAGIC is bumped as well, as a
>> reminder because xl_parameter_change gets an upgrade, and I am most
>> likely going to forget it.
>> 
>> Please let me know if you have comments.  I am still planning to do an
>> extra pass on it.
> 
> Apart from the comment-related notes above your changes look good to
> me, thank you.

Thanks for the confirmation.  I am not planning to finish wrapping
this one today anyway, and next week should be fine.
--
Michael


signature.asc
Description: PGP signature


Re: bug tracking system

2019-02-07 Thread Michael Paquier
On Thu, Feb 07, 2019 at 12:20:35AM -0500, Tom Lane wrote:
> No, that'd be additional effort on top, which I'm not sure I see
> a reason for.  Nobody's given a plausible reason why we need
> a machine-readable way to identify the bug reporter's name from
> the commit log.  And we get a fair number of reports with no name
> or an obvious pseudonym, too, so how would you handle that?

Hm.  Or just rely on Backpatch-through?  A decent bug tracker usually
displays nicely what are the versions impacted, so this field should
be parsed anyway.
--
Michael


signature.asc
Description: PGP signature


Re: Synchronize with imath upstream

2019-02-07 Thread Michael Paquier
On Thu, Feb 07, 2019 at 09:23:43PM +0100, Daniel Gustafsson wrote:
> On 7 Feb 2019, at 16:12, Tom Lane  wrote:
>> .. I do strongly recommend setting up and documenting a
>> reproducible import process, as I did for src/timezone.
> 
> Strong +1 on this.

+1.
--
Michael


signature.asc
Description: PGP signature


Re: Location of pg_rewind/RewindTest.pm and ssl/ServerSetup.pm

2019-02-07 Thread Michael Paquier
On Thu, Feb 07, 2019 at 04:10:35PM -0500, Andrew Dunstan wrote:
> Also, SSLServer.pm contains a bunch of lines like this:
>        copy_files("ssl/server-*.crt", $pgdata);
> 
> I think if we're going to export it maybe we should have a method to set
> up the ssl directory with the required certificates.

Very good point.  Let's see if there is an actual use case for
shipping them.  Personally I have no actual uses for them, but perhaps
somebody else can have an argument good enough to justify it..
--
Michael


signature.asc
Description: PGP signature


Re: Allow some recovery parameters to be changed with reload

2019-02-07 Thread Michael Paquier
On Thu, Feb 07, 2019 at 11:06:27PM +0100, Peter Eisentraut wrote:
> Probably right.  I figured it would be useful to see what the outcome is
> with primary_conninfo, so they can be treated similarly.

The interactions with waiting for WAL to be available and the WAL
receiver stresses me a bit for restore_command, as you could finish
with the startup process switching to use restore_command with a WAL
receiver still working behind and overwriting partially the recovered
segment, which could lead to corruption.  We should be *very* careful
about that.
--
Michael


signature.asc
Description: PGP signature


Re: Inconsistent error handling in the openssl init code

2019-02-07 Thread Michael Paquier
On Thu, Feb 07, 2019 at 10:03:30AM +0100, Daniel Gustafsson wrote:
> Doh, managed to completely overlook that.  The attached updated patch also
> fixes the comment, thanks!

That looks fine to me.  Could you just add it to the next CF as a bug
fix so as we don't forget?  I am pretty sure that Peter will look at
that soon.
--
Michael


signature.asc
Description: PGP signature


Re: Add pg_partition_root to get top-most parent of a partition tree

2019-02-07 Thread Michael Paquier
On Thu, Feb 07, 2019 at 12:11:49PM -0300, Alvaro Herrera wrote:
> I looked at it briefly a few weeks ago and it seemed sound, though I
> haven't yet tried to write the constraint display query for psql using
> it, yet -- but that should be straightforward anyway.  Let's get it
> committed so we have one less thing to worry about.

item_to_worry_about--;

Thanks for the successive reviews.
--
Michael


signature.asc
Description: PGP signature


Re: Tighten up a few overly lax regexes in pg_dump's tap tests

2019-02-07 Thread Michael Paquier
On Thu, Feb 07, 2019 at 03:33:54PM +0100, Daniel Gustafsson wrote:
> Correct.  The idea was to maintain readability while making the regex a bit
> better, without any claims to make it perfect.

Agreed with your position.  I see no point is making all the
expressions unreadable for little gain.  What Daniel proposed upthread
had a better balance in my opinion than the previous behavior, without
sacrifying the readability of anything and still improving the error
detection.
--
Michael


signature.asc
Description: PGP signature


Re: Commit Fest 2019-01 is now closed

2019-02-07 Thread Robbie Harwood
Andres Freund  writes:

> There's plenty stuff that's chugging along in development but ought to
> be processed at less urgency / by different people, than the stuff
> targeted to be committed soon. It's already frustrating to contribute
> to postgresql for new people, but if they don't get feedback for half
> a year because they submitted around December / January it's almost
> guaranteed that they vanish.  Additionally, there's an increasing
> amount of development projects that are too large to complete in a
> single cycle, and if we just stop looking at them for half a year,
> they'll also not succeed.

I definitely did this - and I don't think success can be declared in my
case yet.  Would like to talk about that for a moment if that's alright.

GSSAPI encryption was first submitted 2015-07-02.  Discussion on it
continued until 2016-08-01, when I vanished.  Discussion included 118
messages, 49 of which I sent myself, and 13 separate revisions.  At that
point I had gone way over the allotted time to spend on this, and had to
move on to other things.  The account tracking on the commitfest app
wasn't as good then, but this corresponds to 4 commitfests.

Second push started 2018-05-23 and is ongoing.  Discussion has been much
quieter - 30 messages, 10 mine - and 7 revisions (mostly due to cfbot).
Since the commitfest webpage supports github login now, the count for
second push is accurate: 5 commitfests.

So I'm at 9 commitfests total (over 2 years).  The total amount of time
spent on this is incredibly daunting.  And this isn't an isolated case;
for instance, at the top of the current commitfest is
https://commitfest.postgresql.org/22/528/ which has been in 14
commitfests.  14 - this'll be 3 years next month.  Others have 11, 10,
10, 8...  (I didn't dig into the tracking on this, so they might be
higher for the same reason my count is higher than is reflected.)

postgresql is an amazing piece of software, and it's really cool to have
something to contribute to it.  And I think that the reviews I've
received have been from people who care genuinely that it keeps being
amazing.  But if I weren't regularly dealing with open source and quirky
upstreams for my day job, I would be long gone.  Even so, no contributor
has unlimited time; even for us corporate contributors, management will
eventually decide to solve the problem a different way.

Thanks for letting me get that off my chest.  Happy hacking!

Thanks,
--Robbie


signature.asc
Description: PGP signature


Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2019-02-07 Thread Peter Eisentraut
On 30/01/2019 07:16, Takashi Menjo wrote:
> Sorry but I found that the patchset v2 had a bug in managing WAL segment
> file offset.  I fixed it and updated a patchset as v3 (attached).

I'm concerned with how this would affect the future maintenance of this
code.  You are introducing a whole separate code path for PMDK beside
the normal file path (and it doesn't seem very well separated either).
Now everyone who wants to do some surgery in the WAL code needs to take
that into account.  And everyone who wants to do performance work in the
WAL code needs to check that the PMDK path doesn't regress.  AFAICT,
this hardware isn't very popular at the moment, so it would be very hard
to peer review any work in this area.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Commit Fest 2019-01 is now closed

2019-02-07 Thread Peter Geoghegan
On Thu, Feb 7, 2019 at 4:02 AM Andres Freund  wrote:
> On 2019-02-07 12:53:39 +0100, Peter Eisentraut wrote:
> > What is the meaning of this?  If something is meant for 13, shouldn't it
> > be moved to the next commit fest?
>
> Why? There's plenty stuff that's chugging along in development but ought
> to be processed at less urgency / by different people, than the stuff
> targeted to be committed soon. It's already frustrating to contribute to
> postgresql for new people, but if they don't get feedback for half a
> year because they submitted around December / January it's almost
> guaranteed that they vanish.

+1. Let's not disincentivize acknowledging that v13 is the target release.

I also don't think it's a good idea to force patch authors to take a
position. Novice authors are particularly likely to make the wrong
call.


--
Peter Geoghegan



Re: Allow some recovery parameters to be changed with reload

2019-02-07 Thread Peter Eisentraut
On 07/02/2019 09:14, Sergei Kornilov wrote:
> We have some possible trouble with restore_command? As far i know it also 
> only looked at by the startup process.

Probably right.  I figured it would be useful to see what the outcome is
with primary_conninfo, so they can be treated similarly.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: phase out ossp-uuid?

2019-02-07 Thread Andres Freund
Hi,

On 2019-02-07 09:03:06 +, Dave Page wrote:
> On Thu, Feb 7, 2019 at 8:26 AM Peter Eisentraut
>  wrote:
> >
> > I'm wondering whether we should phase out the use of the ossp-uuid
> > library? (not the uuid-ossp extension)  We have had preferred
> > alternatives for a while now, so it shouldn't be necessary to use this
> > anymore, except perhaps in some marginal circumstances?  As we know,
> > ossp-uuid isn't maintained anymore, and a few weeks ago the website was
> > gone altogether, but it seems to be back now.
> >
> > I suggest we declare it deprecated in PG12 and remove it altogether in PG13.
> 
> Much as I'd like to get rid of it, we don't have an alternative for
> Windows do we? The docs for 11 imply it's required for UUID support
> (though the wording isn't exactly clear, saying it's required for
> UUID-OSSP support!):
> https://www.postgresql.org/docs/11/install-windows-full.html#id-1.6.4.8.8

Given that we've now integrated strong crypto support, and are relying
on it for security (scram), perhaps we should just add a core uuidv4?

Greetings,

Andres Freund



Re: phase out ossp-uuid?

2019-02-07 Thread Peter Eisentraut
On 07/02/2019 10:03, Dave Page wrote:
> Much as I'd like to get rid of it, we don't have an alternative for
> Windows do we?

Yes, that appears to be a significant problem, so we'll have to keep it
for the time being.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Commit Fest 2019-01 is now closed

2019-02-07 Thread Gavin Flower

On 08/02/2019 00:53, Peter Eisentraut wrote:

On 06/02/2019 21:09, Magnus Hagander wrote:

This has now been pushed and is available. I've set it up with stable,
12 and 13 as possible versions for now, but I have not added any tags to
the existing patches (except for one, in order to test it).

What is the meaning of this?  If something is meant for 13, shouldn't it
be moved to the next commit fest?

I find the current display a bit hard to read.  If we are going to use
the white-on-color style, perhaps each version could have a different
color.  Or just make it plain black-on-white text.

If your eyesight is poor, then odd colour combinations like white on 
grey, or other colours, is hard to read.


Ten years ago, before about 7 eye operations, poor contrast would be 
extremely difficult, if not impossible, for me to read.


Remember also that about 1 in 12 men are colour blind.


Cheers,
Gavin




Re: Location of pg_rewind/RewindTest.pm and ssl/ServerSetup.pm

2019-02-07 Thread Andrew Dunstan


On 2/7/19 4:01 PM, Andrew Dunstan wrote:
> On 2/6/19 8:44 PM, Michael Paquier wrote:
>> On Wed, Feb 06, 2019 at 08:34:11PM -0500, Andrew Dunstan wrote:
>>> The obvious solution is to perform the same surgery for the ssl and
>>> pg_rewind tests that we performed for genbki.pl and other scripts, but
>>> in these cases the used modules are not in the same directory as the
>>> using scripts, but in their parent directories. I can't think of a good
>>> reason for this.They clearly relate to the TAP test suites, and nothing
>>> else, so it seems to me that they should go into the same directory as
>>> the TAP test scripts. There should be no danger that "prove" will try to
>>> run them, as it is only set up to run files with a ".pl" extension.
>>>
>>> While we're at it, I think ServerSetup.pm should possibly be renamed to
>>> something like SSLServer.pm (and the perl code adjusted accordingly)
>>>
>>> Does anyone object to me making these changes?
>> Not really if this eases the buildfarm coverage.  At the same time, it
>> seems to me that it could be a good idea to install those sub-modules.
>
>
> I have made the changes, and the last errors should be clearing up now.
> I think installing them is a separate project, although doing so is
> probably reasonable given that we install other TAP test packages. OTOH
> these are rather more specialised.



Also, SSLServer.pm contains a bunch of lines like this:


       copy_files("ssl/server-*.crt", $pgdata);


I think if we're going to export it maybe we should have a method to set
up the ssl directory with the required certificates.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: shared-memory based stats collector

2019-02-07 Thread Andres Freund
Hi,

On 2018-11-12 20:10:42 +0900, Kyotaro HORIGUCHI wrote:
> diff --git a/src/backend/access/transam/xlog.c 
> b/src/backend/access/transam/xlog.c
> index 7eed5866d2..e52ae54821 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -8587,9 +8587,9 @@ LogCheckpointEnd(bool restartpoint)
>   _secs, _usecs);
>  
>   /* Accumulate checkpoint timing summary data, in milliseconds. */
> - BgWriterStats.m_checkpoint_write_time +=
> + BgWriterStats.checkpoint_write_time +=
>   write_secs * 1000 + write_usecs / 1000;
> - BgWriterStats.m_checkpoint_sync_time +=
> + BgWriterStats.checkpoint_sync_time +=
>   sync_secs * 1000 + sync_usecs / 1000;

Why does this patch do renames like this in the same entry as actual
functional changes?


> @@ -1273,16 +1276,22 @@ do_start_worker(void)
>   break;
>   }
>   }
> - if (skipit)
> - continue;
> + if (!skipit)
> + {
> + /* Remember the db with oldest autovac time. */
> + if (avdb == NULL ||
> + tmp->adw_entry->last_autovac_time <
> + avdb->adw_entry->last_autovac_time)
> + {
> + if (avdb)
> + pfree(avdb->adw_entry);
> + avdb = tmp;
> + }
> + }
>  
> - /*
> -  * Remember the db with oldest autovac time.  (If we are here, 
> both
> -  * tmp->entry and db->entry must be non-null.)
> -  */
> - if (avdb == NULL ||
> - tmp->adw_entry->last_autovac_time < 
> avdb->adw_entry->last_autovac_time)
> - avdb = tmp;
> + /* Immediately free it if not used */
> + if(avdb != tmp)
> + pfree(tmp->adw_entry);
>   }

This looks like another unrelated refactoring. Don't do this.


>   /* Transfer stats counts into pending pgstats message */
> - BgWriterStats.m_buf_written_backend += 
> CheckpointerShmem->num_backend_writes;
> - BgWriterStats.m_buf_fsync_backend += 
> CheckpointerShmem->num_backend_fsync;
> + BgWriterStats.buf_written_backend += 
> CheckpointerShmem->num_backend_writes;
> + BgWriterStats.buf_fsync_backend += CheckpointerShmem->num_backend_fsync;

More unrelated renaming.  I'll stop mentioning this in the rest of this
patch, but really don't do this, makes such a large unnecessary harder
to review.


pgstat.c needs a header comment explaining the architecture of the
approach.


> +/*
> + * Operation mode of pgstat_get_db_entry.
> + */
> +#define PGSTAT_FETCH_SHARED  0
> +#define PGSTAT_FETCH_EXCLUSIVE   1
> +#define  PGSTAT_FETCH_NOWAIT 2
> +
> +typedef enum
> +{

Please don't create anonymous enums that are then typedef'd to a
name. The underlying name and the one not scoped to typedefs shoudl be
the same.

> +/*
> + *  report withholding facility.
> + *
> + *  some report items are withholded if required lock is not acquired
> + *  immediately.
> + */

This comment needs polishing. The variables are named _pending_, but the
comment talks about withholding - which doesn't seem like an apt name.

>  /*
>   * Structures in which backends store per-table info that's waiting to be
> @@ -189,18 +189,14 @@ typedef struct TabStatHashEntry
>   * Hash table for O(1) t_id -> tsa_entry lookup
>   */
>  static HTAB *pgStatTabHash = NULL;
> +static HTAB *pgStatPendingTabHash = NULL;
>  
>  /*
>   * Backends store per-function info that's waiting to be sent to the 
> collector
>   * in this hash table (indexed by function OID).
>   */
>  static HTAB *pgStatFunctions = NULL;
> -
> -/*
> - * Indicates if backend has some function stats that it hasn't yet
> - * sent to the collector.
> - */
> -static bool have_function_stats = false;
> +static HTAB *pgStatPendingFunctions = NULL;

So this patch leaves us with a pgStatFunctions that has a comment
explaining it's about "waiting to be sent" stats, and then additionally
a pgStatPendingFunctions?



>  /* 
>   * Public functions used by backends follow
> @@ -802,41 +436,107 @@ allow_immediate_pgstat_restart(void)
>   * pgstat_report_stat() -
>   *
>   *   Must be called by processes that performs DML: tcop/postgres.c, logical
> - *   receiver processes, SPI worker, etc. to send the so far collected
> - *   per-table and function usage statistics to the collector.  Note that 
> this
> - *   is called only when not within a transaction, so it is fair to use
> - *   transaction stop time as an approximation of current time.
> - * --
> + *   receiver processes, SPI worker, etc. to apply the so far collected
> + *   per-table and function 

Re: Location of pg_rewind/RewindTest.pm and ssl/ServerSetup.pm

2019-02-07 Thread Andrew Dunstan


On 2/6/19 8:44 PM, Michael Paquier wrote:
> On Wed, Feb 06, 2019 at 08:34:11PM -0500, Andrew Dunstan wrote:
>> The obvious solution is to perform the same surgery for the ssl and
>> pg_rewind tests that we performed for genbki.pl and other scripts, but
>> in these cases the used modules are not in the same directory as the
>> using scripts, but in their parent directories. I can't think of a good
>> reason for this.They clearly relate to the TAP test suites, and nothing
>> else, so it seems to me that they should go into the same directory as
>> the TAP test scripts. There should be no danger that "prove" will try to
>> run them, as it is only set up to run files with a ".pl" extension.
>>
>> While we're at it, I think ServerSetup.pm should possibly be renamed to
>> something like SSLServer.pm (and the perl code adjusted accordingly)
>>
>> Does anyone object to me making these changes?
> Not really if this eases the buildfarm coverage.  At the same time, it
> seems to me that it could be a good idea to install those sub-modules.



I have made the changes, and the last errors should be clearing up now.
I think installing them is a separate project, although doing so is
probably reasonable given that we install other TAP test packages. OTOH
these are rather more specialised.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Race condition in dependency searches

2019-02-07 Thread Tom Lane
While I've been staring at dependency.c, I've realized that this bit
in findDependentObjects is unsafe:

/*
 * First, release caller's lock on this object and get
 * deletion lock on the owning object.  (We must release
 * caller's lock to avoid deadlock against a concurrent
 * deletion of the owning object.)
 */
ReleaseDeletionLock(object);
AcquireDeletionLock(, 0);

/*
 * The owning object might have been deleted while we waited
 * to lock it; if so, neither it nor the current object are
 * interesting anymore.  We test this by checking the
 * pg_depend entry (see notes below).
 */
if (!systable_recheck_tuple(scan, tup))
{
systable_endscan(scan);
ReleaseDeletionLock();
return;
}

/*
 * Okay, recurse to the owning object instead of proceeding.

The unstated assumption here is that if the pg_depend entry we are looking
at has gone away, then both the current object and its owning object must
be gone too.  That was a safe assumption when this code was written,
fifteen years ago, because nothing except object deletion could cause
a pg_depend entry to disappear.  Since then, however, we have merrily
handed people a bunch of foot-guns with which they can munge pg_depend
like mad, for example ALTER EXTENSION ... DROP.

Hence, if the pg_depend entry is gone, that might only mean that somebody
randomly decided to remove the dependency.  Now, I think it's legit to
decide that we needn't remove the previously-owning object in that case.
But it's not okay to just pack up shop and return, because if the current
object is still there, proceeding with deletion of whatever we were
deleting would be bad.  That could leave us with scenarios like triggers
whose function is gone, views referring to a deceased table, indexes
dependent on a vanished datatype or opclass, yadda yadda.

It seems like what ought to happen here, if systable_recheck_tuple fails,
is to reacquire the deletion lock that we gave up on "object", then
check to see if "object" is still there, and if so continue with deleting
it.  Only if it in fact isn't there is it OK to conclude that 
findDependentObjects needn't do any more work at this recursion level.

I do not think we have any off-the-shelf way of asking whether an
ObjectAddress's referent still exists.  We could probably teach
objectaddress.c to provide one --- it seems to have enough infrastructure
already to know where to find the object's (main) catalog tuple.

This issue seems independent of the partition problems I'm working
on right now, so I plan to leave it for later.

regards, tom lane



Re: Synchronize with imath upstream

2019-02-07 Thread Daniel Gustafsson
> On 7 Feb 2019, at 16:12, Tom Lane  wrote:

> .. I do strongly recommend setting up and documenting a
> reproducible import process, as I did for src/timezone.

Strong +1 on this.

cheers ./daniel



Re: propagating replica identity to partitions

2019-02-07 Thread Dmitry Dolgov
> On Tue, Feb 5, 2019 at 12:54 AM Alvaro Herrera  
> wrote:
>
> Actually, index-based replica identities failed in pg_dump: we first
> create the index ONLY on the partitioned table (marking it as invalid),
> so when we immediately do the ALTER TABLE/REPLICA IDENTITY, it rejects
> the invalid index.  If we change the code to allow invalid indexes on
> partitioned tables to become replica identities, we hit the problem that
> the index didn't exist when processing the partition list.  In order to
> fix that, I added a flag so that partitions are allowed not to have the
> index, in hopes that the missing index are created in subsequent
> commands; those indexes should become valid & identity afterwards.
>
> There's a small emerging problem, which is that if you have an invalid
> index marked as replica identity, you cannot create any more partitions;
> the reason is that we want to propagate the replica identity to the
> partition, but the index is not there (since invalid indexes are not
> propagated).  I don't think this case is worth supporting; it can be
> fixed but requires some work[1].
>
> New patch attached.

Could there be a race condition somewhere? The idea and the code looks
reasonable, but when I'm testing ALTER TABLE ... REPLICA IDENTITY with this
patch and concurrent partition creation, I've got the following error on ATTACH
PARTITION:

ERROR:  42P17: replica index does not exist in partition "test373"
LOCATION:  MatchReplicaIdentity, tablecmds.c:15018

This error seems unstable, other time I've got a deadlock. And I don't observe
this behaviour on the master.



Re: Tighten up a few overly lax regexes in pg_dump's tap tests

2019-02-07 Thread Daniel Gustafsson
> On 7 Feb 2019, at 18:20, David Fetter  wrote:
> 
> On Thu, Feb 07, 2019 at 10:10:32AM +0900, Michael Paquier wrote:
>> On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote:
>>> Correct.  One could argue that the regex is still suboptimal since “COMMENT 
>>> ON
>>> DATABASE postgres IS  ;” will be matched as well, but there I think the 
>>> tradeoff
>>> for readability wins.
>> 
>> Okay, that looks like an improvement anyway, so committed after going
>> over the tests for similar problems, and there was one for CREATE
>> DATABASE and DROP ROLE.  It is possible to have a regex which tells as
>> at least one non-whitespace character, but from what I can see these
>> don't really improve the readability.
> 
> Are you talking about \w+, or [^[:space:]]+, [^[:blank:]]+, or…?

Personally I feel that expanding these test regexes to catch more low-risk
bugs, at the cost of readability, is a slippery slope towards implementing a
SQL parser in regexes.  That was my $0.02 for not attempting to make these
bulletproof.

cheers ./daniel


Re: PG_RE_THROW is mandatory (was Re: jsonpath)

2019-02-07 Thread Tom Lane
Alvaro Herrera  writes:
> So,

> "The error recovery code can either do PG_RE_THROW to propagate the
> error outwards, or do a (sub)transaction abort.  Failure to do so may
> leave the system in an inconsistent state for further processing."

WFM.

regards, tom lane



Re: PG_RE_THROW is mandatory (was Re: jsonpath)

2019-02-07 Thread Alvaro Herrera
> On 2019-02-06 13:09:59 -0300, Alvaro Herrera wrote:

> > This is obviously wrong; while we have a couple of codesites that omit
> > it, it's not a generally available coding pattern.  I think we should
> > amend that comment.  I propose: "The error recovery code must normally
> > do PG_RE_THROW() to propagate the error outwards; failure to do so may
> > leave the system in an inconsistent state for further processing."

On 2019-Feb-06, Andres Freund wrote:

> Well, but it's ok not to rethrow if you do a [sub]transaction
> rollback. I assume that's why it's framed as optional. We probably
> should reference that fact?

On 2019-Feb-06, Tom Lane wrote:

> Well, it can either do PG_RE_THROW or do a (sub)transaction abort.
> Some level of throw-catching code has to do the latter eventually.

So,

"The error recovery code can either do PG_RE_THROW to propagate the
error outwards, or do a (sub)transaction abort.  Failure to do so may
leave the system in an inconsistent state for further processing."

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: use Getopt::Long for catalog scripts

2019-02-07 Thread Alvaro Herrera
Why is this script talking to me?

On 2019-Feb-07, David Fetter wrote:

> Similarly,
> 
> die "-I, the header include path, must be specified.\n" unless $include_path;

But why must thee, oh mighty header include path, be specified?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: bug tracking system

2019-02-07 Thread Tom Lane
Nathan Wagner  writes:
> On Wed, Feb 06, 2019 at 10:50:51PM -0500, Tom Lane wrote:
>> I do have a modest proposal for improving things going forward.  How
>> about, if a commit purports to fix a particular bug, that we say
>> "Fixes: https://postgr.es/m/" in place of our current
>> habit of saying "Discussion: ...".  For bugs that have come in through
>> the bug form, the bug number is trivially extractable from the
>> message-id these days;

> The bug number would only be extractable from the message-id of the
> first message.  This proposal would require finding the message-id of
> the original message, rather than just looking at the subject of any
> message in the thread.  That seems like more work than is really
> necessary.

The existing convention is already to cite the message-id of the start
of the thread.  I proposed this exactly because it's no more work than
before for the committer.

> A bigger question, at least for me is do people actually want to use the
> system I've set up?

Yeah, that's really the bottom line here --- there's been a lot of
"if you build it they will come" theorizing about bug trackers,
but we have little evidence either way about how people would really
use one.

regards, tom lane



Handling of ORDER BY by postgres_fdw

2019-02-07 Thread Antonin Houska
While reviewing [1] it occurred to me that it might not be clean that
postgresGetForeignUpperPaths() adds ORDER BY to the remote query when it
receives stage=UPPERREL_GROUP_AGG. Shouldn't that only happen for
UPPERREL_ORDERED? Thus the processing of UPPERREL_ORDERED would have to be
able to handle grouping too, so some refactoring would be needed. Do I
misunderstand anything?

[1] https://commitfest.postgresql.org/22/1950/

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com



Re: use Getopt::Long for catalog scripts

2019-02-07 Thread David Fetter
On Thu, Feb 07, 2019 at 10:09:08AM +0100, John Naylor wrote:
> This was suggested in
> 
> https://www.postgresql.org/message-id/11766.1545942853%40sss.pgh.pa.us
> 
> I also adjusted usage() to match. There might be other scripts that
> could use this treatment, but I figure this is a good start.
> 
> -- 
> John Naylorhttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

> diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
> index d9f92ba701..8c4bd0c7ae 100644
> --- a/src/backend/catalog/Makefile
> +++ b/src/backend/catalog/Makefile
> @@ -89,7 +89,8 @@ generated-header-symlinks: 
> $(top_builddir)/src/include/catalog/header-stamp
>  # version number when it changes.
>  bki-stamp: genbki.pl Catalog.pm $(POSTGRES_BKI_SRCS) $(POSTGRES_BKI_DATA) 
> $(top_srcdir)/configure.in
>   $(PERL) -I $(catalogdir) $< \
> - -I $(top_srcdir)/src/include/ --set-version=$(MAJORVERSION) \
> + --include-path=$(top_srcdir)/src/include/ \
> + --set-version=$(MAJORVERSION) \
>   $(POSTGRES_BKI_SRCS)
>   touch $@
>  
> diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
> index be81094ffb..48ba69cd0a 100644
> --- a/src/backend/catalog/genbki.pl
> +++ b/src/backend/catalog/genbki.pl
> @@ -16,6 +16,7 @@
>  
>  use strict;
>  use warnings;
> +use Getopt::Long;
>  
>  use File::Basename;
>  use File::Spec;
> @@ -23,41 +24,20 @@ BEGIN  { use lib File::Spec->rel2abs(dirname(__FILE__)); }
>  
>  use Catalog;
>  
> -my @input_files;
>  my $output_path = '';
>  my $major_version;
>  my $include_path;
>  
> -# Process command line switches.
> -while (@ARGV)
> -{
> - my $arg = shift @ARGV;
> - if ($arg !~ /^-/)
> - {
> - push @input_files, $arg;
> - }
> - elsif ($arg =~ /^-I/)
> - {
> - $include_path = length($arg) > 2 ? substr($arg, 2) : shift 
> @ARGV;
> - }
> - elsif ($arg =~ /^-o/)
> - {
> - $output_path = length($arg) > 2 ? substr($arg, 2) : shift @ARGV;
> - }
> - elsif ($arg =~ /^--set-version=(.*)$/)
> - {
> - $major_version = $1;
> - die "Invalid version string.\n"
> -   if !($major_version =~ /^\d+$/);
> - }
> - else
> - {
> - usage();
> - }
> -}
> +GetOptions(
> + 'output:s'=> \$output_path,
> + 'include-path:s'  => \$include_path,
> + 'set-version:s'   => \$major_version) || usage();
> +
> +die "Invalid version string.\n"
> +  if !($major_version =~ /^\d+$/);

Maybe this would be clearer as:

die "Invalid version string.\n"
unless $major_version =~ /^\d+$/;

>  # Sanity check arguments.
> -die "No input files.\n" if !@input_files;
> +die "No input files.\n" if !@ARGV;
>  die "--set-version must be specified.\n" if !defined $major_version;
>  die "-I, the header include path, must be specified.\n" if !$include_path;

Similarly,

die "No input files.\n" unless @ARGV;
die "--set-version must be specified.\n" unless defined $major_version;
die "-I, the header include path, must be specified.\n" unless $include_path;

>  
> @@ -79,7 +59,7 @@ my @toast_decls;
>  my @index_decls;
>  my %oidcounts;
>  
> -foreach my $header (@input_files)
> +foreach my $header (@ARGV)
>  {
>   $header =~ /(.+)\.h$/
> or die "Input files need to be header files.\n";
> @@ -917,11 +897,11 @@ sub form_pg_type_symbol
>  sub usage
>  {
>   die < -Usage: genbki.pl [options] header...
> +Usage: perl -I [directory of Catalog.pm] genbki.pl [--output/-o ] 
> [--include-path/-i include path] header...
>  
>  Options:
> --I   include path
> --o   output path
> +--output Output directory (default '.')
> +--include-path   Include path for source directory
>  --set-versionPostgreSQL version number for initdb cross-check
>  
>  genbki.pl generates BKI files and symbol definition
> diff --git a/src/backend/utils/Gen_fmgrtab.pl 
> b/src/backend/utils/Gen_fmgrtab.pl
> index f17a78ebcd..9aa8714840 100644
> --- a/src/backend/utils/Gen_fmgrtab.pl
> +++ b/src/backend/utils/Gen_fmgrtab.pl
> @@ -18,32 +18,14 @@ use Catalog;
>  
>  use strict;
>  use warnings;
> +use Getopt::Long;
>  
> -# Collect arguments
> -my @input_files;
>  my $output_path = '';
>  my $include_path;
>  
> -while (@ARGV)
> -{
> - my $arg = shift @ARGV;
> - if ($arg !~ /^-/)
> - {
> - push @input_files, $arg;
> - }
> - elsif ($arg =~ /^-o/)
> - {
> - $output_path = length($arg) > 2 ? substr($arg, 2) : shift @ARGV;
> - }
> - elsif ($arg =~ /^-I/)
> - {
> - $include_path = length($arg) > 2 ? substr($arg, 2) : shift 
> @ARGV;
> - }
> - else
> - {
> - usage();
> - }
> -}
> +GetOptions(
> + 'output:s'=> \$output_path,
> + 'include:s'   => \$include_path) || usage();
>  
>  # Make sure output_path ends in 

Re: dsa_allocate() faliure

2019-02-07 Thread Thomas Munro
On Thu, Feb 7, 2019 at 9:10 PM Jakub Glapa  wrote:
> > Do you have query logging enabled ?  If not, could you consider it on at 
> > least
> one of those servers ?  I'm interested to know what ELSE is running at the 
> time
> that query failed.
>
> Ok, I have configured that and will enable in the time window when the errors 
> usually occur. I'll report as soon as I have something.

I don't have the answer yet but I have some progress: I finally
reproduced the "could not find %d free pages" error by running lots of
concurrent parallel queries.  Will investigate.

Set up:

create table foo (p int, a int, b int) partition by list (p);
create table foo_1 partition of foo for values in (1);
create table foo_2 partition of foo for values in (2);
create table foo_3 partition of foo for values in (3);
alter table foo_1 set (parallel_workers = 4);
alter table foo_2 set (parallel_workers = 4);
alter table foo_3 set (parallel_workers = 4);
insert into foo
select generate_series(1, 1000)::int % 3 + 1,
   generate_series(1, 1000)::int % 50,
   generate_series(1, 1000)::int % 50;
create index on foo_1(a);
create index on foo_2(a);
create index on foo_3(a);
create index on foo_1(b);
create index on foo_2(b);
create index on foo_3(b);
analyze;

Then I ran three copies of :

#!/bin/sh
(
  echo "set max_parallel_workers_per_gather = 4;"
  for I in `seq 1 10`; do
echo "explain analyze select count(*) from foo where a between 5
and 6 or b between 5 and 6;"
  done
) | psql postgres

-- 
Thomas Munro
http://www.enterprisedb.com



Re: ToDo: show size of partitioned table

2019-02-07 Thread Pavel Stehule
čt 7. 2. 2019 v 16:03 odesílatel Alvaro Herrera 
napsal:

> On 2019-Feb-07, Pavel Stehule wrote:
>
> > Your example
> >
> > postgres=# \dPtn+
> >  List of partitioned tables
> > ┌┬┬───┬─┬┬─┐
> > │ Schema │  Name  │ Owner │ Parent name │Size│ Description │
> > ╞╪╪═══╪═╪╪═╡
> > │ public │ p  │ pavel │ │ 8192 bytes │ │
> > │ public │ p_1│ pavel │ p   │ 8192 bytes │ │
> > │ public │ p_1_bc │ pavel │ p_1 │ 8192 bytes │ │
> > └┴┴───┴─┴┴─┘
> > (3 rows)
> >
> > I hope so the interpretation is clean .. there are three partitioned
> tables
> > (two are subpartitioned tables). Any partitioned table has assigned 8KB
> of
> > data.
> >
> > We can introduce new column "size with sub partitions" where these
> numbers
> > can be counted together. But for term "size" I expect valid rule S1+S2+..
> > SN = total size.
>
> Right now, a partitioned table has no size of its own; its only storage
> is in its children.  But that might change in the future, for example if
> we implement global indexes.  Then it will be useful to show these sizes
> separately.
>
> I suggest that we should have one column for the aggregated size of its
> children, and another column for the "local" size.  So currently the
> local size would always be zero for partitioned tables.  The other
> column (maybe "Children Size") would be the sum of the sizes of all its
> partitions.
>
> So I think this should be:
>

So this is third proposals :-)

Can I recapitulate it?

1. My proposal - the "size" = sum(partitions on level1), "size with nested
partitioned tables" = sum(partitions on all levels)
2. Amit's proposal - the "size" = sum(partions on all levels)
3. Alvaro's proposal - the "size" = empty now, "children size" =
sum(partitions on level1)

Every proposal is valid, and has some sense and display some information
from some perspective.

So first we should not to use one word term "size" in this context, because
there is any agreement.

Can we find some terminology for two *sizes*?

1. size of all partitions with nesting level = 1
2. size of all partitions without nesting level check.

maybe there can be third variant - size of all partitions with nesting
level > 1

I am not sure so "children size" is good term because this structure is
tree, and the deep is not specified.

Maybe "Immediate partitions size" for @1, "total partitions size" for @2,
and "indirect partitions size" for @3.

Then we can display @1, @2 or @1 or @3.

I hope so I mentioned all possible variants.

Comments, notes?

Pavel

>
>   List of partitioned tables
>
>  
> ┌┬┬───┬─┬┬───┬
>  │ Schema │  Name  │ Owner │ Parent name │Size│ Children Size │
> Description
>  ╞╪╪═══╪═╪╪═══╡
>  │ public │ p  │ pavel │ ││8192 bytes │
>  │ public │ p_1│ pavel │ p   ││8192 bytes │
>  │ public │ p_1_bc │ pavel │ p_1 │ 8192 bytes │   │
>
>  
> └┴┴───┴─┴┴───┴─
>
>
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: Tighten up a few overly lax regexes in pg_dump's tap tests

2019-02-07 Thread David Fetter
On Thu, Feb 07, 2019 at 10:10:32AM +0900, Michael Paquier wrote:
> On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote:
> > Correct.  One could argue that the regex is still suboptimal since “COMMENT 
> > ON
> > DATABASE postgres IS  ;” will be matched as well, but there I think the 
> > tradeoff
> > for readability wins.
> 
> Okay, that looks like an improvement anyway, so committed after going
> over the tests for similar problems, and there was one for CREATE
> DATABASE and DROP ROLE.  It is possible to have a regex which tells as
> at least one non-whitespace character, but from what I can see these
> don't really improve the readability.

Are you talking about \w+, or [^[:space:]]+, [^[:blank:]]+, or...?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Problems with plan estimates in postgres_fdw

2019-02-07 Thread Antonin Houska
Etsuro Fujita  wrote:

> (2018/12/28 15:50), Etsuro Fujita wrote:
> > Attached is a new version of the
> > patch.
> 
> Here is an updated version of the patch set.  Changes are:

I've spent some time reviewing the patches.

First, I wonder if the information on LIMIT needs to be passed to the
FDW. Cannot the FDW routines use root->tuple_fraction? I think (although have
not verified experimentally) that the problem reported in [1] is not limited
to the LIMIT clause. I think the same problem can happen if client uses cursor
and sets cursor_tuple_fraction very low. Since the remote node is not aware of
this setting when running EXPLAIN, the low fraction can also make postgres_fdw
think that retrieval of the whole set is cheaper than it will appear to be
during the actual execution.

Some comments on coding:

0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch
-

* I'm not sure if UPPERREL_ORDERD should to deal with LIMIT at all. Note that
grouping_planner() does not call create_limit_path() until it's done with
create_ordered_paths(), and create_ordered_paths() is where the FDW is
requested to add its paths to UPPERREL_ORDERED relation.

* add_foreign_ordered_paths()

Exprssion root->upper_targets[UPPERREL_FINAL] is used to access the target.

I think create_ordered_paths() should actually set the target so that
postgresGetForeignUpperPaths() can simply find it in
output_rel->reltarget. This is how postgres_fdw already retrieves the target
for grouped paths. Small patch is attached that makes this possible.

* regression tests: I think test(s) should be added for queries that have
  ORDER BY clause but do not have GROUP BY (and also no LIMIT / OFFSET)
  clause. I haven't noticed such tests.


0002-postgres_fdw-Perform-UPPERREL_FINAL-step-remotely-v3.patch
---

* add_foreign_final_paths()

Similar problem I reported for add_foreign_ordered_paths() above.

* adjust_limit_rows_costs()

Doesn't this function address more generic problem than the use of
postgres_fdw? If so, then it should be submitted as a separate patch. BTW, the
function is only used in pathnode.c, so it should be declared static.

Both patches:


One thing that makes me confused when I try to understand details is that the
new functions add_foreign_ordered_paths() and add_foreign_final_paths() both
pass "input_rel" for the "foreignrel" argument of estimate_path_cost_size():

estimate_path_cost_size(root, input_rel, ...)

whereas the existing add_foreign_grouping_paths() passes "grouped_rel":

estimate_path_cost_size(root, grouped_rel, ...)

Can you please make this consistent? If you do, you'll probably need to
reconsider your changes to estimate_path_cost_size().

And maybe related problem: why should FDW care about the effect of
apply_scanjoin_target_to_paths() like you claim to do here?

/*
 * If this is UPPERREL_ORDERED and/or UPPERREL_FINAL steps on the
 * final scan/join relation, the costs obtained from the cache
 * wouldn't yet contain the eval cost for the final scan/join target,
 * which would have been updated by apply_scanjoin_target_to_paths();
 * add the eval cost now.
 */
if (fpextra && !IS_UPPER_REL(foreignrel))
{
/* The costs should have been obtained from the cache. */
Assert(fpinfo->rel_startup_cost >= 0 &&
   fpinfo->rel_total_cost >= 0);

startup_cost += foreignrel->reltarget->cost.startup;
run_cost += foreignrel->reltarget->cost.per_tuple * rows;
}

I think it should not, whether "foreignrel" means "input_rel" or "output_rel"
from the perspective of postgresGetForeignUpperPaths().


[1] 
https://www.postgresql.org/message-id/87pnz1aby9@news-spur.riddles.org.uk

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b2239728cf..17e983f11e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2035,6 +2035,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		 * of the corresponding upperrels might not be needed for this query.
 		 */
 		root->upper_targets[UPPERREL_FINAL] = final_target;
+		root->upper_targets[UPPERREL_ORDERED] = final_target;
 		root->upper_targets[UPPERREL_WINDOW] = sort_input_target;
 		root->upper_targets[UPPERREL_GROUP_AGG] = grouping_target;
 
@@ -2118,6 +2119,12 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 	final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
 
 	/*
+	 * Set reltarget so that it's consistent with the paths. Also it's more
+	 * convenient for FDW to find the target here.
+	 */
+	final_rel->reltarget = final_target;
+

Re: bug tracking system

2019-02-07 Thread Nathan Wagner
On Wed, Feb 06, 2019 at 10:50:51PM -0500, Tom Lane wrote:
> Alvaro Herrera  writes:
> > On 2019-Feb-06, Tom Lane wrote:
> >> That will have caught exactly none of my own commits.
> 
> > Well, what text do you use?  I see "Per bug #XYZ" in the free-form text
> > of your commit messages, though that doesn't explicitly say that the bug
> > is fixed.  If we agree that that phrase indicates that the bug is fixed,
> > it seems fair to mark those bugs as fixed in Nathan's system.
> 
> There are a couple of problems here.
> 
> One is that we haven't really got an agreed-to formula for saying that
> this commit fixes that bug.  It's not that uncommon for a commit message
> to reference a bug that it doesn't fix --- I did that just today, for
> example.  So I'm worried that a regex that tries to capture all of the
> former will capture some of the latter too.

I don't think any reasonable regex would have caused a false positive
from the commit message in commit bdd9a99aac3.

> The other problem is that not all bugs have got bug numbers to begin
> with.  We just had some discussion about trying to label all
> pgsql-bugs traffic with bug numbers, but it wasn't sounding promising.
> 
> I do have a modest proposal for improving things going forward.  How
> about, if a commit purports to fix a particular bug, that we say
> "Fixes: https://postgr.es/m/" in place of our current
> habit of saying "Discussion: ...".  For bugs that have come in through
> the bug form, the bug number is trivially extractable from the
> message-id these days;

The bug number would only be extractable from the message-id of the
first message.  This proposal would require finding the message-id of
the original message, rather than just looking at the subject of any
message in the thread.  That seems like more work than is really
necessary.

Furthermore, this only works if you know in advance that the message-id
is a message id generated by the bug submission form, otherwise if a
message-id has the same form, it might look like a bug id.  What I'm
dimly attempting to express is that I think this method would more
subject to false positives than just quoting the bug number directly.

But I'm happy to work with whatever syntax people want to use.  I'm even
happy to write a different regex for each person.  I can easily write a
script that would look for log messages where Tom Lane was the committer
and look for a string of the form above.

A bigger question, at least for me is do people actually want to use the
system I've set up?  What do people think of it?  If people aren't
interested in it, and won't use it, then it's not worth putting a lot
more work into it.  I'll keep it going for myself even if bug statuses
never get updated because I occasionally find the text search useful.  A
brief look though indicates that there is already a way to search the
mailing list archives.  So, if the general sense is that what I've set
up is sort of pointless, that's useful information as well.

-- 
nw



Re: An out-of-date comment in nodeIndexonlyscan.c

2019-02-07 Thread Thomas Munro
On Thu, May 17, 2018 at 3:49 AM Heikki Linnakangas  wrote:
> On 14/05/18 02:15, Thomas Munro wrote:
> > Since commit cdf91edb (2012), nodeIndexonlyscan.c says:
> >
> >  /*
> >   * Predicate locks for index-only scans must be
> > acquired at the page
> >   * level when the heap is not accessed, since
> > tuple-level predicate
> >   * locks need the tuple's xmin value.  If we had to
> > visit the tuple
> >   * anyway, then we already have the tuple-level lock
> > and can skip the
> >   * page lock.
> >   */
> >  if (tuple == NULL)
> >  PredicateLockPage(scandesc->heapRelation,
> >
> > ItemPointerGetBlockNumber(tid),
> >
> > estate->es_snapshot);
> >
> > The first sentence of that comment is no longer true as of commit
> > c01262a8 (2013).  As for whether it's necessary to predicate-lock the
> > whole eheap page (rather than the heap tuple) anyway because of HOT
> > update chains, I don't know, so I'm not sure what wording to propose
> > instead.
>
> Hmm. If there are any updated tuples, HOT or not, the visibility map bit
> will not be set, and we won't reach this code. So I think we could
> acquire the tuple lock here.

Right.  CC Kevin.  I think we should at least fix the comment.  As for
changing the lock level, PredicateLockTuple() wants a heap tuple that
we don't have, so we'd probably need to add a PredicateLockTid()
function.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Connection slots reserved for replication

2019-02-07 Thread Oleksii Kliukin


> On 7. Feb 2019, at 07:51, Michael Paquier  wrote:
> 
> On Sat, Feb 02, 2019 at 10:35:20AM +0900, Michael Paquier wrote:
>> Oh, I have just noticed this patch when doing my vacuum homework.  I
>> have just added my name as committer, just wait a bit until the CF is
>> closed before I begin looking at it..
> 
> So, I have looked at this thread and the latest patch, and the thing
> looks in good shape.  Nice job by the authors and the reviewers.  It
> is a bit of a pain for users to hope that max_connections would be
> able to handle replication connections when needed, which can cause
> backups to not happen.  Using a separate GUC while we have
> max_wal_senders here to do the job is also not a good idea, so the
> approach of the patch is sound.  And on top of that, dependencies
> between GUCs get reduced.

Thank you for picking it up!

> 
> I have spotted one error though: the patch does not check if changing
> max_wal_senders overflows MAX_BACKEND, and we need to reflect a proper
> calculation into check_autovacuum_max_workers,
> check_max_worker_processes and check_maxconnections.

Good catch, thank you. 

I’ve noticed you returned the 'see max_connections’ parameter there. As noticed
previously in this thread by Kyotaro Horiguchi, it’s not clear what exactly we
are supposed to see there, perhaps it makes sense to elaborate (besides, the
comment on max_wal_senders at guc.c:382 hints that max_wal_senders depends on
max_connections, which is not true anymore).

> 
> One thing is that the slot position calculation gets a bit more
> complicated with the new slot set for WAL senders, still the code is
> straight-forward to follow so that's not really an issue in my
> opinion.  The potential backward-incompatible issue issue of creating
> a standby with lower max_wal_senders value than the primary is also
> something we can live with as that's simple enough to address, and I'd
> like to think that most users prefer inheriting the parameter from the
> primary to ease failover flows.

+1

> 
> It seems to me that it would be a good idea to have an
> autovacuum-worker-related message in the context of InitProcess() for
> a failed backend creation, but we can deal with that later if
> necessary.

Hm.. I am wondering how the autovacuum workers can run out of slots there?

> 
> With all that reviewed, I finish with the attached that I am
> comfortable enough to commit.  XLOG_PAGE_MAGIC is bumped as well, as a
> reminder because xl_parameter_change gets an upgrade, and I am most
> likely going to forget it.
> 
> Please let me know if you have comments.  I am still planning to do an
> extra pass on it.

Apart from the comment-related notes above your changes look good to me, thank
you.

Cheers,
Oleksii


Re: Synchronize with imath upstream

2019-02-07 Thread Tom Lane
Noah Misch  writes:
> On Wed, Feb 06, 2019 at 10:15:24AM -0500, Tom Lane wrote:
>> I don't object to keeping imported code in a form that matches upstream
>> as best we can.  (Should we also exclude such files from pgindent'ing?)

> I think it depends on how much time one spends merging upstream changes versus
> making PostgreSQL-specific edits.  For IMath, both amounts are too small to
> get excited about.  Does pgindent materially complicate src/timezone merges?

My practice with src/timezone is to pgindent the upstream code and then
diff it; given that extra step, it's not really any more complex (and
maybe less so, as this hides minor whitespace changes for instance).
There are some other deltas to worry about as well, see
src/timezone/README.

I have no particular opinion on whether pgindent should be part of the
mix for imath, but I do strongly recommend setting up and documenting a
reproducible import process, as I did for src/timezone.

regards, tom lane



Re: Add pg_partition_root to get top-most parent of a partition tree

2019-02-07 Thread Alvaro Herrera
On 2019-Feb-07, Michael Paquier wrote:

> On Thu, Feb 07, 2019 at 01:34:15PM +0900, Amit Langote wrote:
> > Yeah, I agree.  Should also check with Alvaro maybe?
> 
> Good idea.  Let's wait for his input.

I looked at it briefly a few weeks ago and it seemed sound, though I
haven't yet tried to write the constraint display query for psql using
it, yet -- but that should be straightforward anyway.  Let's get it
committed so we have one less thing to worry about.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ToDo: show size of partitioned table

2019-02-07 Thread Alvaro Herrera
On 2019-Feb-07, Pavel Stehule wrote:

> Your example
> 
> postgres=# \dPtn+
>  List of partitioned tables
> ┌┬┬───┬─┬┬─┐
> │ Schema │  Name  │ Owner │ Parent name │Size│ Description │
> ╞╪╪═══╪═╪╪═╡
> │ public │ p  │ pavel │ │ 8192 bytes │ │
> │ public │ p_1│ pavel │ p   │ 8192 bytes │ │
> │ public │ p_1_bc │ pavel │ p_1 │ 8192 bytes │ │
> └┴┴───┴─┴┴─┘
> (3 rows)
> 
> I hope so the interpretation is clean .. there are three partitioned tables
> (two are subpartitioned tables). Any partitioned table has assigned 8KB of
> data.
> 
> We can introduce new column "size with sub partitions" where these numbers
> can be counted together. But for term "size" I expect valid rule S1+S2+..
> SN = total size.

Right now, a partitioned table has no size of its own; its only storage
is in its children.  But that might change in the future, for example if
we implement global indexes.  Then it will be useful to show these sizes
separately.

I suggest that we should have one column for the aggregated size of its
children, and another column for the "local" size.  So currently the
local size would always be zero for partitioned tables.  The other
column (maybe "Children Size") would be the sum of the sizes of all its
partitions.

So I think this should be:

  List of partitioned tables
 
┌┬┬───┬─┬┬───┬
 │ Schema │  Name  │ Owner │ Parent name │Size│ Children Size │ 
Description 
 ╞╪╪═══╪═╪╪═══╡
 │ public │ p  │ pavel │ ││8192 bytes │
 │ public │ p_1│ pavel │ p   ││8192 bytes │
 │ public │ p_1_bc │ pavel │ p_1 │ 8192 bytes │   │
 
└┴┴───┴─┴┴───┴─



-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: phase out ossp-uuid?

2019-02-07 Thread Tom Lane
Peter Eisentraut  writes:
> I'm wondering whether we should phase out the use of the ossp-uuid
> library?

It's not really costing us any maintenance effort that I've noticed,
so I vote no.  Whether or not there are any people who can't use
another alternative, it would be more work to rip out that code than
to (continue to) ignore it.

regards, tom lane



Re: Tighten up a few overly lax regexes in pg_dump's tap tests

2019-02-07 Thread Daniel Gustafsson
> On 7 Feb 2019, at 13:55, Tels  wrote:
> On Wed, February 6, 2019 8:10 pm, Michael Paquier wrote:
>> On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote:
>>> Correct.  One could argue that the regex is still suboptimal since
>>> “COMMENT ON
>>> DATABASE postgres IS  ;” will be matched as well, but there I think the
>>> tradeoff
>>> for readability wins.
>> 
>> Okay, that looks like an improvement anyway, so committed after going
>> over the tests for similar problems, and there was one for CREATE
>> DATABASE and DROP ROLE.  It is possible to have a regex which tells as
>> at least one non-whitespace character, but from what I can see these
>> don't really improve the readability.
> 
> Sorry for being late to the party, but it just occured to me that while
> ".+"  is definitely an improvement over ".*", it isn't foolproof either,
> as it also matches ";”.

Correct.  The idea was to maintain readability while making the regex a bit
better, without any claims to make it perfect.  One can argue whether or not
that’s enough, but IMO keeping the tests as readable as possible is preferrable
when it comes to the tests in question, as they aren’t matching against user
supplied SQL but machine generated.

cheers ./daniel


Re: REINDEX CONCURRENTLY 2.0

2019-02-07 Thread Sergei Kornilov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

Hello
Sorry for late response. I review latest patch version.

I didn't found any new behavior bugs.

reindex concurrenlty can be in deadlock with another reindex concurrently (or 
manual vacuum (maybe with wraparound autovacuum) or create index concurrently 
on same table). But i think this is not issue for this feature, two create 
index concurrently can be in deadlock too.

Just one small note for documentation:

> +Indexes:
> +"idx" btree (col)
> +"idx_cct" btree (col) INVALID

Second index should be idx_ccnew (or idx_ccold), right?

Code looks good for me.

regards, Sergei

RE: Shared Memory: How to use SYSV rather than MMAP ?

2019-02-07 Thread REIX, Tony
Hi Thomas,


Thanks for your help,


Here are my experiments on the AIX 7.2 machine.

That sounds good !


About "huge", we have plans for AIX. But it is not urgent. Let's go with this 
patch.


Regards,


Tony



Buffers for SharedMemory PSIZ has been extended by:

   ldedit -btextpsize=64k -bdatapsize=64k -bstackpsize=64k 
/opt/freeware/bin/postgres_64



1) shm: mmap / huge: try


$PGDATA/postgresql.conf :

dynamic_shared_memory_type = mmap
shared_memory_type = mmap
huge_pages = try


$ psql -U postgres -x -f pg_showSystVParams.sql
-[ RECORD 1 ]--+-
shared_memory_type | mmap

-[ RECORD 1 ]--+-
dynamic_shared_memory_type | mmap

-[ RECORD 1 ]---
huge_pages | try


Procmap :


Start-ADD End-ADD   SIZE MODE  PSIZ  TYPE   VSID
 MAPPED OBJECT

+ grep SMMAP /tmp/PG.procmap
a00   a0008dca000145192K rw-   smSMMAP  8ce86c
+ grep MAIN /tmp/PG.procmap
1 10090c8839266K r-x   m MAINTEXT   8a62ea  
 postgres_64
119ea 1100f7500 986K rw-   m MAINDATA   836822  
 postgres_64
+ grep SHM /tmp/PG.procmap
a000100   a00010164K rw-   m SHM81b5e1  
 shmid:138413056



2) shm: mmap / huge: on


$PGDATA/postgresql.conf :

dynamic_shared_memory_type = mmap
shared_memory_type = mmap
huge_pages = on


$ pg_ctl start :
FATAL:  huge pages not supported on this platform



3) shm: mmap / huge: try


$PGDATA/postgresql.conf :

dynamic_shared_memory_type = mmap
shared_memory_type = mmap
huge_pages = try


$ pg_ctl start : OK  - No message



4) shm: sysv / huge: off


dynamic_shared_memory_type = sysv
shared_memory_type = sysv
huge_pages = off

$ psql -U postgres -x -f pg_showSystVParams.sql
-[ RECORD 1 ]--+-
shared_memory_type | sysv

-[ RECORD 1 ]--+-
dynamic_shared_memory_type | sysv

-[ RECORD 1 ]---
huge_pages | off

Procmap :

+ grep SMMAP /tmp/PG.procmap
+ grep MAIN /tmp/PG.procmap
1 10090c8839266K r-x   m MAINTEXT   886229  
 postgres_64
119ea 1100f7500 986K rw-   m MAINDATA   8ee2ce  
 postgres_64
+ grep SHM /tmp/PG.procmap
a00   a0008dd145216K rw-   m SHM8745c7  
 shmid:139461632
a001000   a00100164K rw-   m SHM80b380  
 shmid:685769729




5) shm: sysv / huge: on

FATAL:  huge pages not supported on this platform


6) shm: sysv / huge: try


$ pg_ctl start : OK  - No message





# cat procmapcheck.sh
PID=` ps -edf | grep /opt/freeware/bin/postgres | grep "   1" | awk '{print 
$2}'`
procmap -nfX > /tmp/PG.procmap $PID
grep SMMAP /tmp/PG.procmap
grep MAIN  /tmp/PG.procmap
grep SHM   /tmp/PG.procmap



$ cat pg_showSystVParams.sql
SHOW shared_memory_type;
SHOW dynamic_shared_memory_type;
SHOW huge_pages;



Cordialement,

Tony Reix

tony.r...@atos.net

ATOS / Bull SAS
ATOS Expert
IBM Coop Architect & Technical Leader
Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net



De : Thomas Munro 
Envoyé : jeudi 7 février 2019 03:30
À : REIX, Tony
Cc : EMPEREUR-MOT, SYLVIE
Objet : Re: Shared Memory: How to use SYSV rather than MMAP ?

On Thu, Feb 7, 2019 at 4:08 AM REIX, Tony  wrote:
> I've been able to build/install/test the version 11.1 with your patch, on AIX 
> 7.2 .
>
>
> I've changed the postgresql.conf file we use for our benchmark, and I've 
> checked that, when starting postgres, it reads this file.
>
> However, I'm not sure that it takes into account the values that I have set. 
> Or maybe the "postgres --describe-config" command does not do what I am 
> expecting (print the value of all the parameters set in the postgresql.conf 
> file)?

SHOW shared_memory_type;
SHOW dynamic_shared_memory_type;

Maybe you can also see a difference in the output of "procmap" for a
backend process?  I am not sure about that.

--
Thomas Munro
https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.comdata=02%7C01%7Ctony.reix%40atos.net%7C1f0898d2c4fe4073023908d68ca45bcc%7C33440fc6b7c7412cbb730e70b0198d5a%7C0%7C0%7C636851034874173812sdata=Jre8GiJFU%2FobP3K6xsYrV9dOg2nS7%2F7y9J81fDqTwJg%3Dreserved=0


Re: Problem while updating a foreign table pointing to a partitioned table on foreign server

2019-02-07 Thread Etsuro Fujita

(2019/02/02 10:21), Michael Paquier wrote:

On Fri, Nov 16, 2018 at 01:35:15PM -0500, Tom Lane wrote:

I wonder whether we'd be better off thinking of a way to let FDWs
invent additional system column IDs for their tables, so that
something like a remote table OID could be represented in the
natural way as a Var with negative varattno.  This'd potentially
also be a win for FDWs whose underlying storage has a row identifier,
but it's not of type "tid".  Instead of trying to shoehorn their
row ID into SelfItemPointerAttributeNumber, they could define a
new system column that has a more appropriate data type.  Admittedly
there'd be some infrastructure work to do to make this happen, maybe
a lot of it; but it's a bullet we really need to bite at some point.


This patch got zero input for the last couple of months.  As it is
classified as bug fix, I have moved it to next CF, waiting on author.
Fujita-san, are you planning to look at it?


I 100% agree with Tom, and actually, I tried to address his comments, 
but I haven't come up with a clear solution for that yet.  I really want 
to address this, but I won't have much time to work on that at least 
until after this development cycle, so what I'm thinking is to mark this 
as Returned with feedback, or if possible, to move this to the 2019-07 CF.


My apologies for the late reply.

Best regards,
Etsuro Fujita




Re: Tighten up a few overly lax regexes in pg_dump's tap tests

2019-02-07 Thread Tels
Moin,

On Wed, February 6, 2019 8:10 pm, Michael Paquier wrote:
> On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote:
>> Correct.  One could argue that the regex is still suboptimal since
>> “COMMENT ON
>> DATABASE postgres IS  ;” will be matched as well, but there I think the
>> tradeoff
>> for readability wins.
>
> Okay, that looks like an improvement anyway, so committed after going
> over the tests for similar problems, and there was one for CREATE
> DATABASE and DROP ROLE.  It is possible to have a regex which tells as
> at least one non-whitespace character, but from what I can see these
> don't really improve the readability.

Sorry for being late to the party, but it just occured to me that while
".+"  is definitely an improvement over ".*", it isn't foolproof either,
as it also matches ";".

Thus:

  regexp => qr/^COMMENT ON DATABASE postgres IS .+;/m,

matches things like:

  'COMMENT ON DATABASE postgres IS ;;'
  'COMMENT ON DATABASE postgres IS  ;'
  'COMMENT ON DATABASE postgres IS --;'

etc.

I'm not sure it is really necessary to deal with these cases, but one
possibility would be to pre-compute regexps:

  $QR_COMMENT = qr/[^ ;]+/;
  $QR_IDENTIFIER = qr/[^ ;]+/; # etc

or whataver is the thing that should actually be matched here and use them
like so:

  regexp => qr/^COMMENT ON DATABASE postgres IS $QR_COMMENT;/m,

That way it is easily changable and quite readable.

Oh, one more question. Shouldn't these regexps that start with "^" also
end with "$"? Or can there be output like:

  'COMMENT ON DATABASE postgres IS $QR_IDENTIFIER; SELECT 1;'

?

Best regards,

Tels




Re: Protect syscache from bloating with negative cache entries

2019-02-07 Thread Kyotaro HORIGUCHI
At Thu, 07 Feb 2019 15:24:18 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20190207.152418.139132570.horiguchi.kyot...@lab.ntt.co.jp>
> I'm going to retake numbers with search-only queries.

Yeah, I was stupid.

I made a rerun of benchmark using "-S -T 30" on the server build
with no assertion and -O2. The numbers are the best of three
successive attempts.  The patched version is running with
cache_target_memory = 0, cache_prune_min_age = 600 and
cache_entry_limit = 0 but pruning doesn't happen by the workload.

master: 13393 tps
v12   : 12625 tps (-6%)

Significant degradation is found.

Recuded frequency of dlist_move_tail by taking 1ms interval
between two succesive updates on the same entry let the
degradation dissapear.

patched  : 13720 tps (+2%)

I think there's still no need of such frequency. It is 100ms in
the attched patch.

# I'm not sure the name LRU_IGNORANCE_INTERVAL makes sens..

The attached 

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 72a569703662b93fb57c55c337b16107ebccfce3 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 7 Feb 2019 14:56:07 +0900
Subject: [PATCH 1/4] Add dlist_move_tail

We have dlist_push_head/tail and dlist_move_head but not
dlist_move_tail. Add it.
---
 src/include/lib/ilist.h | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index b1a5974ee4..659ab1ac87 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
-- 
2.16.3

>From 429001e7cbbb88710cfc5589bc46e2490f93d216 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 16 Oct 2018 13:04:30 +0900
Subject: [PATCH 2/4] Remove entries that haven't been used for a certain time

Catcache entries can be left alone for several reasons. It is not
desirable that they eat up memory. With this patch, This adds
consideration of removal of entries that haven't been used for a
certain time before enlarging the hash array.

This also can put a hard limit on the number of catcache entries.
---
 doc/src/sgml/config.sgml  |  38 +
 src/backend/access/transam/xact.c |   5 +
 src/backend/utils/cache/catcache.c| 205 +-
 src/backend/utils/misc/guc.c  |  63 
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/utils/catcache.h  |  33 -
 6 files changed, 338 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 9b7a7388d5..d0d2374944 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1662,6 +1662,44 @@ include_dir 'conf.d'
   
  
 
+ 
+  syscache_memory_target (integer)
+  
+   syscache_memory_target configuration parameter
+  
+  
+  
+   
+Specifies the maximum amount of memory to which syscache is expanded
+without pruning. The value defaults to 0, indicating that pruning is
+always considered. After exceeding this size, syscache pruning is
+considered according to
+. If you need to keep
+certain amount of syscache entries with intermittent usage, try
+increase this setting.
+   
+  
+ 
+
+ 
+  syscache_prune_min_age (integer)
+  
+   syscache_prune_min_age configuration parameter
+  
+  
+  
+   
+Specifies the minimum amount of unused time in seconds at which a
+syscache entry is considered to be removed. -1 indicates that syscache
+pruning is disabled at all. The value defaults to 600 seconds
+(10 minutes). The syscache entries that are not
+used for the duration can be removed to prevent syscache bloat. This
+behavior is suppressed until the size of syscache exceeds
+.
+   
+  
+ 
+
  
   max_stack_depth (integer)
   
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 92bda87804..ddc433c59e 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -734,7 +734,12 @@ void
 SetCurrentStatementStartTimestamp(void)
 {
 	if (!IsParallelWorker())
+	{
 		stmtStartTimestamp = GetCurrentTimestamp();
+
+		/* Set this timestamp as aproximated current time */
+		SetCatCacheClock(stmtStartTimestamp);
+	}
 	

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2019-02-07 Thread Michael Paquier
On Thu, Feb 07, 2019 at 12:49:43PM +0100, Peter Eisentraut wrote:
> Anyway, that's all cosmetics.  Are there any more functional or
> correctness issues to be addressed?

Not that I can think of.  At least this evening.

> Another thing I was thinking of: We need some database-global tests.
> For example, at some point during development, I had broken some variant
> of REINDEX DATABASE.  Where could we put those tests?  Maybe with
> reindexdb?

Having some coverage in the TAP tests of reindexdb is a good idea.
--
Michael


signature.asc
Description: PGP signature


Re: Commit Fest 2019-01 is now closed

2019-02-07 Thread Andres Freund
Hi,

On 2019-02-07 12:53:39 +0100, Peter Eisentraut wrote:
> On 06/02/2019 21:09, Magnus Hagander wrote:
> > This has now been pushed and is available. I've set it up with stable,
> > 12 and 13 as possible versions for now, but I have not added any tags to
> > the existing patches (except for one, in order to test it).
> 
> What is the meaning of this?  If something is meant for 13, shouldn't it
> be moved to the next commit fest?

Why? There's plenty stuff that's chugging along in development but ought
to be processed at less urgency / by different people, than the stuff
targeted to be committed soon. It's already frustrating to contribute to
postgresql for new people, but if they don't get feedback for half a
year because they submitted around December / January it's almost
guaranteed that they vanish.  Additionally, there's an increasing amount
of development projects that are too large to complete in a single
cycle, and if we just stop looking at them for half a year, they'll also
not succeed.

Greetings,

Andres Freund



Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2019-02-07 Thread Peter Eisentraut
On 30/01/2019 06:16, Michael Paquier wrote:
> On Tue, Jan 29, 2019 at 09:51:35PM +0100, Peter Eisentraut wrote:
>> On 16/01/2019 09:27, Michael Paquier wrote:
>>> index_create does not actually need its extra argument with the tuple
>>> descriptor.  I think that we had better grab the column name list from
>>> indexInfo and just pass that down to index_create() (patched on my
>>> local branch), so it is an overkill to take a full copy of the index's
>>> TupleDesc.
>>
>> Please send a fixup patch.
> 
> Sure.  Attached is a patch which can be applied on top of what you
> sent last, based on what I noticed at review, here and there.  You
> also forgot to switch two heap_open() to table_open() in pg_depend.c.

OK, applied most of that.

I didn't take your function renaming.  I had deliberately named the
functions index_concurrently_${task}, so their relationship is more
easily visible.

Anyway, that's all cosmetics.  Are there any more functional or
correctness issues to be addressed?

Another thing I was thinking of: We need some database-global tests.
For example, at some point during development, I had broken some variant
of REINDEX DATABASE.  Where could we put those tests?  Maybe with reindexdb?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 5ea43abfdb81269157576ff722396475c8881d2c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 7 Feb 2019 12:43:09 +0100
Subject: [PATCH v8] REINDEX CONCURRENTLY

Discussion: 
https://www.postgresql.org/message-id/flat/60052986-956b-4478-45ed-8bd119e9b9cf%402ndquadrant.com#74948a1044c56c5e817a5050f554ddee
---
 doc/src/sgml/mvcc.sgml|   1 +
 doc/src/sgml/ref/reindex.sgml | 185 +++-
 src/backend/catalog/index.c   | 544 ++-
 src/backend/catalog/pg_depend.c   | 143 +++
 src/backend/commands/indexcmds.c  | 877 +++---
 src/backend/commands/tablecmds.c  |  32 +-
 src/backend/nodes/copyfuncs.c |   1 +
 src/backend/nodes/equalfuncs.c|   1 +
 src/backend/parser/gram.y |  22 +-
 src/backend/tcop/utility.c|  10 +-
 src/bin/psql/common.c |  16 +
 src/bin/psql/tab-complete.c   |  18 +-
 src/include/catalog/dependency.h  |   5 +
 src/include/catalog/index.h   |  16 +
 src/include/commands/defrem.h |   6 +-
 src/include/nodes/parsenodes.h|   1 +
 .../expected/reindex-concurrently.out |  78 ++
 src/test/isolation/isolation_schedule |   1 +
 .../isolation/specs/reindex-concurrently.spec |  40 +
 src/test/regress/expected/create_index.out|  95 ++
 src/test/regress/sql/create_index.sql |  61 ++
 21 files changed, 1986 insertions(+), 167 deletions(-)
 create mode 100644 src/test/isolation/expected/reindex-concurrently.out
 create mode 100644 src/test/isolation/specs/reindex-concurrently.spec

diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index bedd9a008d..9b7ef8bf09 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -926,6 +926,7 @@ Table-level Lock Modes
 
  Acquired by VACUUM (without FULL),
  ANALYZE, CREATE INDEX 
CONCURRENTLY,
+ REINDEX CONCURRENTLY,
  CREATE STATISTICS, and certain ALTER
  INDEX and ALTER TABLE variants (for full
  details see  and 
 
-REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } 
name
+REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ 
CONCURRENTLY ] name
 
  
 
@@ -67,10 +67,7 @@ Description
  
   An index build with the CONCURRENTLY option failed, 
leaving
   an invalid index. Such indexes are useless but it can be
-  convenient to use REINDEX to rebuild them. Note that
-  REINDEX will not perform a concurrent build. To build 
the
-  index without interfering with production you should drop the index and
-  reissue the CREATE INDEX CONCURRENTLY command.
+  convenient to use REINDEX to rebuild them.
  
 
 
@@ -151,6 +148,21 @@ Parameters
 

 
+   
+CONCURRENTLY
+
+ 
+  When this option is used, PostgreSQL will 
rebuild the
+  index without taking any locks that prevent concurrent inserts,
+  updates, or deletes on the table; whereas a standard reindex build
+  locks out writes (but not reads) on the table until it's done.
+  There are several caveats to be aware of when using this option
+   see .
+ 
+
+   
+

 VERBOSE
 
@@ -241,6 +253,161 @@ Notes
Each individual partition can be reindexed separately instead.
   
 
+  
+   Rebuilding Indexes 
Concurrently
+
+   
+index
+rebuilding concurrently
+   
+
+   
+Rebuilding an index can interfere with regular operation of a database.
+Normally PostgreSQL locks 

Re: ToDo: show size of partitioned table

2019-02-07 Thread Pavel Stehule
čt 7. 2. 2019 v 11:25 odesílatel Amit Langote 
napsal:

> Hi,
>
> On 2019/02/07 18:08, Pavel Stehule wrote:
> > čt 7. 2. 2019 v 9:51 odesílatel Amit Langote <
> langote_amit...@lab.ntt.co.jp>
> >> \dPn seems to work fine, but I don't quite understand why \dPn+ should
> >> show the sizes only for nested partitions of level.  Consider the
>
> (correcting words of my previous email: ... of level 1.)
>
> > Show nested objects in rectangular output is a problem. I prefer a design
> > where any times the sum of displayed sizes is same like total size.
> >
> > So if I have partitions on level1 of size 16KB, and on level 2 8KB, then
> I
> > would to display 16 and 8, and not 24 and 8. If I remember, this rule is
> > modified, when filter is used.
>
> Just to recap, the originally proposed feature is to show the size of a
> partitioned table by summing the sizes of *all* of its (actually leaf)
> partitions, which \dP[t|i]+ gives us.  As you mentioned, a limitation of
> the feature as initially proposed is that it only shows partitioned tables
> that are roots of their respective partition trees.  That is, there is no
> way to see the sizes of the intermediate partitioned tables using any of
> psql's \d commands.  So, you introduced the "n" modifier, whereby
> \dP[t|i]n+ now shows *also* the intermediate partitioned tables with their
> sizes.  But it only considers the directly attached partitions of each
> partitioned table that's shown.  So, only those partitioned tables that
> have leaf partitions directly attached them are shown with non-0 size (if
> leaf partitions are non-empty) and others with size 0 (root partitioned
> tables in most cases where nested partitioned tables are involved).  But I
> think that means the "n" modifier is changing the behavior of the base
> command (\dP+) which is meant to show the total size of *all* partitions
> under a given partitioned table.  Maybe, the "n" modifier should only
> result in including the nested/intermediate partitioned tables and nothing
> more than that.
>

It was a Michael's request to see all hierarchy, and I think so it has some
benefits


> I see your point that all these tables are appearing in the display as one
> flat list and so the sizes of same leaf partitions may be multiply
> counted, but it's not totally a flat representation given that you have
> added "Parent name" column.  We could document that the size of a nested
> partitioned table shown in the display is also counted in the size of its
> parent partitioned table.  That I think may be easier to understand than
> that the size of each partitioned table shown in the display only
> considers the sizes of leaf partitions that are directly attached to it.
>
>
Personally I don't agree - a) who read a documentation, b) it is really
violation of some relation principles. It is clean, if we have only one
table, but if we see a report with more tables, than multiple size
calculation can be messy.

It is not problem if you have clean schema like P1, P2, tables (when tree
is balanced). But when some tables can be assigned to P1 and some to P2
(tree is not balanced) then it is not clean what is size of directly
attached tables and what is size of subpartitions. So it is better don't
sum apples and oranges.

\dPn shows all subroots and related minimal size. I think so this is very
clear definition.

Your example

postgres=# \dPtn+
 List of partitioned tables
┌┬┬───┬─┬┬─┐
│ Schema │  Name  │ Owner │ Parent name │Size│ Description │
╞╪╪═══╪═╪╪═╡
│ public │ p  │ pavel │ │ 8192 bytes │ │
│ public │ p_1│ pavel │ p   │ 8192 bytes │ │
│ public │ p_1_bc │ pavel │ p_1 │ 8192 bytes │ │
└┴┴───┴─┴┴─┘
(3 rows)

I hope so the interpretation is clean .. there are three partitioned tables
(two are subpartitioned tables). Any partitioned table has assigned 8KB of
data.

We can introduce new column "size with sub partitions" where these numbers
can be counted together. But for term "size" I expect valid rule S1+S2+..
SN = total size.

It is acceptable for you?





> Thoughts? Any more opinions on this?
>




>
> Thanks,
> Amit
>
>


Re: phase out ossp-uuid?

2019-02-07 Thread Jose Luis Tallon

On 7/2/19 10:03, Dave Page wrote:

On Thu, Feb 7, 2019 at 8:26 AM Peter Eisentraut
 wrote:

I'm wondering whether we should phase out the use of the ossp-uuid
library? (not the uuid-ossp extension)


Hmm... FWIW, just get it in core altogether? Seems small and useful 
enough... if it carries the opfamily with it, UUID would become really 
convenient to use for distributed applications.


(being using that functionality for a while, already)


   We have had preferred
alternatives for a while now, so it shouldn't be necessary to use this
anymore, except perhaps in some marginal circumstances?  As we know,
ossp-uuid isn't maintained anymore, and a few weeks ago the website was
gone altogether, but it seems to be back now.

I suggest we declare it deprecated in PG12 and remove it altogether in PG13.

Much as I'd like to get rid of it, we don't have an alternative for
Windows do we? The docs for 11 imply it's required for UUID support
(though the wording isn't exactly clear, saying it's required for
UUID-OSSP support!):
https://www.postgresql.org/docs/11/install-windows-full.html#id-1.6.4.8.8


AFAIR, Windows has its own DCE/v4 UUID generation support. UUID v5 can 
be generated using built-in crypto hashes. v1 are the ones (potentially) 
more cumbersome to generate plus they are the least useful IMHO.



Just my .02€

Thanks,

    / J.L.





Transaction commits VS Transaction commits (with parallel) VS query mean time

2019-02-07 Thread Haribabu Kommi
Hi Hackers,

Does increase in Transaction commits per second means good query
performance?
Why I asked this question is, many monitoring tools display that number of
transactions
per second in the dashboard (including pgadmin).

During the testing of bunch of queries with different set of
configurations, I observed that
TPS of some particular configuration has increased compared to default
server configuration, but the overall query execution performance is
decreased after comparing all queries run time.

This is because of larger xact_commit value than default configuration.
With the changed server configuration, that leads to generate more parallel
workers and every parallel worker operation is treated as an extra commit,
because of this reason, the total number of commits increased, but the
overall query performance is decreased.

Is there any relation of transaction commits to performance?

Is there any specific reason to consider the parallel worker activity also
as a transaction commit? Especially in my observation, if we didn't
consider the parallel worker activity as separate commits, the test doesn't
show an increase in transaction commits.

Suggestions?

Regards,
Haribabu Kommi
Fujitsu Australia


Re: ToDo: show size of partitioned table

2019-02-07 Thread Amit Langote
Hi,

On 2019/02/07 18:08, Pavel Stehule wrote:
> čt 7. 2. 2019 v 9:51 odesílatel Amit Langote 
>> \dPn seems to work fine, but I don't quite understand why \dPn+ should
>> show the sizes only for nested partitions of level.  Consider the

(correcting words of my previous email: ... of level 1.)

> Show nested objects in rectangular output is a problem. I prefer a design
> where any times the sum of displayed sizes is same like total size.
> 
> So if I have partitions on level1 of size 16KB, and on level 2 8KB, then I
> would to display 16 and 8, and not 24 and 8. If I remember, this rule is
> modified, when filter is used.

Just to recap, the originally proposed feature is to show the size of a
partitioned table by summing the sizes of *all* of its (actually leaf)
partitions, which \dP[t|i]+ gives us.  As you mentioned, a limitation of
the feature as initially proposed is that it only shows partitioned tables
that are roots of their respective partition trees.  That is, there is no
way to see the sizes of the intermediate partitioned tables using any of
psql's \d commands.  So, you introduced the "n" modifier, whereby
\dP[t|i]n+ now shows *also* the intermediate partitioned tables with their
sizes.  But it only considers the directly attached partitions of each
partitioned table that's shown.  So, only those partitioned tables that
have leaf partitions directly attached them are shown with non-0 size (if
leaf partitions are non-empty) and others with size 0 (root partitioned
tables in most cases where nested partitioned tables are involved).  But I
think that means the "n" modifier is changing the behavior of the base
command (\dP+) which is meant to show the total size of *all* partitions
under a given partitioned table.  Maybe, the "n" modifier should only
result in including the nested/intermediate partitioned tables and nothing
more than that.

I see your point that all these tables are appearing in the display as one
flat list and so the sizes of same leaf partitions may be multiply
counted, but it's not totally a flat representation given that you have
added "Parent name" column.  We could document that the size of a nested
partitioned table shown in the display is also counted in the size of its
parent partitioned table.  That I think may be easier to understand than
that the size of each partitioned table shown in the display only
considers the sizes of leaf partitions that are directly attached to it.

Thoughts? Any more opinions on this?

Thanks,
Amit




Re: dsa_allocate() faliure

2019-02-07 Thread Jakub Glapa
> Do you have query logging enabled ?  If not, could you consider it on at
least
one of those servers ?  I'm interested to know what ELSE is running at the
time
that query failed.

Ok, I have configured that and will enable in the time window when the
errors usually occur. I'll report as soon as I have something.


--
regards,
pozdrawiam,
Jakub Glapa


On Thu, Feb 7, 2019 at 12:21 AM Justin Pryzby  wrote:

> Moving to -hackers, hopefully it doesn't confuse the list scripts too much.
>
> On Mon, Feb 04, 2019 at 08:52:17AM +0100, Jakub Glapa wrote:
> > I see the error showing up every night on 2 different servers. But it's a
> > bit of a heisenbug because If I go there now it won't be reproducible.
>
> Do you have query logging enabled ?  If not, could you consider it on at
> least
> one of those servers ?  I'm interested to know what ELSE is running at the
> time
> that query failed.
>
> Perhaps you could enable query logging JUST for the interval of time that
> the
> server usually errors ?  The CSV logs can be imported to postgres for
> analysis.
> You might do something like SELECT
> left(message,99),COUNT(1),max(session_id) FROM postgres_log WHERE log_time
> BETWEEN .. AND .. GROUP BY 1 ORDER BY 2;
> And just maybe there'd be a query there that only runs once per day which
> would
> allow reproducing the error at will.  Or utility command like vacuum..
>
> I think ideally you'd set:
>
> log_statement= all
> log_min_messages = info
> log_destination  = 'stderr,csvlog'
> # stderr isn't important for this purpose, but I keep it set to capture
> crash messages, too
>
> You should set these to something that works well at your site:
>
> log_rotation_age= '2min'
> log_rotation_size   = '32MB'
>
> I would normally set these, and I don't see any reason why you wouldn't set
> them too:
>
> log_checkpoints = on
> log_lock_waits  = on
> log_temp_files  = on
> log_min_error_statement = notice
> log_temp_files  = 0
> log_min_duration_statement  = '9sec'
> log_autovacuum_min_duration = '999sec'
>
> And I would set these too but maybe you'd prefer to do something else:
>
> log_directory   = /var/log/postgresql
> log_file_mode   = 0640
> log_filename= postgresql-%Y-%m-%d_%H%M%S.log
>
> Justin
>


RE: Protect syscache from bloating with negative cache entries

2019-02-07 Thread Ideriha, Takeshi
Hi, thanks for recent rapid work. 

>From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
>At Tue, 5 Feb 2019 19:05:26 -0300, Alvaro Herrera 
>wrote in <20190205220526.GA1442@alvherre.pgsql>
>> On 2019-Feb-05, Tomas Vondra wrote:
>>
>> > I don't think we need to remove the expired entries right away, if
>> > there are only very few of them. The cleanup requires walking the
>> > hash table, which means significant fixed cost. So if there are only
>> > few expired entries (say, less than 25% of the cache), we can just
>> > leave them around and clean them if we happen to stumble on them
>> > (although that may not be possible with dynahash, which has no
>> > concept of expiration) of before enlarging the hash table.
>>
>> I think seqscanning the hash table is going to be too slow;
>> Ideriha-san idea of having a dlist with the entries in LRU order
>> (where each entry is moved to head of list when it is touched) seemed
>> good: it allows you to evict older ones when the time comes, without
>> having to scan the rest of the entries.  Having a dlist means two more
>> pointers on each cache entry AFAIR, so it's not a huge amount of memory.
>
>Ah, I had a separate list in my mind. Sounds reasonable to have pointers in 
>cache entry.
>But I'm not sure how much additional
>dlist_* impact.

Thank you for picking up my comment, Alvaro.
That's what I was thinking about.

>The attached is the new version with the following properties:
>
>- Both prune-by-age and hard limiting feature.
>  (Merged into single function, single scan)
>  Debug tracking feature in CatCacheCleanupOldEntries is removed
>  since it no longer runs a full scan.
It seems to me that adding hard limit strategy choice besides prune-by-age one 
is good
to help variety of (contradictory) cases which have been discussed in this 
thread. I need hard limit as well.

The hard limit is currently represented as number of cache entry 
controlled by both cache_entry_limit and cache_entry_limit_prune_ratio. 
Why don't we change it to the amount of memory (bytes)? 
Amount of memory is more direct parameter for customer who wants to
set the hard limit and easier to tune compared to number of cache entry.

>- Using LRU to get rid of full scan.
>
>I added new API dlist_move_to_tail which was needed to construct LRU.

I just thought there is dlist_move_head() so if new entries are 
head side and old ones are tail side. But that's not objection to adding 
new API because depending on the situation head for new entry could be readable 
code 
and vice versa. 

Regards,
Takeshi Ideriha




Re: Connection slots reserved for replication

2019-02-07 Thread Kyotaro HORIGUCHI
Hello.

At Thu, 7 Feb 2019 15:51:46 +0900, Michael Paquier  wrote 
in <20190207065146.gn4...@paquier.xyz>
> On Sat, Feb 02, 2019 at 10:35:20AM +0900, Michael Paquier wrote:
> > Oh, I have just noticed this patch when doing my vacuum homework.  I
> > have just added my name as committer, just wait a bit until the CF is
> > closed before I begin looking at it..
> 
> So, I have looked at this thread and the latest patch, and the thing
> looks in good shape.  Nice job by the authors and the reviewers.  It
> is a bit of a pain for users to hope that max_connections would be
> able to handle replication connections when needed, which can cause
> backups to not happen.  Using a separate GUC while we have
> max_wal_senders here to do the job is also not a good idea, so the
> approach of the patch is sound.  And on top of that, dependencies
> between GUCs get reduced.
> 
> I have spotted one error though: the patch does not check if changing
> max_wal_senders overflows MAX_BACKEND, and we need to reflect a proper
> calculation into check_autovacuum_max_workers,
> check_max_worker_processes and check_maxconnections.

I don't think it is a good thing, including the existing checker
functions. But as you (seem to have) wrote below it can be
another issue. So I agree to that.

> One thing is that the slot position calculation gets a bit more
> complicated with the new slot set for WAL senders, still the code is
> straight-forward to follow so that's not really an issue in my

I was hesitating to propose to change it (in InitProcGlobal) but
I like the fixed style.

> opinion.  The potential backward-incompatible issue issue of creating
> a standby with lower max_wal_senders value than the primary is also
> something we can live with as that's simple enough to address, and I'd
> like to think that most users prefer inheriting the parameter from the
> primary to ease failover flows.

I belive so.

> It seems to me that it would be a good idea to have an
> autovacuum-worker-related message in the context of InitProcess() for
> a failed backend creation, but we can deal with that later if
> necessary.

(Maybe I'm losing the point, but) The guc validate functions for
max_connections and friends seem rather harmful than useless,
since we only see an error for max_connections and others won't
be seen, and it conceals what is happening. I think we should
remove the validators and InitializeMaxBackends should complain
on that providing more meaningful information. In another patch?

> With all that reviewed, I finish with the attached that I am
> comfortable enough to commit.  XLOG_PAGE_MAGIC is bumped as well, as a
> reminder because xl_parameter_change gets an upgrade, and I am most
> likely going to forget it.

Some removed comments are revived but I'm fine with it. Adding
tags in documentation seems fine.

> Please let me know if you have comments.  I am still planning to do an
> extra pass on it.

After all I (am not the author) am fine with it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: ToDo: show size of partitioned table

2019-02-07 Thread Pavel Stehule
čt 7. 2. 2019 v 9:51 odesílatel Amit Langote 
napsal:

> Hi Pavel,
>
> Thanks for sending the updated patch.
>
> On 2018/12/19 15:38, Pavel Stehule wrote:
> > út 18. 12. 2018 v 8:49 odesílatel Amit Langote <
> >> On 2018/12/17 17:48, Pavel Stehule wrote:
> >>> I can imagine new additional flag - line "n" nested - and then we can
> >>> display nested partitioned tables with parent table info. Some like
> >>>
> >>> \dPt - show only root partition tables
> >>> \dPnt or \dPtn - show root and nested partitioned tables
> >>
> >> Too much complication maybe?
> >>
> >
> > I wrote it - the setup query is more complex, but not too much. I fixed
> the
> > size calculation, when nested partitions tables are visible - it
> calculate
> > partitions only from level1 group. Then the displayed size is same as
> total
> > size
>
> \dPn seems to work fine, but I don't quite understand why \dPn+ should
> show the sizes only for nested partitions of level.  Consider the
> following example outputs:
>

Show nested objects in rectangular output is a problem. I prefer a design
where any times the sum of displayed sizes is same like total size.

So if I have partitions on level1 of size 16KB, and on level 2 8KB, then I
would to display 16 and 8, and not 24 and 8. If I remember, this rule is
modified, when filter is used.

Maybe we can introduce more columns where totals from leaves can be
calculated.

Regards

Pavel


>
> create table p (a int, b char) partition by list (a);
> create table p_1 partition of p for values in (1) partition by list (b);
> create table p_1_a partition of p_1 for values in ('a');
> create table p_1_bc partition of p_1 for values in ('b', 'c') partition by
> list (b);
> create table p_1_b partition of p_1_bc for values in ('b');
> create table p_1_c partition of p_1_bc for values in ('c');
> create table p_2 partition of p for values in (2);
> insert into p values (1, 'a');
> insert into p values (1, 'b');
> insert into p values (2);
>
> \dP+
> List of partitioned relations
>  Schema │ Name │ Owner │ Size  │ Description
> ┼──┼───┼───┼─
>  public │ p│ amit  │ 24 kB │
> (1 row)
>
> -- size of 'p' shown as 8KB, whereas it's actually 24KB as per above
> -- size of 'p_1' shown as 8KB, whereas it's actually 16KB
> \dPn+
>   List of partitioned relations
>  Schema │  Name  │ Owner │ Parent name │Size│ Description
> ┼┼───┼─┼┼─
>  public │ p  │ amit  │ │ 8192 bytes │
>  public │ p_1│ amit  │ p   │ 8192 bytes │
>  public │ p_1_bc │ amit  │ p_1 │ 8192 bytes │
> (3 rows)
>
> Also, if the root partitioned table doesn't have a directly attached leaf
> partition, it's size is shown as 0 bytes, whereas I think it should
> consider the sizes of its other nested partitions.
>
> drop table p_2;
>
> \dPn+
>   List of partitioned relations
>  Schema │  Name  │ Owner │ Parent name │Size│ Description
> ┼┼───┼─┼┼─
>  public │ p  │ amit  │ │ 0 bytes│
>  public │ p_1│ amit  │ p   │ 8192 bytes │
>  public │ p_1_bc │ amit  │ p_1 │ 8192 bytes │
> (3 rows)
>
> If I remove the following two statements from the patched code:
>
> +if (show_nested_partitions)
> +appendPQExpBuffer(, "\n WHERE d.level = 1");
>
> +/*
> + * Assign size just for directly assigned tables, when nested
> + * partitions are visible
> + */
> +if (show_nested_partitions)
> +appendPQExpBuffer(, "\n WHERE ppt.isleaf AND
> ppt.level = 1");
>
> I get the following output, which I find more intuitive:
>
> create table p_2 partition of p for values in (2);
> insert into p values (2);
>
> \dP+
> List of partitioned relations
>  Schema │ Name │ Owner │ Size  │ Description
> ┼──┼───┼───┼─
>  public │ p│ amit  │ 24 kB │
> (1 row)
>
>
> \dPn+
>   List of partitioned relations
>  Schema │  Name  │ Owner │ Parent name │Size│ Description
> ┼┼───┼─┼┼─
>  public │ p  │ amit  │ │ 24 kB  │
>  public │ p_1│ amit  │ p   │ 16 kB  │
>  public │ p_1_bc │ amit  │ p_1 │ 8192 bytes │
> (3 rows)
>
> drop table p_2;
>
> \dPn+
>   List of partitioned relations
>  Schema │  Name  │ Owner │ Parent name │Size│ Description
> ┼┼───┼─┼┼─
>  public │ p  │ amit  │ │ 16 kB  │
>  public │ p_1│ amit  │ p   │ 16 kB  │
>  public │ p_1_bc │ amit  │ p_1 │ 8192 bytes │
> (3 rows)
>
> Thoughts?
>
>
> Meanwhile, some comments on the patch:
>
> + If modifier n is used, then nested partition
> + tables are displayed too.
>

use Getopt::Long for catalog scripts

2019-02-07 Thread John Naylor
This was suggested in

https://www.postgresql.org/message-id/11766.1545942853%40sss.pgh.pa.us

I also adjusted usage() to match. There might be other scripts that
could use this treatment, but I figure this is a good start.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index d9f92ba701..8c4bd0c7ae 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -89,7 +89,8 @@ generated-header-symlinks: $(top_builddir)/src/include/catalog/header-stamp
 # version number when it changes.
 bki-stamp: genbki.pl Catalog.pm $(POSTGRES_BKI_SRCS) $(POSTGRES_BKI_DATA) $(top_srcdir)/configure.in
 	$(PERL) -I $(catalogdir) $< \
-		-I $(top_srcdir)/src/include/ --set-version=$(MAJORVERSION) \
+		--include-path=$(top_srcdir)/src/include/ \
+		--set-version=$(MAJORVERSION) \
 		$(POSTGRES_BKI_SRCS)
 	touch $@
 
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index be81094ffb..48ba69cd0a 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -16,6 +16,7 @@
 
 use strict;
 use warnings;
+use Getopt::Long;
 
 use File::Basename;
 use File::Spec;
@@ -23,41 +24,20 @@ BEGIN  { use lib File::Spec->rel2abs(dirname(__FILE__)); }
 
 use Catalog;
 
-my @input_files;
 my $output_path = '';
 my $major_version;
 my $include_path;
 
-# Process command line switches.
-while (@ARGV)
-{
-	my $arg = shift @ARGV;
-	if ($arg !~ /^-/)
-	{
-		push @input_files, $arg;
-	}
-	elsif ($arg =~ /^-I/)
-	{
-		$include_path = length($arg) > 2 ? substr($arg, 2) : shift @ARGV;
-	}
-	elsif ($arg =~ /^-o/)
-	{
-		$output_path = length($arg) > 2 ? substr($arg, 2) : shift @ARGV;
-	}
-	elsif ($arg =~ /^--set-version=(.*)$/)
-	{
-		$major_version = $1;
-		die "Invalid version string.\n"
-		  if !($major_version =~ /^\d+$/);
-	}
-	else
-	{
-		usage();
-	}
-}
+GetOptions(
+	'output:s'=> \$output_path,
+	'include-path:s'  => \$include_path,
+	'set-version:s'   => \$major_version) || usage();
+
+die "Invalid version string.\n"
+  if !($major_version =~ /^\d+$/);
 
 # Sanity check arguments.
-die "No input files.\n" if !@input_files;
+die "No input files.\n" if !@ARGV;
 die "--set-version must be specified.\n" if !defined $major_version;
 die "-I, the header include path, must be specified.\n" if !$include_path;
 
@@ -79,7 +59,7 @@ my @toast_decls;
 my @index_decls;
 my %oidcounts;
 
-foreach my $header (@input_files)
+foreach my $header (@ARGV)
 {
 	$header =~ /(.+)\.h$/
 	  or die "Input files need to be header files.\n";
@@ -917,11 +897,11 @@ sub form_pg_type_symbol
 sub usage
 {
 	die <] [--include-path/-i include path] header...
 
 Options:
--I   include path
--o   output path
+--output Output directory (default '.')
+--include-path   Include path for source directory
 --set-versionPostgreSQL version number for initdb cross-check
 
 genbki.pl generates BKI files and symbol definition
diff --git a/src/backend/utils/Gen_fmgrtab.pl b/src/backend/utils/Gen_fmgrtab.pl
index f17a78ebcd..9aa8714840 100644
--- a/src/backend/utils/Gen_fmgrtab.pl
+++ b/src/backend/utils/Gen_fmgrtab.pl
@@ -18,32 +18,14 @@ use Catalog;
 
 use strict;
 use warnings;
+use Getopt::Long;
 
-# Collect arguments
-my @input_files;
 my $output_path = '';
 my $include_path;
 
-while (@ARGV)
-{
-	my $arg = shift @ARGV;
-	if ($arg !~ /^-/)
-	{
-		push @input_files, $arg;
-	}
-	elsif ($arg =~ /^-o/)
-	{
-		$output_path = length($arg) > 2 ? substr($arg, 2) : shift @ARGV;
-	}
-	elsif ($arg =~ /^-I/)
-	{
-		$include_path = length($arg) > 2 ? substr($arg, 2) : shift @ARGV;
-	}
-	else
-	{
-		usage();
-	}
-}
+GetOptions(
+	'output:s'=> \$output_path,
+	'include:s'   => \$include_path) || usage();
 
 # Make sure output_path ends in a slash.
 if ($output_path ne '' && substr($output_path, -1) ne '/')
@@ -52,7 +34,7 @@ if ($output_path ne '' && substr($output_path, -1) ne '/')
 }
 
 # Sanity check arguments.
-die "No input files.\n"   if !@input_files;
+die "No input files.\n"   if !@ARGV;
 die "No include path; you must specify -I.\n" if !$include_path;
 
 # Read all the input files into internal data structures.
@@ -63,7 +45,7 @@ die "No include path; you must specify -I.\n" if !$include_path;
 # more than one data file.
 my %catalogs;
 my %catalog_data;
-foreach my $datfile (@input_files)
+foreach my $datfile (@ARGV)
 {
 	$datfile =~ /(.+)\.dat$/
 	  or die "Input files need to be data (.dat) files.\n";
@@ -292,7 +274,7 @@ Catalog::RenameTempFile($tabfile,$tmpext);
 sub usage
 {
 	die <

Re: Inconsistent error handling in the openssl init code

2019-02-07 Thread Daniel Gustafsson
> On 7 Feb 2019, at 05:12, Michael Paquier  wrote:
> 
> On Wed, Feb 06, 2019 at 11:18:22PM +0100, Daniel Gustafsson wrote:
>> The errorhandling in be_tls_init(), and functions called from it, set the
>> appropriate elevel by the isServerStart.  ssl_protocol_version_to_openssl() 
>> is
>> however erroring out unconditionally with ERROR on invalid TLS versions.  The
>> attached patch adds isServerStart handling to the TLS version handling as 
>> well,
>> to make be_tls_init() consistent in its errorhandling.
> 
> (Adding Peter Eisentraut in CC)
> 
> Good catch, this is an oversight from commit e73e67c7, which affects
> only HEAD.  The comment at the top of ssl_protocol_version_to_openssl
> becomes incorrect as the function would not throw an error in a reload
> context.

Doh, managed to completely overlook that.  The attached updated patch also
fixes the comment, thanks!

cheers ./daniel



openssl_tlsver-v2.patch
Description: Binary data


Re: phase out ossp-uuid?

2019-02-07 Thread Dave Page
On Thu, Feb 7, 2019 at 8:26 AM Peter Eisentraut
 wrote:
>
> I'm wondering whether we should phase out the use of the ossp-uuid
> library? (not the uuid-ossp extension)  We have had preferred
> alternatives for a while now, so it shouldn't be necessary to use this
> anymore, except perhaps in some marginal circumstances?  As we know,
> ossp-uuid isn't maintained anymore, and a few weeks ago the website was
> gone altogether, but it seems to be back now.
>
> I suggest we declare it deprecated in PG12 and remove it altogether in PG13.

Much as I'd like to get rid of it, we don't have an alternative for
Windows do we? The docs for 11 imply it's required for UUID support
(though the wording isn't exactly clear, saying it's required for
UUID-OSSP support!):
https://www.postgresql.org/docs/11/install-windows-full.html#id-1.6.4.8.8

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: ToDo: show size of partitioned table

2019-02-07 Thread Amit Langote
Hi Pavel,

Thanks for sending the updated patch.

On 2018/12/19 15:38, Pavel Stehule wrote:
> út 18. 12. 2018 v 8:49 odesílatel Amit Langote <
>> On 2018/12/17 17:48, Pavel Stehule wrote:
>>> I can imagine new additional flag - line "n" nested - and then we can
>>> display nested partitioned tables with parent table info. Some like
>>>
>>> \dPt - show only root partition tables
>>> \dPnt or \dPtn - show root and nested partitioned tables
>>
>> Too much complication maybe?
>>
> 
> I wrote it - the setup query is more complex, but not too much. I fixed the
> size calculation, when nested partitions tables are visible - it calculate
> partitions only from level1 group. Then the displayed size is same as total
> size

\dPn seems to work fine, but I don't quite understand why \dPn+ should
show the sizes only for nested partitions of level.  Consider the
following example outputs:

create table p (a int, b char) partition by list (a);
create table p_1 partition of p for values in (1) partition by list (b);
create table p_1_a partition of p_1 for values in ('a');
create table p_1_bc partition of p_1 for values in ('b', 'c') partition by
list (b);
create table p_1_b partition of p_1_bc for values in ('b');
create table p_1_c partition of p_1_bc for values in ('c');
create table p_2 partition of p for values in (2);
insert into p values (1, 'a');
insert into p values (1, 'b');
insert into p values (2);

\dP+
List of partitioned relations
 Schema │ Name │ Owner │ Size  │ Description
┼──┼───┼───┼─
 public │ p│ amit  │ 24 kB │
(1 row)

-- size of 'p' shown as 8KB, whereas it's actually 24KB as per above
-- size of 'p_1' shown as 8KB, whereas it's actually 16KB
\dPn+
  List of partitioned relations
 Schema │  Name  │ Owner │ Parent name │Size│ Description
┼┼───┼─┼┼─
 public │ p  │ amit  │ │ 8192 bytes │
 public │ p_1│ amit  │ p   │ 8192 bytes │
 public │ p_1_bc │ amit  │ p_1 │ 8192 bytes │
(3 rows)

Also, if the root partitioned table doesn't have a directly attached leaf
partition, it's size is shown as 0 bytes, whereas I think it should
consider the sizes of its other nested partitions.

drop table p_2;

\dPn+
  List of partitioned relations
 Schema │  Name  │ Owner │ Parent name │Size│ Description
┼┼───┼─┼┼─
 public │ p  │ amit  │ │ 0 bytes│
 public │ p_1│ amit  │ p   │ 8192 bytes │
 public │ p_1_bc │ amit  │ p_1 │ 8192 bytes │
(3 rows)

If I remove the following two statements from the patched code:

+if (show_nested_partitions)
+appendPQExpBuffer(, "\n WHERE d.level = 1");

+/*
+ * Assign size just for directly assigned tables, when nested
+ * partitions are visible
+ */
+if (show_nested_partitions)
+appendPQExpBuffer(, "\n WHERE ppt.isleaf AND
ppt.level = 1");

I get the following output, which I find more intuitive:

create table p_2 partition of p for values in (2);
insert into p values (2);

\dP+
List of partitioned relations
 Schema │ Name │ Owner │ Size  │ Description
┼──┼───┼───┼─
 public │ p│ amit  │ 24 kB │
(1 row)


\dPn+
  List of partitioned relations
 Schema │  Name  │ Owner │ Parent name │Size│ Description
┼┼───┼─┼┼─
 public │ p  │ amit  │ │ 24 kB  │
 public │ p_1│ amit  │ p   │ 16 kB  │
 public │ p_1_bc │ amit  │ p_1 │ 8192 bytes │
(3 rows)

drop table p_2;

\dPn+
  List of partitioned relations
 Schema │  Name  │ Owner │ Parent name │Size│ Description
┼┼───┼─┼┼─
 public │ p  │ amit  │ │ 16 kB  │
 public │ p_1│ amit  │ p   │ 16 kB  │
 public │ p_1_bc │ amit  │ p_1 │ 8192 bytes │
(3 rows)

Thoughts?


Meanwhile, some comments on the patch:

+ If modifier n is used, then nested partition
+ tables are displayed too.

Maybe, say "non-root partitioned tables" instead of "nested partition
tables".  Same comment also applies for the same text in the paragraphs
for \dPni and \dPnt too.

+/*
+ * listPartitions()
+ *
+ * handler for \dP, \dPt and \dPi

Maybe mention the 'n' modifier here?

+ */
+bool
+listPartitions(const char *pattern, bool verbose, bool show_indexes, bool
show_tables, bool show_nested_partitions)
+{

I'm not sure if psql's source code formatting guidelines are different
from the backend code, but putting all the arguments on the same line
causes the line grow well over 78 characters.  Can you wrap maybe?

+if (pattern)
+/* translator: object_name is "index", "table" or "relation" 

Re: Too rigorous assert in reorderbuffer.c

2019-02-07 Thread Arseny Sher


Alvaro Herrera  writes:

> On 2019-Feb-06, Arseny Sher wrote:
>
>>
>> Alvaro Herrera  writes:
>>
>> > note the additional pg_temp_XYZ row in the middle.  This is caused by
>> > the rewrite in ALTER TABLE.  Peter E fixed that in Pg11 in commit
>> > 325f2ec55; I don't think there's much to do in the backbranches other
>> > than hide the pesky record to avoid it breaking the test.
>>
>> Oh, I see. Let's just remove the first insertion then, as in attached.
>> I've tested it on master and on 9.4.
>
> Ah, okay.  Does the test still fail when run without the code fix?

Yes. The problem here is overriding cmax of catalog (pg_attribute in the
test) tuples, so it fails without any data at all.


--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Pre-v11 appearances of the word "procedure" in v11 docs

2019-02-07 Thread Peter Eisentraut
On 04/02/2019 23:02, Peter Eisentraut wrote:
> Some things from this thread were left for post-11; see attached patch.
> 
> Specifically, this changes pg_dump and ruleutils.c to use the FUNCTION
> keyword instead of PROCEDURE in trigger and event trigger definitions,
> which was postponed because of the required catversion bump.

done

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



restrict pg_stat_ssl to superuser?

2019-02-07 Thread Peter Eisentraut
As discussed in [0], should we restrict access to pg_stat_ssl to
superusers (and an appropriate pg_ role)?

If so, is there anything in that view that should be made available to
non-superusers?  If not, then we could perhaps do this via a simple
permission change instead of going the route of blanking out individual
columns.


[0]:


-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-02-07 Thread Masahiko Sawada
On Thu, Feb 7, 2019 at 9:27 AM Moon, Insung
 wrote:
>
> Dear Ibrar Ahmed.
>
> From: Ibrar Ahmed [mailto:ibrar.ah...@gmail.com]
> Sent: Thursday, February 07, 2019 4:09 AM
> To: Moon, Insung
> Cc: Tom Lane; PostgreSQL-development
> Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key 
> Management Service (KMS)
>
>
> On Tue, Jul 3, 2018 at 5:37 PM Moon, Insung  
> wrote:
> Dear Tom Lane.
>
> > -Original Message-
> > From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> > Sent: Monday, June 18, 2018 11:52 PM
> > To: Robert Haas
> > Cc: Joe Conway; Masahiko Sawada; Moon, Insung; PostgreSQL-development
> > Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and 
> > Key Management Service (KMS)
> >
> > Robert Haas  writes:
> > > On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway  wrote:
> > >> Not necessarily. Our pages probably have enough predictable bytes to
> > >> aid cryptanalysis, compared to user data in a column which might not
> > >> be very predicable.
> >
> > > Really?  I would guess that the amount of entropy in a page is WAY
> > > higher than in an individual column value.
> >
> > Depending on the specifics of the encryption scheme, having some amount of 
> > known (or guessable) plaintext may allow breaking
> > the cipher, even if much of the plaintext is not known.  This is cryptology 
> > 101, really.
> >
> > At the same time, having to have a bunch of independently-decipherable 
> > short field values is not real secure either, especially
> > if they're known to all be encrypted with the same key.  But what you know 
> > or can guess about the plaintext in such cases
> > would be target-specific, rather than an attack that could be built once 
> > and used against any PG database.
>
> > > Yes. If there is known to guessable data of encrypted data, maybe there 
> > > is a  possibility of decrypting the encrypted data.
> > >
> > > But would it be safe to use an additional encryption mode such as GCM or 
> > > XFS to solve this problem?
> > > (Do not use the same IV)
> > > Thank you and Best regards.
> > > Moon.
> >
> > >
> > >   regards, tom lane
>
>
>
>
>
> > Hi Moon,
> >
> > Have you done progress on that patch? I am thinking to work on the project 
> > and found that you are already working on it. The last message is almost 
> > six months old. I want to check with you that are you still working on 
> > that, if yes I can help on that by reviewing the patch etc. If you are not 
> > working on that anymore, can you share your done work (if possible)?
> > --
> > Ibrar Ahmed
>
> We are currently developing for TDE and integration KMS.
> So, We will Also be prepared to start a new discussion with the PoC patch as 
> soon as possible.
>
> At currently, we have changed the development direction of a per-Tablespace 
> unit by per-table
> Also, currently researching how to associate with KMIP protocol related to 
> the encryption key for integration with KMS.
> We talked about this in the Unconference session of PGConf.ASIA,
> And a week ago, we talked about the development direction of TDE and 
> integration with KMS at FOSDEM PGDAY[1].
>
> We will soon provide PoC with new discussions.
>
> Regards.
>
> [1] TRANSPARENT DATA ENCRYPTION IN POSTGRESQL AND INTEGRATION WITH KEY 
> MANAGEMENT SERVICES
> https://www.postgresql.eu/events/fosdem2019/schedule/session/2307-transparent-data-encryption-in-postgresql-and-integration-with-key-management-services/
>

Let me share the details of progress and current state.

As the our presentation slides describes I've written the PoC code for
transparent encryption that uses 2-tier key architecture and has the
key rotation feature. We've been discussed the design database
transparent encryption on -hackers so far and we found a good design
and implementation. I will share them with our research results. But I
think the design of integration of PostgreSQL with key management
services(KMS) is more controvertible.

For integration with KMS, I'm going to propose to add generic key
management APIs to PostgreSQL core so that it can communicate with
KMSs supporting different interfaces and protocols and can get the
master key (of 2-tier key architecture) from them. Users can choose a
key management plugin according to their enviornment.

The integration of PostgreSQL with KMS should be separated patch from
the TDE patch and we think that TDE can be done first. But at least
it's essential to provide a way to get the master key from an external
location. Therefore as the first step we can propose the basic
components of TDE with a simple interface to get the master key from
KMS rather than supporting full key management APIs. The basic
components of TDE that we're going to propose are:

* Transparent encryption at a layer between shared buffer and OS page cache
* Per tablespaces encryption
* 2-tier key architecture
* Key rotation
* System catalogs and temporary files encryption

WAL encryption will follow as an 

phase out ossp-uuid?

2019-02-07 Thread Peter Eisentraut
I'm wondering whether we should phase out the use of the ossp-uuid
library? (not the uuid-ossp extension)  We have had preferred
alternatives for a while now, so it shouldn't be necessary to use this
anymore, except perhaps in some marginal circumstances?  As we know,
ossp-uuid isn't maintained anymore, and a few weeks ago the website was
gone altogether, but it seems to be back now.

I suggest we declare it deprecated in PG12 and remove it altogether in PG13.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Allow some recovery parameters to be changed with reload

2019-02-07 Thread Sergei Kornilov
Hello

>>  I think the recovery parameters
>>
>>  archive_cleanup_command
>
> Only triggered by the checkpointer.
>
>>  promote_trigger_file
>>  recovery_end_command
>>  recovery_min_apply_delay
>
> Only looked at by the startup process.

We have some possible trouble with restore_command? As far i know it also only 
looked at by the startup process.

regards, Sergei



more unconstify use

2019-02-07 Thread Peter Eisentraut
Here is a patch that makes more use of unconstify() by replacing casts
whose only purpose is to cast away const.  Also a preliminary patch to
remove casts that were useless to begin with.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From e874e9fde3371bedbb8c5301a966c082a8369e2c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 29 Jan 2019 01:12:18 +0100
Subject: [PATCH v1 1/5] Remove useless casts

Some of these were uselessly casting away "const", some were just
nearby, but they where all unnecessary anyway.
---
 contrib/btree_gist/btree_utils_num.c | 24 
 src/backend/catalog/heap.c   |  2 +-
 src/common/unicode_norm.c|  2 +-
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/contrib/btree_gist/btree_utils_num.c 
b/contrib/btree_gist/btree_utils_num.c
index 29b0faf997..4d10bc93f3 100644
--- a/contrib/btree_gist/btree_utils_num.c
+++ b/contrib/btree_gist/btree_utils_num.c
@@ -185,10 +185,10 @@ gbt_num_union(GBT_NUMKEY *out, const GistEntryVector 
*entryvec, const gbtree_nin
c.upper = [tinfo->size];
/* if out->lower > cur->lower, adopt cur as lower */
if (tinfo->f_gt(o.lower, c.lower, flinfo))
-   memcpy((void *) o.lower, (void *) c.lower, tinfo->size);
+   memcpy((void *) o.lower, c.lower, tinfo->size);
/* if out->upper < cur->upper, adopt cur as upper */
if (tinfo->f_lt(o.upper, c.upper, flinfo))
-   memcpy((void *) o.upper, (void *) c.upper, tinfo->size);
+   memcpy((void *) o.upper, c.upper, tinfo->size);
}
 
return out;
@@ -206,10 +206,10 @@ gbt_num_same(const GBT_NUMKEY *a, const GBT_NUMKEY *b, 
const gbtree_ninfo *tinfo
GBT_NUMKEY_R b1,
b2;
 
-   b1.lower = &(((GBT_NUMKEY *) a)[0]);
-   b1.upper = &(((GBT_NUMKEY *) a)[tinfo->size]);
-   b2.lower = &(((GBT_NUMKEY *) b)[0]);
-   b2.upper = &(((GBT_NUMKEY *) b)[tinfo->size]);
+   b1.lower = &(a[0]);
+   b1.upper = &(a[tinfo->size]);
+   b2.lower = &(b[0]);
+   b2.upper = &(b[tinfo->size]);
 
return (tinfo->f_eq(b1.lower, b2.lower, flinfo) &&
tinfo->f_eq(b1.upper, b2.upper, flinfo));
@@ -227,8 +227,8 @@ gbt_num_bin_union(Datum *u, GBT_NUMKEY *e, const 
gbtree_ninfo *tinfo, FmgrInfo *
if (!DatumGetPointer(*u))
{
*u = PointerGetDatum(palloc0(tinfo->indexsize));
-   memcpy((void *) &(((GBT_NUMKEY *) DatumGetPointer(*u))[0]), 
(void *) rd.lower, tinfo->size);
-   memcpy((void *) &(((GBT_NUMKEY *) 
DatumGetPointer(*u))[tinfo->size]), (void *) rd.upper, tinfo->size);
+   memcpy(&(((GBT_NUMKEY *) DatumGetPointer(*u))[0]), rd.lower, 
tinfo->size);
+   memcpy(&(((GBT_NUMKEY *) DatumGetPointer(*u))[tinfo->size]), 
rd.upper, tinfo->size);
}
else
{
@@ -236,10 +236,10 @@ gbt_num_bin_union(Datum *u, GBT_NUMKEY *e, const 
gbtree_ninfo *tinfo, FmgrInfo *
 
ur.lower = &(((GBT_NUMKEY *) DatumGetPointer(*u))[0]);
ur.upper = &(((GBT_NUMKEY *) DatumGetPointer(*u))[tinfo->size]);
-   if (tinfo->f_gt((void *) ur.lower, (void *) rd.lower, flinfo))
-   memcpy((void *) ur.lower, (void *) rd.lower, 
tinfo->size);
-   if (tinfo->f_lt((void *) ur.upper, (void *) rd.upper, flinfo))
-   memcpy((void *) ur.upper, (void *) rd.upper, 
tinfo->size);
+   if (tinfo->f_gt(ur.lower, rd.lower, flinfo))
+   memcpy((void *) ur.lower, rd.lower, tinfo->size);
+   if (tinfo->f_lt(ur.upper, rd.upper, flinfo))
+   memcpy((void *) ur.upper, rd.upper, tinfo->size);
}
 }
 
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 06d18a1cfb..d0215a5eed 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -777,7 +777,7 @@ AddNewAttributeTuples(Oid new_rel_oid,
{
FormData_pg_attribute attStruct;
 
-   memcpy(, (char *) SysAtt[i], 
sizeof(FormData_pg_attribute));
+   memcpy(, SysAtt[i], 
sizeof(FormData_pg_attribute));
 
/* Fill in the correct relation OID in the copied tuple 
*/
attStruct.attrelid = new_rel_oid;
diff --git a/src/common/unicode_norm.c b/src/common/unicode_norm.c
index d5f6d32e0f..6281ff 100644
--- a/src/common/unicode_norm.c
+++ b/src/common/unicode_norm.c
@@ -59,7 +59,7 @@ static pg_unicode_decomposition *
 get_code_entry(pg_wchar code)
 {
return bsearch(&(code),
-  (void *) UnicodeDecompMain,
+  UnicodeDecompMain,