date:20190412

Re: serializable transaction: exclude constraint violation (backed by GIST index) instead of ssi conflict

2019-04-12 Thread Thomas Munro

On Fri, Apr 12, 2019 at 6:01 AM Peter Billen  wrote:
> On Thu, Apr 11, 2019 at 6:12 PM Peter Billen  wrote:
>> I believe we are after multiple issues/improvements:
>>
>> 1. Could we extend 
>> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=fcff8a575198478023ada8a48e13b50f70054766
>>  by adding support for exclude constraints?
>> 2. Fully support gist & constraints in serializable transactions. I did not 
>> yet test a unique constraint backed by a gist constraint, which is also 
>> interesting to test I assume. This test would tell us if there currently is 
>> a status quo between btree and gist indexes.
>
> Regarding the remark in (2), I forgot that a unique constraint cannot be 
> backed by a gist index, so forget the test I mentioned.

Yeah, well we can't directly extend the existing work because unique
constraints are *entirely* handled inside the btree code (in fact no
other index types even support unique constraints, yet).  This
exclusion constraints stuff is handled differently: the error handling
you're seeing is raised by generic code in
src/backend/executor/execIndexing.c , but the code that knows how to
actually perform the necessary SSI checks is index-specific, in this
case in gist.c.  To do the moral equivalent of the UCV change, we'll
need to get these two bits of code to communicate across the "index
AM" boundary (the way that index implementations such as GIST are
plugged into Postgres).  The question is how.

One (bad) idea is that we could actually perform the (illegal)
aminsert just before we raise that error!  We know we're going to roll
back anyway, because that's either going to fail when gist.c calls
CheckForSerializableConflictIn(), or if not, when we raise
"conflicting key value violates exclusion constraint ...".  That's a
bit messy though, because it modifies the index unnecessarily and
possibly breaks important invariants.  An improved version of that
idea is to add a new optional index AM interface "amcheckinsert()"
that shares whatever code it needs to share to do all the work that
insert would do except the actual modification.  That way,
check_exclusion_or_unique_constraint() would give every index AM a
chance to raise an SSI error if it wants to.  This seems like it
should work, but I don't want to propose messing around with the index
AM interface lightly.  It wouldn't usually get called, just in the
error path.

Anyone got a better idea?

-- 
Thomas Munro
https://enterprisedb.com

Re: pg_dump is broken for partition tablespaces

2019-04-12 Thread Alvaro Herrera

On 2019-Mar-06, David Rowley wrote:

> Over on [1] Andres pointed out that the pg_dump support for the new to
> PG12 tablespace inheritance feature is broken.  This is the feature
> added in ca4103025dfe26 to allow a partitioned table to have a
> tablespace that acts as the default tablespace for newly attached
> partitions. The idea being that you can periodically change the
> default tablespace for new partitions to a tablespace that sits on a
> disk partition with more free space without affecting the default
> tablespace for normal non-partitioned tables. Anyway...
> 
> pg_dump is broken with this.

Here's a patch to fix the reported problems.  It's somewhat invasive,
and I've spent a long time staring at it, so I very much appreciate eyes
on it.

Guiding principles:

* Partitioned tables can have the database tablespace in their
  reltablespace column, in contrast with non-partitioned relations.
  This is okay and doesn't cause permission checks, since nothing
  is put in the tablespace just yet.

* When creating a partition, the parent's tablespace takes precedence
  over default_tablespace.  This sounds a bit weird at first, but I
  think it's correct.  If you really want your partition to override the
  parent's tablespace specification, you need to add a tablespace
  clause.  (I was initially opposed to this, but on further reflection
  I think it's the right thing to do.)

With these things in mind, I have introduced some behavior changes.  In
particular, both bugs mentioned in this thread have been fixed.

* pg_dump now correctly reproduces the state, still without using any
  TABLESPACE clauses.  (I suppose this is important for portability of
  dumps, as well as the --no-tablespaces option).

* When a partitioned table is created specifying the database tablespace
  in the TABLESPACE clause, and later a partition is created under
  a different non-empty default_tablespace setting, the partition honors
  the parent's tablespace.  (Fixing this bug became one of the
  principles here.)

* When an index is rewritten because of an ALTER TABLE, it no longer
  moves magically to another tablespace.  This worked fine for
  non-partitioned indexes (commit bd673e8e864a fixed it), but it was
  broken for partitioned ones.

* pg_dump was patched to emit straight CREATE TABLE plus ALTER TABLE
  ATTACH for partitions (just like the binary upgrade code does) instead
  of CREATE TABLE PARTITION OF.  This is what lets the partition use a
  straight default_tablespace setting instead of having to conditionally
  attach TABLESPACE clauses, which would create quite a mess.

Making this work required somewhat unusual hacks:

* Nodes IndexStmt and Constraint have gained a new member
  "reset_default_tblspc".  This is not set in the normal code paths, but
  when ALTER TABLE wants to recreate an index, it sets this flag; the
  flag tells index creation to reset default_tablespace to empty.  This
  is only necessary because ALTER TABLE execution resolves (at execution
  time) the tablespace of the artifically-generated SQL command to
  recreate the index.  If default_tablespace is left there, it
  interferes with that.

* GetDefaultTablespace now behaves differently for partitioned tables,
  precisely because we want it not to return InvalidOid when the
  tablespace is specifically selected.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 37880110e59..9fa645a8525 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1266,8 +1266,8 @@ WITH ( MODULUS numeric_literal, REM
if the table is temporary.  For
   partitioned tables, since no storage is required for the table itself,
   the tablespace specified here only serves to mark the default tablespace
-  for any newly created partitions when no other tablespace is explicitly
-  specified.
+  for any newly created partitions when no other tablespace is specified
+  either using an explicit TABLESPACE clause.

diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index 6d7e11645d2..4f2587d74af 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -314,6 +314,7 @@ Boot_DeclareIndexStmt:
 	stmt->transformed = false;
 	stmt->concurrent = false;
 	stmt->if_not_exists = false;
+	stmt->reset_default_tblspc = false;

 	/* locks and races need not concern us in bootstrap mode */
 	relationId = RangeVarGetRelid(stmt->relation, NoLock,
@@ -363,6 +364,7 @@ Boot_DeclareUniqueIndexStmt:
 	stmt->transformed = false;
 	stmt->concurrent = false;
 	stmt->if_not_exists = false;
+	stmt->reset_default_tblspc = false;

 	/* locks and races need not concern us in bootstrap mode */
 	relationId =

Re: change password_encryption default to scram-sha-256?

2019-04-12 Thread Bruce Momjian

On Mon, Apr  8, 2019 at 10:08:07AM -0400, Tom Lane wrote:
> "Jonathan S. Katz"  writes:
> > On 4/8/19 8:49 AM, Magnus Hagander wrote:
> >> I think the real question is, is it OK to give them basically 5months
> >> warning, by right now saying if you don't have a release out in 6
> >> months, things will break.
> 
> > Given the supported libraries all have open pull requests or issues, it
> > should be fairly easy to inquire if they would be able to support it for
> > PG12 vs PG13. If this sounds like a reasonable plan, I'm happy to reach
> > out and see.
> 
> I think that the right course here is to notify these developers that
> we will change the default in PG13, and it'd be good if they put out
> stable releases with SCRAM support well before that.  This discussion
> seems to be talking as though it's okay if we allow zero daylight
> between availability of fixed drivers and release of a PG version that
> defaults to using SCRAM.  That'd be totally unfair to packagers and
> users.  There needs to be a pretty fair-size window for those fixed
> drivers to propagate into the wild.  A year is not too much; IMO it's
> barely enough.

It would be nice to address channel binding as part of this.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

2019-04-12 Thread Bruce Momjian

On Sat, Apr  6, 2019 at 03:29:13PM +0200, Antonin Houska wrote:
> Robert Haas  wrote:
> 
> > On Fri, Apr 5, 2019 at 11:22 AM Antonin Houska  wrote:
> > > > Wouldn't Tom's proposal to use a stream cipher fix all this?
> > >
> > > Yes it would make the extra alignment unnecessary, but our solution tries 
> > > to
> > > meet specific requirements of disk encryption. Stream cipher appears to be
> > > incompatible with these requirements:
> > >
> > > https://en.wikipedia.org/wiki/Disk_encryption_theory
> > 
> > Hmm.  Well, I don't know what to do about that, but I think this patch
> > is going to be facing an uphill battle if the encrypted and
> > unencrypted WAL formats use different alignment.
> 
> Once you said this could be a problem, I thought about it a bit more. The link
> above tells that stream cipher is not good for disk encryption because it's
> risky to encrypt different data (e.g. old an new version of disk sector) with
> the same key (or rather with the same combination of key and initialization
> vector).
> 
> However WAL is a special case because here we should never need to change the
> data at given position. The only exception is that unused space (sequence of
> zeroes) becomes a valid record. So if we avoid encryption of the unused space,
> it might be o.k. to use the stream cipher for WAL.

Agreed.  Requirement #2 is not needed for WAL:

https://en.wikipedia.org/wiki/Disk_encryption_theory#Problem_definition
2. Data retrieval and storage should both be fast operations, no
matter where on the disk the data is stored.

because you are not writing into random parts of the WAL.

Using the same stream cipher system that SSH uses would make sense,
since it is byte-oriented, and it only encrypts the tail.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Useless code in RelationCacheInitializePhase3

2019-04-12 Thread Andres Freund

Hi,

On 2019-04-12 14:17:11 -0400, Tom Lane wrote:
> While looking at the pending patch to clean up management of
> rd_partcheck, I noticed that RelationCacheInitializePhase3 has code that
> purports to reload rd_partkey and rd_partdesc, but none for rd_partcheck.
> However, that reload code is dead code, as is easily confirmed by
> checking the code coverage report, because we have no partitioned system
> catalogs.
> 
> Moreover, if somebody tried to add such a catalog, I'd bet a good deal
> of money that this code would not work.  It seems highly unlikely that
> we could run RelationBuildPartitionKey or RelationBuildPartitionDesc
> successfully when we haven't even finished bootstrapping the relcache.

But it sure would be nice if we made it work at some point. Having
e.g. global, permanent + unlogged, and temporary tables attributes in a
separate pg_attribute would be quite an advantage (and much easier than
a separate pg_class).  Obviously even that is *far* from trivial.

> I don't think that this foolishness is entirely the fault of the
> partitioning work; it's evidently modeled on the adjacent code to reload
> rules, triggers, and row security code.  But that code is all equally
> dead, equally unlikely to work if somebody tried to invoke it, and
> equally likely to be forever unused because there are many other
> problems you'd have to surmount to support something like triggers or
> row security on system catalogs.

I don't see us wanting to go to supporting triggers, but I could see us
desiring RLS at some point. To hide rows a user doesn't have access to.

> I am less sure about whether the table-access-method stanza is as silly
> as the rest, but I do see that it's unreached in current testing.
> So I wonder whether there is any thought that we'd realistically support
> catalogs with nondefault AMs, and if there is, does anyone think that
> this code would work?

Right now it definitely won't work, most importantly because there's a
fair bit of catalog related code that triggers direct
heap_insert/update/delete, and expects systable_getnext() to not need
memory to allocate the result in the current context (hence the
!shouldFree assert) and just generally because a lot of places just
straight up assume the catalog is heap.

Most of that would be fairly easy to fix however. A lot of rote work,
but technically not hard. The hardest is probably a bunch of code that
uses xmin for cache validation and such, but that seems solvable.

I don't quite know however how we'd use the ability to technically be
able to have a different AM for catalog tables. One possible thing would
be using different builtin AMs for different catalog tables, that seems
like it'd not be too hard.  But after that it gets harder - e.g. doing
an initdb with a different default AM sounds not impossible, but also
far from easy (we can't do pg_proc lookups before having initialized it,
which is why formrdesc hardcodes GetHeapamTableAmRoutine()).  And having
different AMs per database seems even harder.

I think it probably would work for catalog tables, as it's coded right
now. There's no catalog lookups RelationInitTableAccessMethod() for
tables that return true for IsCatalogTable(). In fact, I think we should
apply something like:

diff --git a/src/backend/utils/cache/relcache.c 
b/src/backend/utils/cache/relcache.c
index 64f3c2e8870..7ff64b108c4 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1746,6 +1746,7 @@ RelationInitTableAccessMethod(Relation relation)
 * seem prudent to show that in the catalog. So just overwrite it
 * here.
 */
+   Assert(relation->rd_rel->relam == InvalidOid);
relation->rd_amhandler = HEAP_TABLE_AM_HANDLER_OID;
}
else if (IsCatalogRelation(relation))
@@ -1935,8 +1936,7 @@ formrdesc(const char *relationName, Oid relationReltype,
/*
 * initialize the table am handler
 */
-   relation->rd_rel->relam = HEAP_TABLE_AM_OID;
-   relation->rd_tableam = GetHeapamTableAmRoutine();
+   RelationInitTableAccessMethod(relation);

/*
 * initialize the rel-has-index flag, using hardwired knowledge

To a) ensure that that is and stays the case b) avoid having the
necessary information in multiple places.  Not sure why we not ended up
doing the thing in the second hunk earlier. Just using
RelationInitTableAccessMethod() seems cleaner to me.

Greetings,

Andres Freund

Re: hyrax vs. RelationBuildPartitionDesc

2019-04-12 Thread Tom Lane

Robert Haas  writes:
> As a parenthetical note, I observe that relcache.c seems to know
> almost nothing about rd_partcheck.  rd_partkey and rd_partdesc both
> have handling in RelationClearRelation(), but rd_partcheck does not,
> and I suspect that's wrong.  So the problems are probably not confined
> to the relcache-drop-time problem that you observed.

I concluded that that's not parenthetical but pretty relevant...

Attached is a revised version of Amit's patch at [1] that makes these
data structures be treated more similarly.  I also added some Asserts
and comment improvements to address the complaints I made upthread about
under-documentation of all this logic.

I also cleaned up the problem the code had with failing to distinguish
"partcheck list is NIL" from "partcheck list hasn't been computed yet".
For a relation with no such constraints, generate_partition_qual would do
the full pushups every time.  I'm not sure if the case actually occurs
commonly enough that that's a performance problem, but failure to account
for it made my added assertions fall over :-( and I thought fixing it
was better than weakening the assertions.

I haven't made back-patch versions yet.  I'd expect they could be
substantially the same, except the two new fields have to go at the
end of struct RelationData to avoid ABI breaks.

Oh: we might also need some change in RelationCacheInitializePhase3,
depending on the decision about [2].

regards, tom lane

[1] 
https://www.postgresql.org/message-id/036852f2-ba7f-7a1f-21c6-00bc3515e...@lab.ntt.co.jp
[2] https://www.postgresql.org/message-id/5706.1555093...@sss.pgh.pa.us

diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index e436d1e..4d6595b 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -49,10 +49,13 @@ typedef struct PartitionDirectoryEntry
 
 /*
  * RelationBuildPartitionDesc
- *		Form rel's partition descriptor
+ *		Form rel's partition descriptor, and store in relcache entry
  *
- * Not flushed from the cache by RelationClearRelation() unless changed because
- * of addition or removal of partition.
+ * Note: the descriptor won't be flushed from the cache by
+ * RelationClearRelation() unless it's changed because of
+ * addition or removal of a partition.  Hence, code holding a lock
+ * that's sufficient to prevent that can assume that rd_partdesc
+ * won't change underneath it.
  */
 void
 RelationBuildPartitionDesc(Relation rel)
@@ -173,7 +176,19 @@ RelationBuildPartitionDesc(Relation rel)
 		++i;
 	}
 
-	/* Now build the actual relcache partition descriptor */
+	/* Assert we aren't about to leak any old data structure */
+	Assert(rel->rd_pdcxt == NULL);
+	Assert(rel->rd_partdesc == NULL);
+
+	/*
+	 * Now build the actual relcache partition descriptor.  Note that the
+	 * order of operations here is fairly critical.  If we fail partway
+	 * through this code, we won't have leaked memory because the rd_pdcxt is
+	 * attached to the relcache entry immediately, so it'll be freed whenever
+	 * the entry is rebuilt or destroyed.  However, we don't assign to
+	 * rd_partdesc until the cached data structure is fully complete and
+	 * valid, so that no other code might try to use it.
+	 */
 	rel->rd_pdcxt = AllocSetContextCreate(CacheMemoryContext,
 		  "partition descriptor",
 		  ALLOCSET_SMALL_SIZES);
diff --git a/src/backend/utils/cache/partcache.c b/src/backend/utils/cache/partcache.c
index 8f43d68..29b4a76 100644
--- a/src/backend/utils/cache/partcache.c
+++ b/src/backend/utils/cache/partcache.c
@@ -41,11 +41,11 @@ static List *generate_partition_qual(Relation rel);
 
 /*
  * RelationBuildPartitionKey
- *		Build and attach to relcache partition key data of relation
+ *		Build partition key data of relation, and attach to relcache
  *
  * Partitioning key data is a complex structure; to avoid complicated logic to
  * free individual elements whenever the relcache entry is flushed, we give it
- * its own memory context, child of CacheMemoryContext, which can easily be
+ * its own memory context, a child of CacheMemoryContext, which can easily be
  * deleted on its own.  To avoid leaking memory in that context in case of an
  * error partway through this function, the context is initially created as a
  * child of CurTransactionContext and only re-parented to CacheMemoryContext
@@ -144,6 +144,7 @@ RelationBuildPartitionKey(Relation relation)
 		MemoryContextSwitchTo(oldcxt);
 	}
 
+	/* Allocate assorted arrays in the partkeycxt, which we'll fill below */
 	oldcxt = MemoryContextSwitchTo(partkeycxt);
 	key->partattrs = (AttrNumber *) palloc0(key->partnatts * sizeof(AttrNumber));
 	key->partopfamily = (Oid *) palloc0(key->partnatts * sizeof(Oid));
@@ -151,8 +152,6 @@ RelationBuildPartitionKey(Relation relation)
 	key->partsupfunc = (FmgrInfo *) palloc0(key->partnatts * sizeof(FmgrInfo));
 
 	key->partcollation = (Oid *) palloc0(key->partnatts *

Re: PANIC: could not flush dirty data: Operation not permitted power8, Redhat Centos

2019-04-12 Thread Tom Lane

Andres Freund  writes:
> On 2019-04-12 20:04:00 +0200, reiner peterke wrote:
>> We build Postgres on Power and x86 With the latest Postgres 11 release 
>> (11.2) we get error on
>> power8 ppc64le (Redhat and CentOS).  No error on SUSE on power8

> Any chance you can strace this? Because I don't understand how you'd get
> a permission error here.

What kind of filesystem are the database files on?

regards, tom lane

Re: Minimal logical decoding on standbys

2019-04-12 Thread Andres Freund

Hi,

On 2019-04-12 23:34:02 +0530, Amit Khandekar wrote:
> I tried to see if I can quickly understand what's going on.
> 
> Here, master wal_level is hot_standby, not logical, though slave
> wal_level is logical.

Oh, that's well diagnosed.  Cool.  Also nicely tested - this'd be ugly
in production.

I assume the problem isn't present if you set the primary to wal_level =
logical?

> Not sure why this is happening. On slave, wal_level is logical, so
> logical records should have tuple data. Not sure what does that have
> to do with wal_level of master. Everything should be there on slave
> after it replays the inserts; and also slave wal_level is logical.

The standby doesn't write its own WAL, only primaries do. I thought we
forbade running with wal_level=logical on a standby, when the primary is
only set to replica.  But that's not what we do, see
CheckRequiredParameterValues().

I've not yet thought this through, but I think we'll have to somehow
error out in this case.  I guess we could just check at the start of
decoding what ControlFile->wal_level is set to, and then raise an error
in decode.c when we pass an XLOG_PARAMETER_CHANGE record that sets
wal_level to something lower?

Could you try to implement that?

Greetings,

Andres Freund

Re: PostgreSQL pollutes the file system

2019-04-12 Thread Fred .Flintstone

So there is no regression potential.

When and who can send the patch to rename the programs to carry the
pg_ prefixes, and create symlinks from the old names?

On Fri, Apr 12, 2019 at 5:19 PM Andreas Karlsson  wrote:
>
> On 4/12/19 5:14 PM, Chris Travers wrote:
> > 1. naming things is surprisingly hard.  How sure are we that we can do
> > this right?  Can we come up with a correct name for initdb?  Maybe
> > pg_createcluster?
>
> The Debian packagers already use pg_createcluster for their script which
> wraps initdb, and while pg_initdb is a bit misleading (it creates a
> cluster rather than a database) it is not that bad.
>
> Andreas

PANIC: could not flush dirty data: Operation not permitted power8, Redhat Centos

2019-04-12 Thread reiner peterke

Hi All,

We build Postgres on Power and x86 With the latest Postgres 11 release (11.2) 
we get error on
power8 ppc64le (Redhat and CentOS).  No error on SUSE on power8

No error on x86_64 (RH, Centos and  SUSE)

from the log file
2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  listening on IPv4 address 
"0.0.0.0", port 5432
2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  listening on IPv6 address 
"::", port 5432
2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.5432"
2019-04-09 12:30:10 UTC   pid:204 xid:0 ip: LOG:  database system was shut down 
at 2019-04-09 12:27:09 UTC
2019-04-09 12:30:10 UTC   pid:203 xid:0 ip: LOG:  database system is ready to 
accept connections
2019-04-09 12:31:46 UTC   pid:203 xid:0 ip: LOG:  received SIGHUP, reloading 
configuration files
2019-04-09 12:35:10 UTC   pid:205 xid:0 ip: PANIC:  could not flush dirty data: 
Operation not permitted
2019-04-09 12:35:10 UTC   pid:203 xid:0 ip: LOG:  checkpointer process (PID 
205) was terminated by signal 6: Aborted
2019-04-09 12:35:10 UTC   pid:203 xid:0 ip: LOG:  terminating any other active 
server processes
2019-04-09 12:35:10 UTC   pid:208 xid:0 ip: WARNING:  terminating connection 
because of crash of another server process
2019-04-09 12:35:10 UTC   pid:208 xid:0 ip: DETAIL:  The postmaster has 
commanded this server process to roll back the current transaction and exit, 
because another server process exited abnormally and possibly corrupted shared 
memory.
2019-04-09 12:35:10 UTC   pid:208 xid:0 ip: HINT:  In a moment you should be 
able to reconnect to the database and repeat your command.
2019-04-09 12:35:10 UTC   pid:203 xid:0 ip: LOG:  all server processes 
terminated; reinitializing
2019-04-09 12:35:10 UTC   pid:224 xid:0 ip: LOG:  database system was 
interrupted; last known up at 2019-04-09 12:30:10 UTC
2019-04-09 12:35:10 UTC   pid:224 xid:0 ip: PANIC:  could not flush dirty data: 
Operation not permitted
2019-04-09 12:35:10 UTC   pid:203 xid:0 ip: LOG:  startup process (PID 224) was 
terminated by signal 6: Aborted
2019-04-09 12:35:10 UTC   pid:203 xid:0 ip: LOG:  aborting startup due to 
startup process failure
2019-04-09 12:35:10 UTC   pid:203 xid:0 ip: LOG:  database system is shut down

from pg_config

pg_config output

BINDIR = /usr/local/postgres/11/bin
DOCDIR = /usr/local/postgres/11/share/doc
HTMLDIR = /usr/local/postgres/11/share/doc
INCLUDEDIR = /usr/local/postgres/11/include
PKGINCLUDEDIR = /usr/local/postgres/11/include
INCLUDEDIR-SERVER = /usr/local/postgres/11/include/server
LIBDIR = /usr/local/postgres/11/lib
PKGLIBDIR = /usr/local/postgres/11/lib
LOCALEDIR = /usr/local/postgres/11/share/locale
MANDIR = /usr/local/postgres/11/share/man
SHAREDIR = /usr/local/postgres/11/share
SYSCONFDIR = /usr/local/postgres/etc
PGXS = /usr/local/postgres/11/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--with-tclconfig=/usr/lib64' '--with-perl' '--with-python' 
'--with-tcl' '--with-openssl' '--with-pam' '--with-gssapi' '--enable-nls' 
'--with-libxml' '--with-libxslt' '--with-ldap' 
'--prefix=/usr/local/postgres/11' 'CFLAGS=-O3 -g -pipe -Wall 
-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong 
--param=ssp-buffer-size=4 -m64 -mcpu=power8 -mtune=power8 
-DLINUX_OOM_SCORE_ADJ=0' '--with-libs=/usr/lib' '--with-includes=/usr/include' 
'--with-uuid=e2fs' '--sysconfdir=/usr/local/postgres/etc' '--with-llvm' 
'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O3 
-g -pipe -Wall -D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong 
--param=ssp-buffer-size=4 -m64 -mcpu=power8 -mtune=power8 
-DLINUX_OOM_SCORE_ADJ=0
CFLAGS_SL = -fPIC
LDFLAGS = -L/usr/local/lib -L/usr/lib -Wl,--as-needed 
-Wl,-rpath,'/usr/local/postgres/11/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lpthread -lxslt -lxml2 -lpam -lssl -lcrypto 
-lgssapi_krb5 -lz -lreadline -lrt -lcrypt -ldl -lm
VERSION = PostgreSQL 11.2

I get the feeling this is related to the fsync() issue.
why is it happening on Power RH and CentOS, but not on the other platforms?

Let me know if i need to provide any more information.

Reiner



signature.asc
Description: Message signed with OpenPGP

Useless code in RelationCacheInitializePhase3

2019-04-12 Thread Tom Lane

While looking at the pending patch to clean up management of
rd_partcheck, I noticed that RelationCacheInitializePhase3 has code that
purports to reload rd_partkey and rd_partdesc, but none for rd_partcheck.
However, that reload code is dead code, as is easily confirmed by
checking the code coverage report, because we have no partitioned system
catalogs.

Moreover, if somebody tried to add such a catalog, I'd bet a good deal
of money that this code would not work.  It seems highly unlikely that
we could run RelationBuildPartitionKey or RelationBuildPartitionDesc
successfully when we haven't even finished bootstrapping the relcache.

I don't think that this foolishness is entirely the fault of the
partitioning work; it's evidently modeled on the adjacent code to reload
rules, triggers, and row security code.  But that code is all equally
dead, equally unlikely to work if somebody tried to invoke it, and
equally likely to be forever unused because there are many other
problems you'd have to surmount to support something like triggers or
row security on system catalogs.

I'm inclined to remove almost everything below the comment
"Fix data that isn't saved in relcache cache file", and replace
it with either assertions or test-and-elogs that say that a
relation appearing in the cache file can't have triggers or rules
or row security or be partitioned.

I am less sure about whether the table-access-method stanza is as silly
as the rest, but I do see that it's unreached in current testing.
So I wonder whether there is any thought that we'd realistically support
catalogs with nondefault AMs, and if there is, does anyone think that
this code would work?

regards, tom lane

Re: Minimal logical decoding on standbys

2019-04-12 Thread Amit Khandekar

On Wed, 10 Apr 2019 at 21:39, Andres Freund  wrote:
>
> Hi,
>
> On 2019-04-10 12:11:21 +0530, tushar wrote:
> >
> > On 03/13/2019 08:40 PM, tushar wrote:
> > > Hi ,
> > >
> > > I am getting a server crash on standby while executing
> > > pg_logical_slot_get_changes function   , please refer this scenario
> > >
> > > Master cluster( ./initdb -D master)
> > > set wal_level='hot_standby in master/postgresql.conf file
> > > start the server , connect to  psql terminal and create a physical
> > > replication slot ( SELECT * from
> > > pg_create_physical_replication_slot('p1');)
> > >
> > > perform pg_basebackup using --slot 'p1'  (./pg_basebackup -D slave/ -R
> > > --slot p1 -v))
> > > set wal_level='logical' , hot_standby_feedback=on,
> > > primary_slot_name='p1' in slave/postgresql.conf file
> > > start the server , connect to psql terminal and create a logical
> > > replication slot (  SELECT * from
> > > pg_create_logical_replication_slot('t','test_decoding');)
> > >
> > > run pgbench ( ./pgbench -i -s 10 postgres) on master and select
> > > pg_logical_slot_get_changes on Slave database
> > >
> > > postgres=# select * from pg_logical_slot_get_changes('t',null,null);
> > > 2019-03-13 20:34:50.274 IST [26817] LOG:  starting logical decoding for
> > > slot "t"
> > > 2019-03-13 20:34:50.274 IST [26817] DETAIL:  Streaming transactions
> > > committing after 0/6C60, reading WAL from 0/6C28.
> > > 2019-03-13 20:34:50.274 IST [26817] STATEMENT:  select * from
> > > pg_logical_slot_get_changes('t',null,null);
> > > 2019-03-13 20:34:50.275 IST [26817] LOG:  logical decoding found
> > > consistent point at 0/6C28
> > > 2019-03-13 20:34:50.275 IST [26817] DETAIL:  There are no running
> > > transactions.
> > > 2019-03-13 20:34:50.275 IST [26817] STATEMENT:  select * from
> > > pg_logical_slot_get_changes('t',null,null);
> > > TRAP: FailedAssertion("!(data == tupledata + tuplelen)", File:
> > > "decode.c", Line: 977)
> > > server closed the connection unexpectedly
> > > This probably means the server terminated abnormally
> > > before or while processing the request.
> > > The connection to the server was lost. Attempting reset: 2019-03-13
> > > 20:34:50.276 IST [26809] LOG:  server process (PID 26817) was terminated
> > > by signal 6: Aborted
> > >
> > Andres - Do you think - this is an issue which needs to  be fixed ?
>
> Yes, it definitely needs to be fixed. I just haven't had sufficient time
> to look into it. Have you reproduced this with Amit's latest version?
>
> Amit, have you spent any time looking into it? I know that you're not
> that deeply steeped into the internals of logical decoding, but perhaps
> there's something obvious going on.

I tried to see if I can quickly understand what's going on.

Here, master wal_level is hot_standby, not logical, though slave
wal_level is logical.

On slave, when pg_logical_slot_get_changes() is run, in
DecodeMultiInsert(), it does not get any WAL records having
XLH_INSERT_CONTAINS_NEW_TUPLE set. So data pointer is never
incremented, it remains at tupledata. So at the end of the function,
this assertion fails :
Assert(data == tupledata + tuplelen);
because data is actually at tupledata.

Not sure why this is happening. On slave, wal_level is logical, so
logical records should have tuple data. Not sure what does that have
to do with wal_level of master. Everything should be there on slave
after it replays the inserts; and also slave wal_level is logical.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: Reducing the runtime of the core regression tests

2019-04-12 Thread Peter Geoghegan

On Fri, Apr 12, 2019 at 10:49 AM Peter Geoghegan  wrote:
> It's definitely generally recommended that "-O0" be used, so I think
> that we can agree that that was an improvement, even if it doesn't fix
> the remaining problem that I noticed when I rechecked nbtutils.c.

I'm not sure that we can really assume that "-O0" avoids the behavior
I pointed out. Perhaps this counts as "semantic flattening" or
something, rather than an optimization. I could have easily written
the code in _bt_keep_natts_fast() in the way gcov/gcc/whatever thinks
I ought to have, which would have obscured the distinction anyway.

-- 
Peter Geoghegan

Re: Fix doc bug in logical replication.

2019-04-12 Thread Robert Treat

On Mon, Apr 8, 2019 at 7:19 PM Euler Taveira  wrote:
>
> Em seg, 8 de abr de 2019 às 19:38, Robert Treat  escreveu:
> >
> > I noticed that the docs currently state "A different order of columns
> > in the target table is allowed, but the column types have to match."
> > This is untrue, as you can replicate between any two data types as
> > long as the data can be coerced into the right format on the
> > subscriber. Attached is a patch that attempts to clarify this, and
> > provides some additional wordsmithing of that section. Patch is
> > against head but the nature of the patch would apply to the docs for
> > 11 and 10, which both have the incorrect information as well, even if
> > the patch itself does not.
> >
> I would say it is inaccurate because the actual instruction works. I
> agree that your words are an improvement but it lacks a comment on the
> slot_[store|modify]_cstrings.
>

It is clear to me that the docs are wrong, but I don't see anything
inherently incorrect about the code itself. Do you have suggestions
for how you would like to see the code comments improved?


Robert Treat
https://xzilla.net

Re: Reducing the runtime of the core regression tests

2019-04-12 Thread Peter Geoghegan

On Fri, Apr 12, 2019 at 10:24 AM Alvaro Herrera
 wrote:
> I wonder if it would be useful to add --enable-debug.  I think I
> purposefully removed that, but I don't remember any details about it.

As usual, this stuff is horribly under-documented. I think it's
possible that  --enable-debug would help, since llvm-gcov requires it,
but that doesn't seem particularly likely.

It's definitely generally recommended that "-O0" be used, so I think
that we can agree that that was an improvement, even if it doesn't fix
the remaining problem that I noticed when I rechecked nbtutils.c.

-- 
Peter Geoghegan

Re: Reducing the runtime of the core regression tests

2019-04-12 Thread Alvaro Herrera

On 2019-Apr-12, Peter Geoghegan wrote:

> On Fri, Apr 12, 2019 at 9:31 AM Alvaro Herrera  
> wrote:
> > Done.  Do you have a preferred spot where the counts were wrong?
> 
> Not really, but I can give you an example.
> 
> Line counts for each of the two "break" statements within
> _bt_keep_natts_fast() are exactly the same. I don't think that this
> because we actually hit each break exactly the same number of times
> (90,236 times). I think that we see this because the same instruction
> is associated with both break statements in the loop. All of the
> examples I've noticed are a bit like that. Not a huge problem, but
> less useful than the alternative.

Hmm, it's odd, because
https://coverage.postgresql.org/src/backend/access/nbtree/nbtutils.c.gcov.html
still shows that function doing that.  pg_config shows:

$ ./pg_config --configure
'--enable-depend' '--enable-coverage' '--enable-tap-tests' '--enable-nls' 
'--with-python' '--with-perl' '--with-tcl' '--with-openssl' '--with-libxml' 
'--with-ldap' '--with-pam' 'CFLAGS=-O0'

src/Makefile.global says:

CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Werror=vla -Wendif-labels 
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv 
-fexcess-precision=standard -fprofile-arcs -ftest-coverage -O0

the compile line for nbtutils is:

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
-Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security 
-fno-strict-aliasing -fwrapv -fexcess-precision=standard -fprofile-arcs 
-ftest-coverage -O0 -I../../../../src/include  -D_GNU_SOURCE 
-I/usr/include/libxml2   -c -o nbtutils.o nbtutils.c -MMD -MP -MF 
.deps/nbtutils.Po

so I suppose there's something else that's affecting this.

I wonder if it would be useful to add --enable-debug.  I think I
purposefully removed that, but I don't remember any details about it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PostgreSQL pollutes the file system

2019-04-12 Thread John W Higgins

Could I please ask a couple of questions?

Why does the first answer to everything seem to be "we need to destroy
something to make it better for others"? Why does createdb need to be
removed? Why do we use the "newbie that can't understand whether or not
createdb is for PostgreSQL or MySQL or " and then ignore the fact that
this would be the one person least able to handle a breakage of a 5 year
old internal script that simply does it's job for them each day?

What if someone has a nice little script that is really well written and
fails on warnings because the company policy is that "warnings are to be
respected"? How does that person properly do their job if they need to
break open that script one morning because we've dropped a "warning bomb"
on them without any option but to fix the entire script in one shot with no
option to continue otherwise? What if there is a semi-strict QA policy at
said company and they are placed in a bind due to the volume and nature of
the changes required here because of steps taken that are impossible to
reasonably work around (possibly even outside of the script itself)?

So I would like to offer the beginning of a "framework" of steps that could
accomplish the desired task with the bare minimum of breakage and with
simple steps that can be offered to help people affected by these changes.

1) Any new name is a symlink to the old name. We do not break existing
tooling for any non-obvious reason. Any notion of symlinking the old names
and then discussing "packagers could add a PostgreSQL-Legacy-Symlinks
package" is not ok. We cannot have users breaking because of a missing
package and then have them running around with their head cut off trying to
figure out where that package is for their particular system. We make
across the board changes that are easily explainable.

2) We can certainly add a warning to the old names that warn of future
removal. However we need to offer a simple option to remove these warnings
in a future friendly fashion. Remember the person that is not ok running
around deep inside a 1000 line script.

3) Long term (or even fairly short term) we move the old names back to a
more appropriate location - lets say /opt//pgsql/bin - if someone ignored
the warnings then they are broken - there is nothing that can be done with
that - but we've now accomplished the stated goal - hide names like
"createdb" from standard paths.

However how do we deal with the VERY bad side of #2/#3? That's what i feel
has been missing here. So lets walk through something

If someone has a script that breaks on warnings - or they are generally not
someone that is comfortable making many changes to a script - we need a
single line option for them.

WARNING - createdb is no longer the preferred method - please either change
to pg_createdb OR add the following line near the top of your
script/environment

source pg_legacy_environment

(Wording is not my strong suit - bear with me please)

What is pg_legacy_environment? Well it's a provided file that starts it's
life as simple as this

export PG_LEGACY_ENVIRONMENT=1

And the warnings that check for usage of the old command names check for
PG_LEGACY_ENVIRONMENT and understand that if that variable exists - the
user has chosen to make the minimal change to their script and should be
respected. We will fix their environment for them as needed to allow them
to continue using old names.

That solves #2 and allows for someone to very quickly remove warnings
without any major changes. A single line change is as simple as one can
imagine or do. If someone cannot accomplish this change - what possibly can
we do for them?

When #3 hits and the old names are removed from the path -
pg_legacy_environment could change to something along these lines

export PATH=$PATH:/opt/pgsql/bin

And now we have a removal of the old names, that does not break anyone that
has followed the warning until this point - and allows for a simple, one
line fix, to anyone that walks in the door and screams FIRE because they
ignored the warning and now have a problem.

I feel the above respects the people that are supposed to be the people we
have empathy for - they are also steps that can be done even fairly quickly
because the fix is handled via modification to the script environment as
opposed to the core workings of a script itself. In fact - one could add
the pg_legacy_environment line to their shell environment and not even
modify a single script at all.

John

Re: Reducing the runtime of the core regression tests

2019-04-12 Thread Peter Geoghegan

On Fri, Apr 12, 2019 at 9:31 AM Alvaro Herrera  wrote:
> Done.  Do you have a preferred spot where the counts were wrong?

Not really, but I can give you an example.

Line counts for each of the two "break" statements within
_bt_keep_natts_fast() are exactly the same. I don't think that this
because we actually hit each break exactly the same number of times
(90,236 times). I think that we see this because the same instruction
is associated with both break statements in the loop. All of the
examples I've noticed are a bit like that. Not a huge problem, but
less useful than the alternative.

-- 
Peter Geoghegan

Re: Attempt to consolidate reading of XLOG page

2019-04-12 Thread Alvaro Herrera

On 2019-Apr-11, Antonin Houska wrote:

> While working on the instance encryption I found it annoying to apply
> decyption of XLOG page to three different functions. Attached is a patch that
> tries to merge them all into one function, XLogRead(). The existing
> implementations differ in the way new segment is opened. So I added a pointer
> to callback function as a new argument. This callback handles the specific
> ways to determine segment file name and to open the file.
> 
> I can split the patch into multiple diffs to make detailed review easier, but
> first I'd like to hear if anything is seriously wrong about this
> design. Thanks.

I agree that xlog reading is pretty messy.

I think ifdef'ing the way XLogRead reports errors is not great.  Maybe
we can pass a function pointer that is to be called in case of errors?
Not sure about the walsize; maybe it can be a member in XLogReadPos, and
given to XLogReadInitPos()?  (Maybe rename XLogReadPos as
XLogReadContext or something like that, indicating it's not just the
read position.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

TM format can mix encodings in to_char()

2019-04-12 Thread Juan José Santamaría Flecha

Hackers,

I will use as an example the code in the regression test
'collate.linux.utf8'.
There you can find:

SET lc_time TO 'tr_TR';
SELECT to_char(date '2010-04-01', 'DD TMMON ');
   to_char
-
 01 NIS 2010
(1 row)

The problem is that the locale 'tr_TR' uses the encoding ISO-8859-9
(LATIN5),
while the test runs in UTF8. So the following code will raise an error:

SET lc_time TO 'tr_TR';
SELECT to_char(date '2010-02-01', 'DD TMMON ');
ERROR:  invalid byte sequence for encoding "UTF8": 0xde 0x75

The problem seems to be in the code touched in the attached patch.

Regards,

Juan Jose Santamaria Flecha


tm-format-mixes-encodings.patch
Description: Binary data

Re: Reducing the runtime of the core regression tests

2019-04-12 Thread Alvaro Herrera

On 2019-Apr-11, Peter Geoghegan wrote:

> On Thu, Apr 11, 2019 at 11:00 AM Alvaro Herrera
>  wrote:
> > ./configure --enable-depend --enable-coverage --enable-tap-tests 
> > --enable-nls --with-python --with-perl --with-tcl --with-openssl 
> > --with-libxml --with-ldap --with-pam >> $LOG 2>&1
> >
> > make -j4 >> $LOG 2>&1
> > make -j4 -C contrib >> $LOG 2>&1
> > make check-world PG_TEST_EXTRA="ssl ldap" >> $LOG 2>&1
> > make coverage-html >> $LOG 2>&1
> >
> > There are no environment variables that would affect it.
> 
> Could we add "CFLAGS=-O0"?

Done.  Do you have a preferred spot where the counts were wrong?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

2019-04-12 Thread Tom Lane

Andrew Dunstan  writes:
> OK, I have tried this with an earlier compiler (gcc 7.3.0 vs gcc 8.1.0)
> and the problem disappears. So I think we can put this down as a
> compiler bug. I will downgrade jacana to 7.3.0.

OK.

There is still the matter of those two failures on skate and snapper,
which don't look the same as jacana's issue but nonetheless seem to
be the jsonpath patch's fault.  However, since neither animal has yet
reproduced those failures, I don't know how to investigate further,
or whether it's even worth considering that to be an open item.

For the record, I wasted a fair amount of time last week trying to
duplicate the skate/snapper failures by setting up a sparc qemu VM
with more or less the same Debian version they're using.  No luck.
Recent versions of Debian Wheezy wouldn't boot; I did manage to get
7.6.0 to install and run, er crawl, but the failure didn't manifest
in quite a few tries.

regards, tom lane

Re: jsonpath

2019-04-12 Thread Andrew Dunstan



On 3/28/19 12:43 PM, Andrew Dunstan wrote:
> On 3/28/19 9:50 AM, Tom Lane wrote:
>> Andres Freund  writes:
>>> On March 28, 2019 9:31:14 AM EDT, Tom Lane  wrote:
 Has anybody gotten through a valgrind run on this code yet?
>>> Skink has successfully passed since - but that's x86...
>> Yeah, there is a depressingly high chance that this is somehow specific
>> to the bison version, flex version, and/or compiler in use on jacana.
>>
>>  
>
>
> lousyjack has also passed it (x64).
>
>
> git bisect on jacana blames commit 550b9d26f.



OK, I have tried this with an earlier compiler (gcc 7.3.0 vs gcc 8.1.0)
and the problem disappears. So I think we can put this down as a
compiler bug. I will downgrade jacana to 7.3.0.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Adding Unix domain socket path and port to pg_stat_get_wal_senders()

2019-04-12 Thread Tom Lane

Euler Taveira  writes:
> The question is: what is the problem we want to solve? Ishii-san asked
> for a socket path. If we have already figured out the replica (via
> application_name), use the replica PID to find the socket path. A new
> column as suggested by Tom could show the desired info. Is it *really*
> useful? I mean, how many setups have master and replica in the same
> server?

Yeah, I think that argument is why we didn't cover the case in the
original view design.  This additional column would be useless on
Windows, too.  Still, since Ishii-san is concerned about this,
I suppose he has a plausible use-case in mind.

> For a socket connection, directory is important and that
> information I can get from unix_socket_directories parameter (I've
> never seen a setup with multiple socket directories).

Those are actually pretty common, for example if you use Red Hat's
packaging you will have both /var/run/postgresql and /tmp as socket
directories (since they consider use of /tmp deprecated, but getting
rid of all clients' use of it turns out to be really hard).  However,
it's definitely fair to question whether anyone *cares* which of
the server's socket directories a given connection used.  Aren't
they going to be pretty much all equivalent?

regards, tom lane

Re: improve PQexec documentation

2019-04-12 Thread Fabien COELHO




Hello Alvaro,


I'm looking at psql's use of PQexec for implementing some feature.

When running with multiple SQL commands, the doc is not very helpful.

From the source code I gathered that PQexec returns the first COPY results
if any, and if not the last non-empty results, unless all is empty in which
case an empty result is returned.


I'm not sure we necessarily want to document this behavior.  If it was
super helpful for some reason, or if we thought we would never change
it, then it would make sense to document it in minute detail.  But
otherwise I think documenting it sets a promise that we would (try to)
never change it in the future, which I don't necessarily agree with --
particularly since it's somewhat awkward to use.

I'm inclined to reject this patch.


Hmmm. I obviously agree that PQexec is beyond awkward.

Now I'm not sure how anyone is expected to guess the actual function 
working from the available documentation, and without this knowledge I 
cannot see how to write meaningful code for the multiple query case.


Basically it seems to have been designed for simple queries, and then 
accomodated somehow for the multiple case but with a strange non 
systematic approach.


I think it would have been much simpler and straightforward to always 
return the first result and let the client do whatever it wants 
afterwards. However, as it has existed for quite some time, I'm unsure how 
likely it is to change as it would break existing code, so documenting its 
behavior seems logical. I'd be all in favor of changing the behavior, but 
I'm pessimistic that it could pass. Keeping the current status (not really 
documented & awkward behavior) seems rather strange.


--
Fabien.

Re: hyrax vs. RelationBuildPartitionDesc

2019-04-12 Thread Tom Lane

Michael Paquier  writes:
> On Wed, Apr 10, 2019 at 05:03:21PM +0900, Amit Langote wrote:
>> The problem lies in all branches that have partitioning, so it should be
>> listed under Older Bugs, right?  You may have noticed that I posted
>> patches for all branches down to 10.

> I have noticed.  The message from Tom upthread outlined that an open
> item should be added, but this is not one.  That's what I wanted to
> emphasize.  Thanks for making it clearer.

To clarify a bit: there's more than one problem here.  Amit added an
open item about pre-existing leaks related to rd_partcheck.  (I'm going
to review and hopefully push his fix for that today.)  However, there's
a completely separate leak associated with mismanagement of rd_pdcxt,
as I showed in [1], and it doesn't seem like we have consensus about
what to do about that one.  AFAIK that's a new bug in 12 (caused by
898e5e329) and so it ought to be in the main open-items list.

This thread has discussed a bunch of possible future changes like
trying to replace copying of relcache-provided data structures
with reference-counting, but I don't think any such thing could be
v12 material at this point.  We do need to fix the newly added
leak though.

regards, tom lane

[1] https://www.postgresql.org/message-id/10797.1552679128%40sss.pgh.pa.us

Re: PostgreSQL pollutes the file system

2019-04-12 Thread Andreas Karlsson


On 4/12/19 5:14 PM, Chris Travers wrote:
1. naming things is surprisingly hard.  How sure are we that we can do 
this right?  Can we come up with a correct name for initdb?  Maybe 
pg_createcluster?


The Debian packagers already use pg_createcluster for their script which 
wraps initdb, and while pg_initdb is a bit misleading (it creates a 
cluster rather than a database) it is not that bad.


Andreas

Re: PostgreSQL pollutes the file system

2019-04-12 Thread Chris Travers

On Fri, Apr 12, 2019 at 3:20 PM Alvaro Herrera 
wrote:

> On 2019-Apr-12, Daniel Gustafsson wrote:
>
> > There are many good reasons for the changes proposed in this thread, but
> I'm
> > not sure if discoverability is one.  Relying on autocompleting a
> filename to
> > figure out existing tooling for database maintenance and DBA type
> operations
> > seems like a fragile usecase.
> >
> > If commandline discoverability is of importance, providing a summary of
> the
> > tools in "man postgresql" seems like a better option.
>
> The first comment in the LWN article:
>  "It's broken and obviously a bad idea but we've been doing it for so long
> we
>   shouldn't attempt to fix it"
>
> IMO the future is longer than the past, and has more users, so let's do
> it right instead of perpetuating the mistakes.
>

I agree we should look at fixing these.  However I have two concerns.

1. naming things is surprisingly hard.  How sure are we that we can do this
right?  Can we come up with a correct name for initdb?  Maybe
pg_createcluster?
2. How long would our deprecation cycle be?  5 years?  10 years?  Given
that people may need to support multiple versions I would propose no
warnings until both formats are supported, then warnings for 2 years, then
drop the old ones.

>
>
> ... unless you think PostgreSQL is going to become irrelevant before
> 2050.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

Re: PostgreSQL pollutes the file system

2019-04-12 Thread Fred .Flintstone

I think of discoverability as as how easy it is to discover and
rediscover things.
Like rediscover commands you forgot the name of. Like "what was the
command to create a database?", just type pg_ and press tab and see
whats there.

The LWN article is now unlocked to all readers, not just paying
subscribers. It have many comments which might bring value to this
discussion.
https://lwn.net/Articles/784508/

On Fri, Apr 12, 2019 at 3:37 PM Daniel Gustafsson  wrote:
>
> On Friday, April 12, 2019 3:20 PM, Alvaro Herrera  
> wrote:
>
> > On 2019-Apr-12, Daniel Gustafsson wrote:
> >
> > > There are many good reasons for the changes proposed in this thread, but 
> > > I'm
> > > not sure if discoverability is one. Relying on autocompleting a filename 
> > > to
> > > figure out existing tooling for database maintenance and DBA type 
> > > operations
> > > seems like a fragile usecase.
> > > If commandline discoverability is of importance, providing a summary of 
> > > the
> > > tools in "man postgresql" seems like a better option.
> >
> > The first comment in the LWN article:
> > "It's broken and obviously a bad idea but we've been doing it for so long we
> > shouldn't attempt to fix it"
> >
> > IMO the future is longer than the past, and has more users, so let's do
> > it right instead of perpetuating the mistakes.
> >
> > ... unless you think PostgreSQL is going to become irrelevant before
> > 2050.
>
> Not at all, and as I said there are many good reasons for doing this.  I just
> don't think "discoverability" is the driver, since I consider that a different
> thing from ease of use and avoid confusion with system tools etc (my reading 
> of
> that word is "finding something new", not "how did I spell that tool again").
>
> cheers ./daniel

Re: Adding Unix domain socket path and port to pg_stat_get_wal_senders()

2019-04-12 Thread Euler Taveira

Em sex, 12 de abr de 2019 às 01:39, Michael Paquier
 escreveu:
>
> On Thu, Apr 11, 2019 at 10:19:01PM -0300, Euler Taveira wrote:
> > application_name. I'm not sure if it solves your complain but Peter
> > committed a patch [1] for v12 that distinguishes replicas in the same
> > host via cluster_name.
>
> Let's be honest, this is just a workaround.
>
The question is: what is the problem we want to solve? Ishii-san asked
for a socket path. If we have already figured out the replica (via
application_name), use the replica PID to find the socket path. A new
column as suggested by Tom could show the desired info. Is it *really*
useful? I mean, how many setups have master and replica in the same
server? For a socket connection, directory is important and that
information I can get from unix_socket_directories parameter (I've
never seen a setup with multiple socket directories).


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: Adding Unix domain socket path and port to pg_stat_get_wal_senders()

2019-04-12 Thread Tatsuo Ishii

>> client_addr does not seem the right place to store this information,
>> and it is already documented for years that NULL is used when using a
>> Unix socket.  But I think that we could change *client_hostname* so as
>> the path name is reported instead of NULL when connecting through a
>> Unix domain socket, and there is no need to switch the field type for
>> that.
> 
> That seems like a hack, and I think it could still break apps that
> are expecting particular semantics for that field.  Why not add a
> new column instead?

Actually I aready proposed to add new column to pg_stat_get_wal_senders():

> Changing this behavior would affect existing pg_stat_get_activity view
> users and I hesitate to do so. I wonder if we could add receiver's
> UNIX domain socket path to from pg_stat_get_wal_senders() (which is
> called from pg_stat_replication view) so that the poor UNIX domain
> socket users could make their own view or access
> pg_stat_get_wal_senders() to get the UNIX domain socket path.

If we were ok to add a new column to pg_stat_activity view or
pg_stat_replication view as well, that will be great.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Calling pgstat_report_wait_end() before ereport(ERROR)

2019-04-12 Thread Tom Lane

Masahiko Sawada  writes:
> There are something like the following code in many places in PostgreSQL code.
> ...
> Since we eventually call
> pgstat_report_wait_end() in AbortTransaction(). I think that we don't
> need to call pgstat_report_wait_end() if we're going to raise an error
> just after that. Is that right?

Yes ... and those CloseTransientFile calls are unnecessary as well.

To a first approximation, *any* cleanup-type call occurring just before
an ereport(ERROR) is probably unnecessary, or if it is necessary then
the code is broken in other ways.  One should not assume that there is
no other way for an error to be thrown while the resource is held, and
therefore it's generally better design to have enough infrastructure
so that the error cleanup mechanisms can handle whatever cleanup is
needed.  We certainly have such infrastructure for OpenTransientFile/
CloseTransientFile, and according to what you say above (I didn't
check it) pgstat wait reporting is handled similarly.  So these
call sites could all be simplified substantially.

There are exceptions to this rule of thumb.  In some places, for
instance, it's worth releasing a lock before ereport simply to shorten
the length of time that the lock might stay held.  And there are places
where a very low-level resource (such as a spinlock) is only held in
straight-line code so there's not really need for error cleanup
infrastructure for it.  Perhaps there's an argument to be made that
pgstat wait reporting could be put in this second category, but
I doubt it.

regards, tom lane

Re: Checksum errors in pg_stat_database

2019-04-12 Thread Julien Rouhaud

On Fri, Apr 12, 2019 at 2:18 PM Magnus Hagander  wrote:
>
> On Sun, Apr 7, 2019 at 6:28 PM Julien Rouhaud  wrote:
>>
>> v5 attached.
>
> Thanks. Pushed!

Thanks!

Re: Adding Unix domain socket path and port to pg_stat_get_wal_senders()

2019-04-12 Thread Tom Lane

Michael Paquier  writes:
> On Thu, Apr 11, 2019 at 10:19:01PM -0300, Euler Taveira wrote:
>> Socket has different semantic from TCP/UDP. We can't add socket
>> information into client_addr unless we are prepared to break this view
>> (client_addr has type inet and it would be necessary to change it to
>> text). It could break a lot of applications.

Agreed.

> client_addr does not seem the right place to store this information,
> and it is already documented for years that NULL is used when using a
> Unix socket.  But I think that we could change *client_hostname* so as
> the path name is reported instead of NULL when connecting through a
> Unix domain socket, and there is no need to switch the field type for
> that.

That seems like a hack, and I think it could still break apps that
are expecting particular semantics for that field.  Why not add a
new column instead?

regards, tom lane

Re: PostgreSQL pollutes the file system

2019-04-12 Thread Daniel Gustafsson

On Friday, April 12, 2019 3:20 PM, Alvaro Herrera  
wrote:

> On 2019-Apr-12, Daniel Gustafsson wrote:
>
> > There are many good reasons for the changes proposed in this thread, but I'm
> > not sure if discoverability is one. Relying on autocompleting a filename to
> > figure out existing tooling for database maintenance and DBA type operations
> > seems like a fragile usecase.
> > If commandline discoverability is of importance, providing a summary of the
> > tools in "man postgresql" seems like a better option.
>
> The first comment in the LWN article:
> "It's broken and obviously a bad idea but we've been doing it for so long we
> shouldn't attempt to fix it"
>
> IMO the future is longer than the past, and has more users, so let's do
> it right instead of perpetuating the mistakes.
>
> ... unless you think PostgreSQL is going to become irrelevant before
> 2050.

Not at all, and as I said there are many good reasons for doing this.  I just
don't think "discoverability" is the driver, since I consider that a different
thing from ease of use and avoid confusion with system tools etc (my reading of
that word is "finding something new", not "how did I spell that tool again").

cheers ./daniel

Re: PostgreSQL pollutes the file system

2019-04-12 Thread Alvaro Herrera

On 2019-Apr-12, Daniel Gustafsson wrote:

> There are many good reasons for the changes proposed in this thread, but I'm
> not sure if discoverability is one.  Relying on autocompleting a filename to
> figure out existing tooling for database maintenance and DBA type operations
> seems like a fragile usecase.
> 
> If commandline discoverability is of importance, providing a summary of the
> tools in "man postgresql" seems like a better option.

The first comment in the LWN article:
 "It's broken and obviously a bad idea but we've been doing it for so long we
  shouldn't attempt to fix it"

IMO the future is longer than the past, and has more users, so let's do
it right instead of perpetuating the mistakes.


... unless you think PostgreSQL is going to become irrelevant before
2050.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: improve PQexec documentation

2019-04-12 Thread Alvaro Herrera

On 2019-Apr-12, Fabien COELHO wrote:

> I'm looking at psql's use of PQexec for implementing some feature.
> 
> When running with multiple SQL commands, the doc is not very helpful.
> 
> From the source code I gathered that PQexec returns the first COPY results
> if any, and if not the last non-empty results, unless all is empty in which
> case an empty result is returned.

I'm not sure we necessarily want to document this behavior.  If it was
super helpful for some reason, or if we thought we would never change
it, then it would make sense to document it in minute detail.  But
otherwise I think documenting it sets a promise that we would (try to)
never change it in the future, which I don't necessarily agree with --
particularly since it's somewhat awkward to use.

I'm inclined to reject this patch.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Calling pgstat_report_wait_end() before ereport(ERROR)

2019-04-12 Thread Masahiko Sawada

On Fri, Apr 12, 2019 at 9:07 PM Michael Paquier  wrote:
>
> On Fri, Apr 12, 2019 at 07:27:44PM +0900, Masahiko Sawada wrote:
> > As far as I know there are three places where call
> > pgstat_report_wait_end before ereport(ERROR): two in twophase.c
> > andanother in copydir.c(at L199). Since we eventually call
> > pgstat_report_wait_end() in AbortTransaction(). I think that we don't
> > need to call pgstat_report_wait_end() if we're going to raise an error
> > just after that. Is that right?
>
> RecreateTwoPhaseFile() gets called in the checkpointer or the startup
> process which do not have a transaction context

Yes.

>  so the wait event would not get cleaned up

But I think that's not right, I've checked the code. If the startup
process failed in that function it raises a FATAL and recovery fails,
and if checkpointer process does then it calls
pgstat_report_wait_end() in CheckpointerMain().

>  It looks that 249cf070 has been rather
> inconsistent in its way of handling things.

Yeah, I think that at least handling of pgstat_report_wait_end() in
RecreateTwoPhseFile() is inconsistent in any case.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: PostgreSQL pollutes the file system

2019-04-12 Thread Fred .Flintstone

I would disagree.

Discoverability is important, and having a user space that is
intuitive and predictable.
With the discoverability exposed by pg_ then you immediately find
out what is available.

One shouldn't have to delve down into manuals and books. Then forget
what that darn command was next time its needed, just to have to
return to the documentation again.

Preferably a wrapper around the tools could provide a summary for all
the tools, just like git --help.

On Fri, Apr 12, 2019 at 2:56 PM Daniel Gustafsson  wrote:
>
> On Friday, April 12, 2019 2:25 PM, Fred .Flintstone  
> wrote:
>
> > It would make the old commands more easily discoverable. Just type pg_
> > and press the tab key for auto-completion.
>
> There are many good reasons for the changes proposed in this thread, but I'm
> not sure if discoverability is one.  Relying on autocompleting a filename to
> figure out existing tooling for database maintenance and DBA type operations
> seems like a fragile usecase.
>
> If commandline discoverability is of importance, providing a summary of the
> tools in "man postgresql" seems like a better option.
>
> cheers ./daniel

Re: [PATCH v20] GSSAPI encryption support

2019-04-12 Thread Michael Paquier

On Fri, Apr 12, 2019 at 10:22:03AM +0200, Peter Eisentraut wrote:
> Another problem is that the two test files cannot be run in parallel
> because they use the same hardcoded data directories.  That would have
> to be replaced by temporary directories.

Please note that I have added an open item about the instability of
these tests.

If ones's PG_TEST_EXTRA uses kerberos, just using PROVE_FLAGS="-j4" or
such to run multiple scripts in parallel makes the test failure very
easy to reproduce.  That's my case, and I had to disable the test for
now...
--
Michael


signature.asc
Description: PGP signature

Re: PostgreSQL pollutes the file system

2019-04-12 Thread Daniel Gustafsson

On Friday, April 12, 2019 2:25 PM, Fred .Flintstone  wrote:

> It would make the old commands more easily discoverable. Just type pg_
> and press the tab key for auto-completion.

There are many good reasons for the changes proposed in this thread, but I'm
not sure if discoverability is one.  Relying on autocompleting a filename to
figure out existing tooling for database maintenance and DBA type operations
seems like a fragile usecase.

If commandline discoverability is of importance, providing a summary of the
tools in "man postgresql" seems like a better option.

cheers ./daniel

Re: PostgreSQL pollutes the file system

2019-04-12 Thread Fred .Flintstone

It would make the old commands more easily discoverable. Just type pg_
and press the tab key for auto-completion.

On Wed, Apr 10, 2019 at 9:46 PM Peter Eisentraut
 wrote:
>
> On 2019-04-10 15:01, Tatsuo Ishii wrote:
> >> On 2019-03-29 20:32, Joe Conway wrote:
> >>>   pg_util  
> >>
> >> How is that better than just renaming to pg_$oldname?
> >
> > As I already said in up thread:
> >
> >> This way, we would be free from the command name conflict problem
>
> Well, whatever we do -- if anything -- we would certainly need to keep
> the old names around for a while somehow.  So this doesn't really make
> that issue go away.
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Checksum errors in pg_stat_database

2019-04-12 Thread Magnus Hagander

On Sun, Apr 7, 2019 at 6:28 PM Julien Rouhaud  wrote:

> Thanks for looking it it!
>
> On Sun, Apr 7, 2019 at 4:36 PM Magnus Hagander 
> wrote:
> >
> > I'm not sure I like the idea of using "" as the database
> name. It's not very likely that somebody would be using that as a name for
> their database, but i's not impossible. But it also just looks strrange.
> Wouldn't NULL be a more appropriate choice?
> >
> > Likewise, shouldn't we return NULL as the number of backends for the
> shared counters, rather than 0?
> I wanted to make things more POLA-compliant, but maybe it was a bad
> idea.  I changed it for NULL here and for numbackends.
>
> > Micro-nit:
> > + Time at which the last data page checksum failures was
> detected in
> > s/failures/failure/
>
> Oops.
>
> v5 attached.
>

Thanks. Pushed!

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Calling pgstat_report_wait_end() before ereport(ERROR)

2019-04-12 Thread Michael Paquier

On Fri, Apr 12, 2019 at 07:27:44PM +0900, Masahiko Sawada wrote:
> As far as I know there are three places where call
> pgstat_report_wait_end before ereport(ERROR): two in twophase.c
> andanother in copydir.c(at L199). Since we eventually call
> pgstat_report_wait_end() in AbortTransaction(). I think that we don't
> need to call pgstat_report_wait_end() if we're going to raise an error
> just after that. Is that right?

RecreateTwoPhaseFile() gets called in the checkpointer or the startup
process which do not have a transaction context so the wait event
would not get cleaned up, and we should call pgstat_report_wait_end()
in the third elog(ERROR), no?  It looks that 249cf070 has been rather
inconsistent in its way of handling things.
--
Michael


signature.asc
Description: PGP signature

Re: REINDEX CONCURRENTLY 2.0

2019-04-12 Thread Dagfinn Ilmari Mannsåker

Michael Paquier  writes:

> So...  I am coming up with the patch attached.  I have introduced some
> tests using a trick with CIC to have an invalid index to work on.

I don't have any comments on the code (but the test looks sensible, it's
the same trick I used to discover the issue in the first place).

However, the doc patch lost the trailing paren:

>  The recommended recovery
>  method in such cases is to drop the index and try again to perform
> -CREATE INDEX CONCURRENTLY.  (Another possibility is 
> to rebuild
> -the index with REINDEX.  However, since 
> REINDEX
> -does not support concurrent builds, this option is unlikely to seem
> -attractive.)
> +CREATE INDEX CONCURRENTLY.  (Another possibility is
> +to rebuild the index with REINDEX INDEX CONCURRENTLY.
> 


- ilmari
-- 
- Twitter seems more influential [than blogs] in the 'gets reported in
  the mainstream press' sense at least.   - Matt McLeod
- That'd be because the content of a tweet is easier to condense down
  to a mainstream media article.  - Calle Dybedahl

Calling pgstat_report_wait_end() before ereport(ERROR)

2019-04-12 Thread Masahiko Sawada

Hi,

There are something like the following code in many places in PostgreSQL code.

pgstat_report_wait_start(WAIT_EVENT_xxx);
if (write(...) != len)
{
ereport(ERROR, ...);
}
pgstat_report_wait_end();

Almost of these places don't call pgstat_report_wait_end() before
ereport(ERROR) but some places. Especially in RecreateTwoPhaseFile()
we have,

/* Write content and CRC */
errno = 0;
pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_WRITE);
if (write(fd, content, len) != len)
{
int save_errno = errno;

pgstat_report_wait_end();
CloseTransientFile(fd);

/* if write didn't set errno, assume problem is no disk space */
errno = save_errno ? save_errno : ENOSPC;
ereport(ERROR,
(errcode_for_file_access(),
 errmsg("could not write file \"%s\": %m", path)));
}
if (write(fd, _crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
{
int save_errno = errno;

pgstat_report_wait_end();
CloseTransientFile(fd);

/* if write didn't set errno, assume problem is no disk space */
errno = save_errno ? save_errno : ENOSPC;
ereport(ERROR,
(errcode_for_file_access(),
 errmsg("could not write file \"%s\": %m", path)));
}
pgstat_report_wait_end();

/*
 * We must fsync the file because the end-of-replay checkpoint will not do
 * so, there being no GXACT in shared memory yet to tell it to.
 */
pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_SYNC);
if (pg_fsync(fd) != 0)
{
int save_errno = errno;

CloseTransientFile(fd);
errno = save_errno;
ereport(ERROR,
(errcode_for_file_access(),
 errmsg("could not fsync file \"%s\": %m", path)));
}
pgstat_report_wait_end();

First two call pgstat_report_wait_end() but third one doesn't.

As far as I know there are three places where call
pgstat_report_wait_end before ereport(ERROR): two in twophase.c
andanother in copydir.c(at L199). Since we eventually call
pgstat_report_wait_end() in AbortTransaction(). I think that we don't
need to call pgstat_report_wait_end() if we're going to raise an error
just after that. Is that right?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Transparent data encryption support as an extension

2019-04-12 Thread Masahiko Sawada

On Fri, Apr 12, 2019 at 6:34 PM Haribabu Kommi  wrote:
>
> Hi Hackers,
>
> I read many mail discussions in supporting data at rest encryption support in
> PostgreSQL.
>
> I checked the discussions around full instance encryption or tablespace or
> table level encryption. In my observation, all the proposals are trying to 
> modify
> the core code to support encryption.
>
> I am thinking of an approach of providing tablespace level encryption support
> including WAL using an extension instead of changing the core code by adding
> hooks in xlogwrite and xlogread flows, reorderbuffer flows and also by adding
> smgr plugin routines to support encryption and decryption of other pages.
>
> Definitely this approach does't work for full instance encryption.
>
> Any opinions/comments/problems in evaluating the encryption with an extesnion
> approach?
>

The discussion[1] of similar proposal might be worth to read. The
proposal was adding hook in BufferSync, although for differential
backup purpose.

[1] 
https://www.postgresql.org/message-id/20051502087...@webcorp01e.yandex-team.ru

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Transparent data encryption support as an extension

2019-04-12 Thread Haribabu Kommi

Hi Hackers,

I read many mail discussions in supporting data at rest encryption support
in
PostgreSQL.

I checked the discussions around full instance encryption or tablespace or
table level encryption. In my observation, all the proposals are trying to
modify
the core code to support encryption.

I am thinking of an approach of providing tablespace level encryption
support
including WAL using an extension instead of changing the core code by adding
hooks in xlogwrite and xlogread flows, reorderbuffer flows and also by
adding
smgr plugin routines to support encryption and decryption of other pages.

Definitely this approach does't work for full instance encryption.

Any opinions/comments/problems in evaluating the encryption with an
extesnion
approach?

Regards,
Haribabu Kommi
Fujitsu Australia

Re: Attempt to consolidate reading of XLOG page

2019-04-12 Thread Haribabu Kommi

On Fri, Apr 12, 2019 at 2:06 AM Antonin Houska  wrote:

> While working on the instance encryption I found it annoying to apply
> decyption of XLOG page to three different functions. Attached is a patch
> that
> tries to merge them all into one function, XLogRead(). The existing
> implementations differ in the way new segment is opened. So I added a
> pointer
> to callback function as a new argument. This callback handles the specific
> ways to determine segment file name and to open the file.
>
> I can split the patch into multiple diffs to make detailed review easier,
> but
> first I'd like to hear if anything is seriously wrong about this
> design. Thanks.
>

I didn't check the code, but it is good to combine all the 3 page read
functions
into one instead of spreading the logic.

Regards,
Haribabu Kommi
Fujitsu Australia

improve PQexec documentation

2019-04-12 Thread Fabien COELHO



Hello devs,

I'm looking at psql's use of PQexec for implementing some feature.

When running with multiple SQL commands, the doc is not very helpful.

From the source code I gathered that PQexec returns the first COPY results 
if any, and if not the last non-empty results, unless all is empty in 
which case an empty result is returned. So * marks the returned result

in the following examples:

  INSERT ... \; * COPY ... \; SELECT ... \; \;
  SELECT ... \; UPDATE ... \; * SELECT ... \; \;
  \; \; * ;

The attached patch tries to improve the documentation based on my 
understanding.


IMVHO, psql's code is kind of a mess to work around this strange behavior, 
as there is a loop over results within PQexec, then another one after 
PQexec if there were some COPY.


--
Fabien.diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 7f01fcc148..de9a0652a1 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -2488,9 +2488,14 @@ PGresult *PQexec(PGconn *conn, const char *command);
 for more details about how the server handles multi-query strings.)
 Note however that the returned
 PGresult structure describes only the result
-of the last command executed from the string.  Should one of the
-commands fail, processing of the string stops with it and the returned
-PGresult describes the error condition.
+of the last command executed from the string.
+If the strings contains COPY commands, the result of the
+first of those is returned and further results should be fetched with
+PQgetResult.
+If the string contains only empty commands, an empty result is returned.
+Otherwise, the result of the last non-empty command is returned.
+Should one of the commands fail, processing of the string stops with it and
+the returned PGresult describes the error condition.

Re: [PATCH v20] GSSAPI encryption support

2019-04-12 Thread Peter Eisentraut

On 2019-04-09 09:32, Peter Eisentraut wrote:
> On 2019-04-09 04:51, Stephen Frost wrote:
>>> Running just 002_enc.pl by itself passes the tests!
>> Great!  I think what I'll do is work to incorporate the two tests back
>> into one script, to avoid whatever the race condition or other confusion
>> is happening on macOS here.
> 
> That seems reasonable.  It also avoids the large amount of duplicate
> setup code.

Another problem is that the two test files cannot be run in parallel
because they use the same hardcoded data directories.  That would have
to be replaced by temporary directories.

The race condition alluded to above appears to be simply that at the end
of the test the kdc is shut down by kill -INT but nothing waits for that
to finish before a new one is started by the next test.  So there is
potential for all kinds of confusion.

Putting it all in one test file seems to be the easiest way out.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: cache lookup failed for collation 0

2019-04-12 Thread Peter Eisentraut

On 2019-04-11 17:04, Jeevan Chalke wrote:
> The error is coming from get_collation_isdeterministic() when colloid
> passed is 0. I think like we do in get_collation_name(), we should
> return false here when such collation oid does not exist.

I'm not in favor of doing that.  It would risk papering over errors of
omission at other call sites.

The root cause is that the same code match_pattern_prefix() is being
used for text and bytea, but bytea does not use collations, so having
the collation 0 is expected, and we shouldn't call
get_collation_isdeterministic() in that case.

Proposed patch attached.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From acb1542a1f8ebee9c9d6d9322c64c849b2f23e15 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 12 Apr 2019 09:49:10 +0200
Subject: [PATCH] Unbreak index optimization for LIKE on bytea

The same code is used to handle both text and bytea, but bytea is not
collation-aware, so we shouldn't call get_collation_isdeterministic()
in that case.
---
 src/backend/utils/adt/like_support.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/like_support.c 
b/src/backend/utils/adt/like_support.c
index a65e63736c..7528c80f7c 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -267,8 +267,10 @@ match_pattern_prefix(Node *leftop,
 * precise error messages.)  (It should be possible to support at least
 * Pattern_Prefix_Exact, but no point as along as the actual
 * pattern-matching implementations don't support it.)
+*
+* expr_coll is not set for a non-collation-aware data type such as 
bytea.
 */
-   if (!get_collation_isdeterministic(expr_coll))
+   if (expr_coll && !get_collation_isdeterministic(expr_coll))
return NIL;
 
/*
-- 
2.21.0

Re: REINDEX CONCURRENTLY 2.0

2019-04-12 Thread Michael Paquier

On Fri, Apr 12, 2019 at 08:44:18AM +0200, Peter Eisentraut wrote:
> Looks good, committed.

Thanks for committing!
--
Michael


signature.asc
Description: PGP signature

Re: REINDEX CONCURRENTLY 2.0

2019-04-12 Thread Peter Eisentraut

On 2019-04-11 05:59, Michael Paquier wrote:
> On Tue, Apr 09, 2019 at 03:50:27PM +0900, Michael Paquier wrote:
>> And here is the patch to address this issue.  It happens that a bit
>> more than the dependency switch was lacking here:
>> - At swap time, we need to have the new index definition track
>> relispartition from the old index.
>> - Again at swap time, the inheritance link needs to be updated between
>> the old/new index and its parent when reindexing a partition index.
> 
> Peter, this is an open item, and I think as the committer of the
> feature you are its owner.  Well, in this case, I don't mind taking
> the ownership as need be as I know this stuff.  Anyway, could you have
> a look at the patch proposed and see if you have any issues with it?

Looks good, committed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: cache lookup failed for collation 0

2019-04-12 Thread Jeevan Chalke

On Thu, Apr 11, 2019 at 10:50 PM Tom Lane  wrote:

> Jeevan Chalke  writes:
> > Do you mean, the code in get_collation_isdeterministic() should look like
> > something like below?
>
> > If colloid = InvalidOid then
> >   return TRUE
> > ELSE IF tuple is valid then
> >   return collisdeterministic from the tuple
> > ELSE
> >  return FALSE
>
> I think it's appropriate to fail if we don't find a tuple, for any
> collation oid other than zero.  Again, if you trace through the
> behavior of the longstanding collation check functions like
> lc_ctype_is_c(), you'll see that that's what happens (except for
> some hardwired OIDs that they have fast paths for).
>

OK.

Attached patch which treats "collation 0" as deterministic in
get_collation_isdeterministic() and returns true, keeping rest of the code
as is.


> regards, tom lane
>


-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index b4f2d0f..79e6484 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -945,6 +945,10 @@ get_collation_isdeterministic(Oid colloid)
 	Form_pg_collation colltup;
 	bool		result;
 
+	/* Treat "collation 0" as deterministic. */
+	if (!OidIsValid(colloid))
+		return true;
+
 	tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(colloid));
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for collation %u", colloid);
diff --git a/src/test/regress/expected/collate.out b/src/test/regress/expected/collate.out
index 0dee7d7..a8e1f6e 100644
--- a/src/test/regress/expected/collate.out
+++ b/src/test/regress/expected/collate.out
@@ -676,13 +676,19 @@ SELECT collation for ((SELECT b FROM collate_test1 LIMIT 1));
  "C"
 (1 row)
 
+CREATE TABLE byteatable(a bytea primary key, b int);
+SELECT * FROM byteatable WHERE a LIKE '%1%';
+ a | b 
+---+---
+(0 rows)
+
 --
 -- Clean up.  Many of these table names will be re-used if the user is
 -- trying to run any platform-specific collation tests later, so we
 -- must get rid of them.
 --
 DROP SCHEMA collate_tests CASCADE;
-NOTICE:  drop cascades to 17 other objects
+NOTICE:  drop cascades to 18 other objects
 DETAIL:  drop cascades to table collate_test1
 drop cascades to table collate_test_like
 drop cascades to table collate_test2
@@ -700,3 +706,4 @@ drop cascades to table collate_test21
 drop cascades to table collate_test22
 drop cascades to collation mycoll2
 drop cascades to table collate_test23
+drop cascades to table byteatable
diff --git a/src/test/regress/sql/collate.sql b/src/test/regress/sql/collate.sql
index 89de26a..124daf6 100644
--- a/src/test/regress/sql/collate.sql
+++ b/src/test/regress/sql/collate.sql
@@ -259,6 +259,9 @@ SELECT collation for ((SELECT a FROM collate_test1 LIMIT 1)); -- non-collatable
 SELECT collation for ((SELECT b FROM collate_test1 LIMIT 1));
 
 
+CREATE TABLE byteatable(a bytea primary key, b int);
+SELECT * FROM byteatable WHERE a LIKE '%1%';
+
 --
 -- Clean up.  Many of these table names will be re-used if the user is
 -- trying to run any platform-specific collation tests later, so we

55 matches

Mail list logo