Re: [HACKERS] Contains and is contained by operators of inet datatypes

2016-11-11 Thread Andreas Karlsson

Review

- Applies and passes the test suite.

- I think this is a good change since it increases the consistency of 
the operators. I also like the choice of <<@ and @>> since they feel 
intuitive to me.


- I tested it and both old and new operators use the brin and gist indexes.

- The new spgist index does not support the old deprecated operators, 
which is intentional. I do not have a strong opinion here either way but 
some people may find this surprising.


- I am not convinced that your changes to the descriptions of the 
operators necessarily make things clearer. For example "is contained by 
and smaller network (subnet)" only mentions subnets and not IP-addresses.


- Maybe change "deprecated and will eventually be removed." to 
"deprecated and may be removed in a future release.". I prefer that 
latter wording but I am fine with either.


- Won't renaming the functions which implement risk breaking people's 
applications? While the new names are a bit nicer I am not sure it is 
worth doing.


- The changes to the code look generally good.

Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-11 Thread Amit Kapila
On Tue, Nov 8, 2016 at 5:18 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
>> on something else than LW_EXCLUSIVE compared to now. To keep things
>> more simple I' would still favor using WALInsertLocks for this patch,
>> that looks more consistent, and also because there is no noticeable
>> difference.
>
> Ok, the patch looks fine. So there's nothing for me but to accept
> the current shape since the discussion about performance seems
> not to be settled with out performance measurement with machines
> with many cores.
>

I think it is good to check the performance impact of this patch on
many core m/c.  Is it possible for you to once check with Alexander
Korotkov to see if he can provide you access to his powerful m/c which
has 70 cores (if I remember correctly)?


@@ -1035,6 +1038,7 @@ LogAccessExclusiveLocks(int nlocks,
xl_standby_lock *locks)
  XLogBeginInsert();
  XLogRegisterData((char *) , offsetof(xl_standby_locks, locks));
  XLogRegisterData((char *) locks, nlocks * sizeof(xl_standby_lock));
+ XLogSetFlags(XLOG_NO_PROGRESS);


Is it right to set XLOG_NO_PROGRESS flag in LogAccessExclusiveLocks?
This function is called not only in LogStandbySnapshot(), but during
DDL operations as well where lockmode >= AccessExclusiveLock.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump, pg_dumpall and data durability

2016-11-11 Thread Michael Paquier
On Fri, Nov 11, 2016 at 11:03 PM, Albe Laurenz  wrote:
> - In pg_dumpall.c, the result of fsync_fname() is cast to "void" to show that
>   the return code is ignored, but not anywhere else.  Is that by design?

Right. The patch is lacking consistency in this area. The main thought
regarding this design is to not consider a fsync failure as critical,
and just issue a warning in stderr. I have updated the two other call
sites with a (void) cast.

> - For pg_dumpall, a short option "-N" is added for "--no-sync", but not for
>   pg_dump (because -N is already taken there).
>   I'd opt for either using the same short option for both or (IMO better)
>   only offering a long option for both.

Okay. For consistency's sake let's do that. I was a bit hesitant
regarding that to be honest.

>   This would avoid confusion, and we expect that few people will want to use
>   this option anyway, right?

Definitely a good point.
-- 
Michael
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 371a614..dcad095 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -815,6 +815,20 @@ PostgreSQL documentation
  
 
  
+  --no-sync
+  
+   
+By default, pg_dump will wait for all files
+to be written safely to disk.  This option causes
+pg_dump to return without waiting, which is
+faster, but means that a subsequent operating system crash can leave
+the dump corrupt.  Generally, this option is useful for testing
+but should not be used when dumping data from production installation.
+   
+  
+ 
+
+ 
   --quote-all-identifiers
   

diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml
index 97168a0..4e6839b 100644
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -365,6 +365,21 @@ PostgreSQL documentation
  
 
  
+  -N
+  --no-sync
+  
+   
+By default, pg_dumpall will wait for all files
+to be written safely to disk.  This option causes
+pg_dumpall to return without waiting, which is
+faster, but means that a subsequent operating system crash can leave
+the dump corrupt.  Generally, this option is useful for testing
+but should not be used when dumping data from production installation.
+   
+  
+ 
+
+ 
   --quote-all-identifiers
   

diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 0a28124..becbc56 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -268,7 +268,7 @@ extern Archive *OpenArchive(const char *FileSpec, const 
ArchiveFormat fmt);
 
 /* Create a new archive */
 extern Archive *CreateArchive(const char *FileSpec, const ArchiveFormat fmt,
- const int compression, ArchiveMode mode,
+ const int compression, bool dosync, ArchiveMode mode,
  SetupWorkerPtr setupDumpWorker);
 
 /* The --list option */
diff --git a/src/bin/pg_dump/pg_backup_archiver.c 
b/src/bin/pg_dump/pg_backup_archiver.c
index 0e20985..5b60da6 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -56,7 +56,8 @@ static const char *modulename = gettext_noop("archiver");
 
 
 static ArchiveHandle *_allocAH(const char *FileSpec, const ArchiveFormat fmt,
-const int compression, ArchiveMode mode, SetupWorkerPtr 
setupWorkerPtr);
+const int compression, bool dosync, ArchiveMode mode,
+SetupWorkerPtr setupWorkerPtr);
 static void _getObjectDescription(PQExpBuffer buf, TocEntry *te,
  ArchiveHandle *AH);
 static void _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData, bool 
acl_pass);
@@ -202,10 +203,12 @@ setupRestoreWorker(Archive *AHX)
 /* Public */
 Archive *
 CreateArchive(const char *FileSpec, const ArchiveFormat fmt,
-const int compression, ArchiveMode mode, SetupWorkerPtr 
setupDumpWorker)
+ const int compression, bool dosync, ArchiveMode mode,
+ SetupWorkerPtr setupDumpWorker)
 
 {
-   ArchiveHandle *AH = _allocAH(FileSpec, fmt, compression, mode, 
setupDumpWorker);
+   ArchiveHandle *AH = _allocAH(FileSpec, fmt, compression, dosync,
+mode, 
setupDumpWorker);
 
return (Archive *) AH;
 }
@@ -215,7 +218,7 @@ CreateArchive(const char *FileSpec, const ArchiveFormat fmt,
 Archive *
 OpenArchive(const char *FileSpec, const ArchiveFormat fmt)
 {
-   ArchiveHandle *AH = _allocAH(FileSpec, fmt, 0, archModeRead, 
setupRestoreWorker);
+   ArchiveHandle *AH = _allocAH(FileSpec, fmt, 0, true, archModeRead, 
setupRestoreWorker);
 
return (Archive *) AH;
 }
@@ -2223,7 +2226,8 @@ _discoverArchiveFormat(ArchiveHandle *AH)
  */
 static ArchiveHandle *
 

Re: [HACKERS] Why PostgreSQL doesn't implement a semi sync replication?

2016-11-11 Thread Craig Ringer
On 12 November 2016 at 02:12, Petr Jelinek  wrote:
> On 11/11/16 16:03, Francisco Olarte wrote:
>> On Fri, Nov 11, 2016 at 4:40 AM, 余森彬  wrote:
>>> As we know, the synchronous commit process is blocked while receives
>>> from acknowledgement from standby in
>>> PostgreSQL.This is good for data consistence in master and standby, and
>>> application can get important data from standby.But
>>> when the standby crash or network goes wrong, the master could be hang.Is
>>> there a feature plan for a semi sync like MySQL
>>> InnoDB(set a timer, and become asynchronous when timeout)?
>>
>> JMO, but it seems this basically means any process should be dessigned
>> to cope with the posibility of not having replicated data after
>> commit, so, why bother with synchronous replication in the first
>> place?
>
> It's often more acceptable to say "we lose data when 2 servers die (or
> are in problems)" than "we lose data when 1 server dies" and it's also
> more acceptable to say "we stop answering when we lose 2 servers" but
> not "we stop answering when we lose 1 server", and semisync replication
> works for combination of these two.

Yep. Also, monitoring. sync with a short timeout means you can usually
rely on sync rep, and if it times out and falls back to async your
monitoring system can start screaming at you.

I think k= replication will help quite a bit with this though.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_sequence catalog

2016-11-11 Thread Andreas Karlsson

Review for pg_sequence catalog

I like this change since it moves all the parts which should be 
transactional to the system catalogs while keeping the only 
non-transactional stuff in the sequence relations.


There was some discussion upthread about more compact representations 
for the sequences, but I feel that is a separate issue mostly unrelated 
to this patch.


I might play around more with it but it seems to work well so far.

As pointed out by Peter this patch also requires the changes to 
pg_upgrade. I have not looked at those patches.


= Functional review

- The patch applies and compiles and seems to work fine after some quick 
manual testing.


- The pg_dump tests fails due to the pg_dump code not being updated. I 
have attached a patch which fixes this.


= Benchmarks

I was a bit worried that the extra syscache lookups might slow down 
nextval(), but I got a measurable speed up on a very simple workload 
which consisted of only calls to nextval() to one sequence. The speedup 
was about 10% on my machine.


= Code

The changes to the code looks generally good.

> @@ -1155,6 +1156,8 @@ doDeletion(const ObjectAddress *object, int flags)
> else
> heap_drop_with_catalog(object->objectId);
> }
> +   if (relKind == RELKIND_SEQUENCE)
> +   DeleteSequenceTuple(object->objectId);
> break;
> }

I think it might be cleaner here to put this as a "else if" just like 
"relKind == RELKIND_INDEX".


= Documentation

The patch does not update catalogs.sgml which it should do.

Andreas
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index ee1f673..a272ad3 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -15115,7 +15115,27 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
 	snprintf(bufm, sizeof(bufm), INT64_FORMAT, SEQ_MINVALUE);
 	snprintf(bufx, sizeof(bufx), INT64_FORMAT, SEQ_MAXVALUE);
 
-	if (fout->remoteVersion >= 80400)
+	if (fout->remoteVersion >= 10)
+	{
+		appendPQExpBuffer(query,
+		  "SELECT relname, "
+		  "seqstart, seqincrement, "
+		  "CASE WHEN seqincrement > 0 AND seqmax = %s THEN NULL "
+		  " WHEN seqincrement < 0 AND seqmax = -1 THEN NULL "
+		  " ELSE seqmax "
+		  "END AS seqmax, "
+		  "CASE WHEN seqincrement > 0 AND seqmin = 1 THEN NULL "
+		  " WHEN seqincrement < 0 AND seqmin = %s THEN NULL "
+		  " ELSE seqmin "
+		  "END AS seqmin, "
+		  "seqcache, seqcycle "
+		  "FROM pg_class c "
+		  "JOIN pg_sequence s ON (s.seqrelid = c.oid) "
+		  "WHERE relname = ",
+		  bufx, bufm);
+		appendStringLiteralAH(query, tbinfo->dobj.name, fout);
+	}
+	else if (fout->remoteVersion >= 80400)
 	{
 		appendPQExpBuffer(query,
 		  "SELECT sequence_name, "

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove the comment on the countereffectiveness of large shared_buffers on Windows

2016-11-11 Thread Amit Kapila
On Fri, Nov 11, 2016 at 3:01 PM, Magnus Hagander  wrote:
>> > Based on this optimization we might want to keep the text that says
>> > large
>> > shared buffers on Windows aren't as effective perhaps,

Sounds sensible or may add a line to say why it isn't as effective as on Linux.

> and just remove
>> > the
>> > sentence that explicitly says don't go over 512MB?
>>

Have we done any windows specific optimization since it was originally
mentioned as 512MB which indicates that we can remove it?  Are you
telling it based on results in this thread, if so, I think it is
better to do few more tests before changing it.

>> Just removing the reference to the size would make users ask a question
>> "What size is the effective upper limit?"
>
>
> True, but that's a question for other platforms as well, isn't it?
>

Right, but for other platforms, the recommendation seems to be 25% of
RAM, can we safely say that for Windows as well?  As per test results
in this thread, it seems the read-write performance degrades when
shared buffers have increased from 12.5 to 25%.  I think as the test
is done for a short duration so that degradation could be just a run
to run to run variation, that's why I suggested doing few more tests.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something is broken about connection startup

2016-11-11 Thread Tom Lane
I wrote:
> So it's happening when RelationCacheInitializePhase3 is trying to replace
> a fake pg_class row for pg_proc (made by formrdesc) with the real one.
> That's even odder, because that's late enough that this should be a pretty
> ordinary catalog lookup.  Now I wonder if it's possible that this can be
> seen during ordinary relation opens after connection startup.  If so, it
> would almost surely be a recently-introduced bug, else we'd have heard
> about this from the field.

Okay, I've been tracing through this, and the reason that the catalog
search for the real pg_proc row is failing is that all it finds are
versions too new for its snapshot to see.  Thus, the failure happens
when the process running the GRANTs has pruned away a committed-dead
tuple that is the version the incoming process would have needed to see.
This is not the fault of the process running the GRANTs, because *the
incoming process is advertising MyPgXact->xmin = zero* which should mean
that it has no active snapshot.  It has no right to complain that
somebody truncated away data.

The sequence of events inside the incoming process seems to go
approximately like this:

* RelationCacheInitializePhase3 goes to load up the pg_class row for some
other catalog (maybe pg_class itself; whatever hash_seq_search comes to
first).  This results in systable_beginscan calling GetCatalogSnapshot
which calls GetSnapshotData which sets MyPgXact->xmin nonzero as a side
effect.  All well so far.

* Once we've collected that row, systable_endscan unregisters the
snapshot, which ends with SnapshotResetXmin resetting MyPgXact->xmin
to zero because the RegisteredSnapshots list is now empty.

* RelationCacheInitializePhase3 now tries to read the pg_class row for
pg_proc.  This results in systable_beginscan calling GetCatalogSnapshot,
which doesn't call GetSnapshotData this time, it just returns the
snapshot it's already got.  This is a fatal mistake, because there is
nothing preventing data visible to that snapshot from being removed.


So this has pretty much been broken since we put in MVCC snapshots for
catalog searches.  The problem would be masked when inside a transaction
that has already got a transaction snapshot, but whenever we don't have
one already, our catalog lookups are basically unprotected against
premature row removal.  I don't see any reason to think that this is
specific to connection startup.

The basic problem here, therefore, is that SnapshotResetXmin isn't aware
that GetCatalogSnapshot is keeping a possibly-unregistered snapshot in
its hip pocket.  That has to change.  We could either treat the saved
CatalogSnapshot as always being registered, or we could add some logic
to force invalidating the CatalogSnapshot whenever we clear MyPgXact->xmin
or advance it past that snapshot's xmin.

Neither of those is very nice from a performance standpoint.  With the
first option we risk delaying global cleanup by holding back
MyPgXact->xmin to protect a CatalogSnapshot we might not actually use
anytime soon.  With the second option we will likely end up doing many
more GetSnapshotData calls than we do now, because in a transaction
that hasn't yet set a regular snapshot, we will need to get a new
CatalogSnapshot for every catalog access (since we'll go back to
MyPgXact->xmin = 0 after every access).  And the parser, in particular,
tends to do a lot of catalog accesses before we ever get a regular
transaction snapshot.

Ideally, perhaps, we'd treat the saved CatalogSnapshot as registered
but automatically kill it "every so often".  I'm not sure how to drive
that though.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Reload SSL certificates on SIGHUP

2016-11-11 Thread Andreas Karlsson

On 11/11/2016 07:40 PM, Andreas Karlsson wrote:

Hi,

Here is a new version of the patch with the only differences;

1) The SSL tests have been changed to use reload rather than restart

2) Rebased on master


And here with a fix to a comment.

Andreas

diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 787cfce..5e78d81 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -2288,8 +2288,8 @@ pg_dumpall -p 5432 | psql -d postgres -p 5433
 The files server.key, server.crt,
 root.crt, and root.crl
 (or their configured alternative names)
-are only examined during server start; so you must restart
-the server for changes in them to take effect.
+are examined when reloading the configuration, or when spawning the backend
+process on Windows systems.

   
 
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 668f217..8449a53 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -77,12 +77,14 @@ static DH  *generate_dh_parameters(int prime_len, int generator);
 static DH  *tmp_dh_cb(SSL *s, int is_export, int keylength);
 static int	verify_cb(int, X509_STORE_CTX *);
 static void info_cb(const SSL *ssl, int type, int args);
-static void initialize_ecdh(void);
+static SSL_CTX *initialize_context(void);
+static bool initialize_ecdh(SSL_CTX *context);
 static const char *SSLerrmessage(unsigned long ecode);
 
 static char *X509_NAME_to_cstring(X509_NAME *name);
 
 static SSL_CTX *SSL_context = NULL;
+static bool SSL_initialized = false;
 
 /*  */
 /*		 Hardcoded values		*/
@@ -157,14 +159,12 @@ KWbuHn491xNO25CQWMtem80uKw+pTnisBRF/454n1Jnhub144YRBoN8CAQI=\n\
 /*
  *	Initialize global SSL context.
  */
-void
+int
 be_tls_init(void)
 {
-	struct stat buf;
+	SSL_CTX *context;
 
-	STACK_OF(X509_NAME) *root_cert_list = NULL;
-
-	if (!SSL_context)
+	if (!SSL_initialized)
 	{
 #ifdef HAVE_OPENSSL_INIT_SSL
 		OPENSSL_init_ssl(OPENSSL_INIT_LOAD_CONFIG, NULL);
@@ -173,177 +173,29 @@ be_tls_init(void)
 		SSL_library_init();
 		SSL_load_error_strings();
 #endif
-
-		/*
-		 * We use SSLv23_method() because it can negotiate use of the highest
-		 * mutually supported protocol version, while alternatives like
-		 * TLSv1_2_method() permit only one specific version.  Note that we
-		 * don't actually allow SSL v2 or v3, only TLS protocols (see below).
-		 */
-		SSL_context = SSL_CTX_new(SSLv23_method());
-		if (!SSL_context)
-			ereport(FATAL,
-	(errmsg("could not create SSL context: %s",
-			SSLerrmessage(ERR_get_error();
-
-		/*
-		 * Disable OpenSSL's moving-write-buffer sanity check, because it
-		 * causes unnecessary failures in nonblocking send cases.
-		 */
-		SSL_CTX_set_mode(SSL_context, SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER);
-
-		/*
-		 * Load and verify server's certificate and private key
-		 */
-		if (SSL_CTX_use_certificate_chain_file(SSL_context,
-			   ssl_cert_file) != 1)
-			ereport(FATAL,
-	(errcode(ERRCODE_CONFIG_FILE_ERROR),
-  errmsg("could not load server certificate file \"%s\": %s",
-		 ssl_cert_file, SSLerrmessage(ERR_get_error();
-
-		if (stat(ssl_key_file, ) != 0)
-			ereport(FATAL,
-	(errcode_for_file_access(),
-	 errmsg("could not access private key file \"%s\": %m",
-			ssl_key_file)));
-
-		if (!S_ISREG(buf.st_mode))
-			ereport(FATAL,
-	(errcode(ERRCODE_CONFIG_FILE_ERROR),
-	 errmsg("private key file \"%s\" is not a regular file",
-			ssl_key_file)));
-
-		/*
-		 * Refuse to load files owned by users other than us or root.
-		 *
-		 * XXX surely we can check this on Windows somehow, too.
-		 */
-#if !defined(WIN32) && !defined(__CYGWIN__)
-		if (buf.st_uid != geteuid() && buf.st_uid != 0)
-			ereport(FATAL,
-	(errcode(ERRCODE_CONFIG_FILE_ERROR),
-	 errmsg("private key file \"%s\" must be owned by the database user or root",
-			ssl_key_file)));
-#endif
-
-		/*
-		 * Require no public access to key file. If the file is owned by us,
-		 * require mode 0600 or less. If owned by root, require 0640 or less
-		 * to allow read access through our gid, or a supplementary gid that
-		 * allows to read system-wide certificates.
-		 *
-		 * XXX temporarily suppress check when on Windows, because there may
-		 * not be proper support for Unix-y file permissions.  Need to think
-		 * of a reasonable check to apply on Windows.  (See also the data
-		 * directory permission check in postmaster.c)
-		 */
-#if !defined(WIN32) && !defined(__CYGWIN__)
-		if ((buf.st_uid == geteuid() && buf.st_mode & (S_IRWXG | S_IRWXO)) ||
-			(buf.st_uid == 0 && buf.st_mode & (S_IWGRP | S_IXGRP | S_IRWXO)))
-			ereport(FATAL,
-	(errcode(ERRCODE_CONFIG_FILE_ERROR),
-  errmsg("private key file \"%s\" has group or world access",
-		 ssl_key_file),
-	 errdetail("File must have permissions u=rw (0600) or less if owned by the 

Re: [HACKERS] [PATCH] Reload SSL certificates on SIGHUP

2016-11-11 Thread Andreas Karlsson

Hi,

Here is a new version of the patch with the only differences;

1) The SSL tests have been changed to use reload rather than restart

2) Rebased on master

Please take a look.

Andreas
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 787cfce..5e78d81 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -2288,8 +2288,8 @@ pg_dumpall -p 5432 | psql -d postgres -p 5433
 The files server.key, server.crt,
 root.crt, and root.crl
 (or their configured alternative names)
-are only examined during server start; so you must restart
-the server for changes in them to take effect.
+are examined when reloading the configuration, or when spawning the backend
+process on Windows systems.

   
 
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 668f217..8449a53 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -77,12 +77,14 @@ static DH  *generate_dh_parameters(int prime_len, int generator);
 static DH  *tmp_dh_cb(SSL *s, int is_export, int keylength);
 static int	verify_cb(int, X509_STORE_CTX *);
 static void info_cb(const SSL *ssl, int type, int args);
-static void initialize_ecdh(void);
+static SSL_CTX *initialize_context(void);
+static bool initialize_ecdh(SSL_CTX *context);
 static const char *SSLerrmessage(unsigned long ecode);
 
 static char *X509_NAME_to_cstring(X509_NAME *name);
 
 static SSL_CTX *SSL_context = NULL;
+static bool SSL_initialized = false;
 
 /*  */
 /*		 Hardcoded values		*/
@@ -157,14 +159,12 @@ KWbuHn491xNO25CQWMtem80uKw+pTnisBRF/454n1Jnhub144YRBoN8CAQI=\n\
 /*
  *	Initialize global SSL context.
  */
-void
+int
 be_tls_init(void)
 {
-	struct stat buf;
+	SSL_CTX *context;
 
-	STACK_OF(X509_NAME) *root_cert_list = NULL;
-
-	if (!SSL_context)
+	if (!SSL_initialized)
 	{
 #ifdef HAVE_OPENSSL_INIT_SSL
 		OPENSSL_init_ssl(OPENSSL_INIT_LOAD_CONFIG, NULL);
@@ -173,177 +173,29 @@ be_tls_init(void)
 		SSL_library_init();
 		SSL_load_error_strings();
 #endif
-
-		/*
-		 * We use SSLv23_method() because it can negotiate use of the highest
-		 * mutually supported protocol version, while alternatives like
-		 * TLSv1_2_method() permit only one specific version.  Note that we
-		 * don't actually allow SSL v2 or v3, only TLS protocols (see below).
-		 */
-		SSL_context = SSL_CTX_new(SSLv23_method());
-		if (!SSL_context)
-			ereport(FATAL,
-	(errmsg("could not create SSL context: %s",
-			SSLerrmessage(ERR_get_error();
-
-		/*
-		 * Disable OpenSSL's moving-write-buffer sanity check, because it
-		 * causes unnecessary failures in nonblocking send cases.
-		 */
-		SSL_CTX_set_mode(SSL_context, SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER);
-
-		/*
-		 * Load and verify server's certificate and private key
-		 */
-		if (SSL_CTX_use_certificate_chain_file(SSL_context,
-			   ssl_cert_file) != 1)
-			ereport(FATAL,
-	(errcode(ERRCODE_CONFIG_FILE_ERROR),
-  errmsg("could not load server certificate file \"%s\": %s",
-		 ssl_cert_file, SSLerrmessage(ERR_get_error();
-
-		if (stat(ssl_key_file, ) != 0)
-			ereport(FATAL,
-	(errcode_for_file_access(),
-	 errmsg("could not access private key file \"%s\": %m",
-			ssl_key_file)));
-
-		if (!S_ISREG(buf.st_mode))
-			ereport(FATAL,
-	(errcode(ERRCODE_CONFIG_FILE_ERROR),
-	 errmsg("private key file \"%s\" is not a regular file",
-			ssl_key_file)));
-
-		/*
-		 * Refuse to load files owned by users other than us or root.
-		 *
-		 * XXX surely we can check this on Windows somehow, too.
-		 */
-#if !defined(WIN32) && !defined(__CYGWIN__)
-		if (buf.st_uid != geteuid() && buf.st_uid != 0)
-			ereport(FATAL,
-	(errcode(ERRCODE_CONFIG_FILE_ERROR),
-	 errmsg("private key file \"%s\" must be owned by the database user or root",
-			ssl_key_file)));
-#endif
-
-		/*
-		 * Require no public access to key file. If the file is owned by us,
-		 * require mode 0600 or less. If owned by root, require 0640 or less
-		 * to allow read access through our gid, or a supplementary gid that
-		 * allows to read system-wide certificates.
-		 *
-		 * XXX temporarily suppress check when on Windows, because there may
-		 * not be proper support for Unix-y file permissions.  Need to think
-		 * of a reasonable check to apply on Windows.  (See also the data
-		 * directory permission check in postmaster.c)
-		 */
-#if !defined(WIN32) && !defined(__CYGWIN__)
-		if ((buf.st_uid == geteuid() && buf.st_mode & (S_IRWXG | S_IRWXO)) ||
-			(buf.st_uid == 0 && buf.st_mode & (S_IWGRP | S_IXGRP | S_IRWXO)))
-			ereport(FATAL,
-	(errcode(ERRCODE_CONFIG_FILE_ERROR),
-  errmsg("private key file \"%s\" has group or world access",
-		 ssl_key_file),
-	 errdetail("File must have permissions u=rw (0600) or less if owned by the database user, or permissions u=rw,g=r (0640) or less if owned by 

Re: [HACKERS] Why PostgreSQL doesn't implement a semi sync replication?

2016-11-11 Thread Petr Jelinek
On 11/11/16 16:03, Francisco Olarte wrote:
> On Fri, Nov 11, 2016 at 4:40 AM, 余森彬  wrote:
>> As we know, the synchronous commit process is blocked while receives
>> from acknowledgement from standby in
>> PostgreSQL.This is good for data consistence in master and standby, and
>> application can get important data from standby.But
>> when the standby crash or network goes wrong, the master could be hang.Is
>> there a feature plan for a semi sync like MySQL
>> InnoDB(set a timer, and become asynchronous when timeout)?
> 
> JMO, but it seems this basically means any process should be dessigned
> to cope with the posibility of not having replicated data after
> commit, so, why bother with synchronous replication in the first
> place?

It's often more acceptable to say "we lose data when 2 servers die (or
are in problems)" than "we lose data when 1 server dies" and it's also
more acceptable to say "we stop answering when we lose 2 servers" but
not "we stop answering when we lose 1 server", and semisync replication
works for combination of these two.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_sequence catalog

2016-11-11 Thread Andreas Karlsson

On 11/10/2016 06:27 AM, Andreas Karlsson wrote:

On 11/10/2016 05:29 AM, Peter Eisentraut wrote:

On 11/8/16 6:43 PM, Andreas Karlsson wrote:

- Shouldn't last_value be NULL directly after we have created the
sequence but nobody has called nextval() yet?

- I noticed that last_value includes the cached values, but that also
seems to me like the correct thing to do.


The documentation now emphasizes that this is the value stored on disk.
This matches what Oracle does.


We also store is_called on disk, I think that if is_called is false then
last_value should be NULL. Either that or we should add is_called.

I will give the patch another review some time this week when i can find
the time.


Other than my comment above about is_called and last_value I think the 
patch looks great.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we need use more meaningful variables to replace 0 in catalog head files?

2016-11-11 Thread Tom Lane
Andrew Dunstan  writes:
> +1. If we come up with an agreed format I will undertake to produce the 
> requisite perl script. So let's reopen the debate on the data format. I 
> want something that doesn't consume large numbers of lines per entry. If 
> we remove defaults in most cases we should be able to fit a set of 
> key/value pairs on just a handful of lines.

The other reason for keeping the entries short is to prevent patch
misapplications: you want three or less lines of context to be enough
to uniquely identify which line you're changing.  So something with,
say, a bunch of  overhead, with that markup split onto
separate lines, would be a disaster.  This may mean that we can't
get too far away from the DATA-line approach :-(.

Or maybe what we need to do is ensure that there's identification info on
every line, something like (from the first entry in pg_proc.h)

boolin: OID=1242 proname=boolin proargtypes="cstring" prorettype=bool
boolin: prosrc=boolin provolatile=i proparallel=s

(I'm imagining the prefix as having no particular semantic significance,
except that identical values on successive lines denote fields for a
single catalog row.)

With this approach, even if you had blocks of boilerplate-y lines
that were the same for many successive functions, the prefixes would
keep them looking unique to "patch".

On the other hand, Andrew might be right that with reasonable defaults
available, the entries would mostly be short enough that there wouldn't
be much of a problem anyway.  This example certainly looks that way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why PostgreSQL doesn't implement a semi sync replication?

2016-11-11 Thread Francisco Olarte
On Fri, Nov 11, 2016 at 4:40 AM, 余森彬  wrote:
> As we know, the synchronous commit process is blocked while receives
> from acknowledgement from standby in
> PostgreSQL.This is good for data consistence in master and standby, and
> application can get important data from standby.But
> when the standby crash or network goes wrong, the master could be hang.Is
> there a feature plan for a semi sync like MySQL
> InnoDB(set a timer, and become asynchronous when timeout)?

JMO, but it seems this basically means any process should be dessigned
to cope with the posibility of not having replicated data after
commit, so, why bother with synchronous replication in the first
place?

Francisco Olarte.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving RLS planning

2016-11-11 Thread Stephen Frost
* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
> On 10 November 2016 at 17:12, Tom Lane  wrote:
> > Yeah, I think we'd be best off to avoid the bare term "security".
> > It's probably too late to change the RTE field name "securityQuals",
> > but maybe we could uniformly call those "security barrier quals" in
> > the comments.  Then the basic terminology is that we have security
> > barrier views and row-level security both implemented on top of
> > security barrier quals, and we should be careful to use the right
> > one of those three terms in comments/documentation.
> 
> +1 for that terminology and no renaming of fields.

Agreed.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-11 Thread Peter Eisentraut
On 11/9/16 10:05 AM, Mithun Cy wrote:
> As in Victor's patch we have a new connection parameter
> "target_server_type", It can take 2 values 1. "any" 2. "master" with
> DEFAULT as "any". If it's has the value "any" we can connect to any of
> the host server (both master(primary) and slave(standby)). If the value
> is "master" then we try to connect to master(primary) only.

We tend to use the term "primary" instead of "master".

Will this work with logical replication?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump, pg_dumpall and data durability

2016-11-11 Thread Albe Laurenz
Michael Paquier wrote:
> A typo s/pre_fsync_fname/pre_sync_fname, and a mistake from me because
> I did not compile with -DPG_FLUSH_DATA_WORKS to check this code.
> 
> v2 is attached, fixing those issues.

The patch applies and compiles fine.

I have tested it on Linux and MinGW and could see the fsync(2) and
FlushFileBuffers calls I expected.
This adds crash safety for a reasonable price, and I think we should have that.

The documentation additions are sufficient.

Looking through the patch, I had two questions that are more about
style and consistency than anything else:

- In pg_dumpall.c, the result of fsync_fname() is cast to "void" to show that
  the return code is ignored, but not anywhere else.  Is that by design?

- For pg_dumpall, a short option "-N" is added for "--no-sync", but not for
  pg_dump (because -N is already taken there).
  I'd opt for either using the same short option for both or (IMO better)
  only offering a long option for both.
  This would avoid confusion, and we expect that few people will want to use
  this option anyway, right?

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we need use more meaningful variables to replace 0 in catalog head files?

2016-11-11 Thread Andrew Dunstan



On 11/11/2016 03:03 AM, Magnus Hagander wrote:


On Nov 11, 2016 00:53, "Jan de Visser" > wrote:

>
> On 2016-11-09 10:47 AM, Tom Lane wrote:
>
>> Amit Langote > writes:

>>>
>>> On Wed, Nov 9, 2016 at 11:47 PM, Tom Lane > wrote:


 Hmm, that's from 2009.  I thought I remembered something much 
more recent,

 like last year or so.
>>>
>>> This perhaps:
>>> * Re: Bootstrap DATA is a pita *
>>> 
https://www.postgresql.org/message-id/flat/CAOjayEfKBL-_Q9m3Jsv6V-mK1q8h%3Dca5Hm0fecXGxZUhPDN9BA%40mail.gmail.com

>>
>> Yeah, that's the thread I remembered.  I think the basic conclusion was
>> that we needed a Perl script that would suck up a bunch of data 
from some

>> representation that's more edit-friendly than the DATA lines, expand
>> symbolic representations (regprocedure etc) into numeric OIDs, and 
write
>> out the .bki script from that.  I thought some people had 
volunteered to

>> work on that, but we've seen no results ...
>>
>> regards, tom lane
>>
>>
>
> Would a python script converting something like json or yaml be 
acceptable? I think right now only perl is used, so it would be a new 
build chain tool, albeit one that's in my (very humble) opinion much 
better suited to the task.

>

Python or perl is not what matters here really. For something as 
simple as this (for the script) it doesn't make a real difference. I 
personally prefer python over perl in most cases, but our standard is 
perl so we should stick to that.


The issues is coming up with a format that people like and think is an 
improvement.


If we have that and a python script for our, someone would surely 
volunteer to convert that part. But we need to start by solving the 
actual problem.






+1. If we come up with an agreed format I will undertake to produce the 
requisite perl script. So let's reopen the debate on the data format. I 
want something that doesn't consume large numbers of lines per entry. If 
we remove defaults in most cases we should be able to fit a set of 
key/value pairs on just a handful of lines.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why PostgreSQL doesn't implement a semi sync replication?

2016-11-11 Thread 余森彬


Hi all:


As we know, the synchronous commit process is blocked while 
receives from acknowledgement from standby in


PostgreSQL.This is good for data consistence in master and standby, and 
application can get important data from standby.But


when the standby crash or network goes wrong, the master could be 
hang.Is there a feature plan for a semi sync like MySQL


InnoDB(set a timer, and become asynchronous when timeout)?



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding in docs the meaning of pg_stat_replication.sync_state

2016-11-11 Thread Michael Paquier
On Thu, Nov 10, 2016 at 10:35 PM, Fujii Masao  wrote:
> On Wed, Nov 9, 2016 at 2:33 PM, Michael Paquier
>  wrote:
>> Hi all,
>>
>> The documentation does not explain at all what means "sync" or "async"
>> on pg_stat_replication.
>
> "potential" state also should be explained?

Of course.

>> The paragraph "Planning for high availability"
>> mentions "catchup" and "streaming", but it does not say that users can
>> refer to it directly in pg_stat_replication.
>>
>> Thoughts about the patch attached to add a couple of sentences in
>> "Planning for high availability"? We could as well mention it directly
>> in the page of pg_stat_replication but this information seems more
>> suited if located in the HA section.
>
> Yeah, I think it's better to mention them even in pg_stat_replication page.

OK, so I have done things this way. What do you think? At the same
time it would be good to mention all the possible field values in
"state", aka streaming, catchup, etc so I added that as well.
-- 
Michael


sync-state-docs.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gather Merge

2016-11-11 Thread Rushabh Lathia
Oops forgot to attach latest patch in the earlier mail.



On Fri, Nov 11, 2016 at 6:26 PM, Rushabh Lathia 
wrote:

>
>
> On Fri, Nov 4, 2016 at 8:30 AM, Thomas Munro <
> thomas.mu...@enterprisedb.com> wrote:
>
>> On Thu, Oct 27, 2016 at 10:50 PM, Rushabh Lathia
>>  wrote:
>> > Please find attached latest patch which fix the review point as well as
>> > additional clean-up.
>>
>> I've signed up to review this patch and I'm planning to do some
>> testing.  Here's some initial feedback after a quick read-through:
>>
>>
> Thanks Thomas.
>
>
>> + if (gather_merge_readnext(gm_state, i, initialize ? false : true))
>>
>> Clunky ternary operator... how about "!initialize".
>>
>>
> Fixed.
>
>
>> +/*
>> + * Function clear out a slot in the tuple table for each gather merge
>> + * slots and returns the clear clear slot.
>> + */
>>
>> Maybe better like this?  "_Clear_ out a slot in the tuple table for
>> each gather merge _slot_ and _return_ the _cleared_ slot."
>>
>>
> Fixed.
>
>
>> + /* Free tuple array as we no more need it */
>>
>> "... as we don't need it any more"
>>
>>
> Fixed
>
>
>> +/*
>> + * Read the next tuple for gather merge.
>> + *
>> + * Function fetch the sorted tuple out of the heap.
>> + */
>>
>> "_Fetch_ the sorted tuple out of the heap."
>>
>>
> Fixed
>
>
>> + * Otherwise, pull the next tuple from whichever participate we
>> + * returned from last time, and reinsert the index into the heap,
>> + * because it might now compare differently against the existing
>>
>> s/participate/participant/
>>
>>
> Fixed.
>
>
>> + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
>> + * Portions Copyright (c) 1994, Regents of the University of California
>>
>> Shouldn't this say just "(c) 2016, PostgreSQL Global Development
>> Group"?
>
>
> Fixed.
>
>
>> Are we supposed to be blaming the University of California
>> for new files?
>>
>>
> Not quite sure about this, so keeping this as it is.
>
>
>> +#include "executor/tqueue.h"
>> +#include "miscadmin.h"
>> +#include "utils/memutils.h"
>> +#include "utils/rel.h"
>> +#include "lib/binaryheap.h"
>>
>> Not correctly sorted.
>>
>>
> Copied from nodeGather.c. But Fixed here.
>
>
>> + /*
>> + * store the tuple descriptor into gather merge state, so we can use it
>> + * later while initilizing the gather merge slots.
>> + */
>>
>> s/initilizing/initializing/
>>
>>
> Fixed.
>
>
>> +/* 
>> + * ExecEndGatherMerge
>> + *
>> + * frees any storage allocated through C routines.
>> + * 
>>
>> The convention in Postgres code seems to be to use a form like "Free
>> any storage ..." in function documentation.  Not sure if that's an
>> imperative, an infinitive, or if the word "we" is omitted since
>> English is so fuzzy like that, but it's inconsistent with other
>> documentation to use "frees" here.  Oh, I see that exact wording is in
>> several other files.  I guess I'll just leave this as a complaint
>> about all those files then :-)
>>
>>
> Sure.
>
>
>> + * Pull atleast single tuple from each worker + leader and set up the
>> heap.
>>
>> s/atleast single/at least a single/
>>
>>
> Fixed.
>
>
>> + * Read the tuple for given reader into nowait mode, and form the tuple
>> array.
>>
>> s/ into / in /
>>
>>
> Fixed.
>
>
>> + * Function attempt to read tuple for the given reader and store it into
>> reader
>>
>> s/Function attempt /Attempt /
>>
>>
> Fixed.
>
>
>> + * Function returns true if found tuple for the reader, otherwise returns
>>
>> s/Function returns /Return /
>>
>>
> Fixed.
>
>
>> + * First try to read tuple for each worker (including leader) into nowait
>> + * mode, so that we initialize read from each worker as well as leader.
>>
>> I wonder if it would be good to standardise on the terminology we use
>> when we mean workers AND the leader.  In my Parallel Shared Hash work,
>> I've been saying 'participants' if I mean = workers + leader.  What do
>> you think?
>>
>>
> I am not quite sure about participants. In my options when we explicitly
> say workers + leader its more clear. I am open to change is if committer
> thinks otherwise.
>
>
>
>> + * After this if all active worker unable to produce the tuple, then
>> + * re-read and this time read the tuple into wait mode. For the worker,
>> + * which was able to produced single tuple in the earlier loop and still
>> + * active, just try fill the tuple array if more tuples available.
>> + */
>>
>> How about this?  "After this, if all active workers are unable to
>> produce a tuple, then re-read and this time us wait mode.  For workers
>> that were able to produce a tuple in the earlier loop and are still
>> active, just try to fill the tuple array if more tuples are
>> available."
>>
>>
> Fixed.
>
>
>> + * The heap is never spilled to disk, since we assume N is not very
>> large. So
>> + * this is much simple then cost_sort.
>>

Re: [HACKERS] Gather Merge

2016-11-11 Thread Rushabh Lathia
On Fri, Nov 4, 2016 at 8:30 AM, Thomas Munro 
wrote:

> On Thu, Oct 27, 2016 at 10:50 PM, Rushabh Lathia
>  wrote:
> > Please find attached latest patch which fix the review point as well as
> > additional clean-up.
>
> I've signed up to review this patch and I'm planning to do some
> testing.  Here's some initial feedback after a quick read-through:
>
>
Thanks Thomas.


> + if (gather_merge_readnext(gm_state, i, initialize ? false : true))
>
> Clunky ternary operator... how about "!initialize".
>
>
Fixed.


> +/*
> + * Function clear out a slot in the tuple table for each gather merge
> + * slots and returns the clear clear slot.
> + */
>
> Maybe better like this?  "_Clear_ out a slot in the tuple table for
> each gather merge _slot_ and _return_ the _cleared_ slot."
>
>
Fixed.


> + /* Free tuple array as we no more need it */
>
> "... as we don't need it any more"
>
>
Fixed


> +/*
> + * Read the next tuple for gather merge.
> + *
> + * Function fetch the sorted tuple out of the heap.
> + */
>
> "_Fetch_ the sorted tuple out of the heap."
>
>
Fixed


> + * Otherwise, pull the next tuple from whichever participate we
> + * returned from last time, and reinsert the index into the heap,
> + * because it might now compare differently against the existing
>
> s/participate/participant/
>
>
Fixed.


> + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
> + * Portions Copyright (c) 1994, Regents of the University of California
>
> Shouldn't this say just "(c) 2016, PostgreSQL Global Development
> Group"?


Fixed.


> Are we supposed to be blaming the University of California
> for new files?
>
>
Not quite sure about this, so keeping this as it is.


> +#include "executor/tqueue.h"
> +#include "miscadmin.h"
> +#include "utils/memutils.h"
> +#include "utils/rel.h"
> +#include "lib/binaryheap.h"
>
> Not correctly sorted.
>
>
Copied from nodeGather.c. But Fixed here.


> + /*
> + * store the tuple descriptor into gather merge state, so we can use it
> + * later while initilizing the gather merge slots.
> + */
>
> s/initilizing/initializing/
>
>
Fixed.


> +/* 
> + * ExecEndGatherMerge
> + *
> + * frees any storage allocated through C routines.
> + * 
>
> The convention in Postgres code seems to be to use a form like "Free
> any storage ..." in function documentation.  Not sure if that's an
> imperative, an infinitive, or if the word "we" is omitted since
> English is so fuzzy like that, but it's inconsistent with other
> documentation to use "frees" here.  Oh, I see that exact wording is in
> several other files.  I guess I'll just leave this as a complaint
> about all those files then :-)
>
>
Sure.


> + * Pull atleast single tuple from each worker + leader and set up the
> heap.
>
> s/atleast single/at least a single/
>
>
Fixed.


> + * Read the tuple for given reader into nowait mode, and form the tuple
> array.
>
> s/ into / in /
>
>
Fixed.


> + * Function attempt to read tuple for the given reader and store it into
> reader
>
> s/Function attempt /Attempt /
>
>
Fixed.


> + * Function returns true if found tuple for the reader, otherwise returns
>
> s/Function returns /Return /
>
>
Fixed.


> + * First try to read tuple for each worker (including leader) into nowait
> + * mode, so that we initialize read from each worker as well as leader.
>
> I wonder if it would be good to standardise on the terminology we use
> when we mean workers AND the leader.  In my Parallel Shared Hash work,
> I've been saying 'participants' if I mean = workers + leader.  What do
> you think?
>
>
I am not quite sure about participants. In my options when we explicitly
say workers + leader its more clear. I am open to change is if committer
thinks otherwise.



> + * After this if all active worker unable to produce the tuple, then
> + * re-read and this time read the tuple into wait mode. For the worker,
> + * which was able to produced single tuple in the earlier loop and still
> + * active, just try fill the tuple array if more tuples available.
> + */
>
> How about this?  "After this, if all active workers are unable to
> produce a tuple, then re-read and this time us wait mode.  For workers
> that were able to produce a tuple in the earlier loop and are still
> active, just try to fill the tuple array if more tuples are
> available."
>
>
Fixed.


> + * The heap is never spilled to disk, since we assume N is not very
> large. So
> + * this is much simple then cost_sort.
>
> s/much simple then/much simpler than/
>
>
Fixed.


> + /*
> + * Avoid log(0)...
> + */
> + N = (path->num_workers < 2) ? 2.0 : (double) path->num_workers;
> + logN = LOG2(N);
> ...
> + /* Per-tuple heap maintenance cost */
> + run_cost += path->path.rows * comparison_cost * 2.0 * logN;
>
> Why multiply by two?  The comment above this code says "about 

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2016-11-11 Thread Ashutosh Bapat
>
> So, I am thinking about your approach of creating PartitionJoinPaths
> without actually creating child paths and then at a later stage
> actually plan the child joins. Here's rough sketch of how that may be
> done.
>
> At the time of creating regular paths, we identify the join orders
> which can use partition-wise join and save those in the RelOptInfo of
> the parent table. If no such join order exists, we do not create
> PartitionJoinPaths for that relation. Otherwise, once we have
> considered all the join orders i.e. in
> generate_partition_wise_join_paths(), we create one PartitionJoinPath
> for every path that has survived in the parent or at least for every
> path that has distinct properties like pathkeys or parameterisation,
> with those properties.
>
> At the time of creating plans, if PartitionJoinPath is chosen, we
> actually create paths for every partition of that relation
> recursively. The path creation logic is carried out in a different
> memory context. Amongst the paths that survive, we choose the best
> path that has the same properties as PartitionJoinPath. We would
> expect all parameterized paths to be retained and any unparameterized
> path can be sorted to match the pathkeys of reference
> PartitionJoinPath. We then create the plan out of this path and copy
> it into the outer memory context and release the memory context used
> for path creation. This is similar to how prepared statements save
> their plans. Once we have the plan, the memory consumed by paths won't
> be referenced, and hence can not create problems. At the end we create
> an Append/MergeAppend plan with all the child plans and return it.
>
> Costing PartitionJoinPath needs more thought so that we don't end up
> with bad overall plans. Here's an idea. Partition-wise joins are
> better compared to the unpartitioned ones, because of the smaller
> sizes of partitions. If we think of join as O(MN) operation where M
> and N are sizes of unpartitioned tables being joined, partition-wise
> join computes P joins each with average O(M/P * N/P) order where P is
> the number of partitions, which is still O(MN) with constant factor
> reduced by P times. I think, we need to apply similar logic to
> costing. Let's say cost of a join is J(M, N) = S (M, N) + R (M, N)
> where S and R are setup cost and joining cost (for M, N rows) resp.
> Cost of partition-wise join would be P * J(M/P, N/P) = P * S(M/P, N/P)
> + P * R(M/P, N/P). Each of the join methods will have different S and
> R functions and may not be linear on the number of rows. So,
> PartitionJoinPath costs are obtained from corresponding regular path
> costs subjected to above transformation. This way, we will be
> protected from choosing a PartitionJoinPath when it's not optimal.
> Take example of a join where the joining relations are very small in
> size, thus hash join on full relation is optimal compared to hash join
> of each partition because of setup cost. In such a case, the function
> which calculates the cost of hash table setup, would result in almost
> same cost for full table as well as each of the partitions, thus
> increasing P * S(M/P, N/P) as compared to S(M, N).
>
> Let me know your comments.

I tried to measure the impact of having a memory context reset 1000
times (once for each partition) with the attached patch. Without this
patch make check in regress/ takes about 24 seconds on my laptop and
with this patch it takes 26 seconds. This is almost 10% increase in
time. I hope that's fine.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d8c5dd3..abc34aa 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -54,6 +54,7 @@
 #include "utils/rel.h"
 #include "utils/selfuncs.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
 #include "utils/syscache.h"
 
 
@@ -192,6 +193,9 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	Plan	   *top_plan;
 	ListCell   *lp,
 			   *lr;
+	MemoryContext	temp_context;
+	int			i;
+	MemoryContext	default_context;
 
 	/* Cursor options may come from caller or from DECLARE CURSOR stmt */
 	if (parse->utilityStmt &&
@@ -432,6 +436,24 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	result->invalItems = glob->invalItems;
 	result->nParamExec = glob->nParamExec;
 
+	temp_context = AllocSetContextCreate(CurrentMemoryContext,
+		   "TemporaryContext",
+		   ALLOCSET_DEFAULT_SIZES);
+
+	/* Test the time impact of creating and destroying 1000 memory contexts. */
+	for (i = 0; i < 1000; i++)
+	{
+		RelOptInfo *rel;
+		default_context = MemoryContextSwitchTo(temp_context);
+		rel = makeNode(RelOptInfo);
+		pfree(rel);
+		MemoryContextSwitchTo(default_context);
+		MemoryContextResetAndDeleteChildren(temp_context);
+	}
+
+	MemoryContextSwitchTo(default_context);
+	

Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-11 Thread Ashutosh Bapat
On Mon, Nov 7, 2016 at 10:14 AM, Tsunakawa, Takayuki
 wrote:
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Ashutosh Bapat
>> I am not sure if following condition is a good idea in ServerLoop()
>> 1650 if (pmState == PM_WAIT_DEAD_END || ClosedSockets)
>>
>> There are no sockets to listen on, so select in the else condition is going
>> to sleep for timeout determined based on the sequence expected.
>> Just before we close sockets in pmdie() it sets AbortStartTime, which
>> determines the timeout for the sleep here. So, it doesn't make sense to
>> ignore it. Instead may be we should change the default 60s sleep to 100ms
>> sleep in DetermineSleepTime().
>
> That sounds better.  I modified cleaned ServerLoop() by pushing the existing 
> 100ms logic into DetermineSleepTime().

I have changed some comments around this block. See attached patch.
Let me know if that looks good.

>
>
>> While the postmaster is terminating children, a new connection request may
>> arrive. We should probably close listening sockets before terminating
>> children in pmdie().
>
> I moved ClosePostmasterSocket() call before terminating children in the 
> immediate shutdown case.  I didn't change the behavior of smart and fast 
> shutdown modes, because they may take a long time to complete due to 
> checkpointing etc.  Users will want to know what's happening during shutdown 
> or after pg_ctl stop times out, by getting the message "FATAL:  the database 
> system is shutting down" when they try to connect to the database.  The 
> immediate shutdown or crash should better be as fast as possible.

OK.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 24add74..52c0f46 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -265,20 +265,22 @@ static StartupStatusEnum StartupStatus = STARTUP_NOT_RUNNING;
 
 /* Startup/shutdown state */
 #define			NoShutdown		0
 #define			SmartShutdown	1
 #define			FastShutdown	2
 #define			ImmediateShutdown	3
 
 static int	Shutdown = NoShutdown;
 
 static bool FatalError = false; /* T if recovering from backend crash */
+static bool ClosedSockets = false; /* T if postmaster's listening sockets
+	  have been closed. */
 
 /*
  * We use a simple state machine to control startup, shutdown, and
  * crash recovery (which is rather like shutdown followed by startup).
  *
  * After doing all the postmaster initialization work, we enter PM_STARTUP
  * state and the startup process is launched. The startup process begins by
  * reading the control file and other preliminary initialization steps.
  * In a normal startup, or after crash recovery, the startup process exits
  * with exit code 0 and we switch to PM_RUN state.  However, archive recovery
@@ -372,20 +374,21 @@ static DNSServiceRef bonjour_sdref = NULL;
 
 /*
  * postmaster.c - function prototypes
  */
 static void CloseServerPorts(int status, Datum arg);
 static void unlink_external_pid_file(int status, Datum arg);
 static void getInstallationPaths(const char *argv0);
 static void checkDataDir(void);
 static Port *ConnCreate(int serverFd);
 static void ConnFree(Port *port);
+static void ClosePostmasterSockets(void);
 static void reset_shared(int port);
 static void SIGHUP_handler(SIGNAL_ARGS);
 static void pmdie(SIGNAL_ARGS);
 static void reaper(SIGNAL_ARGS);
 static void sigusr1_handler(SIGNAL_ARGS);
 static void startup_die(SIGNAL_ARGS);
 static void dummy_handler(SIGNAL_ARGS);
 static void StartupPacketTimeoutHandler(void);
 static void CleanupBackend(int pid, int exitstatus);
 static bool CleanupBackgroundWorker(int pid, int exitstatus);
@@ -1513,20 +1516,28 @@ checkDataDir(void)
  * background tasks handled by ServerLoop get done even when no requests are
  * arriving.  However, if there are background workers waiting to be started,
  * we don't actually sleep so that they are quickly serviced.  Other exception
  * cases are as shown in the code.
  */
 static void
 DetermineSleepTime(struct timeval * timeout)
 {
 	TimestampTz next_wakeup = 0;
 
+	/* In PM_WAIT_DEAD_END state, sleeping for 100ms seems reasonable. */
+	if (pmState == PM_WAIT_DEAD_END)
+	{
+		timeout->tv_sec = 0;
+		timeout->tv_usec = 10L;
+		return;
+	}
+
 	/*
 	 * Normal case: either there are no background workers at all, or we're in
 	 * a shutdown sequence (during which we ignore bgworkers altogether).
 	 */
 	if (Shutdown > NoShutdown ||
 		(!StartWorkerNeeded && !HaveCrashedWorker))
 	{
 		if (AbortStartTime != 0)
 		{
 			/* time left to abort; clamp to 0 in case it already expired */
@@ -1623,57 +1634,52 @@ ServerLoop(void)
 
 	last_lockfile_recheck_time = last_touch_time = time(NULL);
 
 	nSockets = initMasks();
 
 	for (;;)
 	{
 		fd_set		rmask;
 		int			selres;
 		time_t		now;
+		/* must set timeout each 

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-11-11 Thread Etsuro Fujita

On 2016/11/11 19:22, Ashutosh Bapat wrote:

The patch looks in good shape now.


Thanks for the review!


The patch looks in good shape now. Here are some comments. I have also
made several changes to comments correcting grammar, typos, style and
at few places logic. Let me know if the patch looks good.


OK, will look into that.


I guess, below code
+   if (!fpinfo->subquery_rels)
+   return false;
can be changed to
if (!bms_is_member(node->varno, fpinfo->subquery_rels))
return false;
Also the return values from the recursive calls to isSubqueryExpr() can be
returned as is. I have included this change in the patch.


Will look into that too.


deparse.c seems to be using capitalized names for function which
actually deparse something and an non-capitalized form for helper
functions.


That's not true.  There is a function named classifyConditions().  The 
function naming in deparse.c is a bit arbitrary.



From that perspective attached patch renames isSubqueryExpr
as is_subquery_var() and getSubselectAliasInfo() as
get_alias_id_for_var(). Actually both these functions accept a Var
node but somehow their names refer to expr.


The reason why I named that function isSubqueryExpr is that I think that 
function would be soon extended so as to handle PHVs.  See another patch 
for evaluating PHVs remotely.



This patch is using make_tlist_from_pathtarget() to create tlist to be
deparsed but search in RelOptInfo::reltarget::exprs for a given Var.
As long as the relations deparsed do not carry expressions, this might
work, but it will certainly break once we start deparsing relations
with expressions since the upper most query's tlist contains only
Vars. Instead, we should probably, create tlist and save it in fpinfo
and use it later for searching (tlist_member()?). Possibly use using
build_tlist_to_deparse(), to create the tlist similar so that
targetlist list creation logic is same for all the relations being
deparsed. I haven't included this change in the patch.


Seems like a good idea.  Will revise.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-11 Thread Ashutosh Bapat
I have not looked at the latest set of patches, but in the version
that I have we create one composite type for every partition. This
means that if there are thousand partitions, there will be thousand
identical entries in pg_type. Since all the partitions share the same
definition (by syntax), it doesn't make sense to add so many identical
entries. Moreover, in set_append_rel_size(), while translating the
expressions from parent to child, we add a ConvertRowtypeExpr instead
of whole-row reference if reltype of the parent and child do not match
(adjust_appendrel_attrs_mutator())
   if (appinfo->parent_reltype != appinfo->child_reltype)
{
ConvertRowtypeExpr *r = makeNode(ConvertRowtypeExpr);

I guess, we should set reltype of child to that of parent for
declarative partitions.

On Fri, Nov 11, 2016 at 4:00 PM, Amit Langote
 wrote:
> On 2016/11/11 6:51, Robert Haas wrote:
>> On Wed, Nov 9, 2016 at 9:58 PM, Amit Langote
>>  wrote:
 With all patches applied, "make check" fails with a bunch of diffs
 that look like this:

   Check constraints:
 - "pt1chk2" CHECK (c2 <> ''::text)
   "pt1chk3" CHECK (c2 <> ''::text)
>>>
>>> Hm, I can't seem to reproduce this one.  Is it perhaps possible that you
>>> applied the patches on top of some other WIP patches or something?
>>
>> Nope.  I just checked and this passes with only 0001 and 0002 applied,
>> but when I add 0003 and 0004 then it starts failing.
>
> Sorry, it definitely wasn't an error on your part.
>
>> It appears that
>> the problem starts at this point in the foreign_data test:
>>
>> ALTER TABLE pt1 DROP CONSTRAINT pt1chk2 CASCADE;
>>
>> After that command, in the expected output, pt1chk2 stops showing up
>> in the output of \d+ pt1, but continues to appear in the output of \d+
>> ft2.  With your patch, however, it stops showing up for ft2 also.  If
>> that's not also happening for you, it might be due to an uninitialized
>> variable someplace.
>
> Thanks for the context.  I think I found the culprit variable in
> MergeConstraintsIntoExisting() and fixed it.  As you correctly guessed,
> the uninitialized variable caused (in your environment) even non-partition
> child relations to be treated partitions and hence forced any merged
> constraints to be non-local in all cases, not just in case of partitions.
> Which meant the command you quoted would even drop the ft2's (a child)
> constraint because its conislocal is wrongly false.
>
>>
>> +/* Force inheritance recursion, if partitioned table. */
>>
>> Doesn't match code (any more).
>
> Fixed.
>
 I think "partitioning key" is a bit awkward and actually prefer
 "partiton key".  But "partition method" sounds funny so I would go
 with "partitioning method".
>>>
>>> OK, "partition key" and "partitioning method" it is then.  Source code
>>> comments, error messages, variables call the latter (partitioning)
>>> "strategy" though which hopefully is fine.
>>
>> Oh, I like "partitioning strategy".  Can we standardize on that?
>
> OK, done.
>
 I would be in favor of committing the initial patch set without that,
 and then considering the possibility of adding it later.  If we
 include it in the initial patch set we are stuck with it.
>>>
>>> OK, I have removed the syntactic ability to specify INCLUSIVE/EXCLUSIVE
>>> with each of the range bounds.
>>>
>>> I haven't changed any code (such as comparison functions) that manipulates
>>> instances of PartitionRangeBound which has a flag called inclusive.  I
>>> didn't remove the flag, but is instead just set to (is_lower ? true :
>>> false) when initializing from the parse node. Perhaps, there is some scope
>>> for further simplifying that code, which you probably alluded to when you
>>> proposed that we do this.
>>
>> Yes, you need to rip out all of the logic that supports it.  Having
>> the logic to support it but not the syntax is bad because then that
>> code can't be properly tested.
>
> Agreed, done.
>
>
> Attached updated patches.
>
> Thanks,
> Amit



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Shared memory estimation for postgres

2016-11-11 Thread Craig Ringer
On 11 Nov. 2016 13:00, "leoaaryan"  wrote:
>
> Hi Michael,
>
> Thanks for all the help and time. I have already developed a code where I
> can exactly calculate the to be allocated shared memory value based on the
> Postgres 9.5.4 code (i went through the code, found out the sizes and
offset
> of all the structures used in the memory calculation process and then use
> the values from postgres.conf file to calculate the required value).
>
> But the problem is if there is any change in the structures or anything is
> newly added in the next major version, I need to look at the code again
and
> see what changed and then modify the hardcoded values of the structure
size.
> I'm trying to avoid that.

Earlier I suggested adding a command line flag to the backend that, like
--version, prints the desired info and exits.

It's still most unclear to me what the underlying problem you're trying to
solve here is. Why you want this info and why you're so keen to avoid
starting a backend to find it.


Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw

2016-11-11 Thread Etsuro Fujita

On 2016/09/08 19:55, Etsuro Fujita wrote:

On 2016/09/07 13:21, Ashutosh Bapat wrote:

* with the patch:
postgres=# explain verbose delete from ft1 using ft2 where ft1.a =
ft2.a;
 QUERY PLAN

-

 Delete on public.ft1  (cost=100.00..102.04 rows=1 width=38)
   ->  Foreign Delete  (cost=100.00..102.04 rows=1 width=38)
 Remote SQL: DELETE FROM public.t1 r1 USING (SELECT ROW(a,
b), a FROM public.t2) ss1(c1, c2) WHERE ((r1.a = ss1.c2))
(3 rows)



The underlying scan on t2 requires ROW(a,b) for locking the row for
update/share. But clearly it's not required if the full query is being
pushed down.



Is there a way we can detect that ROW(a,b) is useless
column (not used anywhere in the other parts of the query like
RETURNING, DELETE clause etc.) and eliminate it?



I don't have a clear solution for that yet, but I'll try to remove that
in the next version.



Similarly for a, it's
part of the targetlist of the underlying scan so that the WHERE clause
can be applied on it. But it's not needed if we are pushing down the
query. If we eliminate the targetlist of the query, we could construct a
remote query without having subquery in it, making it more readable.



Will try to do so also.


I addressed this by improving the deparse logic so that a remote query 
for performing an UPDATE/DELETE on a join directly on the remote can be 
created as proposed if possible.  Attached is an updated version of the 
patch, which is created on top of the patch set [1].  The patch is still 
WIP (ie, needs more comments and regression tests, at least), but any 
comments would be gratefully appreciated.


Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/11eafd10-d3f8-ac8a-b642-b0e65037c76b%40lab.ntt.co.jp
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 133,145  static void deparseTargetList(StringInfo buf,
    Bitmapset *attrs_used,
    bool qualify_col,
    List **retrieved_attrs);
! static void deparseExplicitTargetList(List *tlist, List **retrieved_attrs,
  		  deparse_expr_cxt *context);
  static void deparseReturningList(StringInfo buf, PlannerInfo *root,
  	 Index rtindex, Relation rel,
  	 bool trig_after_row,
  	 List *returningList,
  	 List **retrieved_attrs);
  static void deparseColumnRef(StringInfo buf, int varno, int varattno,
   PlannerInfo *root, bool qualify_col);
  static void deparseRelation(StringInfo buf, Relation rel);
--- 133,164 
    Bitmapset *attrs_used,
    bool qualify_col,
    List **retrieved_attrs);
! static void deparseExplicitTargetList(bool is_returning,
! 		  List *tlist,
! 		  List **retrieved_attrs,
  		  deparse_expr_cxt *context);
  static void deparseReturningList(StringInfo buf, PlannerInfo *root,
  	 Index rtindex, Relation rel,
  	 bool trig_after_row,
  	 List *returningList,
  	 List **retrieved_attrs);
+ static void deparseExplicitReturningList(List *rlist,
+ 			 List **retrieved_attrs,
+ 			 deparse_expr_cxt *context);
+ static List *makeExplicitReturningList(Index rtindex, Relation rel,
+ 		  List *returningList,
+ 		  List **fdw_scan_tlist,
+ 		  Relids *wholerows);
+ static void rewriteFromExprForRel(PlannerInfo *root, RelOptInfo *foreignrel,
+ 	  Index target_rel, List **target_conds, Relids wholerows);
+ static List *pullUpTargetConds(RelOptInfo *foreignrel, List *joinclauses,
+   Index target_rel, List **target_conds);
+ static bool pullUpSubquery(PlannerInfo *root, RelOptInfo *foreignrel, JoinType jointype,
+ 			   Index target_rel, List **target_conds, Relids wholerows);
+ static bool isSimpleSubquery(PlannerInfo *root, RelOptInfo *baserel, JoinType jointype,
+  Relids wholerows);
+ static List *getCleanSubqueryExprs(PlannerInfo *root, RelOptInfo *foreignrel,
+ 	  List *subquery_exprs, Relids wholerows);
  static void deparseColumnRef(StringInfo buf, int varno, int varattno,
   PlannerInfo *root, bool qualify_col);
  static void deparseRelation(StringInfo buf, Relation rel);
***
*** 168,178  static void deparseSelectSql(List *tlist, List **retrieved_attrs,
  static void deparseLockingClause(deparse_expr_cxt *context);
  static void appendOrderByClause(List *pathkeys, deparse_expr_cxt *context);
  static void appendConditions(List *exprs, deparse_expr_cxt *context);
! static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
! 	RelOptInfo *joinrel, bool use_alias, List **params_list);
  static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
  static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
!    bool make_subquery, List **params_list);
  static void appendSubselectAlias(StringInfo 

Re: [HACKERS] Logical Replication WIP

2016-11-11 Thread Petr Jelinek
On 08/11/16 19:51, Peter Eisentraut wrote:
> Review of v7 0003-Add-PUBLICATION-catalogs-and-DDL.patch:
> 
> This appears to address previous reviews and is looking pretty solid.  I
> have some comments that are easily addressed:
> 
> [still from previous review] The code for OCLASS_PUBLICATION_REL in
> getObjectIdentityParts() does not fill in objname and objargs, as it is
> supposed to.
> 
> catalog.sgml: pg_publication_rel column names must be updated after renaming
> 
> alter_publication.sgml and elsewhere: typos PuBLISH_INSERT etc.
> 
> create_publication.sgml: FOR TABLE ALL IN SCHEMA does not exist anymore
> 
> create_publication.sgml: talks about not-yet-existing SUBSCRIPTION role
> 
> DropPublicationById maybe name RemovePublicationById for consistency
> 
> system_views.sql: C.relkind = 'r' unnecessary
> 
> CheckCmdReplicaIdentity: error message says "cannot update", should
> distinguish between update and delete
> 
> relcache.c: pubactions->pubinsert |= pubform->pubinsert; etc. should be ||=
> 
> RelationData.rd_pubactions could be a bitmap, simplifying some memcpy
> and context management.  But RelationData appears to favor rich data
> structures, so maybe that is fine.
> 

Thanks for these, some of it is result of various rebases that I did
(the sync patch makes rebasing bit complicated as it touches everything)
and it's easy for me to overlook it at this point.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-11-11 Thread Petr Jelinek
On 04/11/16 14:24, Andres Freund wrote:
> Hi,
> 
> (btw, I vote against tarballing patches)
>

Well, I vote against CF app not handling correctly emails with multiple
attachments :)


> +   
> +
> + 
> +  Name
> +  Type
> +  References
> +  Description
> + 
> +
> +
> +
> + 
> +  oid
> +  oid
> +  
> +  Row identifier (hidden attribute; must be explicitly 
> selected)
> + 
> +
> 
> + 
> +  subpublications
> +  name[]
> +  
> +  Array of subscribed publication names. These reference the
> +   publications on the publisher server.
> +  
> 
> Why is this names and not oids? So you can see it across databases?
> 

Because they only exist on remote server.

> 
> 
>  include $(top_srcdir)/src/backend/common.mk
> diff --git a/src/backend/commands/event_trigger.c 
> b/src/backend/commands/event_trigger.c
> index 68d7e46..523008d 100644
> --- a/src/backend/commands/event_trigger.c
> +++ b/src/backend/commands/event_trigger.c
> @@ -112,6 +112,7 @@ static event_trigger_support_data event_trigger_support[] 
> = {
>   {"SCHEMA", true},
>   {"SEQUENCE", true},
>   {"SERVER", true},
> + {"SUBSCRIPTION", true},
> 
> Hm, is that ok? Subscriptions are shared, so ...?
> 

Good point, forgot event triggers don't handle shared objects.

> 
> + /*
> +  * If requested, create the replication slot on remote side for 
> our
> +  * newly created subscription.
> +  *
> +  * Note, we can't cleanup slot in case of failure as reason for
> +  * failure might be already existing slot of the same name and 
> we
> +  * don't want to drop somebody else's slot by mistake.
> +  */
> + if (create_slot)
> + {
> + XLogRecPtr  lsn;
> +
> + /*
> +  * Create the replication slot on remote side for our 
> newly created
> +  * subscription.
> +  *
> +  * Note, we can't cleanup slot in case of failure as 
> reason for
> +  * failure might be already existing slot of the same 
> name and we
> +  * don't want to drop somebody else's slot by mistake.
> +  */
> 
> We should really be able to recognize that based on the error code...
> 

We could, provided that the slot is active, but that would leave nasty
race condition where if you do drop and the other subscription of same
name is not running (restarting, temporarily disabled, etc) we'll remove
the slot for it. Maybe we should not care about that and say slot is
representing the subscription and if you name slot same for two
different subscriptions then that's your problem though.

> +/*
> + * Drop subscription by OID
> + */
> +void
> +DropSubscriptionById(Oid subid)
> +{
> 
> + /*
> +  * We must ignore errors here as that would make it impossible to drop
> +  * subscription when publisher is down.
> +  */
> 
> I'm not convinced.  Leaving a slot around without a "record" of it on
> the creating side isn't nice either. Maybe a FORCE flag or something?
> 

I would like to have this as option yes, not sure if FORCE is best
naming, but I have trouble coming up with good name. We have CREATE_SLOT
and NOCREATE_SLOT for CREATE SUBSCRIPTION, so maybe we could have
DROP_SLOT (default) and NODROP_SLOT for DROP SUBSCRIPTION.


-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-11-11 Thread Petr Jelinek
On 04/11/16 14:00, Andres Freund wrote:
> Hi,
> 
> + 
> +  pg_publication_rel
> +
> +  
> +   pg_publication_rel
> +  
> +
> +  
> +   The pg_publication_rel catalog contains
> +   mapping between tables and publications in the database. This is many to
> +   many mapping.
> +  
> 
> I wonder if we shouldn't abstract this a bit away from relations to
> allow other objects to be exported to. Could structure it a bit more
> like pg_depend.
> 

Honestly, let's not overdesign this. Change like that can be made in the
future if we need it and I am quite unconvinced we do given that
anything we might want to replicate will be relation. I understand that
it might be useful to know what's on downstream in terms of objects at
some point for some future functionality,  but I am don't have idea how
that functionality will look like so it's premature to guess what
catalog structure it will need.

> 
> +ALTER PUBLICATION name [ [ WITH 
> ] option [ ... ] ]
> +
> +where option can 
> be:
> +
> +  PuBLISH_INSERT | NOPuBLISH_INSERT
> +| PuBLISH_UPDATE | NOPuBLISH_UPDATE
> +| PuBLISH_DELETE | NOPuBLISH_DELETE
> 
> That's odd casing.
> 
> 
> +   
> +PuBLISH_INSERT
> +NOPuBLISH_INSERT
> +PuBLISH_UPDATE
> +NOPuBLISH_UPDATE
> +PuBLISH_DELETE
> +NOPuBLISH_DELETE
> 

Ah typo in my sed script, fun.

> More odd casing.
> 
> +   
> +FOR TABLE
> +
> + 
> +  Specifies optional list of tables to add to the publication.
> + 
> +
> +   
> +
> +   
> +FOR TABLE ALL IN SCHEMA
> +
> + 
> +  Specifies optional schema for which all logged tables will be added to
> +  publication.
> + 
> +
> +   
> 
> "FOR TABLE ALL IN SCHEMA" sounds weird.
> 

I actually removed support for this at some point, forgot to remove
docs. I might add this feature again in the future but I reckon we can
live without it in v1.


> +  
> +   This operation does not reserve any resources on the server. It only
> +   defines grouping and filtering logic for future subscribers.
> +  
> 
> That's strictly speaking not true, maybe rephrase a bit?
> 

Sure, this basically is supposed to mean that it does not really start
replication or keep wal or anything like that as opposed what for
example slots do.

> +/*
> + * Check if relation can be in given publication and throws appropriate
> + * error if not.
> + */
> +static void
> +check_publication_add_relation(Relation targetrel)
> +{
> + /* Must be table */
> + if (RelationGetForm(targetrel)->relkind != RELKIND_RELATION)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +  errmsg("only tables can be added to 
> publication"),
> +  errdetail("%s is not a table",
> +
> RelationGetRelationName(targetrel;
> +
> + /* Can't be system table */
> + if (IsCatalogRelation(targetrel))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +  errmsg("only user tables can be added to 
> publication"),
> +  errdetail("%s is a system table",
> +
> RelationGetRelationName(targetrel;
> +
> + /* UNLOGGED and TEMP relations cannot be part of publication. */
> + if (!RelationNeedsWAL(targetrel))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +  errmsg("UNLOGGED and TEMP relations cannot be 
> replicated")));
> +}
> 
> This probably means we need a check in the ALTER TABLE ... SET UNLOGGED
> path.
> 

Good point.

> 
> +/*
> + * Returns if relation represented by oid and Form_pg_class entry
> + * is publishable.
> + *
> + * Does same checks as the above, but does not need relation to be opened
> + * and also does not throw errors.
> + */
> +static bool
> +is_publishable_class(Oid relid, Form_pg_class reltuple)
> +{
> + return reltuple->relkind == RELKIND_RELATION &&
> + !IsCatalogClass(relid, reltuple) &&
> + reltuple->relpersistence == RELPERSISTENCE_PERMANENT &&
> + /* XXX needed to exclude information_schema tables */
> + relid >= FirstNormalObjectId;
> +}
> 
> Shouldn't that be IsCatalogRelation() instead?
> 

Well IsCatalogRelation just calls IsCatalogClass and we call
IsCatalogClass here as well. The problem with IsCatalogClass is that it
does not consider tables in information_schema that were created as part
of initdb to be system catalogs because it first does negative check on
pg_catalog and toast schemas and only then considers
FirstNormalObjectId. I was actually wondering if that might be a bug in
IsCatalogClass.

> 
> +/*
> + * Create new publication.
> + * TODO ACL check
> + */
> 

That was meant for future enhancements, but I think I'll don't do
detailed ACLs 

Re: [HACKERS] Logical Replication WIP

2016-11-11 Thread Petr Jelinek
On 04/11/16 13:07, Andres Freund wrote:
> 
> Hm. I think I have to agree a bit with Peter here.  Overloading
> MyReplicationSlot this way seems ugly, and I think there's a bunch of
> bugs around it too.
> 
> 
> Sounds what we really want is a) two different lifetimes for ephemeral
> slots, session and "command" b) have a number of slots that are released
> either after a failed transaction / command or at session end.   The
> easiest way for that appears to have a list of slots to be checked at
> end-of-xact and backend shutdown. 
> 

Ok so how about attached? It adds temp slots as new type of persistence.
It does not really touch the behavior of any of the existing API or
persistence settings.

The temp slots are just cleaned up on backend exit or error, other than
that they are not special. I don't use any specific backend local list
to track them, instead they have active_pid always set and just cleanup
everything that has that set at the end of the session. This has nice
property that it forbids other backends for acquiring them.

It does not do any locking while searching for the slots to cleanup (see
ReplicationSlotCleanup), mainly because it complicates the interaction
with ReplicationSlotDropPtr and it seems to me that locking there is not
really needed there as other backends will never change active_pid to
our backend pid and then the ReplicationSlotDropPtr does exclusive lock
when resetting it.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 1cf0aca7f1405f31229ab679c9451b51a8cc18de Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Wed, 28 Sep 2016 23:36:58 +0200
Subject: [PATCH 1/7] Add support for TEMPORARY replication slots

This patch allows creating temporary replication slots that are removed
automatically at the end of the session or on error.
---
 contrib/test_decoding/Makefile  |  2 +-
 contrib/test_decoding/expected/ddl.out  |  4 +-
 contrib/test_decoding/expected/slot.out | 35 
 contrib/test_decoding/sql/slot.sql  | 13 ++
 doc/src/sgml/func.sgml  | 16 ++--
 doc/src/sgml/protocol.sgml  | 13 +-
 src/backend/catalog/system_views.sql| 11 +
 src/backend/replication/repl_gram.y | 22 ++
 src/backend/replication/repl_scanner.l  |  1 +
 src/backend/replication/slot.c  | 72 ++---
 src/backend/replication/slotfuncs.c | 24 +++
 src/backend/replication/walsender.c | 28 -
 src/backend/storage/lmgr/proc.c |  3 ++
 src/backend/tcop/postgres.c |  3 ++
 src/include/catalog/pg_proc.h   |  6 +--
 src/include/nodes/replnodes.h   |  1 +
 src/include/replication/slot.h  |  4 +-
 src/test/regress/expected/rules.out |  3 +-
 18 files changed, 209 insertions(+), 52 deletions(-)
 create mode 100644 contrib/test_decoding/expected/slot.out
 create mode 100644 contrib/test_decoding/sql/slot.sql

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index a6641f5..d2bc8b8 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -39,7 +39,7 @@ submake-test_decoding:
 
 REGRESSCHECKS=ddl xact rewrite toast permissions decoding_in_xact \
 	decoding_into_rel binary prepared replorigin time messages \
-	spill
+	spill slot
 
 regresscheck: | submake-regress submake-test_decoding temp-install
 	$(MKDIR_P) regression_output
diff --git a/contrib/test_decoding/expected/ddl.out b/contrib/test_decoding/expected/ddl.out
index 7fbeafd..84ab7d3 100644
--- a/contrib/test_decoding/expected/ddl.out
+++ b/contrib/test_decoding/expected/ddl.out
@@ -702,7 +702,7 @@ SELECT pg_drop_replication_slot('regression_slot');
 
 /* check that the slot is gone */
 SELECT * FROM pg_replication_slots;
- slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
++---++--+++--+--+-+-
+ slot_name | plugin | slot_type | datoid | database | persistent | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
+---++---++--++++--+--+-+-
 (0 rows)
 
diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
new file mode 100644
index 000..28b2f89
--- /dev/null
+++ b/contrib/test_decoding/expected/slot.out
@@ -0,0 +1,35 @@
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slotp', 'test_decoding');
+ ?column? 
+--
+ init
+(1 row)
+
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slote', 'test_decoding', true);
+ ?column? 
+--
+ init
+(1 row)
+
+SELECT pg_drop_replication_slot('regression_slotp');
+ 

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-11-11 Thread Ashutosh Bapat
On Mon, Nov 7, 2016 at 5:50 PM, Etsuro Fujita
 wrote:
> On 2016/11/07 11:24, Etsuro Fujita wrote:
>>
>> On 2016/11/04 19:55, Etsuro Fujita wrote:
>>>
>>> Attached is an updated version of the patch.
>
>
>> I noticed that I have included an unrelated regression test in the
>> patch.  Attached is a patch with the test removed.
>
>
> I noticed that I inadvertently removed some changes from that patch, so I
> fixed that.  Please find attached an updated version of the patch. I'm also
> attaching an updated version of another patch for evaluating PHVs remotely,
> which has been created on top of that patch.  Changes to the latter: I
> revised comments and added a bit more regression tests.

The patch looks in good shape now. Here are some comments. I have also
made several changes to comments correcting grammar, typos, style and
at few places logic. Let me know if the patch looks good.

I guess, below code
+   if (!fpinfo->subquery_rels)
+   return false;
can be changed to
if (!bms_is_member(node->varno, fpinfo->subquery_rels))
return false;
Also the return values from the recursive calls to isSubqueryExpr() can be
returned as is. I have included this change in the patch.

deparse.c seems to be using capitalized names for function which
actually deparse something and an non-capitalized form for helper
functions. From that perspective attached patch renames isSubqueryExpr
as is_subquery_var() and getSubselectAliasInfo() as
get_alias_id_for_var(). Actually both these functions accept a Var
node but somehow their names refer to expr.

This patch is using make_tlist_from_pathtarget() to create tlist to be
deparsed but search in RelOptInfo::reltarget::exprs for a given Var.
As long as the relations deparsed do not carry expressions, this might
work, but it will certainly break once we start deparsing relations
with expressions since the upper most query's tlist contains only
Vars. Instead, we should probably, create tlist and save it in fpinfo
and use it later for searching (tlist_member()?). Possibly use using
build_tlist_to_deparse(), to create the tlist similar so that
targetlist list creation logic is same for all the relations being
deparsed. I haven't included this change in the patch.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 66b059a..9946f8b 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -102,20 +102,22 @@ typedef struct deparse_expr_cxt
  * foreignrel, when that represents a join or
  * a base relation. */
 	StringInfo	buf;			/* output buffer to append to */
 	List	  **params_list;	/* exprs that will become remote Params */
 } deparse_expr_cxt;
 
 #define REL_ALIAS_PREFIX	"r"
 /* Handy macro to add relation name qualification */
 #define ADD_REL_QUALIFIER(buf, varno)	\
 		appendStringInfo((buf), "%s%d.", REL_ALIAS_PREFIX, (varno))
+#define SS_TAB_ALIAS_PREFIX	"s"
+#define SS_COL_ALIAS_PREFIX	"c"
 
 /*
  * Functions to determine whether an expression can be evaluated safely on
  * remote server.
  */
 static bool foreign_expr_walker(Node *node,
 	foreign_glob_cxt *glob_cxt,
 	foreign_loc_cxt *outer_cxt);
 static char *deparse_type_name(Oid type_oid, int32 typemod);
 
@@ -160,20 +162,26 @@ static void printRemoteParam(int paramindex, Oid paramtype, int32 paramtypmod,
 static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
 	   deparse_expr_cxt *context);
 static void deparseSelectSql(List *tlist, List **retrieved_attrs,
  deparse_expr_cxt *context);
 static void deparseLockingClause(deparse_expr_cxt *context);
 static void appendOrderByClause(List *pathkeys, deparse_expr_cxt *context);
 static void appendConditions(List *exprs, deparse_expr_cxt *context);
 static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
 	RelOptInfo *joinrel, bool use_alias, List **params_list);
 static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
+static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
+   bool make_subquery, List **params_list);
+static void appendSubselectAlias(StringInfo buf, int tabno, int ncols);
+static void getSubselectAliasInfo(Var *node, RelOptInfo *foreignrel,
+	  int *tabno, int *colno);
+static bool isSubqueryExpr(Var *node, RelOptInfo *foreignrel, int *tabno, int *colno);
 static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
 static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
 static void appendAggOrderBy(List *orderList, List *targetList,
  deparse_expr_cxt *context);
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist,
 	   deparse_expr_cxt *context);
 
 
 /*
@@ -983,26 +991,32 @@ deparseSelectSql(List *tlist, List **retrieved_attrs, deparse_expr_cxt 

Re: [HACKERS] [PATCH] Allow TAP tests to be run individually

2016-11-11 Thread Michael Paquier
On Fri, Nov 11, 2016 at 6:10 PM, Craig Ringer  wrote:
> Please backpatch to at least 9.6 since it's trivial and we seem to be
> doing that for TAP. 9.5 and 9.4 would be nice too :)

Yes please!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove the comment on the countereffectiveness of large shared_buffers on Windows

2016-11-11 Thread Magnus Hagander
On Fri, Nov 11, 2016 at 1:54 AM, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> From: pgsql-hackers-ow...@postgresql.org
> > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Magnus Hagander
> Okay and I think partially it might be because we don't have
> > writeback
> >   optimization (done in 9.6) for Windows.  However, still the broader
> >   question stands that whether above data is sufficient to say that
> > we
> >   can recommend the settings of shared_buffers on Windows similar
> > to
> >   Linux?
> >
> >
> >
> >
> > Based on this optimization we might want to keep the text that says large
> > shared buffers on Windows aren't as effective perhaps, and just remove
> the
> > sentence that explicitly says don't go over 512MB?
>
> Just removing the reference to the size would make users ask a question
> "What size is the effective upper limit?"
>

True, but that's a question for other platforms as well, isn't it? We can
certainly find a different phrasing for it, but ISTM that we know that it
might be a problem, but we just don't know where the limit is? Maybe
something that suggests to people that they need to test their way to the
answer?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] [PATCH] pgpassfile connection option

2016-11-11 Thread Julian Markwort
I've updated my patch to work with the changes introduced to libpq by allowing 
multiple hosts.
On Fabien's recommendations, I've kept the variable dot_pgpassfile_used, 
however I renamed that to pgpassfile_used, to keep a consistent naming scheme.
I'm still not sure about the amount of error messages produced by libpq, I 
think it would be ideal to report an error while accessing a file provided in 
the connection string, however I would not report that same type of error when 
the .pgpass file has been tried to retrieve a password. 
(Else, you'd get an error message on every connection string that doesn't 
specify a pgpassfile or password, since .pgpass will be checked every time, 
before prompting the user to input the password)

regards, 
Julian Markwort

pgpassfile_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Allow TAP tests to be run individually

2016-11-11 Thread Craig Ringer
Per prior discussion, this small Makefile change allows anything that
uses $(prove_check) or $(prove_installcheck) to have the list of tests
overridden by passing a PROVE_TESTS as a list of space-separated test
file paths.

Particularly handy for src/test/recovery when you want to run
t/009_something_new.pl without running 001 through 008 each and every
time. The current workaround is deleting the tests you don't want,
which dirties up your git tree and is really annoying when checking
out and rebasing etc.

Please backpatch to at least 9.6 since it's trivial and we seem to be
doing that for TAP. 9.5 and 9.4 would be nice too :)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 1180cfaca9b92c401095f9766cbb46e2ca4371f3 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 11 Nov 2016 16:21:41 +0800
Subject: [PATCH 1/2] Allow individual TAP tests to be run via PROVE_TESTS

Add a new optional Makefile variable PROVE_TESTS that, if passed
as a space-separated list of paths relative to the Makefile
invoking $(prove_check) or $(prove_installcheck), runs just
those tests instead of t/*.pl .
---
 src/Makefile.global.in | 4 ++--
 src/test/perl/README   | 6 ++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index ea61eb5..c5036d8 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -354,12 +354,12 @@ ifeq ($(enable_tap_tests),yes)
 
 define prove_installcheck
 rm -rf $(CURDIR)/tmp_check/log
-cd $(srcdir) && TESTDIR='$(CURDIR)' PATH="$(bindir):$$PATH" PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)/$(top_builddir)' PG_REGRESS='$(CURDIR)/$(top_builddir)/src/test/regress/pg_regress' $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) t/*.pl
+cd $(srcdir) && TESTDIR='$(CURDIR)' PATH="$(bindir):$$PATH" PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)/$(top_builddir)' PG_REGRESS='$(CURDIR)/$(top_builddir)/src/test/regress/pg_regress' $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(or $(PROVE_TESTS),t/*.pl)
 endef
 
 define prove_check
 rm -rf $(CURDIR)/tmp_check/log
-cd $(srcdir) && TESTDIR='$(CURDIR)' $(with_temp_install) PGPORT='6$(DEF_PGPORT)' PG_REGRESS='$(CURDIR)/$(top_builddir)/src/test/regress/pg_regress' $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) t/*.pl
+cd $(srcdir) && TESTDIR='$(CURDIR)' $(with_temp_install) PGPORT='6$(DEF_PGPORT)' PG_REGRESS='$(CURDIR)/$(top_builddir)/src/test/regress/pg_regress' $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(or $(PROVE_TESTS),t/*.pl)
 endef
 
 else
diff --git a/src/test/perl/README b/src/test/perl/README
index 710a0d8..cfb45a1 100644
--- a/src/test/perl/README
+++ b/src/test/perl/README
@@ -6,6 +6,12 @@ across the source tree, particularly tests in src/bin and src/test. It's used
 to drive tests for backup and restore, replication, etc - anything that can't
 really be expressed using pg_regress or the isolation test framework.
 
+The tests are invoked via perl's 'prove' command, wrapped in PostgreSQL
+makefiles to handle instance setup etc. See the $(prove_check) and
+$(prove_installcheck) targets in Makefile.global. By default every test in the
+t/ subdirectory is run. Individual test(s) can be run instead by passing
+something like PROVE_TESTS="t/001_testname.pl t/002_othertestname.pl" to make.
+
 You should prefer to write tests using pg_regress in src/test/regress, or
 isolation tester specs in src/test/isolation, if possible. If not, check to
 see if your new tests make sense under an existing tree in src/test, like
-- 
2.5.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-11-11 Thread Kuntal Ghosh
On Fri, Nov 11, 2016 at 3:36 AM, Michael Paquier
 wrote:
> On Fri, Nov 11, 2016 at 1:36 AM, Robert Haas  wrote:
>> So, who are all of the people involved in the effort to produce this
>> patch, and what's the right way to attribute credit?
>
> The original idea was from Heikki as he has introduced the idea of
> doing such checks if you look at the original thread. I added on top
> of it a couple of things like the concept of page masking, and hacked
> an early version of the versoin we have now
> (https://www.postgresql.org/message-id/cab7npqr4vxdkijp+du82vocongmvutq-gfqiu2dsh4bsm77...@mail.gmail.com).
> So it seems to me that marking Heikki as an author would be fair as
> the original idea is from him. I don't mind being marked only as a
> reviewer of the feature, or as someone from which has written an
> earlier version of the patch, but I let that up to your judgement.
> Kuntai is definitely an author.
Although lot of changes have been done later, but I've started with the patch
attached in the above thread. Hence, I feel the author of that patch should
also get the credit.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we need use more meaningful variables to replace 0 in catalog head files?

2016-11-11 Thread Magnus Hagander
On Nov 11, 2016 00:53, "Jan de Visser"  wrote:
>
> On 2016-11-09 10:47 AM, Tom Lane wrote:
>
>> Amit Langote  writes:
>>>
>>> On Wed, Nov 9, 2016 at 11:47 PM, Tom Lane  wrote:

 Hmm, that's from 2009.  I thought I remembered something much more
recent,
 like last year or so.
>>>
>>> This perhaps:
>>> * Re: Bootstrap DATA is a pita *
>>>
https://www.postgresql.org/message-id/flat/CAOjayEfKBL-_Q9m3Jsv6V-mK1q8h%3Dca5Hm0fecXGxZUhPDN9BA%40mail.gmail.com
>>
>> Yeah, that's the thread I remembered.  I think the basic conclusion was
>> that we needed a Perl script that would suck up a bunch of data from some
>> representation that's more edit-friendly than the DATA lines, expand
>> symbolic representations (regprocedure etc) into numeric OIDs, and write
>> out the .bki script from that.  I thought some people had volunteered to
>> work on that, but we've seen no results ...
>>
>> regards, tom lane
>>
>>
>
> Would a python script converting something like json or yaml be
acceptable? I think right now only perl is used, so it would be a new build
chain tool, albeit one that's in my (very humble) opinion much better
suited to the task.
>

Python or perl is not what matters here really. For something as simple as
this (for the script) it doesn't make a real difference. I personally
prefer python over perl in most cases, but our standard is perl so we
should stick to that.

The issues is coming up with a format that people like and think is an
improvement.

If we have that and a python script for our, someone would surely volunteer
to convert that part. But we need to start by solving the actual problem.

/Magnus