Re: Locking a row with KEY SHARE NOWAIT blocks

2019-09-04 Thread Amit Kapila
On Tue, Sep 3, 2019 at 6:58 PM Heikki Linnakangas  wrote:
>
> When you lock a row with FOR KEY SHARE, and the row's non-key columns
> have been updated, heap_lock_tuple() walks the update chain to mark all
> the in-progress tuple versions also as locked. But it doesn't pay
> attention to the NOWAIT or SKIP LOCKED flags when doing so. The
> heap_lock_updated_tuple() function walks the update chain, but the
> 'wait_policy' argument is not passed to it. As a result, a SELECT in KEY
> SHARE NOWAIT query can block waiting for another updating transaction,
> despite the NOWAIT modifier.
>
> This can be reproduced with the attached isolation test script.
>
> I'm not sure how to fix this. The logic to walk the update chain and
> propagate the tuple lock is already breathtakingly complicated :-(.
>

Can't we pass the wait_policy parameter to heap_lock_updated_tuple and
do the same handling as we do in the caller (heap_lock_tuple)?
Basically, whenever we need to wait on any tuple version do the same
as we do in heap_lock_tuple for 'require_sleep' case.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-09-04 Thread amul sul
On Wed, Sep 4, 2019 at 2:40 AM Alvaro Herrera 
wrote:

> Fujita-san, amul,
>
> CFbot complains that Fujita-san submitted a patch that doesn't apply,
> which makes sense since the necessary previous patch was only referred
> to without being resubmitted.  I suggest to always post all patches
> together with each resubmission so that it can be checked automatically
> by the cf bot: http://commitfest.cputube.org/etsuro-fujita.html
>
>
Understood and sorry about that.

I'm not clear on who the author of this patch is, now.  Also, I'm not
> sure what the *status* is.  Are we waiting for Fujita-san to review this
> patch?
>

Yes, we are waiting for Fujita-san to review.  Fujita-san has started a
review
and proposed some enhancement which I reviewed in the last update.

I think soon Fujita-san might post the complete patch including his changes.

Regards,
Amul


Re: ERROR: multixact X from before cutoff Y found to be still running

2019-09-04 Thread Thomas Munro
On Thu, Sep 5, 2019 at 1:01 PM Jeremy Schneider  wrote:
> On 9/4/19 17:37, Nathan Bossart wrote:
> Currently, if you hold a multixact open long enough to generate an
> "oldest multixact is far in the past" message during VACUUM, you may
> see the following ERROR:
>
> WARNING:  oldest multixact is far in the past
> HINT:  Close open transactions with multixacts soon to avoid 
> wraparound problems.
> ERROR:  multixact X from before cutoff Y found to be still running
>
> Upon further inspection, I found that this is because the multixact
> limit used in this case is the threshold for which we emit the "oldest
> multixact" message.  Instead, I think the multixact limit should be
> set to the result of GetOldestMultiXactId(), effectively forcing a
> minimum freeze age of zero.  The ERROR itself is emitted by
> FreezeMultiXactId() and appears to be a safeguard against problems
> like this.
>
> I've attached a patch to set the limit to the oldest multixact instead
> of the "safeMxactLimit" in this case.  I'd like to credit Jeremy
> Schneider as the original reporter.
>
> This was fun (sortof) - and a good part of the afternoon for Nathan, Nasby 
> and myself today.  A rather large PostgreSQL database with default autovacuum 
> settings had a large table that started getting behind on Sunday.  The server 
> has a fairly large number of CPUs and a respectable workload.  We realized 
> today that with their XID generation they would go read-only to prevent 
> wraparound tomorrow.  (And perfectly healthy XID age on Sunday - that's 
> wraparound in four days! Did I mention that I'm excited for the default limit 
> GUC change in pg12?)  To make matters more interesting, whenever we attempted 
> to run a VACUUM command we encountered the ERROR message that Nate quoted on 
> every single attempt!  There was a momentary mild panic based on the 
> "ERRCODE_DATA_CORRUPTED" message parameter in heapam.c FreezeMultiXactId() 
> ... but as we looked closer we're now thinking there might just be an obscure 
> bug in the code that sets vacuum limits.
>
> Nathan and Nasby and myself have been chatting about this for quite awhile 
> but the vacuum code isn't exactly the simplest thing in the world to reason 
> about.  :)  Anyway, it looks to me like MultiXactMemberFreezeThreshold() is 
> intended to progressively reduce the vacuum multixact limits across multiple 
> vacuum runs on the same table, as pressure on the members space increases.  
> I'm thinking there was just a small oversight in writing the formula where 
> under the most aggressive circumstances, vacuum could actually be instructed 
> to delete multixacts that are still in use by active transactions and trigger 
> the failure we observed.
>
> Nate put together an initial patch (attached to the previous email, which was 
> sent only to the bugs list). We couldn't quite come to a consensus and on the 
> best approach, but we decided that he'd kick of the thread and I'd throw out 
> an alternative version of the patch that might be worth discussion.  
> [Attached to this email.]  Curious what others think!

Hi Jeremy, Nathan, Jim,

Ok, so to recap... since commit 801c2dc7 in 2014, if the limit was
before the 'safe' limit, then it would log the warning and start using
the safe limit, even if that was newer than a multixact that is *still
running*.  It's not immediately clear to me if the limits on the
relevant GUCs or anything else ever prevented that.

Then commit 53bb309d2d5 came along in 2015 (to fix a bug: member's
head could overwrite its tail) and created a way for the safe limit to
be more aggressive.  When member space is low, we start lowering the
effective max freeze age, and as we do so the likelihood of crossing
into still-running-multixact territory increases.

I suppose this requires you to run out of member space (for example
many backends key sharing the same FK) or maybe just set
autovacuum_multixact_freeze_max_age quite low, and then prolong the
life of a multixact for longer.  Does the problem fix itself once you
close the transaction that's in the oldest multixact, ie holding back
GetOldestMultiXact() from advancing?  Since VACUUM errors out, we
don't corrupt data, right?  Everyone else is still going to see the
multixact as running and do the right thing because vacuum never
manages to (bogusly) freeze the tuple.

Both patches prevent mxactLimit from being newer than the oldest
running multixact.  The v1 patch uses the most aggressive setting
possible: the oldest running multi; the v2 uses the least aggressive
of the 'safe' and oldest running multi.  At first glance it seems like
the second one is better: it only does something different if we're in
the dangerous scenario you identified, but otherwise it sticks to the
safe limit, which generates less IO.

-- 
Thomas Munro
https://enterprisedb.com




Re: Add "password_protocol" connection parameter to libpq

2019-09-04 Thread Jeff Davis
On Wed, 2019-08-21 at 16:12 +0900, Michael Paquier wrote:
> This counts for other auth requests as well like krb5, no?  I think
> that we should also add the "disable" flavor for symmetry, and
> also...

Added "disable" option, and I refactored so that it would look
explicitly for an expected auth req based on the connection parameters.

I had to also make the client tell the server that it does not support
channel binding if channel_binding=disable, otherwise the server
complains.

> +#define DefaultChannelBinding  "prefer"
> If the build of Postgres does not support SSL (USE_SSL is not
> defined), I think that DefaultChannelBinding should be "disable".
> That would make things consistent with sslmode.

Done.

> Doing the filtering at the time of AUTH_REQ_OK is necessary for
> "trust", but shouldn't this be done as well earlier for other request
> types?  This could save round-trips to the server if we know that an
> exchange begins with a protocol which will never satisfy this
> request, saving efforts for a client doomed to fail after the first
> AUTH_REQ received.  That would be the case for all AUTH_REQ, except
> the SASL ones and of course AUTH_REQ_OK.

Done.

> 
> Could you please add negative tests in
> src/test/authentication/?  What
> could be covered there is that the case where "prefer" (and
> "disable"?) is defined then the authentication is able to go through,
> and that with "require" we get a proper failure as SSL is not used.
> Tests in src/test/ssl/ could include:
> - Make sure that "require" works properly.
> - Test after "disable".

Done.

> +   if (conn->channel_binding[0] == 'r')
> Maybe this should comment that this means "require", in a fashion
> similar to what is done when checking conn->sslmode.

Done.

New patch attached.

Regards,
Jeff Davis

From f3743a9a7c59b628249986d8b7252dfb13e1952d Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Tue, 20 Aug 2019 18:59:23 -0700
Subject: [PATCH] Add libpq parameter 'channel_binding'.

Allow clients to require channel binding to enhance security against
untrusted servers.

Author: Jeff Davis
Discussion: https://postgr.es/m/227015d8417f2b4fef03f8966dbfa5cbcc4f44da.camel%40j-davis.com
---
 doc/src/sgml/libpq.sgml   | 22 +++
 src/interfaces/libpq/fe-auth-scram.c  |  6 +-
 src/interfaces/libpq/fe-auth.c| 71 +--
 src/interfaces/libpq/fe-connect.c | 33 +++
 src/interfaces/libpq/libpq-int.h  |  2 +
 src/test/authentication/t/002_saslprep.pl | 12 +++-
 src/test/ssl/t/002_scram.pl   |  8 ++-
 7 files changed, 144 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index b7c3d96b01f..bc03a171f53 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1118,6 +1118,28 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
+ 
+  channel_binding
+  
+  
+This option controls the client's use of channel binding. A setting
+of require means that the connection must employ
+channel binding, prefer means that the client will
+choose channel binding if available, and disable
+prevents the use of channel binding. The default
+is prefer if
+PostgreSQL is compiled with SSL support;
+otherwise the default is disable.
+  
+  
+Channel binding is a method for the server to authenticate itself to
+the client. It is only supported over SSL connections
+with PostgreSQL 11 or later servers using
+the scram-sha-256 authentication method.
+  
+  
+ 
+
  
   connect_timeout
   
diff --git a/src/interfaces/libpq/fe-auth-scram.c b/src/interfaces/libpq/fe-auth-scram.c
index 7a8335bf9f8..1232aa8c6d2 100644
--- a/src/interfaces/libpq/fe-auth-scram.c
+++ b/src/interfaces/libpq/fe-auth-scram.c
@@ -358,7 +358,7 @@ build_client_first_message(fe_scram_state *state)
 		appendPQExpBufferStr(, "p=tls-server-end-point");
 	}
 #ifdef HAVE_PGTLS_GET_PEER_CERTIFICATE_HASH
-	else if (conn->ssl_in_use)
+	else if (conn->ssl_in_use && conn->channel_binding[0] != 'd')
 	{
 		/*
 		 * Client supports channel binding, but thinks the server does not.
@@ -369,7 +369,7 @@ build_client_first_message(fe_scram_state *state)
 	else
 	{
 		/*
-		 * Client does not support channel binding.
+		 * Client does not support channel binding, or has disabled it.
 		 */
 		appendPQExpBufferChar(, 'n');
 	}
@@ -498,7 +498,7 @@ build_client_final_message(fe_scram_state *state)
 #endif			/* HAVE_PGTLS_GET_PEER_CERTIFICATE_HASH */
 	}
 #ifdef HAVE_PGTLS_GET_PEER_CERTIFICATE_HASH
-	else if (conn->ssl_in_use)
+	else if (conn->ssl_in_use && conn->channel_binding[0] != 'd')
 		appendPQExpBufferStr(, "c=eSws");	/* base64 of "y,," */
 #endif
 	else
diff --git a/src/interfaces/libpq/fe-auth.c b/src/interfaces/libpq/fe-auth.c
index ab227421b3b..dd883b6d8e9 100644
--- a/src/interfaces/libpq/fe-auth.c

Re: block-level incremental backup

2019-09-04 Thread Robert Haas
On Wed, Sep 4, 2019 at 10:08 PM Michael Paquier  wrote:
> > For generating a
> > file, you can always emit the newest and "best" tar format, but for
> > reading a file, you probably want to be prepared for older or cruftier
> > variants.  Maybe not -- I'm not super-familiar with the tar on-disk
> > format.  But I think there must be a reason why tar libraries exist,
> > and I don't want to write a new one.
>
> We need to be sure as well that the library chosen does not block
> access to a feature in all the various platforms we have.

Well, again, my preference is to just not make this particular feature
work natively with tar files.  Then I don't need to choose a library,
so the question is moot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pg_promote() can cause busy loop

2019-09-04 Thread Fujii Masao
On Thu, Sep 5, 2019 at 10:26 AM Michael Paquier  wrote:
>
> On Thu, Sep 05, 2019 at 09:46:26AM +0900, Fujii Masao wrote:
> > I found small issue in pg_promote(). If postmaster dies
> > while pg_promote() is waiting for the standby promotion to finish,
> > pg_promote() can cause busy loop. This happens because
> > pg_promote() does nothing when WaitLatch() detects
> > the postmaster death event. I think that pg_promote()
> > should bail out of the loop immediately in that case.
> >
> > Attached is the patch for the fix.
>
> Indeed, this is not correct.
>
> -   ereport(WARNING,
> -   (errmsg("server did not promote within %d seconds",
> -   wait_seconds)));
> +   if (i >= WAITS_PER_SECOND * wait_seconds)
> +   ereport(WARNING,
> +   (errmsg("server did not promote within %d seconds", 
> wait_seconds)));
>
> Would it make more sense to issue a warning mentioning the postmaster
> death and then return PG_RETURN_BOOL(false) instead of breaking out of
> the loop?  It could be confusing to warn about a timeout if the
> postmaster died in parallel, and we know the actual reason why the
> promotion did not happen in this case.

It's ok to use PG_RETURN_BOOL(false) instead of breaking out of the loop
in that case. Which would make the code simpler.

But I don't think it's worth warning about postmaster death here
because a backend will emit FATAL message like "terminating connection
due to unexpected postmaster exit" in secure_read() after
pg_promote() returns false.

Regards,

-- 
Fujii Masao




Re: pg_promote() can cause busy loop

2019-09-04 Thread Michael Paquier
On Thu, Sep 05, 2019 at 10:53:19AM +0900, Fujii Masao wrote:
> It's ok to use PG_RETURN_BOOL(false) instead of breaking out of the loop
> in that case. Which would make the code simpler.

Okay.  I would have done so FWIW.

> But I don't think it's worth warning about postmaster death here
> because a backend will emit FATAL message like "terminating connection
> due to unexpected postmaster exit" in secure_read() after
> pg_promote() returns false.

Good point, that could be equally confusing.
--
Michael


signature.asc
Description: PGP signature


Re: block-level incremental backup

2019-09-04 Thread Michael Paquier
On Tue, Sep 03, 2019 at 08:59:53AM -0400, Robert Haas wrote:
> I think pg_basebackup is using homebrew code to generate tar files,
> but I'm reluctant to do that for reading tar files.

Yes.  This code has not actually changed since its introduction.
Please note that we also have code which reads directly data from a
tarball in pg_basebackup.c when appending the recovery parameters to
postgresql.auto.conf for -R.  There could be some consolidation here
with what you are doing.

> For generating a
> file, you can always emit the newest and "best" tar format, but for
> reading a file, you probably want to be prepared for older or cruftier
> variants.  Maybe not -- I'm not super-familiar with the tar on-disk
> format.  But I think there must be a reason why tar libraries exist,
> and I don't want to write a new one.

We need to be sure as well that the library chosen does not block
access to a feature in all the various platforms we have.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-04 Thread Michael Paquier
On Wed, Sep 04, 2019 at 04:50:45PM -0400, Alvaro Herrera wrote:
> According to CFbot, the Windows build fails with this patch.  Please
> fix.

To save a couple of clicks:
"C:\projects\postgresql\pageinspect.vcxproj" (default target) (56) ->
(Link target) ->
  heapfuncs.obj : error LNK2001: unresolved external symbol
  pg_popcount32 [C:\projects\postgresql\pageinspect.vcxproj]
.\Release\pageinspect\pageinspect.dll : fatal error LNK1120: 1
unresolved externals [C:\projects\postgresql\pageinspect.vcxproj]

I think that it would be more simple to just use pg_popcount().
That's what other contrib modules do (for example ltree or intarray).
--
Michael


signature.asc
Description: PGP signature


Re: pg_promote() can cause busy loop

2019-09-04 Thread Michael Paquier
On Thu, Sep 05, 2019 at 09:46:26AM +0900, Fujii Masao wrote:
> I found small issue in pg_promote(). If postmaster dies
> while pg_promote() is waiting for the standby promotion to finish,
> pg_promote() can cause busy loop. This happens because
> pg_promote() does nothing when WaitLatch() detects
> the postmaster death event. I think that pg_promote()
> should bail out of the loop immediately in that case.
> 
> Attached is the patch for the fix.

Indeed, this is not correct.

-   ereport(WARNING,
-   (errmsg("server did not promote within %d seconds",
-   wait_seconds)));
+   if (i >= WAITS_PER_SECOND * wait_seconds)
+   ereport(WARNING,
+   (errmsg("server did not promote within %d seconds", 
wait_seconds)));

Would it make more sense to issue a warning mentioning the postmaster
death and then return PG_RETURN_BOOL(false) instead of breaking out of
the loop?  It could be confusing to warn about a timeout if the
postmaster died in parallel, and we know the actual reason why the
promotion did not happen in this case.
--
Michael


signature.asc
Description: PGP signature


Re: Tid scan improvements

2019-09-04 Thread Edmund Horner
On Wed, 4 Sep 2019 at 10:34, Alvaro Herrera  wrote:
>
> On 2019-Aug-01, Edmund Horner wrote:
>
> > Hi everyone,
> >
> > Sadly it is the end of the CF and I have not had much time to work on
> > this.  I'll probably get some time in the next month to look at the
> > heapam changes.
> >
> > Should we move to CF?  We have been in the CF cycle for almost a year now...
>
> Hello, do we have an update on this patch?  Last version that was posted
> was v9 from David on July 17th; you said you had made some changes on
> July 22nd but didn't attach any patch.  v9 doesn't apply anymore.

Hi pgsql-hackers,

I have had a lot of difficulty making the changes to heapam.c and I
think it's becoming obvious I won't get them done by myself.

The last *working* patch pushed the work into heapam.c, but it was
just a naive loop over the whole table.  I haven't found how to
rewrite heapgettup_pagemode in the way Andres suggests.

So, I think we need to either get some help from someone familiar with
heapam.c, or maybe shelve the patch.  It has been work in progress for
over a year now.

Edmund




Re: [HACKERS] CLUSTER command progress monitor

2019-09-04 Thread Michael Paquier
On Wed, Sep 04, 2019 at 09:18:39AM -0400, Robert Haas wrote:
> I think this is all going in the wrong direction.  It's the
> responsibility of the people who are calling the pgstat_progress_*
> functions to only do so when it's appropriate.  Having the
> pgstat_progress_* functions try to untangle whether or not they ought
> to ignore calls made to them is backwards.

Check.

> It seems to me that the problem here can be summarized in this way:
> there's a bunch of code reuse in the relevant paths here, and somebody
> wasn't careful enough when adding progress reporting for one of the
> new commands, and so now things are broken, because e.g. CLUSTER
> progress reporting gets ended by a pgstat_progress_end_command() call
> that was intended for some other utility command but is reached in the
> CLUSTER case anyway.  The solution to that problem in my book is to
> figure out which commit broke it, and then the person who made that
> commit either needs to fix it or revert it.

I am not sure that it is right as well to say that the first committed
patch is right and that the follow-up ones are wrong.  CLUSTER
progress was committed first (6f97457), followed a couple of days
after by CREATE INDEX (ab0dfc9) and then REINDEX (03f9e5c).  So let's
have a look at them..

For CLUSTER, the progress starts and ends in cluster_rel().  CLUSTER
uses its code paths at the beginning, but then things get more
complicated, particularly with finish_heap_swap() which calls directly
reindex_table().  6f97457 includes one progress update at point which
can be a problem per its shared nature in reindex_relation() with
PROGRESS_CLUSTER_INDEX_REBUILD_COUNT.  This last part is wrong IMO,
why should cluster reporting take priority in this code path,
enforcing anything else?

For CREATE INDEX, the progress reporting starts and ends once in
DefineIndex().  However, we have updates of progress within each index
AM build routine, which could be taken by many code paths.  Is it
actually fine to give priority to CREATE INDEX in those cases?  Those
paths can as well be taken by REINDEX or CLUSTER (right?), so having a
counter for CREATE INDEX looks logically wrong to me.  The part where
we wait for snapshots looks actually good from the perspective of
REINDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY.

For REINDEX, we have a problematic start progress call in
reindex_index() which is for example called by reindex_relation for
each relation's index for a non-concurrent case (also in
ReindexRelationConcurrently()).  I think that these are incorrect
locations, and I would have placed them in ReindexIndex(),
ReindexTable() and ReindexMultipleTables() so as we avoid anything
low-level.  This has added two calls to pgstat_progress_update_param()
in reindex_index(), which is shared between all.  Why would it be fine
to give the priority to a CREATE INDEX marker here if CLUSTER can also
cross this way?

On top of those issues, I see some problems with the current state of
affairs, and I am repeating myself:
- It is possible that pgstat_progress_update_param() is called for a
given command for a code path taken by a completely different
command, and that we have a real risk of showing a status completely
buggy as the progress phases share the same numbers.
- We don't consider wisely end and start progress handling for
cascading calls, leading to a risk where we start command A, but 
for shared code paths where we assume that only command B can run then
the processing abruptly ends for command A.
- Is it actually fine to report information about a command completely
different than the one provided by the client?  It is for example
possible to call CLUSTER, but show up to the user progress report
about PROGRESS_COMMAND_CREATE_INDEX (see reindex_index).  This could
actually make sense if we are able to handle cascading progress
reports.

These are, at least it seems to me, fundamental problems we need to
ponder more about if we begin to include more progress reporting, and
we don't have that now, and that worries me.

> It's quite possible here that we need a bigger redesign to make adding
> progress reporting for new command easier and less prone to bugs, but
> I don't think it's at all clear what such a redesign should look like
> or even that we definitely need one, and September is not the right
> time to be redesigning features for the pending release.

Definitely.
--
Michael


signature.asc
Description: PGP signature


pg_promote() can cause busy loop

2019-09-04 Thread Fujii Masao
Hi,

I found small issue in pg_promote(). If postmaster dies
while pg_promote() is waiting for the standby promotion to finish,
pg_promote() can cause busy loop. This happens because
pg_promote() does nothing when WaitLatch() detects
the postmaster death event. I think that pg_promote()
should bail out of the loop immediately in that case.

Attached is the patch for the fix.

Regards,

-- 
Fujii Masao


pg_promote_and_postmaster_death.patch
Description: Binary data


Re: Unexpected "shared memory block is still in use"

2019-09-04 Thread Tom Lane
I wrote:
> This still isn't committable as-is, since the test will just curl up
> and die on machines lacking IPC::SharedMem.

After a bit of research, here's a version that takes a stab at fixing
that.  There may be cleaner ways to do it, but this successfully skips
the test if it can't import the needed IPC modules.

This also fixes a problem that the previous script had with leaking
a shmem segment.  That's due to something that could be considered
a pre-existing bug, which is that if we use shmem key X+1, and the
postmaster crashes, and the next start is able to get shmem key X,
we don't clean up the shmem segment at X+1.  In principle, we could
note from the contents of postmaster.pid that X+1 was used before and
try to remove it.  In practice, I doubt this is worth worrying about
given how small the shmem segments are now, and the very low probability
of key collisions in the new regime.  Anyway it would be material for
a different patch.

I think this could be considered committable, but if anyone wants to
improve the test script, step right up.

regards, tom lane

diff --git a/src/backend/port/posix_sema.c b/src/backend/port/posix_sema.c
index 3370adf..bdd6552 100644
--- a/src/backend/port/posix_sema.c
+++ b/src/backend/port/posix_sema.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "miscadmin.h"
 #include "storage/ipc.h"
@@ -181,10 +182,6 @@ PGSemaphoreShmemSize(int maxSemas)
  * are acquired here or in PGSemaphoreCreate, register an on_shmem_exit
  * callback to release them.
  *
- * The port number is passed for possible use as a key (for Posix, we use
- * it to generate the starting semaphore name).  In a standalone backend,
- * zero will be passed.
- *
  * In the Posix implementation, we acquire semaphores on-demand; the
  * maxSemas parameter is just used to size the arrays.  For unnamed
  * semaphores, there is an array of PGSemaphoreData structs in shared memory.
@@ -196,8 +193,22 @@ PGSemaphoreShmemSize(int maxSemas)
  * we don't have to expose the counters to other processes.)
  */
 void
-PGReserveSemaphores(int maxSemas, int port)
+PGReserveSemaphores(int maxSemas)
 {
+	struct stat statbuf;
+
+	/*
+	 * We use the data directory's inode number to seed the search for free
+	 * semaphore keys.  This minimizes the odds of collision with other
+	 * postmasters, while maximizing the odds that we will detect and clean up
+	 * semaphores left over from a crashed postmaster in our own directory.
+	 */
+	if (stat(DataDir, ) < 0)
+		ereport(FATAL,
+(errcode_for_file_access(),
+ errmsg("could not stat data directory \"%s\": %m",
+		DataDir)));
+
 #ifdef USE_NAMED_POSIX_SEMAPHORES
 	mySemPointers = (sem_t **) malloc(maxSemas * sizeof(sem_t *));
 	if (mySemPointers == NULL)
@@ -215,7 +226,7 @@ PGReserveSemaphores(int maxSemas, int port)
 
 	numSems = 0;
 	maxSems = maxSemas;
-	nextSemKey = port * 1000;
+	nextSemKey = statbuf.st_ino;
 
 	on_shmem_exit(ReleaseSemaphores, 0);
 }
diff --git a/src/backend/port/sysv_sema.c b/src/backend/port/sysv_sema.c
index ac5106f..a1652cc 100644
--- a/src/backend/port/sysv_sema.c
+++ b/src/backend/port/sysv_sema.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifdef HAVE_SYS_IPC_H
 #include 
 #endif
@@ -301,10 +302,6 @@ PGSemaphoreShmemSize(int maxSemas)
  * are acquired here or in PGSemaphoreCreate, register an on_shmem_exit
  * callback to release them.
  *
- * The port number is passed for possible use as a key (for SysV, we use
- * it to generate the starting semaphore key).  In a standalone backend,
- * zero will be passed.
- *
  * In the SysV implementation, we acquire semaphore sets on-demand; the
  * maxSemas parameter is just used to size the arrays.  There is an array
  * of PGSemaphoreData structs in shared memory, and a postmaster-local array
@@ -314,8 +311,22 @@ PGSemaphoreShmemSize(int maxSemas)
  * have clobbered.)
  */
 void
-PGReserveSemaphores(int maxSemas, int port)
+PGReserveSemaphores(int maxSemas)
 {
+	struct stat statbuf;
+
+	/*
+	 * We use the data directory's inode number to seed the search for free
+	 * semaphore keys.  This minimizes the odds of collision with other
+	 * postmasters, while maximizing the odds that we will detect and clean up
+	 * semaphores left over from a crashed postmaster in our own directory.
+	 */
+	if (stat(DataDir, ) < 0)
+		ereport(FATAL,
+(errcode_for_file_access(),
+ errmsg("could not stat data directory \"%s\": %m",
+		DataDir)));
+
 	/*
 	 * We must use ShmemAllocUnlocked(), since the spinlock protecting
 	 * ShmemAlloc() won't be ready yet.  (This ordering is necessary when we
@@ -332,7 +343,7 @@ PGReserveSemaphores(int maxSemas, int port)
 	if (mySemaSets == NULL)
 		elog(PANIC, "out of memory");
 	numSemaSets = 0;
-	nextSemaKey = port * 1000;
+	nextSemaKey = statbuf.st_ino;
 	nextSemaNumber = SEMAS_PER_SET; /* force sema set alloc on 1st call */
 
 	on_shmem_exit(ReleaseSemaphores, 0);
diff --git 

Re: Plug-in common/logging.h with vacuumlo and oid2name

2019-09-04 Thread Michael Paquier
On Wed, Sep 04, 2019 at 02:37:12PM +0200, Peter Eisentraut wrote:
>>> Do we need set_pglocale_pgservice() calls here if we're not doing NLS?
>>> Does the logging stuff require it?  I'm not sure.
>> 
>> The logging part does not require it, but this can be used for
>> PGSYSCONFDIR, no?
> 
> How does PGSYSCONFDIR come into play here?

There is an argument to allow libpq to find out a service file for
a connection from the executable path.  Note that oid2name can use a
connection string for connection, but not vacuumlo, so I somewhat
missed that.
--
Michael


signature.asc
Description: PGP signature


Re: Client Certificate Authentication Using Custom Fields (i.e. other than CN)

2019-09-04 Thread George Hafiz
Hi David,

Glad you are open to the idea!

My proposal would be an additional authentication setting for certauth
(alongside the current map option) which lets you specify which subject
field to match on.

I'll take a look at what the patch would look like, but this is incredibly
tangential to what I'm supposed to be doing, so I can't promise anything!
Would be good if anyone else would like to look at it as well. Hopefully
it's a relatively straightforward change.

Best regards,
George

On Wed, 4 Sep 2019, 21:40 David Fetter,  wrote:

> On Wed, Sep 04, 2019 at 05:24:15PM +0100, George Hafiz wrote:
> > Hello,
> >
> > It is currently only possible to authenticate clients using certificates
> > with the CN.
> >
> > I would like to propose that the field used to identify the client is
> > configurable, e.g. being able to specify DN as the appropriate field. The
> > reason being is that in some organisations, where you might want to use
> the
> > corporate PKI, but where the CN of such certificates is not controlled.
> >
> > In my case, the DN of our corporate issued client certificates is
> > controlled and derived from AD groups we are members of. Only users in
> > those groups can request client certificates with a DN that is equal to
> the
> > AD group ID. This would make DN a perfectly suitable drop-in replacement
> > for Postgres client certificate authentication, but as it stands it is
> not
> > possible to change the field used.
>
> This all sounds interesting.  Do you have a concrete proposal as to
> how such a new interface would look in operation?  Better yet, a PoC
> patch implementing same?
>
> Best,
> David.
> --
> David Fetter  http://fetter.org/
> Phone: +1 415 235 3778
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>


Re: unexpected rowlock mode when trigger is on the table

2019-09-04 Thread Alvaro Herrera
On 2019-Sep-03, Tomáš Záluský wrote:

> postgres=# begin;
> BEGIN
> postgres=# update master set detail_id=null, name='y' where id=1000;
> UPDATE 1
> 
> In another psql console, I run:
> 
> postgres=# select * from pgrowlocks('master');
>  locked_row | locker | multi | xids  |  modes   | pids
> ++---+---+--+---
>  (0,3)  |564 | f | {564} | {Update} | {138}
> (1 row)

Hmm, so I'm guessing that this tuple lock comes from GetTupleForTrigger
called from ExecBRUpdateTriggers, which uses ExecUpdateLockMode() in
order to figure out the lockmode to use, depending on whether the
modified columns by the update overlap columns indexed by any unique
index.  I think there should be no overlap (PK is column "id", not modified)
but I may be missing something.

(gdb) bt
#0  heap_lock_tuple (relation=relation@entry=0x7eff2157b4d8, 
tuple=tuple@entry=0x7ffe570db3e0, cid=0, 
mode=mode@entry=LockTupleExclusive, 
wait_policy=wait_policy@entry=LockWaitBlock, 
follow_updates=follow_updates@entry=0 '\000', buffer=0x7ffe570db3cc, 
hufd=0x7ffe570db3d0)
at /pgsql/source/REL9_6_STABLE/src/backend/access/heap/heapam.c:4577
#1  0x5648b1d52f15 in GetTupleForTrigger (
estate=estate@entry=0x5648b3894110, 
epqstate=epqstate@entry=0x5648b3894750, tid=tid@entry=0x7ffe570db674, 
lockmode=LockTupleExclusive, newSlot=0x7ffe570db498, 
relinfo=, relinfo=)
at /pgsql/source/REL9_6_STABLE/src/backend/commands/trigger.c:2709
#2  0x5648b1d579a0 in ExecBRUpdateTriggers (
estate=estate@entry=0x5648b3894110, 
epqstate=epqstate@entry=0x5648b3894750, 
relinfo=relinfo@entry=0x5648b3894260, 
tupleid=tupleid@entry=0x7ffe570db674, 
fdw_trigtuple=fdw_trigtuple@entry=0x0, slot=slot@entry=0x5648b3896670)
at /pgsql/source/REL9_6_STABLE/src/backend/commands/trigger.c:2432
#3  0x5648b1d8ddc2 in ExecUpdate (tupleid=tupleid@entry=0x7ffe570db674, 
oldtuple=oldtuple@entry=0x0, slot=slot@entry=0x5648b3896670, 
planSlot=planSlot@entry=0x5648b3895998, 
epqstate=epqstate@entry=0x5648b3894750, 
estate=estate@entry=0x5648b3894110, canSetTag=1 '\001')
at /pgsql/source/REL9_6_STABLE/src/backend/executor/nodeModifyTable.c:850

Maybe we're passing an argument wrong somewhere.  Unclear ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Unexpected "shared memory block is still in use"

2019-09-04 Thread Tom Lane
I wrote:
> Attached is a draft patch to change both shmem and sema key selection
> to be based on data directory inode rather than port.
> ...
> I'm not quite sure what's going on in src/test/recovery/t/017_shm.pl.
> As expected, the test for port number non-collision no longer sees
> a failure.  After fixing that, the test passes, but it takes a
> ridiculously long time (minutes); apparently each postmaster start/stop
> cycle takes much longer than it ought to.  I suppose this patch is
> breaking its assumptions, but I've not studied it.

After looking closer, the problem is pretty obvious: the initial
loop is trying to create a cluster whose shmem key matches its
port-number-based expectation.  With this code, that will never
happen except by unlikely accident, so it wastes time with repeated
initdb/start/stop attempts.  After 100 tries it gives up and presses
on with the test, resulting in the apparent pass with long runtime.

I now understand the point you made upthread that this test could
only be preserved if we invent some way to force the choice of shmem
key.  While it wouldn't be hard to do that (say, invent a magic
environment variable), I really don't want to do so.  In the field,
such a behavior would have no positive use, and it could destroy our
newly-improved guarantees about detecting conflicting old processes.

However, there's another way to skin this cat.  We can have the
Perl test script create a conflicting shmem segment directly,
as in the attached second-draft patch.  I simplified the test
script quite a bit, since I don't see any particular value in
creating more than one test postmaster with this approach.

This still isn't committable as-is, since the test will just curl up
and die on machines lacking IPC::SharedMem.  (It could be rewritten
to rely only on the lower-level IPC::SysV module, but I doubt that's
worth the trouble, since either way it'd fail on Windows.)  I'm not
sure whether we should just not bother to run the test at all, or
if we should run it but skip the IPC-related parts; and my Perl-fu
isn't really up to implementing either behavior.

Another thing that might be interesting is to do more than just create
the conflicting segment, ie, try to put some data into it that would
fool the postmaster.  I'm not excited about that at all, but maybe
someone else is?

The attached patch is identical to the previous one except for the
changes in src/test/recovery/t/017_shm.pl.

regards, tom lane

diff --git a/src/backend/port/posix_sema.c b/src/backend/port/posix_sema.c
index 3370adf..bdd6552 100644
--- a/src/backend/port/posix_sema.c
+++ b/src/backend/port/posix_sema.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "miscadmin.h"
 #include "storage/ipc.h"
@@ -181,10 +182,6 @@ PGSemaphoreShmemSize(int maxSemas)
  * are acquired here or in PGSemaphoreCreate, register an on_shmem_exit
  * callback to release them.
  *
- * The port number is passed for possible use as a key (for Posix, we use
- * it to generate the starting semaphore name).  In a standalone backend,
- * zero will be passed.
- *
  * In the Posix implementation, we acquire semaphores on-demand; the
  * maxSemas parameter is just used to size the arrays.  For unnamed
  * semaphores, there is an array of PGSemaphoreData structs in shared memory.
@@ -196,8 +193,22 @@ PGSemaphoreShmemSize(int maxSemas)
  * we don't have to expose the counters to other processes.)
  */
 void
-PGReserveSemaphores(int maxSemas, int port)
+PGReserveSemaphores(int maxSemas)
 {
+	struct stat statbuf;
+
+	/*
+	 * We use the data directory's inode number to seed the search for free
+	 * semaphore keys.  This minimizes the odds of collision with other
+	 * postmasters, while maximizing the odds that we will detect and clean up
+	 * semaphores left over from a crashed postmaster in our own directory.
+	 */
+	if (stat(DataDir, ) < 0)
+		ereport(FATAL,
+(errcode_for_file_access(),
+ errmsg("could not stat data directory \"%s\": %m",
+		DataDir)));
+
 #ifdef USE_NAMED_POSIX_SEMAPHORES
 	mySemPointers = (sem_t **) malloc(maxSemas * sizeof(sem_t *));
 	if (mySemPointers == NULL)
@@ -215,7 +226,7 @@ PGReserveSemaphores(int maxSemas, int port)
 
 	numSems = 0;
 	maxSems = maxSemas;
-	nextSemKey = port * 1000;
+	nextSemKey = statbuf.st_ino;
 
 	on_shmem_exit(ReleaseSemaphores, 0);
 }
diff --git a/src/backend/port/sysv_sema.c b/src/backend/port/sysv_sema.c
index ac5106f..a1652cc 100644
--- a/src/backend/port/sysv_sema.c
+++ b/src/backend/port/sysv_sema.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifdef HAVE_SYS_IPC_H
 #include 
 #endif
@@ -301,10 +302,6 @@ PGSemaphoreShmemSize(int maxSemas)
  * are acquired here or in PGSemaphoreCreate, register an on_shmem_exit
  * callback to release them.
  *
- * The port number is passed for possible use as a key (for SysV, we use
- * it to generate the starting semaphore key).  In a standalone backend,
- * zero will be passed.
- *
 

Re: BUG #15858: could not stat file - over 4GB

2019-09-04 Thread Juan José Santamaría Flecha
Thanks for looking into this.

On Fri, Aug 23, 2019 at 11:49 PM Tom Lane  wrote:
>
> Directly editing the configure script is Not Done ... or at least,
> such changes wouldn't survive the next correctly-done configure
> update.  You have to edit configure.in (or one of the sub-files in
> config/) and then regenerate configure using autoconf.
>
> It seems likely that we *don't* need or want this for Cygwin;
> that should be providing a reasonable stat() emulation already.
> So probably you just want to add "AC_LIBOBJ(win32_stat)" to
> the stanza beginning
>
> I'd also recommend that stat() fill all the fields in struct stat,
> even if you don't have anything better to put there than zeroes.
> Otherwise you're just opening things up for random misbehavior.
>

Fixed.

> I'm not in a position to comment on the details of the conversion from
> GetFileAttributesEx results to struct stat, but in general this
> seems like a reasonable way to proceed.
>

Actually, due to the behaviour of GetFileAttributesEx with symbolic
links I think that using GetFileInformationByHandle instead can give a
more resilient solution. Also, by using a handle we get a good test
for ERROR_DELETE_PENDING. This is the approach for the attached patch.

Regards,

Juan José Santamaría Flecha


0001-WIP-support-for-large-files-on-Win32-v4.patch
Description: Binary data


Re: Index Skip Scan

2019-09-04 Thread Alvaro Herrera
Surely it isn't right to add members prefixed with "ioss_" to
struct IndexScanState.

I'm surprised about this "FirstTupleEmitted" business.  Wouldn't it make
more sense to implement index_skip() to return the first tuple if the
scan is just starting?  (I know little about executor, apologies if this
is a stupid question.)

It would be good to get more knowledgeable people to review this patch.
It's clearly something we want, yet it's been there for a very long
time.

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: progress report for ANALYZE

2019-09-04 Thread Alvaro Herrera
There were some minor problems in v5 -- bogus Docbook as well as
outdated rules.out, small "git diff --check" complaint about whitespace.
This v6 (on today's master) fixes those, no other changes.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c303..630f9821a9 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -360,6 +360,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   
  
 
+ 
+  pg_stat_progress_analyzepg_stat_progress_analyze
+  One row for each backend (including autovacuum worker processes) running
+   ANALYZE, showing current progress.
+   See .
+  
+ 
+
  
   pg_stat_progress_clusterpg_stat_progress_cluster
   One row for each backend running
@@ -3481,7 +3489,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
PostgreSQL has the ability to report the progress of
certain commands during command execution.  Currently, the only commands
which support progress reporting are CREATE INDEX,
-   VACUUM and
+   VACUUM, ANALYZE and
CLUSTER. This may be expanded in the future.
   
 
@@ -3927,6 +3935,132 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  
 
+ 
+  ANALYZE Progress Reporting
+
+  
+   Whenever ANALYZE is running, the
+   pg_stat_progress_analyze view will contain a
+   row for each backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  
+
+  
+   pg_stat_progress_analyze View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of backend.
+
+ 
+  datid
+  oid
+  OID of the database to which this backend is connected.
+ 
+ 
+  datname
+  name
+  Name of the database to which this backend is connected.
+ 
+ 
+  relid
+  oid
+  OID of the table being analyzed.
+ 
+ 
+  include_children
+  boolean
+  Whether scanning through child tables.
+ 
+ 
+  current_relid
+  oid
+  OID of the table currently being scanned.
+It might be different from relid when analyzing tables that have child tables.
+  
+ 
+ 
+  phase
+  text
+  Current processing phase. See 
+ 
+ 
+ sample_blks_total
+ bigint
+ 
+   Total number of heap blocks that will be sampled.
+ 
+ 
+
+ heap_blks_scanned
+ bigint
+ 
+   Number of heap blocks scanned.
+ 
+
+   
+   
+  
+
+  
+   ANALYZE phases
+   
+
+
+  Phase
+  Description
+ 
+
+   
+
+ initializing
+ 
+   The command is preparing to begin scanning the heap.  This phase is
+   expected to be very brief.
+ 
+
+
+ acquiring sample rows
+ 
+   The command is currently scanning the current_relid
+   to obtain samples.
+ 
+
+
+ computing stats
+ 
+   The command is computing stats from the samples obtained during the table scan.
+ 
+
+
+ computing extended stats
+ 
+   The command is computing extended stats from the samples obtained in the previous phase.
+ 
+
+
+ finalizing analyze
+ 
+   The command is updating pg_class. When this phase is completed, 
+   ANALYZE will end.
+ 
+
+   
+   
+  
+ 
+
  
   CLUSTER Progress Reporting
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e395..cc0c6291d9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -964,6 +964,22 @@ CREATE VIEW pg_stat_progress_vacuum AS
 FROM pg_stat_get_progress_info('VACUUM') AS S
 LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_analyze AS
+SELECT
+S.pid AS pid, S.datid AS datid, D.datname AS datname,
+CAST(S.relid AS oid) AS relid,
+CAST(CAST(S.param2 AS int) AS boolean) AS include_children,
+CAST(S.param3 AS oid) AS current_relid,
+CASE S.param1 WHEN 0 THEN 'initializing'
+  WHEN 1 THEN 'acquiring sample rows'
+  WHEN 2 THEN 'computing stats'
+  WHEN 3 THEN 'computing extended stats'
+  WHEN 4 THEN 'finalizing analyze'
+  END AS phase,
+S.param4 AS sample_blks_total, S.param5 AS heap_blks_scanned
+FROM pg_stat_get_progress_info('ANALYZE') AS S
+LEFT JOIN pg_database D ON S.datid = D.oid;
+
 CREATE VIEW pg_stat_progress_cluster AS
 SELECT
 S.pid AS pid,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d2fdf447b6..6380e4cbbb 

Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-04 Thread Alvaro Herrera
On 2019-Sep-04, Alvaro Herrera wrote:

> Attached v3 again, for CFbot's benefit.  No changes from last time.

According to CFbot, the Windows build fails with this patch.  Please
fix.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Client Certificate Authentication Using Custom Fields (i.e. other than CN)

2019-09-04 Thread David Fetter
On Wed, Sep 04, 2019 at 05:24:15PM +0100, George Hafiz wrote:
> Hello,
> 
> It is currently only possible to authenticate clients using certificates
> with the CN.
> 
> I would like to propose that the field used to identify the client is
> configurable, e.g. being able to specify DN as the appropriate field. The
> reason being is that in some organisations, where you might want to use the
> corporate PKI, but where the CN of such certificates is not controlled.
> 
> In my case, the DN of our corporate issued client certificates is
> controlled and derived from AD groups we are members of. Only users in
> those groups can request client certificates with a DN that is equal to the
> AD group ID. This would make DN a perfectly suitable drop-in replacement
> for Postgres client certificate authentication, but as it stands it is not
> possible to change the field used.

This all sounds interesting.  Do you have a concrete proposal as to
how such a new interface would look in operation?  Better yet, a PoC
patch implementing same?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: using explicit_bzero

2019-09-04 Thread Alvaro Herrera
On 2019-Aug-24, Peter Eisentraut wrote:

> On 2019-08-14 05:00, Michael Paquier wrote:
> > On Tue, Aug 13, 2019 at 10:30:39AM +0200, Peter Eisentraut wrote:
> >> Another patch, to attempt to fix the Windows build.
> > 
> > I have not been able to test the compilation, but the changes look
> > good on this side.
> 
> Rebased patch, no functionality changes.

Marked RfC.  Can we get on with this?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] WIP: Aggregation push-down

2019-09-04 Thread Alvaro Herrera
This stuff seems very useful.  How come it sits unreviewed for so long?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pg_basebackup ignores the existing data directory permissions

2019-09-04 Thread Alvaro Herrera
On 2019-Apr-03, Robert Haas wrote:

> I am not sure what solution is best here, but it is hard to imagine
> that the status quo is the right thing.

This patch has been dormant for months.  There's been at lot of
discussion but it doesn't seem conclusive; it doesn't look like we know
what we actually want to do.  Can I try to restart the discussion and
see if we can get to an agreement, so that somebody can implement it?
Failing that, it seems this patch would be Returned with Little Useful Feedback.

There seem to be multiple fine points here:

1. We want to have initdb and pg_basebackup behave consistently.

   Maybe if we don't like that changing pg_basebackup would make it
   behave differently to initdb, then we ought to change both tools'
   default behavior, and give equivalent new options to both to select
   the other(s?) behavior(s?).  So I talk about "the tool" referring to
   both initdb and pg_basebackup in the following.

2. Should the case of creating a new dir behave differently from using
   an existing directory?

   Probably for simplicity we want both cases to behave the same.
   I mean that if an existing dir has group privs and we choose that the
   default behavior is without group privs, then those would get removed
   unless a cmd line arg is given.  Contrariwise if we choose that group
   perms are to be preserved if they exist, then we should create a new
   dir with group privs unless an option is given.

3. Sometimes we want to have the tool keep the permissions of an
   existing directory, but for pg_basebackup the user might sometimes
   want to preserve the permissions of upstream instead.

   It seems to me that we could choose the default to be the most secure
   behavior (which AFAICT is not to have any group perms), and if the
   user wants to preserve group perms in an existing dir (or give group
   perms to a directory created by the tool) they can pass a bespoke
   command line argument.

   I think ultimately this means that upstream privs would go ignored by
   pg_basebackup.  Maybe we can add another cmdline option to enable
   preserving such.

I hope I didn't completely misunderstand the thread -- always a
possibility.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-09-04 Thread Tomas Vondra

On Wed, Sep 04, 2019 at 11:37:48AM +0200, Rafia Sabih wrote:

On Tue, 30 Jul 2019 at 02:17, Tomas Vondra 
wrote:


On Sun, Jul 21, 2019 at 01:34:22PM +0200, Tomas Vondra wrote:
>
> ...
>
>I wonder if we're approaching this wrong. Maybe we should not reverse
>engineer queries for the various places, but just start with a set of
>queries that we want to optimize, and then identify which places in the
>planner need to be modified.
>

I've decided to do a couple of experiments, trying to make my mind about
which modified places matter to diffrent queries. But instead of trying
to reverse engineer the queries, I've taken a different approach - I've
compiled a list of queries that I think are sensible and relevant, and
then planned them with incremental sort enabled in different places.

I don't have any clear conclusions at this point - it does show some of
the places don't change plan for any of the queries, although there may
be some additional query where it'd make a difference.

But I'm posting this mostly because it might be useful. I've initially
planned to move changes that add incremental sort paths to separate
patches, and then apply/skip different subsets of those patches. But
then I realized there's a better way to do this - I've added a bunch of
GUCs, one for each such place. This allows doing this testing without
having to rebuild repeatedly.

I'm not going to post the patch(es) with extra GUCs here, because it'd
just confuse the patch tester, but it's available here:

  https://github.com/tvondra/postgres/tree/incremental-sort-20190730

There are 10 GUCs, one for each place in planner where incremental sort
paths are constructed. By default all those are set to 'false' so no
incremental sort paths are built. If you do

  SET devel_create_ordered_paths = on;

it'll start creating the paths in non-parallel in create_ordered_paths.
Then you may enable devel_create_ordered_paths_parallel to also consider
parallel paths, etc.

The list of queries (synthetic, but hopefully sufficiently realistic)
and a couple of scripts to collect the plans is in this repository:

  https://github.com/tvondra/incremental-sort-tests-2

There's also a spreadsheet with a summary of results, with a visual
representation of which GUCs affect which queries.


Wow, that sounds like an elaborate experiment. But where is this
spreadsheet you mentioned ?



It seems I forgot to push the commit containing the spreadsheet with
results. I'll fix that tomorrow.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-04 Thread Alvaro Herrera
Attached v3 again, for CFbot's benefit.  No changes from last time.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 018077d786f874cb314b5f61b5ef85f42c62bbe5 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada 
Date: Fri, 23 Aug 2019 10:36:13 +0900
Subject: [PATCH v3] Introduce heap_infomask_flags to decode infomask and
 infomask2

---
 contrib/pageinspect/Makefile  |   2 +-
 contrib/pageinspect/expected/page.out |  97 
 contrib/pageinspect/heapfuncs.c   | 104 ++
 contrib/pageinspect/pageinspect--1.7--1.8.sql |  13 
 contrib/pageinspect/pageinspect.control   |   2 +-
 contrib/pageinspect/sql/page.sql  |  26 +++
 doc/src/sgml/pageinspect.sgml |  33 
 7 files changed, 275 insertions(+), 2 deletions(-)
 create mode 100644 contrib/pageinspect/pageinspect--1.7--1.8.sql

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index e5a581f..cfe0129 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -5,7 +5,7 @@ OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \
 		  brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
-DATA =  pageinspect--1.6--1.7.sql \
+DATA =  pageinspect--1.7--1.8.sql pageinspect--1.6--1.7.sql \
 	pageinspect--1.5.sql pageinspect--1.5--1.6.sql \
 	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
 	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out
index 3fcd9fb..3b47599 100644
--- a/contrib/pageinspect/expected/page.out
+++ b/contrib/pageinspect/expected/page.out
@@ -82,6 +82,103 @@ SELECT * FROM fsm_page_contents(get_raw_page('test1', 'fsm', 0));
  
 (1 row)
 
+-- If we freeze the only tuple on test1, the infomask should
+-- always be the same in all test runs.
+VACUUM FREEZE test1;
+SELECT t_infomask, t_infomask2, flags
+FROM heap_page_items(get_raw_page('test1', 0)),
+ LATERAL heap_infomask_flags(t_infomask, t_infomask2, true) m(flags);
+ t_infomask | t_infomask2 |flags 
++-+--
+   2816 |   2 | {HEAP_XMAX_INVALID,HEAP_XMIN_FROZEN}
+(1 row)
+
+SELECT t_infomask, t_infomask2, flags
+FROM heap_page_items(get_raw_page('test1', 0)),
+ LATERAL heap_infomask_flags(t_infomask, t_infomask2, false) m(flags);
+ t_infomask | t_infomask2 |   flags   
++-+---
+   2816 |   2 | {HEAP_XMAX_INVALID,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID}
+(1 row)
+
+SELECT heap_infomask_flags(2816, 0); -- show raw flags by default
+heap_infomask_flags
+---
+ {HEAP_XMAX_INVALID,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID}
+(1 row)
+
+SELECT heap_infomask_flags(2816, 0, true);
+ heap_infomask_flags  
+--
+ {HEAP_XMAX_INVALID,HEAP_XMIN_FROZEN}
+(1 row)
+
+SELECT heap_infomask_flags(2816, 1, true);
+ heap_infomask_flags  
+--
+ {HEAP_XMAX_INVALID,HEAP_XMIN_FROZEN}
+(1 row)
+
+SELECT heap_infomask_flags(2816, 1, false);
+heap_infomask_flags
+---
+ {HEAP_XMAX_INVALID,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID}
+(1 row)
+
+SELECT heap_infomask_flags(2816, x''::integer, true);
+   heap_infomask_flags   
+-
+ {HEAP_XMAX_INVALID,HEAP_XMIN_FROZEN,HEAP_KEYS_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}
+(1 row)
+
+SELECT heap_infomask_flags(2816, x''::integer, false);
+ heap_infomask_flags  
+--
+ {HEAP_XMAX_INVALID,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_KEYS_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}
+(1 row)
+
+SELECT heap_infomask_flags(x'1080'::integer, 0, true); -- test for HEAP_LOCKED_UPGRADED
+  heap_infomask_flags   
+
+ {HEAP_LOCKED_UPGRADED}
+(1 row)
+
+SELECT heap_infomask_flags(x'1080'::integer, 0, false);
+   heap_infomask_flags
+--
+ {HEAP_XMAX_LOCK_ONLY,HEAP_XMAX_IS_MULTI}
+(1 row)
+
+SELECT heap_infomask_flags(x''::integer, x''::integer, false);
+ 

Re: [Patch] Invalid permission check in pg_stats for functional indexes

2019-09-04 Thread Alvaro Herrera
On 2019-Sep-03, Pierre Ducroquet wrote:

> > IIUC, the patch introduces an additional privilege check for the
> > underlying objects involved in the expression/functional index. If the
> > user has 'select' privileges on all of the columns/objects included in
> > the expression/functional index, then it should be visible in pg_stats
> > view. I've applied the patch manually and tested the feature. It works
> > as expected.
> 
> Indeed, you understood correctly. I have not digged around to find out the 
> origin of the current situation, but it does not look like an intentional 
> behaviour, more like a small oversight.

Hmm.  This seems to create a large performance drop.  I created your
view as pg_stats2 alongside pg_stats, and ran EXPLAIN on both for the
query you posted.  Look at the plan with the original query:

55432 13devel 10881=# explain select tablename, attname from pg_stats where 
tablename like 'tbl1%';
QUERY PLAN  
   
───
 Subquery Scan on pg_stats  (cost=129.79..156.46 rows=1 width=128)
   Filter: (pg_stats.tablename ~~ 'tbl1%'::text)
   ->  Hash Join  (cost=129.79..156.39 rows=5 width=401)
 Hash Cond: ((s.starelid = c.oid) AND (s.staattnum = a.attnum))
 ->  Index Only Scan using pg_statistic_relid_att_inh_index on 
pg_statistic s  (cost=0.27..22.60 rows=422 width=6)
 ->  Hash  (cost=114.88..114.88 rows=976 width=138)
   ->  Hash Join  (cost=22.90..114.88 rows=976 width=138)
 Hash Cond: (a.attrelid = c.oid)
 Join Filter: has_column_privilege(c.oid, a.attnum, 
'select'::text)
 ->  Seq Scan on pg_attribute a  (cost=0.00..84.27 
rows=2927 width=70)
   Filter: (NOT attisdropped)
 ->  Hash  (cost=17.95..17.95 rows=396 width=72)
   ->  Seq Scan on pg_class c  (cost=0.00..17.95 
rows=396 width=72)
 Filter: ((NOT relrowsecurity) OR (NOT 
row_security_active(oid)))
(14 filas)

and here's the plan with your modified view:

55432 13devel 10881=# explain select tablename, attname from pg_stats2 where 
tablename like 'tbl1%';


QUERY PLAN  

  
──
 Subquery Scan on pg_stats2  (cost=128.72..6861.85 rows=1 width=128)
   Filter: (pg_stats2.tablename ~~ 'tbl1%'::text)
   ->  Nested Loop  (cost=128.72..6861.80 rows=4 width=401)
 Join Filter: (s.starelid = c.oid)
 ->  Hash Join  (cost=128.45..152.99 rows=16 width=74)
   Hash Cond: ((s.starelid = a.attrelid) AND (s.staattnum = 
a.attnum))
   ->  Index Only Scan using pg_statistic_relid_att_inh_index on 
pg_statistic s  (cost=0.27..22.60 rows=422 width=6)
   ->  Hash  (cost=84.27..84.27 rows=2927 width=70)
 ->  Seq Scan on pg_attribute a  (cost=0.00..84.27 
rows=2927 width=70)
   Filter: (NOT attisdropped)
 ->  Index Scan using pg_class_oid_index on pg_class c  
(cost=0.27..419.29 rows=1 width=73)
   Index Cond: (oid = a.attrelid)
   Filter: (((NOT relrowsecurity) OR (NOT 
row_security_active(oid))) AND ((relkind = 'r'::"char") OR ((relkind = 
'i'::"char") AND (NOT (alternatives: SubPlan 1 or hashed SubPlan 2 AND 
(((relkind = 'r'::"char") AND has_column_privilege(oid, a.attnum, 
'select'::text)) OR ((relkind = 'i'::"char") AND (NOT (alternatives: SubPlan 1 
or hashed SubPlan 2)
   SubPlan 1
 ->  Seq Scan on pg_depend  (cost=0.00..209.48 rows=1 width=0)
   Filter: ((refobjsubid > 0) AND (objid = c.oid) AND (NOT 
has_column_privilege(refobjid, (refobjsubid)::smallint, 'select'::text)))
   SubPlan 2
 ->  Seq Scan on pg_depend pg_depend_1  (cost=0.00..190.42 
rows=176 width=4)
   Filter: ((refobjsubid > 0) AND (NOT 
has_column_privilege(refobjid, (refobjsubid)::smallint, 'select'::text)))
(19 filas)

You forgot to add a condition `pg_depend.classid =
'pg_catalog.pg_class'::pg_catalog.regclass` in your subquery 

Re: [Patch] Invalid permission check in pg_stats for functional indexes

2019-09-04 Thread Alvaro Herrera
BTW you labelled this in the CF app as targetting "stable", but I don't
think this is backpatchable.  I think we should fix it in master and
call it a day.  Changing system view definitions in stable versions is
tough.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Client Certificate Authentication Using Custom Fields (i.e. other than CN)

2019-09-04 Thread George Hafiz
Hello,

It is currently only possible to authenticate clients using certificates
with the CN.

I would like to propose that the field used to identify the client is
configurable, e.g. being able to specify DN as the appropriate field. The
reason being is that in some organisations, where you might want to use the
corporate PKI, but where the CN of such certificates is not controlled.

In my case, the DN of our corporate issued client certificates is
controlled and derived from AD groups we are members of. Only users in
those groups can request client certificates with a DN that is equal to the
AD group ID. This would make DN a perfectly suitable drop-in replacement
for Postgres client certificate authentication, but as it stands it is not
possible to change the field used.

Best regards,
George


Re: Planning counters in pg_stat_statements (using pgss_store)

2019-09-04 Thread Sergei Kornilov
Hello

I think the most important question for this topic is performance penalty.
It was a long story, first test on my desktop was too volatile. I setup 
separate PC with DB only and test few cases.

PC spec: 2-core Intel Core 2 Duo E6550, 4GB ram, mechanical HDD
All tests on top 7dedfd22b79822b7f4210e6255b672ea82db6678 commit, build via 
./configure  --prefix=/home/melkij/tmp/ --enable-tap-tests
DB settings:
  listen_addresses = '*'
  log_line_prefix = '%m %p %u@%d from %h [vxid:%v txid:%x] [%i] '
  lc_messages = 'C'
  shared_buffers = 512MB

pgbench runned from different host, in same L2 network.
Database was generated by: pgbench -s 10 -i -h hostname postgres
After database start I run:
  create extension if not exists pg_prewarm;
  select count(*), sum(pg_prewarm) from pg_tables join 
pg_prewarm(tablename::regclass) on true where schemaname= 'public';
  select count(*), sum(pg_prewarm) from pg_indexes join 
pg_prewarm(indexname::regclass) on true where schemaname= 'public';
So all data was in buffers.

Load generated by command: pgbench --builtin=select-only --time=300 -n -c 10 -h 
hostname postgres -M (vary)

Tests are:
head_no_pgss - unpatched version, empty shared_preload_libraries
head_track_none - unpatched version with:
  shared_preload_libraries = 'pg_stat_statements'
  pg_stat_statements.max = 5000
  pg_stat_statements.track = none
  pg_stat_statements.save = off
  pg_stat_statements.track_utility = off
head_track_top - the same but with pg_stat_statements.track=top
5-times runned in every mode -M: simple, extended, prepared

patch_not_loaded - build with latest published patches, empty 
shared_preload_libraries
patch_track_none - patched build with
  shared_preload_libraries = 'pg_stat_statements'
  pg_stat_statements.max = 5000
  pg_stat_statements.track = none
  pg_stat_statements.save = off
  pg_stat_statements.track_utility = off
  pg_stat_statements.track_planning = off
patch_track_top - the same but with pg_stat_statements.track=top
patch_track_planning - with:
  shared_preload_libraries = 'pg_stat_statements'
  pg_stat_statements.max = 5000
  pg_stat_statements.track = top
  pg_stat_statements.save = off
  pg_stat_statements.track_utility = off
  pg_stat_statements.track_planning = on

10-times runned in every mode -M: simple, extended, prepared

Results:

 test |   mode   | average_tps | degradation_perc 
--+--+-+--
 head_no_pgss | extended |   13816 |1.000
 patch_not_loaded | extended |   13755 |0.996
 head_track_none  | extended |   13607 |0.985
 patch_track_none | extended |   13560 |0.981
 head_track_top   | extended |   13277 |0.961
 patch_track_top  | extended |   13189 |0.955
 patch_track_planning | extended |   12983 |0.940
 head_no_pgss | prepared |   29101 |1.000
 head_track_none  | prepared |   28510 |0.980
 patch_track_none | prepared |   28481 |0.979
 patch_not_loaded | prepared |   28382 |0.975
 patch_track_planning | prepared |   28046 |0.964
 head_track_top   | prepared |   28035 |0.963
 patch_track_top  | prepared |   27973 |0.961
 head_no_pgss | simple   |   16733 |1.000
 patch_not_loaded | simple   |   16552 |0.989
 head_track_none  | simple   |   16452 |0.983
 patch_track_none | simple   |   16365 |0.978
 head_track_top   | simple   |   15867 |0.948
 patch_track_top  | simple   |   15820 |0.945
 patch_track_planning | simple   |   15739 |0.941

So I found slight slowdown with track_planning = off compared to HEAD. Possibly 
just at the level of measurement error. I think this is ok.
track_planning = on also has no dramatic impact. In my opinion proposed design 
with pgss_store call is acceptable.

regards, Sergei




Re: [PATCH] Connection time for \conninfo

2019-09-04 Thread Alvaro Herrera
On 2019-Sep-04, Michael Paquier wrote:

> On Tue, Sep 03, 2019 at 10:13:57PM -0400, Rodrigo Ramírez Norambuena wrote:
> > I've work in the a little patch to add into the \conninfo of psql
> > command the connection time  against a server backend
> > 
> > Maybe could add after, the precheck to if the connection is alive.
> > I've take first look to the pqPacketSend, my idea is send a ECHO
> > packet over to the database connection. If someone has a better idea
> > it would be great to know.
> 
> You can do basically the same thing by looking at
> pg_stat_activity.backend_start and compare it with the clock
> timestamp.  Doing it at SQL level the way you want has also the
> advantage to offer you a modular format output.

Hmm, couldn't you say the same about the other details that \conninfo
gives?  You can get them from pg_stat_activity or pg_stat_ssl, yet
they're still shown \conninfo for convenience.  Surely we don't want to
remove everything from \conninfo and tell users to query the stat views
instead.

The only thing that seems wrong about this proposal is that the time
format is a bit too verbose.  I would have it do "N days 12:23:34".

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




having issues with PG12 debian packaging repository

2019-09-04 Thread Murat Tuncer
Hello hackers

I am getting sporadic errors when I tried to use PG12 bionic debian
repository.

Here is the error message that is result of apt-get update.
---
Failed to fetch
http://apt.postgresql.org/pub/repos/apt/dists/bionic-pgdg/main/binary-i386/Packages.gz
File has unexpected size (260865 != 260866). Mirror sync in progress? [IP:
34.96.81.152 80]
Hashes of expected file:
 - Filesize:260866 [weak]
 - SHA256:433bef097d8a54a9899350c182d0074c1a13f62c8e7e9987cc6c63cd11242abc
 - SHA1:1be55e080a1dd277929f095690ae9b9cf01e971f [weak]
 - MD5Sum:08189bf54aa297f53b9656bc3c529c62 [weak]
 Release file created at: Mon, 02 Sep 2019 10:25:33 +
 Failed to fetch
http://apt.postgresql.org/pub/repos/apt/dists/bionic-pgdg/main/binary-amd64/Packages.gz
 Some index files failed to download. They have been ignored, or old ones
used instead.
--

It usually succeeds when I run it again.  Is there a way to avoid this?

Thanks
Murat


Re: Default JIT setting in V12

2019-09-04 Thread Andres Freund
Hi,

On 2019-09-04 07:51:16 -0700, Andres Freund wrote:
> On 2019-09-04 09:56:28 -0400, Jeff Janes wrote:
> > I think it is intuitive, and with empirical evidence, that we do not want
> > to JIT compile at all unless we are going to optimize the compiled code.
> 
> There's pretty clear counter-evidence however as well :(
> 
> I think it's probably more sensible to use some cheap minimal
> optimization for the "unoptimized" mode - because there's some
> non-linear cost algorithms with full optimizations enabled.
> 
> How does your example look with something like:
> 
> diff --git i/src/backend/jit/llvm/llvmjit.c w/src/backend/jit/llvm/llvmjit.c
> index 82c4afb7011..85ddae2ea2b 100644
> --- i/src/backend/jit/llvm/llvmjit.c
> +++ w/src/backend/jit/llvm/llvmjit.c
> @@ -428,7 +428,7 @@ llvm_optimize_module(LLVMJitContext *context, 
> LLVMModuleRef module)
>  if (context->base.flags & PGJIT_OPT3)
>  compile_optlevel = 3;
>  else
> -compile_optlevel = 0;
> +compile_optlevel = 1;
>  
>  /*
>   * Have to create a new pass manager builder every pass through, as the
> 
> which I think - but I'd have to check - doesn't include any of the
> non-linear cost optimizations.

Or better, something slightly more complete, like the attached (which
affects both code-gen time optimizations (which are more like peephole
ones), and both function/global ones that are cheap).

Greetings,

Andres Freund
diff --git i/src/backend/jit/llvm/llvmjit.c w/src/backend/jit/llvm/llvmjit.c
index 82c4afb7011..42cc1b0ab92 100644
--- i/src/backend/jit/llvm/llvmjit.c
+++ w/src/backend/jit/llvm/llvmjit.c
@@ -97,11 +97,11 @@ static const char *llvm_triple = NULL;
 static const char *llvm_layout = NULL;
 
 
-static LLVMTargetMachineRef llvm_opt0_targetmachine;
+static LLVMTargetMachineRef llvm_opt1_targetmachine;
 static LLVMTargetMachineRef llvm_opt3_targetmachine;
 
 static LLVMTargetRef llvm_targetref;
-static LLVMOrcJITStackRef llvm_opt0_orc;
+static LLVMOrcJITStackRef llvm_opt1_orc;
 static LLVMOrcJITStackRef llvm_opt3_orc;
 
 
@@ -282,12 +282,12 @@ llvm_get_function(LLVMJitContext *context, const char *funcname)
 #else
 
 #if LLVM_VERSION_MAJOR < 5
-	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt0_orc, funcname)))
+	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt1_orc, funcname)))
 		return (void *) (uintptr_t) addr;
 	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt3_orc, funcname)))
 		return (void *) (uintptr_t) addr;
 #else
-	if (LLVMOrcGetSymbolAddress(llvm_opt0_orc, , funcname))
+	if (LLVMOrcGetSymbolAddress(llvm_opt1_orc, , funcname))
 		elog(ERROR, "failed to look up symbol \"%s\"", funcname);
 	if (addr)
 		return (void *) (uintptr_t) addr;
@@ -428,7 +428,7 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module)
 	if (context->base.flags & PGJIT_OPT3)
 		compile_optlevel = 3;
 	else
-		compile_optlevel = 0;
+		compile_optlevel = 1;
 
 	/*
 	 * Have to create a new pass manager builder every pass through, as the
@@ -499,7 +499,7 @@ llvm_compile_module(LLVMJitContext *context)
 	if (context->base.flags & PGJIT_OPT3)
 		compile_orc = llvm_opt3_orc;
 	else
-		compile_orc = llvm_opt0_orc;
+		compile_orc = llvm_opt1_orc;
 
 	/* perform inlining */
 	if (context->base.flags & PGJIT_INLINE)
@@ -648,9 +648,9 @@ llvm_session_initialize(void)
 	elog(DEBUG2, "LLVMJIT detected CPU \"%s\", with features \"%s\"",
 		 cpu, features);
 
-	llvm_opt0_targetmachine =
+	llvm_opt1_targetmachine =
 		LLVMCreateTargetMachine(llvm_targetref, llvm_triple, cpu, features,
-LLVMCodeGenLevelNone,
+LLVMCodeGenLevelLess,
 LLVMRelocDefault,
 LLVMCodeModelJITDefault);
 	llvm_opt3_targetmachine =
@@ -667,7 +667,7 @@ llvm_session_initialize(void)
 	/* force symbols in main binary to be loaded */
 	LLVMLoadLibraryPermanently(NULL);
 
-	llvm_opt0_orc = LLVMOrcCreateInstance(llvm_opt0_targetmachine);
+	llvm_opt1_orc = LLVMOrcCreateInstance(llvm_opt1_targetmachine);
 	llvm_opt3_orc = LLVMOrcCreateInstance(llvm_opt3_targetmachine);
 
 #if defined(HAVE_DECL_LLVMCREATEGDBREGISTRATIONLISTENER) && HAVE_DECL_LLVMCREATEGDBREGISTRATIONLISTENER
@@ -675,7 +675,7 @@ llvm_session_initialize(void)
 	{
 		LLVMJITEventListenerRef l = LLVMCreateGDBRegistrationListener();
 
-		LLVMOrcRegisterJITEventListener(llvm_opt0_orc, l);
+		LLVMOrcRegisterJITEventListener(llvm_opt1_orc, l);
 		LLVMOrcRegisterJITEventListener(llvm_opt3_orc, l);
 	}
 #endif
@@ -684,7 +684,7 @@ llvm_session_initialize(void)
 	{
 		LLVMJITEventListenerRef l = LLVMCreatePerfJITEventListener();
 
-		LLVMOrcRegisterJITEventListener(llvm_opt0_orc, l);
+		LLVMOrcRegisterJITEventListener(llvm_opt1_orc, l);
 		LLVMOrcRegisterJITEventListener(llvm_opt3_orc, l);
 	}
 #endif
@@ -711,14 +711,14 @@ llvm_shutdown(int code, Datum arg)
 		llvm_opt3_orc = NULL;
 	}
 
-	if (llvm_opt0_orc)
+	if (llvm_opt1_orc)
 	{
 #if defined(HAVE_DECL_LLVMORCREGISTERPERF) && HAVE_DECL_LLVMORCREGISTERPERF
 		if (jit_profiling_support)
-			LLVMOrcUnregisterPerf(llvm_opt0_orc);
+		

auxiliary processes in pg_stat_ssl

2019-09-04 Thread Alvaro Herrera
I just noticed that we list auxiliary processes in pg_stat_ssl:

55432 13devel 28627=# select * from pg_stat_ssl ;
  pid  │ ssl │ version │ cipher │ bits │ compression │ 
client_dn │ client_serial │ issuer_dn 
───┼─┼─┼┼──┼─┼───┼───┼───
 28618 │ f   │ ││  │ │  
 │   │ 
 28620 │ f   │ ││  │ │  
 │   │ 
 28627 │ t   │ TLSv1.3 │ TLS_AES_256_GCM_SHA384 │  256 │ f   │  
 │   │ 
 28616 │ f   │ ││  │ │  
 │   │ 
 28615 │ f   │ ││  │ │  
 │   │ 
 28617 │ f   │ ││  │ │  
 │   │ 
(6 filas)

55432 13devel 28627=# select pid, backend_type from pg_stat_activity ;
  pid  │ backend_type 
───┼──
 28618 │ autovacuum launcher
 28620 │ logical replication launcher
 28627 │ client backend
 28616 │ background writer
 28615 │ checkpointer
 28617 │ walwriter
(6 filas)

But this seems pointless.  Should we not hide those?  Seems this only
happened as an unintended side-effect of fc70a4b0df38.  It appears to me
that we should redefine that view to restrict backend_type that's
'client backend' (maybe include 'wal receiver'/'wal sender' also, not
sure.)

-- 
Álvaro Herrerahttp://www.twitter.com/alvherre




Re: Unexpected "shared memory block is still in use"

2019-09-04 Thread Tom Lane
Peter Eisentraut  writes:
> I agree with this patch and the reasons for it.

OK, thanks for reviewing.

> A related point, perhaps we should change the key printed into
> postmaster.pid to be in hexadecimal format ("0x08x") so that it matches
> what ipcs prints.

Hmm, that depends on whose ipcs you use :-(.  A quick survey
of my machines says it's

key shmid

Linux:  hex decimal
FreeBSD:decimal decimal
NetBSD: decimal decimal
OpenBSD:decimal decimal
macOS:  hex decimal
HPUX:   hex (not printed)

There's certainly room to argue that hex+decimal is most popular,
but I'm not sure that that outweighs possible compatibility issues
from changing postmaster.pid contents.  (Admittedly, it's not real
clear that anything would be paying attention to the shmem key,
so maybe there's no compatibility issue.)

If we did want to assume that we could change postmaster.pid,
it might be best to print the key both ways?

regards, tom lane




Re: Default JIT setting in V12

2019-09-04 Thread Andres Freund
Hi,

On 2019-09-04 09:56:28 -0400, Jeff Janes wrote:
> I think it is intuitive, and with empirical evidence, that we do not want
> to JIT compile at all unless we are going to optimize the compiled code.

There's pretty clear counter-evidence however as well :(

I think it's probably more sensible to use some cheap minimal
optimization for the "unoptimized" mode - because there's some
non-linear cost algorithms with full optimizations enabled.

How does your example look with something like:

diff --git i/src/backend/jit/llvm/llvmjit.c w/src/backend/jit/llvm/llvmjit.c
index 82c4afb7011..85ddae2ea2b 100644
--- i/src/backend/jit/llvm/llvmjit.c
+++ w/src/backend/jit/llvm/llvmjit.c
@@ -428,7 +428,7 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef 
module)
 if (context->base.flags & PGJIT_OPT3)
 compile_optlevel = 3;
 else
-compile_optlevel = 0;
+compile_optlevel = 1;
 
 /*
  * Have to create a new pass manager builder every pass through, as the

which I think - but I'd have to check - doesn't include any of the
non-linear cost optimizations.


> Is there a rationale for, or other examples to show, that it makes sense
> for the default value of jit_optimize_above_cost to be 5 fold higher than
> the default setting of jit_above_cost?

Yes. IIRC even tpc-h or something shows that with small scale one does
get noticable - but not crazy - speedups with unoptimized code, but that
it's a loss with optimized code.

Greetings,

Andres Freund




Re: BUG #15977: Inconsistent behavior in chained transactions

2019-09-04 Thread fn ln
I made another patch for suggested behavior (COMMIT/ROLLBACK AND CHAIN now
gives us an error when used in an implicit block).

2019年9月4日(水) 16:53 Andres Freund :

> Hi,
>
> On 2019-09-03 11:54:57 +0200, Peter Eisentraut wrote:
> > On 2019-08-29 16:58, Fabien COELHO wrote:
> > >
> > >> Thanks, I got it. I have never made a patch before so I'll keep it in
> my
> > >> mind. Self-contained patch is now attached.
> > >
> > > v3 applies, compiles, "make check" ok.
> > >
> > > I turned it ready on the app.
>
> I don't think is quite sufficient. Note that the same problem also
> exists for commits, one just needs force the commit to be part of a
> multi-statement implicit transaction (i.e. a simple protocol exec /
> PQexec(), with multiple statements).
>
> E.g.:
>
> postgres[32545][1]=# ROLLBACK;
> WARNING:  25P01: there is no transaction in progress
> LOCATION:  UserAbortTransactionBlock, xact.c:3914
> ROLLBACK
> Time: 0.790 ms
> postgres[32545][1]=# SELECT 1\;COMMIT AND CHAIN;
> WARNING:  25P01: there is no transaction in progress
> LOCATION:  EndTransactionBlock, xact.c:3728
> COMMIT
> Time: 0.945 ms
> postgres[32545][1]*=# COMMIT ;
> COMMIT
> Time: 0.539 ms
>
> the \; bit forces psql to not split command into two separate protocol
> level commands, but to submit them together.
>
>
> > Should we make it an error instead of a warning?
>
> Yea, I think for AND CHAIN we have to either error, or always start a
> new transaction. I can see arguments for both, as long as it's
> consistent.
>
> The historical behaviour of issuing only WARNINGS when issuing COMMIT or
> ROLLBACK outside of an explicit transaction seems to weigh in favor of
> not erroring. Given that the result of such a transaction command is not
> an error, the AND CHAIN portion should work.
>
> Additionally, we actually have COMMIT; ROLLBACK; PREPARE TRANSACTION all
> work meaningfully for implicit transactions. E.g.:
>
> postgres[32545][1]=# CREATE TABLE test()\; PREPARE TRANSACTION 'frak';
> WARNING:  25P01: there is no transaction in progress
> LOCATION:  EndTransactionBlock, xact.c:3728
> PREPARE TRANSACTION
> Time: 15.094 ms
> postgres[32545][1]=# \d test
> Did not find any relation named "test".
> postgres[32545][1]=# COMMIT PREPARED 'frak';
> COMMIT PREPARED
> Time: 4.727 ms
> postgres[32545][1]=# \d test
>Table "public.test"
> ┌┬──┬───┬──┬─┐
> │ Column │ Type │ Collation │ Nullable │ Default │
> ├┼──┼───┼──┼─┤
> └┴──┴───┴──┴─┘
>
>
> The argument in the other direction is that not erroring out hides bugs,
> like an accidentally already committed transaction (which is
> particularly bad for ROLLBACK). We can't easily change that for plain
> COMMIT/ROLLBACK due to backward compat concerns, but for COMMIT|ROLLBACK
> AND CHAIN there's no such such concern.
>
> I think there's an argument that we ought to behave differently for
> COMMIT/ROLLBACK/PREPARE in implicit transactions where multiple commands
> exist, and ones where that's not the case. Given that they all actually
> have an effect if there's a preceding statement in the implicit
> transaction, the WARNING doesn't actually seem that appropriate?
>
> There's some arguments that it's sometimes useful to be able to force
> committing an implicit transaction. Consider e.g. executing batch schema
> modifications with some sensitivity to latency (and thus wanting to use
> reduce latency by executing multiple statements via one protocol
> message). There's a few cases where committing earlier is quite useful
> in that scenario, e.g.:
>
> CREATE TYPE test_enum AS ENUM ('a', 'b', 'c');
> CREATE TABLE uses_test_enum(v test_enum);
>
> without explicit commit:
>
> postgres[32545][1]=# ALTER TYPE test_enum ADD VALUE 'd'\;INSERT INTO
> uses_test_enum VALUES('d');
> ERROR:  55P04: unsafe use of new value "d" of enum type test_enum
> LINE 1: ...t_enum ADD VALUE 'd';INSERT INTO uses_test_enum VALUES('d');
>   ^
> HINT:  New enum values must be committed before they can be used.
> LOCATION:  check_safe_enum_use, enum.c:98
>
>
> with explicit commit:
>
> postgres[32545][1]=# ALTER TYPE test_enum ADD VALUE 'd'\;COMMIT\;INSERT
> INTO uses_test_enum VALUES('d');
> WARNING:  25P01: there is no transaction in progress
> LOCATION:  EndTransactionBlock, xact.c:3728
> INSERT 0 1
>
> There's also the case that one might want to batch execute statements,
> but not have to redo them if a later command fails.
>
> Greetings,
>
> Andres Freund
>


disable-implicit-transaction-chaining-v4.patch
Description: Binary data


Re: block-level incremental backup

2019-09-04 Thread Robert Haas
On Tue, Sep 3, 2019 at 12:46 PM Ibrar Ahmed  wrote:
> I did that and have experience working on the TAR format.  I was curious 
> about what
> "best/newest" you are talking.

Well, why not go look it up?

On my MacBook, tar is documented to understand three different tar
formats: gnutar, ustar, and v7, and two sets of extensions to the tar
format: numeric extensions required by POSIX, and Solaris extensions.
It also understands the pax and restricted-pax formats which are
derived from the ustar format.  I don't know what your system
supports, but it's probably not hugely different; the fact that there
are multiple tar formats has been documented in the tar man page on
every machine I've checked for the past 20 years.  Here, 'man tar'
refers the reader to 'man libarchive-formats', which contains the
details mentioned above.

A quick Google search for 'multiple tar formats' also finds
https://en.wikipedia.org/wiki/Tar_(computing)#File_format and
https://www.gnu.org/software/tar/manual/html_chapter/tar_8.html each
of which explains a good deal of the complexity in this area.

I don't really understand why I have to explain to you what I mean
when I say there are multiple tar formats when you can look it up on
Google and find that there are multiple tar formats.  Again, the point
is that the current code only generates tar archives and therefore
only needs to generate one format, but if we add code that reads a tar
archive, it probably needs to read several formats, because there are
several formats that are popular enough to be widely-supported.

It's possible that somebody else here knows more about this topic and
could make better judgements than I can, but my view at present is
that if we want to read tar archives, we probably would want to do it
by depending on libarchive.  And I don't think we should do that for
this project because I don't think it would provide much value.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Default JIT setting in V12

2019-09-04 Thread Jeff Janes
Since JIT is on by default in v12, I wanted to revisit the issue raised in
https://www.postgresql.org/message-id/CAMkU=1zVhQ5k5d=YyHNyrigLUNTkOj4=YB17s9--3ts8H-SO=q...@mail.gmail.com

When the total estimated cost is between jit_above_cost and
jit_optimize_above_cost, I get a substantial regression in the attached.
Note that I did not devise this case specifically to cause this problem, I
just stumbled upon it.

JIT, no optimization: 10.5s
JIT, optimization: 3.8s
no JIT:  4.1s

It seems like the unoptimized JIT code is much worse than the general
purpose code.

This is on AWS c4.large, Ubuntu 18.04, installed from PGDG apt repository.
No config changes were made, other than the local ones included in the
script. (Previously there were questions about how LLVM was configured,
that I couldn't really answer well, but here there should be question as I
didn't compile or configure it at all.)

There were some proposed mitigations in sister threads, but none have been
adopted in v12.

I think it is intuitive, and with empirical evidence, that we do not want
to JIT compile at all unless we are going to optimize the compiled code.

Is there a rationale for, or other examples to show, that it makes sense
for the default value of jit_optimize_above_cost to be 5 fold higher than
the default setting of jit_above_cost?

I think these defaults are setting a trap for our users who aren't really
interested in JIT, and are just upgrading to stay on the most-current
version.  I would propose lowering the default jit_optimize_above_cost to
be the same as jit_above_cost, or set it to 0 so that  jit_above_cost is
always in control and always optimizes.

Cheers,

Jeff
drop table if exists i200c200;
create table i200c200 ( pk bigint primary key, 
int1 bigint,
int2 bigint,
int3 bigint,
int4 bigint,
int5 bigint,
int6 bigint,
int7 bigint,
int8 bigint,
int9 bigint,
int10 bigint,
int11 bigint,
int12 bigint,
int13 bigint,
int14 bigint,
int15 bigint,
int16 bigint,
int17 bigint,
int18 bigint,
int19 bigint,
int20 bigint,
int21 bigint,
int22 bigint,
int23 bigint,
int24 bigint,
int25 bigint,
int26 bigint,
int27 bigint,
int28 bigint,
int29 bigint,
int30 bigint,
int31 bigint,
int32 bigint,
int33 bigint,
int34 bigint,
int35 bigint,
int36 bigint,
int37 bigint,
int38 bigint,
int39 bigint,
int40 bigint,
int41 bigint,
int42 bigint,
int43 bigint,
int44 bigint,
int45 bigint,
int46 bigint,
int47 bigint,
int48 bigint,
int49 bigint,
int50 bigint,
int51 bigint,
int52 bigint,
int53 bigint,
int54 bigint,
int55 bigint,
int56 bigint,
int57 bigint,
int58 bigint,
int59 bigint,
int60 bigint,
int61 bigint,
int62 bigint,
int63 bigint,
int64 bigint,
int65 bigint,
int66 bigint,
int67 bigint,
int68 bigint,
int69 bigint,
int70 bigint,
int71 bigint,
int72 bigint,
int73 bigint,
int74 bigint,
int75 bigint,
int76 bigint,
int77 bigint,
int78 bigint,
int79 bigint,
int80 bigint,
int81 bigint,
int82 bigint,
int83 bigint,
int84 bigint,
int85 bigint,
int86 bigint,
int87 bigint,
int88 bigint,
int89 bigint,
int90 bigint,
int91 bigint,
int92 bigint,
int93 bigint,
int94 bigint,
int95 bigint,
int96 bigint,
int97 bigint,
int98 bigint,
int99 bigint,
int100 bigint,
int101 bigint,
int102 bigint,
int103 bigint,
int104 bigint,
int105 bigint,
int106 bigint,
int107 bigint,
int108 bigint,
int109 bigint,
int110 bigint,
int111 bigint,
int112 bigint,
int113 bigint,
int114 bigint,
int115 bigint,
int116 bigint,
int117 bigint,
int118 bigint,
int119 bigint,
int120 bigint,
int121 bigint,
int122 bigint,
int123 bigint,
int124 bigint,
int125 bigint,
int126 bigint,
int127 bigint,
int128 bigint,
int129 bigint,
int130 bigint,
int131 bigint,
int132 bigint,
int133 bigint,
int134 bigint,
int135 bigint,
int136 bigint,
int137 bigint,
int138 bigint,
int139 bigint,
int140 bigint,
int141 bigint,
int142 bigint,
int143 bigint,
int144 bigint,
int145 bigint,
int146 bigint,
int147 bigint,
int148 bigint,
int149 bigint,
int150 bigint,
int151 bigint,
int152 bigint,
int153 bigint,
int154 bigint,
int155 bigint,
int156 bigint,
int157 bigint,
int158 bigint,
int159 bigint,
int160 bigint,
int161 bigint,
int162 bigint,
int163 bigint,
int164 bigint,
int165 bigint,
int166 bigint,
int167 bigint,
int168 bigint,
int169 bigint,
int170 bigint,
int171 bigint,
int172 bigint,
int173 bigint,
int174 bigint,
int175 bigint,
int176 bigint,
int177 bigint,
int178 bigint,
int179 bigint,
int180 bigint,
int181 bigint,
int182 bigint,
int183 bigint,
int184 bigint,
int185 bigint,
int186 bigint,
int187 bigint,
int188 bigint,
int189 bigint,
int190 bigint,
int191 bigint,
int192 bigint,
int193 bigint,
int194 bigint,
int195 bigint,
int196 bigint,
int197 bigint,
int198 bigint,
int199 bigint,
int200 bigint,
char1 varchar(255),
char2 varchar(255),
char3 varchar(255),
char4 varchar(255),
char5 varchar(255),
char6 varchar(255),
char7 varchar(255),
char8 varchar(255),
char9 varchar(255),
char10 varchar(255),
char11 varchar(255),
char12 varchar(255),
char13 varchar(255),
char14 varchar(255),
char15 varchar(255),
char16 varchar(255),
char17 varchar(255),

Re: Replication & recovery_min_apply_delay

2019-09-04 Thread Konstantin Knizhnik




On 04.09.2019 1:22, Alvaro Herrera wrote:

On 2019-Aug-02, Michael Paquier wrote:


On Wed, Jul 31, 2019 at 04:43:26PM -0400, Alvaro Herrera wrote:

As for the test module, the one I submitted takes a lot of time to run
(well, 60s) and I don't think it's a good idea to include it as
something to be run all the time by every buildfarm member.  I'm not
sure that there's a leaner way to test for this bug, though, but
certainly it'd be a good idea to ensure that this continues to work.

Hmmm.  Instead of that, wouldn't it be cleaner to maintain in the
context of the startup process a marker similar to receivedUpto for
the last LSN?  The issue with this one is that it gets reset easily so
we would lose track of it easily, and we need also to count with the
case where a WAL receiver is not started.  So I think that we should
do that as a last replayed or received LSN if a WAL receiver is up and
running, whichever is newer.  Splitting the WAL receiver restart logic
into a separate routine is a good idea in itself, the patch attempting
to switch primary_conninfo to be reloadable could make use of that.

Konstantin, any interest in trying this?


Sorry, I do not understand this proposal.

> and we need also to count with the case where a WAL receiver is not 
started.


May be i missed something, but what this patch is trying to achieve is 
to launch WAL receiver before already received transactions are applied.

So definitely WAL receiver is not started at this moment.

receivedUpto is just static variable in xlog.c, maintained by WAL receiver.
But as I mentioned above, WAL receiver is not started at the moment when 
we need to know LSN of last record.


Certainly it should be possible to somehow persist receveidUpto, so we 
do not need to scan WAL to determine the last LSN at next start.

By persisting last LSN introduce a lot of questions and problems.
For example when it needs to be flushed for the disk. If it is done 
after each received transaction, then it can significantly suffer 
performance.
If it is done more or less asynchronously, then there us a risk that we 
requested streaming with wrong position.
In any case it will significantly complicate the patch and make it more 
sensible for various errors.


I wonder what is wrong with determining LSN of last record by just 
scanning WAL?
Certainly it is not the most efficient way. But I do not expect that 
somebody will have hundreds or thousands megabytes of WAL.
Michael, do you see some other problems with GetLastLSN() functions 
except time of its execution?


IMHO one of the ,ani advantages of this patch is that it is very simple.
We need to scan WAL to locate last LSN only if recovery_min_apply_delay 
is set.

So this patch should not affect performance of all other cases.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: [HACKERS] CLUSTER command progress monitor

2019-09-04 Thread Robert Haas
On Tue, Sep 3, 2019 at 1:02 AM Masahiko Sawada  wrote:
> After more thought, even if we don't start a new command progress when
> there is another one already started the progress update functions
> could be called and these functions don't specify the command type.
> Therefore, the progress information might be changed with wrong value
> by different command. Probably we can change the caller of progress
> updating function so that it doesn't call all of them if the command
> could not start a new progress report, but it might be a big change.
>
> As an alternative idea, we can make pgstat_progress_end_command() have
> one argument that is command the caller wants to end. That is, we
> don't end the command progress when the specified command type doesn't
> match to the command type of current running command progress. We
> unconditionally clear the progress information at CommitTransaction()
> and AbortTransaction() but we can do that by passing
> PROGRESS_COMMAND_INVALID to pgstat_progress_end_command().

I think this is all going in the wrong direction.  It's the
responsibility of the people who are calling the pgstat_progress_*
functions to only do so when it's appropriate.  Having the
pgstat_progress_* functions try to untangle whether or not they ought
to ignore calls made to them is backwards.

It seems to me that the problem here can be summarized in this way:
there's a bunch of code reuse in the relevant paths here, and somebody
wasn't careful enough when adding progress reporting for one of the
new commands, and so now things are broken, because e.g. CLUSTER
progress reporting gets ended by a pgstat_progress_end_command() call
that was intended for some other utility command but is reached in the
CLUSTER case anyway.  The solution to that problem in my book is to
figure out which commit broke it, and then the person who made that
commit either needs to fix it or revert it.

It's quite possible here that we need a bigger redesign to make adding
progress reporting for new command easier and less prone to bugs, but
I don't think it's at all clear what such a redesign should look like
or even that we definitely need one, and September is not the right
time to be redesigning features for the pending release.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: row filtering for logical replication

2019-09-04 Thread Euler Taveira
Em ter, 3 de set de 2019 às 00:32, Euler Taveira
 escreveu:
>
> Ops... exact. That was an oversight while poking with different types of 
> slots.
>
Here is a rebased version including this small fix.


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
From 92474dd8e15d58e253d5b5aa76348d8973bf6d04 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Fri, 9 Mar 2018 17:37:36 +
Subject: [PATCH 2/8] Store number of tuples in WalRcvExecResult

It seems to be a useful information while allocating memory for queries
that returns more than one row. It reduces memory allocation
for initial table synchronization.
---
 src/backend/replication/libpqwalreceiver/libpqwalreceiver.c | 5 +++--
 src/backend/replication/logical/tablesync.c | 5 ++---
 src/include/replication/walreceiver.h   | 1 +
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 6eba08a920..343550a335 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -878,6 +878,7 @@ libpqrcv_processTuples(PGresult *pgres, WalRcvExecResult *walres,
  errdetail("Expected %d fields, got %d fields.",
 		   nRetTypes, nfields)));
 
+	walres->ntuples = PQntuples(pgres);
 	walres->tuplestore = tuplestore_begin_heap(true, false, work_mem);
 
 	/* Create tuple descriptor corresponding to expected result. */
@@ -888,7 +889,7 @@ libpqrcv_processTuples(PGresult *pgres, WalRcvExecResult *walres,
 	attinmeta = TupleDescGetAttInMetadata(walres->tupledesc);
 
 	/* No point in doing more here if there were no tuples returned. */
-	if (PQntuples(pgres) == 0)
+	if (walres->ntuples == 0)
 		return;
 
 	/* Create temporary context for local allocations. */
@@ -897,7 +898,7 @@ libpqrcv_processTuples(PGresult *pgres, WalRcvExecResult *walres,
 	   ALLOCSET_DEFAULT_SIZES);
 
 	/* Process returned rows. */
-	for (tupn = 0; tupn < PQntuples(pgres); tupn++)
+	for (tupn = 0; tupn < walres->ntuples; tupn++)
 	{
 		char	   *cstrs[MaxTupleAttributeNumber];
 
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 0a565dd837..42db4ada9e 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -709,9 +709,8 @@ fetch_remote_table_info(char *nspname, char *relname,
 (errmsg("could not fetch table info for table \"%s.%s\": %s",
 		nspname, relname, res->err)));
 
-	/* We don't know the number of rows coming, so allocate enough space. */
-	lrel->attnames = palloc0(MaxTupleAttributeNumber * sizeof(char *));
-	lrel->atttyps = palloc0(MaxTupleAttributeNumber * sizeof(Oid));
+	lrel->attnames = palloc0(res->ntuples * sizeof(char *));
+	lrel->atttyps = palloc0(res->ntuples * sizeof(Oid));
 	lrel->attkeys = NULL;
 
 	natt = 0;
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index e12a934966..0d32d598d8 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -196,6 +196,7 @@ typedef struct WalRcvExecResult
 	char	   *err;
 	Tuplestorestate *tuplestore;
 	TupleDesc	tupledesc;
+	int			ntuples;
 } WalRcvExecResult;
 
 /* libpqwalreceiver hooks */
-- 
2.11.0

From b8a8d98368ba032670788ac4f4b4ef666a4dedd0 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Tue, 27 Feb 2018 02:21:03 +
Subject: [PATCH 3/8] Refactor function create_estate_for_relation

Relation localrel is the only LogicalRepRelMapEntry structure member
that is useful for create_estate_for_relation.
---
 src/backend/replication/logical/worker.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 11e6331f49..d9952c8b7e 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -173,7 +173,7 @@ ensure_transaction(void)
  * This is based on similar code in copy.c
  */
 static EState *
-create_estate_for_relation(LogicalRepRelMapEntry *rel)
+create_estate_for_relation(Relation rel)
 {
 	EState	   *estate;
 	ResultRelInfo *resultRelInfo;
@@ -183,13 +183,13 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 
 	rte = makeNode(RangeTblEntry);
 	rte->rtekind = RTE_RELATION;
-	rte->relid = RelationGetRelid(rel->localrel);
-	rte->relkind = rel->localrel->rd_rel->relkind;
+	rte->relid = RelationGetRelid(rel);
+	rte->relkind = rel->rd_rel->relkind;
 	rte->rellockmode = AccessShareLock;
 	ExecInitRangeTable(estate, list_make1(rte));
 
 	resultRelInfo = makeNode(ResultRelInfo);
-	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
+	InitResultRelInfo(resultRelInfo, rel, 1, NULL, 0);
 
 	

Re: pglz performance

2019-09-04 Thread Andrey Borodin



> 4 сент. 2019 г., в 17:40, Peter Eisentraut  
> написал(а):
> 
> On 2019-09-04 11:22, Andrey Borodin wrote:
>>> What about the two patches?  Which one is better?
>> On our observations pglz_decompress_hacked.patch is best for most of tested 
>> platforms.
>> Difference is that pglz_decompress_hacked8.patch will not appply 
>> optimization if decompressed match is not greater than 8 bytes. This 
>> optimization was suggested by Tom, that's why we benchmarked it specifically.
> 
> The patches attached to the message I was replying to are named
> 
> 0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch
> 0001-Use-memcpy-in-pglz-decompression.patch
> 
> Are those the same ones?

Yes. Sorry for this confusion.

The only difference of 
0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch is that it 
fallbacks to byte-loop if len is <= 8.

Best regards, Andrey Borodin.



Re: pglz performance

2019-09-04 Thread Peter Eisentraut
On 2019-09-04 11:22, Andrey Borodin wrote:
>> What about the two patches?  Which one is better?
> On our observations pglz_decompress_hacked.patch is best for most of tested 
> platforms.
> Difference is that pglz_decompress_hacked8.patch will not appply optimization 
> if decompressed match is not greater than 8 bytes. This optimization was 
> suggested by Tom, that's why we benchmarked it specifically.

The patches attached to the message I was replying to are named

0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch
0001-Use-memcpy-in-pglz-decompression.patch

Are those the same ones?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Plug-in common/logging.h with vacuumlo and oid2name

2019-09-04 Thread Peter Eisentraut
On 2019-09-04 14:17, Michael Paquier wrote:
>> progname can probably be made into a local variable now.
> 
> Can it?  vacuumlo() still uses the progname from _param for the
> connection string.

Yeah, probably best to leave it as is for now.

>> Do we need set_pglocale_pgservice() calls here if we're not doing NLS?
>> Does the logging stuff require it?  I'm not sure.
> 
> The logging part does not require it, but this can be used for
> PGSYSCONFDIR, no?

How does PGSYSCONFDIR come into play here?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Plug-in common/logging.h with vacuumlo and oid2name

2019-09-04 Thread Michael Paquier
On Wed, Sep 04, 2019 at 10:17:57AM +0200, Peter Eisentraut wrote:
> On 2019-08-20 03:28, Michael Paquier wrote:
> > +   pg_log_error("\nfailed to remove lo %u: 
> > %s", lo,
> > +
> > PQerrorMessage(conn));
> 
> This newline should be removed.

Thanks, missed that.

> progname can probably be made into a local variable now.

Can it?  vacuumlo() still uses the progname from _param for the
connection string.

> Do we need set_pglocale_pgservice() calls here if we're not doing NLS?
> Does the logging stuff require it?  I'm not sure.

The logging part does not require it, but this can be used for
PGSYSCONFDIR, no?
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] CLUSTER command progress monitor

2019-09-04 Thread Masahiko Sawada
On Wed, Sep 4, 2019 at 3:48 PM Michael Paquier  wrote:
>
> On Tue, Sep 03, 2019 at 02:52:28PM +0900, Michael Paquier wrote:
> > Indeed, good catch.  This is wrong since b6fb647 which has introduced
> > the progress reports.  I'll fix that one and back-patch if there are
> > no objections.
>
> OK, applied this part down to 9.6.

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center




Re: REINDEX filtering in the backend

2019-09-04 Thread Peter Eisentraut
On 2019-08-30 02:10, Michael Paquier wrote:
> On Thu, Aug 29, 2019 at 10:52:55AM +0200, Julien Rouhaud wrote:
>> That was already suggested by Thomas and seconded by Peter E., see
>> https://www.postgresql.org/message-id/2b1504ac-3d6c-11ec-e1ce-3daf132b3d37%402ndquadrant.com.
>>
>> I personally think that it's a sensible approach, and I'll be happy to
>> propose a patch for that too if no one worked on this yet.
> 
> That may be interesting to sort out first then because we'd likely
> want to know what is first in the catalogs before designing the
> filtering processing looking at it, no?

Right.  We should aim to get per-object collation version tracking done.
 And then we might want to have a REINDEX variant that processes exactly
those indexes with an out-of-date version number -- and then updates
that version number once the reindexing is done.  I think that project
is achievable for PG13.

I think we can keep this present patch in our back pocket for, say, the
last commit fest if we don't make sufficient progress on those other
things.  Right now, it's hard to review because the target is moving,
and it's unclear what guidance to give users.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: block-level incremental backup

2019-09-04 Thread Dilip Kumar
On Tue, Sep 3, 2019 at 12:11 PM Dilip Kumar  wrote:
>
> On Fri, Aug 16, 2019 at 3:54 PM Jeevan Chalke
>  wrote:
> >
> 0003:
> +/*
> + * When to send the whole file, % blocks modified (90%)
> + */
> +#define WHOLE_FILE_THRESHOLD 0.9
>
> How this threshold is selected.  Is it by some test?
>
>
> - magic number, currently 0 (4 bytes)
> I think in the patch we are using  (#define INCREMENTAL_BACKUP_MAGIC
> 0x494E4352) as a magic number, not 0
>
>
> + Assert(statbuf->st_size <= (RELSEG_SIZE * BLCKSZ));
> +
> + buf = (char *) malloc(statbuf->st_size);
> + if (buf == NULL)
> + ereport(ERROR,
> + (errcode(ERRCODE_OUT_OF_MEMORY),
> + errmsg("out of memory")));
> +
> + if ((cnt = fread(buf, 1, statbuf->st_size, fp)) > 0)
> + {
> + Bitmapset  *mod_blocks = NULL;
> + int nmodblocks = 0;
> +
> + if (cnt % BLCKSZ != 0)
> + {
>
> It will be good to add some comments for the if block and also for the
> assert. Actully, it's not very clear from the code.
>
> 0004:
> +#include 
> +#include 
> +#include 
> Header file include order (sys/state.h should be before time.h)
>
>
>
> + printf(_("%s combines full backup with incremental backup.\n\n"), progname);
> /backup/backups
>
>
> + * scan_file
> + *
> + * Checks whether given file is partial file or not.  If partial, then 
> combines
> + * it into a full backup file, else copies as is to the output directory.
> + */
>
> /If partial, then combines/ If partial, then combine
>
>
>
> +static void
> +combine_partial_files(const char *fn, char **IncrDirs, int nIncrDir,
> +   const char *subdirpath, const char *outfn)
> + /*
> + * Open all files from all incremental backup directories and create a file
> + * map.
> + */
> + basefilefound = false;
> + for (i = (nIncrDir - 1), fmindex = 0; i >= 0; i--, fmindex++)
> + {
> + fm = [fmindex];
> +
> .
> + }
> +
> +
> + /* Process all opened files. */
> + lastblkno = 0;
> + modifiedblockfound = false;
> + for (i = 0; i < fmindex; i++)
> + {
> + char*buf;
> + int hsize;
> + int k;
> + int blkstartoffset;
> ..
> + }
> +
> + for (i = 0; i <= lastblkno; i++)
> + {
> + char blkdata[BLCKSZ];
> + FILE*infp;
> + int offset;
> ...
> + }
> }
>
> Can we breakdown this function in 2-3 functions.  At least creating a
> file map can directly go to a separate function.
>
> I have read 0003 and 0004 patch and there are few cosmetic comments.
>
 I have not yet completed the review for 0004, but I have few more
comments.  Tomorrow I will try to complete the review and some testing
as well.

1. It seems that the output full backup generated with
pg_combinebackup also contains the "INCREMENTAL BACKUP REFERENCE WAL
LOCATION".  It seems confusing
because now this is a full backup, not the incremental backup.

2.
+ FILE*outfp;
+ FileOffset outblocks[RELSEG_SIZE];
+ int i;
+ FileMap*filemaps;
+ int fmindex;
+ bool basefilefound;
+ bool modifiedblockfound;
+ uint32 lastblkno;
+ FileMap*fm;
+ struct stat statbuf;
+ uint32 nblocks;
+
+ memset(outblocks, 0, sizeof(FileOffset) * RELSEG_SIZE);

I don't think you need to memset this explicitly as you can initialize
the array itself no?
FileOffset outblocks[RELSEG_SIZE] = {{0}}

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Unexpected "shared memory block is still in use"

2019-09-04 Thread Peter Eisentraut
I agree with this patch and the reasons for it.

A related point, perhaps we should change the key printed into
postmaster.pid to be in hexadecimal format ("0x08x") so that it matches
what ipcs prints.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: fix for BUG #3720: wrong results at using ltree

2019-09-04 Thread Ibrar Ahmed
On Tue, Jul 16, 2019 at 8:52 PM Nikita Glukhov 
wrote:

>
> On 09.07.2019 17:57, Oleg Bartunov wrote:
>
> On Mon, Jul 8, 2019 at 7:22 AM Thomas Munro  
>  wrote:
>
> On Sun, Apr 7, 2019 at 3:46 AM Tom Lane  
>  wrote:
>
> =?UTF-8?Q?Filip_Rembia=C5=82kowski?=  
>  writes:
>
> Here is my attempt to fix a 12-years old ltree bug (which is a todo item).
> I see it's not backward-compatible, but in my understanding that's
> what is documented. Previous behavior was inconsistent with
> documentation (where single asterisk should match zero or more
> labels).http://archives.postgresql.org/pgsql-bugs/2007-11/msg00044.php
>
> [...]
>
>
> In short, I'm wondering if we should treat this as a documentation
> bug not a code bug.  But to do that, we'd need a more accurate
> description of what the code is supposed to do, because the statement
> quoted above is certainly not a match to the actual behavior.
>
> This patch doesn't apply.  More importantly, it seems like we don't
> have a consensus on whether we want it.
>
> Teodor, Oleg, would you like to offer an opinion here?  If I
> understand correctly, the choices are doc change, code/comment change
> or WONT_FIX.  This seems to be an entry that we can bring to a
> conclusion in this CF with some input from the ltree experts.
>
> We are currently very busy and will look at the problem (and dig into
> our memory) later.  There is also another ltree patch
> (https://commitfest.postgresql.org/23/1977/), it would be nice if
> Filip try it.
>
> I looked at "ltree syntax improvement" patch and found two more very
> old bugs in ltree/lquery (fixes are attached):
>
> 1.  ltree/lquery level counter overflow is wrongly checked:
>
> SELECT nlevel((repeat('a.', 65534) || 'a')::ltree);
>  nlevel
> 
>   65535
> (1 row)
>
> -- expected 65536 or error
> SELECT nlevel((repeat('a.', 65535) || 'a')::ltree);
>  nlevel
> 
>   0
> (1 row)
>
> -- expected 65537 or error
> SELECT nlevel((repeat('a.', 65536) || 'a')::ltree);
>  nlevel
> 
>   1
> (1 row)
>
> -- expected 'a...' or error
> SELECT (repeat('a.', 65535) || 'a')::ltree;
>  ltree
> ---
>
> (1 row)
>
> -- expected 'a...' or error
> SELECT (repeat('a.', 65536) || 'a')::ltree;
>  ltree
> ---
>  a
> (1 row)
>
>
> 2.  '*{a}.*{b}.*{c}' is not equivalent to '*{a+b+c}' (as I expect):
>
> SELECT ltree '1.2' ~ '*{2}';
>  ?column?
> --
>  t
> (1 row)
>
> -- expected true
> SELECT ltree '1.2' ~ '*{1}.*{1}';
>  ?column?
> --
>  f
> (1 row)
>
>
> Maybe these two bugs need a separate thread?
>
>
> Please create separate commitfest entry.



> --
> Nikita Glukhov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
>

-- 
Ibrar Ahmed


Re: fix for BUG #3720: wrong results at using ltree

2019-09-04 Thread Ibrar Ahmed
Please create separate commitfest entry.

Re: refactoring - share str2*int64 functions

2019-09-04 Thread Fabien COELHO


Bonjour Michaël,


Attached a rebased version which implements the int64/uint64 stuff. It is
basically the previous patch without the overflow inlined functions.


- if (!strtoint64(yytext, true, >ival))
+ if (unlikely(pg_strtoint64(yytext, >ival) != PG_STRTOINT_OK))



It seems to me that we should have unlikely() only within
pg_strtoint64(), pg_strtouint64(), etc.


From a compiler perspective, the (un)likely tip is potentially useful on 
any test. We know when parsing a that it is very unlikely that the string 
conversion would fail, so we can tell that, so that the compiler knows 
which branch it should optimize first.


You can argue against that if the functions are inlined, because maybe the 
compiler would propagate the information, but for distinct functions 
compiled separately the information is useful at each level.



-   /* skip leading spaces; cast is consistent with strtoint64 */
-   while (*ptr && isspace((unsigned char) *ptr))



+   /* skip leading spaces; cast is consistent with pg_strtoint64 */
+   while (isspace((unsigned char) *ptr))
   ptr++;



What do you think about splitting this part in two?  I would suggest
to do the refactoring in a first patch, and the consider all the
optimizations for the routines you have in mind afterwards.


I would not bother.


I think that we don't actually need is_an_int() and str2int64() at all
in pgbench.c as we could just check for the return code of
pg_strtoint64() and switch to the double parsing only if we don't have
PG_STRTOINT_OK.


Yep, you are right, now that the conversion functions does not error out a 
message, its failure can be used as a test.


The version attached changes slightly the semantics, because on int 
overflows a double conversion will be attempted instead of erroring. I do 
not think that it is worth the effort of preserving the previous semantic 
of erroring.



scanint8() only has one caller in the backend with your patch, and we
don't check after its return result in int8.c, so I would suggest to
move the error handling directly in int8in() and to remove scanint8().


Ok.


I think that we should also introduce the [u]int[16|32] flavors and
expand them in the code in a single patch, in a way consistent with
what you have done for int64/uint64 as the state that we reach with
the patch is kind of weird as there are existing versions numutils.c.


Before dealing with the 16/32 versions, which involve quite a significant 
amount of changes, I would want a clear message that "the 64 bit approach" 
is the model to follow.


Moreover, I'm unsure how to rename the existing pg_strtoint32 and others
which call ereport, if the name is used for the common error returning 
version.



Have you done some performance testing of the patch?  The COPY
bulkload is a good example of that:
https://www.postgresql.org/message-id/20190717181428.krqpmduejbqr4m6g%40alap3.anarazel.de


I have done no such thing for now.

I would not expect any significant performance difference when loading 
int8 things because basically scanint8 has just been renamed pg_strtoint64 
and made global, and that is more or less all. It might be a little bit 
slower because possible the compiler cannot inline the conversion, but on 
the other hand, the *likely hints and removed tests may marginaly help 
performance. I think that the only way to test performance significantly 
would be to write a specific program that loops over a conversion.


--
Fabien.diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 221b47298c..28891ba337 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -79,6 +79,7 @@
 #include "utils/builtins.h"
 #include "utils/hashutils.h"
 #include "utils/memutils.h"
+#include "common/string.h"
 
 PG_MODULE_MAGIC;
 
@@ -1022,7 +1023,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		/* parse command tag to retrieve the number of affected rows. */
 		if (completionTag &&
 			strncmp(completionTag, "COPY ", 5) == 0)
-			rows = pg_strtouint64(completionTag + 5, NULL, 10);
+			(void) pg_strtouint64(completionTag + 5, );
 		else
 			rows = 0;
 
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 2c0ae395ba..8e75d52b06 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -21,6 +21,7 @@
 #include "catalog/heap.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "common/string.h"
 #include "executor/executor.h"
 #include "executor/spi_priv.h"
 #include "miscadmin.h"
@@ -2338,8 +2339,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 	CreateTableAsStmt *ctastmt = (CreateTableAsStmt *) stmt->utilityStmt;
 
 	if (strncmp(completionTag, "SELECT ", 7) == 0)
-		_SPI_current->processed =
-			pg_strtouint64(completionTag + 7, NULL, 10);
+		(void) pg_strtouint64(completionTag + 7, &_SPI_current->processed);
 

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-09-04 Thread Rafia Sabih
On Tue, 30 Jul 2019 at 02:17, Tomas Vondra 
wrote:

> On Sun, Jul 21, 2019 at 01:34:22PM +0200, Tomas Vondra wrote:
> >
> > ...
> >
> >I wonder if we're approaching this wrong. Maybe we should not reverse
> >engineer queries for the various places, but just start with a set of
> >queries that we want to optimize, and then identify which places in the
> >planner need to be modified.
> >
>
> I've decided to do a couple of experiments, trying to make my mind about
> which modified places matter to diffrent queries. But instead of trying
> to reverse engineer the queries, I've taken a different approach - I've
> compiled a list of queries that I think are sensible and relevant, and
> then planned them with incremental sort enabled in different places.
>
> I don't have any clear conclusions at this point - it does show some of
> the places don't change plan for any of the queries, although there may
> be some additional query where it'd make a difference.
>
> But I'm posting this mostly because it might be useful. I've initially
> planned to move changes that add incremental sort paths to separate
> patches, and then apply/skip different subsets of those patches. But
> then I realized there's a better way to do this - I've added a bunch of
> GUCs, one for each such place. This allows doing this testing without
> having to rebuild repeatedly.
>
> I'm not going to post the patch(es) with extra GUCs here, because it'd
> just confuse the patch tester, but it's available here:
>
>   https://github.com/tvondra/postgres/tree/incremental-sort-20190730
>
> There are 10 GUCs, one for each place in planner where incremental sort
> paths are constructed. By default all those are set to 'false' so no
> incremental sort paths are built. If you do
>
>   SET devel_create_ordered_paths = on;
>
> it'll start creating the paths in non-parallel in create_ordered_paths.
> Then you may enable devel_create_ordered_paths_parallel to also consider
> parallel paths, etc.
>
> The list of queries (synthetic, but hopefully sufficiently realistic)
> and a couple of scripts to collect the plans is in this repository:
>
>   https://github.com/tvondra/incremental-sort-tests-2
>
> There's also a spreadsheet with a summary of results, with a visual
> representation of which GUCs affect which queries.
>
> Wow, that sounds like an elaborate experiment. But where is this
spreadsheet you mentioned ?

-- 
Regards,
Rafia Sabih


Re: Proposal: roll pg_stat_statements into core

2019-09-04 Thread Andres Freund
Hi,

On 2019-09-02 12:07:17 -0400, Tom Lane wrote:
> Euler Taveira  writes:
> > At least if pg_stat_statements
> > was in core you could load it by default and have a GUC to turn it
> > on/off without restarting the server (that was Magnus proposal and
> > Andres agreed).
> 
> That assertion is 100% bogus.  To turn it on or off on-the-fly,
> you'd need some way to acquire or release its shared memory
> on-the-fly.

Not necessarily - in practice the issue of pg_stat_statements having
overhead isn't about the memory overhead, but the CPU overhead. And that
we can nearly entirely disable/enable without acquiring / releasing
shared memory already, by setting pg_stat_statements.track to NONE.


There's still the overhead of being indirected through the pgss_* hooks,
which then have to figure out that pgss is disabled. But that's
obviously much smaller than the overhead of pgss itself.  Although I do
wonder if it's time that we make hook registration/unregistration a bit
more flexible - it's basically impossible to unregister hooks right now
because the chain of hooks is distributed over multiple files. If we
instead provided a few helper functions to register hooks we could allow
deregistering as well.


> It's probably now possible to do something like that, using the
> DSM mechanisms, but the code for it hasn't been written (AFAIK).
> And it wouldn't have much to do with whether the module was
> in core or stayed where it is.

I think it'd be good to add DSM support for pg_stat_statements - but as
you say that's largely independent of it being mainlined.


> Another concern that I have about moving pg_stat_statements
> into core is that doing so would effectively nail down
> Query.queryId as belonging to pg_stat_statements, whereas
> currently it's possible for other plugins to commandeer that
> if they wish.  This isn't academic, because of the fact that
> not everybody is satisfied with the way pg_stat_statements
> defines queryId [1].

I'm not sure I see this as that big an issue - we could have a few
different query ids for each query. If it's a problem that there's
potentially different desirable semantics for queryId, then it's also an
issue that we can have only one.

Greetings,

Andres Freund




Re: pg_get_databasebyid(oid)

2019-09-04 Thread Sergei Kornilov
Hello

Thank you for attention! I marked CF entry as returned with feedback.

regards, Sergei




Re: WIP: Data at rest encryption

2019-09-04 Thread Shawn Wang
 On Wed, 04 Sep 2019 00:56:15 +0800 Alvaro Herrera 
 wrote 



On 2019-Aug-02, Shawn Wang wrote: 



> Hi Antonin, 

> It is very glad to see the new patch. I used the public patches a long time 
> ago. 

> I did some tests like the stream replication, much data running, temporary 
> files encryption. 

> I found that there is an issue in the src/backend/storage/file/encryption.c. 
> You should put block_size = EVP_CIPHER_CTX_block_size(ctx); under the #ifdef 
> USE_ASSERT_CHECKING. 

> There is some problem to merge your patches to the latest kernel in the 
> pg_ctl.c. 



Is a new, fixed version going to be posted soon?  It's been a while. 






Also, apologies if this has been asked before, but: how does this patch 

relate to the stuff being discussed in 

https://postgr.es/m/031401d3f41d$5c70ed90$1552c8b0$@lab.ntt.co.jp ? 








Hi Álvaro,



Thank you for a reply.


I mainly said that the issues in the src/backend/storage/file/encryption.c. If 
somebody want to use these patches, I think Antonin need to fix it.

It does not relate to the stuff being discussed in TDE. As I know, some company 
use these patches to encrypt data, even if these issues don't matter.



Regards, 

--

Shawn Wang

Re: pglz performance

2019-09-04 Thread Andrey Borodin
Hi, Peter! Thanks for looking into this.

> 4 сент. 2019 г., в 14:09, Peter Eisentraut  
> написал(а):
> 
> On 2019-06-24 10:44, Andrey Borodin wrote:
>>> 18 мая 2019 г., в 11:44, Andrey Borodin  написал(а):
>>> 
>> Hi! 
>> Here's rebased version of patches.
>> 
>> Best regards, Andrey Borodin.
> 
> I think this is the most recent patch for the CF entry
> .
> 
> What about the two patches?  Which one is better?
On our observations pglz_decompress_hacked.patch is best for most of tested 
platforms.
Difference is that pglz_decompress_hacked8.patch will not appply optimization 
if decompressed match is not greater than 8 bytes. This optimization was 
suggested by Tom, that's why we benchmarked it specifically.

> Have you also considered using memmove() to deal with the overlap issue?
Yes, memmove() resolves ambiguity of copying overlapping regions in a way that 
is not compatible with pglz. In proposed patch we never copy overlapping 
regions.

> Benchmarks have been posted in this thread.  Where is the benchmarking
> tool?  Should we include that in the source somehow?

Benchmarking tool is here [0]. Well, code of the benchmarking tool do not 
adhere to our standards in some places, we did not consider its inclusion in 
core.
However, most questionable part of benchmarking is choice of test data. It's 
about 100Mb of useless WALs, datafile and valuable Shakespeare writings.

Best regards, Andrey Borodin.


[0] https://github.com/x4m/test_pglz





Re: pglz performance

2019-09-04 Thread Peter Eisentraut
On 2019-06-24 10:44, Andrey Borodin wrote:
>> 18 мая 2019 г., в 11:44, Andrey Borodin  написал(а):
>>
> Hi! 
> Here's rebased version of patches.
> 
> Best regards, Andrey Borodin.

I think this is the most recent patch for the CF entry
.

What about the two patches?  Which one is better?

Have you also considered using memmove() to deal with the overlap issue?

Benchmarks have been posted in this thread.  Where is the benchmarking
tool?  Should we include that in the source somehow?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: refactoring - share str2*int64 functions

2019-09-04 Thread Andres Freund
Hi,

On 2019-09-03 20:10:37 +0200, Fabien COELHO wrote:
> @@ -113,7 +113,7 @@ parse_output_parameters(List *options, uint32 
> *protocol_version,
>errmsg("conflicting or 
> redundant options")));
>   protocol_version_given = true;
>
> - if (!scanint8(strVal(defel->arg), true, ))
> + if (unlikely(pg_strtoint64(strVal(defel->arg), ) 
> != PG_STRTOINT_OK))
>   ereport(ERROR,
>   
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>errmsg("invalid 
> proto_version")));

Unexcited about adding unlikely() to any place that's not a hot path -
which this certainly is not.

But I also wonder if we shouldn't just put the branch likelihood of
pg_strtoint64 not failing into one central location. E.g. something like

static inline pg_strtoint_status
pg_strtoint64(const char *str, int64 *result)
{
pg_strtoint_status status;

status = pg_strtoint64_impl(str, result);

likely(status == PG_STRTOINT_OK);

return status;
}

I've not tested this, but IIRC that should be sufficient to propagate
that knowledge up to callers.


> + if (likely(stat == PG_STRTOINT_OK))
> + return true;
> + else if (stat == PG_STRTOINT_RANGE_ERROR)
>   {
> - /* could fail if input is most negative number */
> - if (unlikely(tmp == PG_INT64_MIN))
> - goto out_of_range;
> - tmp = -tmp;
> - }
> -
> - *result = tmp;
> - return true;
> -
> -out_of_range:
> - if (!errorOK)
>   ereport(ERROR,
>   (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
>errmsg("value \"%s\" is out of range for type 
> %s",
>   str, "bigint")));
> - return false;
> -
> -invalid_syntax:
> - if (!errorOK)
> + return false;
> + }
> + else if (stat == PG_STRTOINT_SYNTAX_ERROR)
> + {
>   ereport(ERROR,
>   (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
>errmsg("invalid input syntax for type %s: 
> \"%s\"",
>   "bigint", str)));
> - return false;
> + return false;
> + }
> + else
> + /* cannot get here */
> + Assert(0);

I'd write this as a switch over the enum - that way we'll get a
compile-time warning if we're not handling a value.


> +static bool
> +str2int64(const char *str, int64 *val)
> +{
> + pg_strtoint_status  stat = pg_strtoint64(str, val);
> +

I find it weird to have a wrapper that's named 'str2...' that then calls
'strto' to implement itself.


> +/*
> + * pg_strtoint64 -- convert a string to 64-bit integer
> + *
> + * The function returns if the conversion failed, or
> + * "*result" is set to the result.
> + */
> +pg_strtoint_status
> +pg_strtoint64(const char *str, int64 *result)
> +{
> + const char *ptr = str;
> + int64   tmp = 0;
> + boolneg = false;
> +
> + /*
> +  * Do our own scan, rather than relying on sscanf which might be broken
> +  * for long long.
> +  *
> +  * As INT64_MIN can't be stored as a positive 64 bit integer, accumulate
> +  * value as a negative number.
> +  */
> +
> + /* skip leading spaces */
> + while (isspace((unsigned char) *ptr))
> + ptr++;
> +
> + /* handle sign */
> + if (*ptr == '-')
> + {
> + ptr++;
> + neg = true;
> + }
> + else if (*ptr == '+')
> + ptr++;
> +
> + /* require at least one digit */
> + if (unlikely(!isdigit((unsigned char) *ptr)))
> + return PG_STRTOINT_SYNTAX_ERROR;
> +
> + /* process digits, we know that we have one ahead */
> + do
> + {
> + int64   digit = (*ptr++ - '0');
> +
> + if (unlikely(pg_mul_s64_overflow(tmp, 10, )) ||
> + unlikely(pg_sub_s64_overflow(tmp, digit, )))
> + return PG_STRTOINT_RANGE_ERROR;
> + }
> + while (isdigit((unsigned char) *ptr));

Hm. If we're concerned about the cost of isdigit etc - and I think
that's reasonable, after looking at their implementation in glibc (it
performs a lookup in a global table to handle potential locale changes)
- I think we ought to just not use the provided isdigit, and probably
not isspace either.  This code effectively relies on the input being
ascii anyway, so we don't need any locale specific behaviour afaict
(we'd e.g. get wrong results if isdigit() returned true for something
other than the ascii chars).

To me the generated code looks considerably better if I use something
like

static inline bool
pg_isdigit(char c)
{
return c >= '0' && c <= 

Re: d25ea01275 and partitionwise join

2019-09-04 Thread Richard Guo
Hi Amit,

On Wed, Sep 4, 2019 at 3:30 PM Richard Guo  wrote:

> Hi Amit,
>
> On Wed, Sep 4, 2019 at 10:01 AM Amit Langote 
> wrote:
>
>> Fujita-san,
>>
>> To avoid losing track of this, I've added this to November CF.
>>
>> https://commitfest.postgresql.org/25/2278/
>>
>> I know there is one more patch beside the partitionwise join fix, but
>> I've set the title to suggest that this is related mainly to
>> partitionwise joins.
>>
>
>  Thank you for working on this. Currently partitionwise join does not
>  take COALESCE expr into consideration when matching to partition keys.
>  This is a problem.
>
>  BTW, a rebase is needed for the patch set.
>


I'm reviewing v2-0002 and I have concern about how COALESCE expr is
processed in match_join_arg_to_partition_keys().

If there is a COALESCE expr with first arg being non-partition key expr
and second arg being partition key, the patch would match it to the
partition key, which may result in wrong results in some cases.

For instance, consider the partition table below:

create table p (k int, val int) partition by range(k);
create table p_1 partition of p for values from (1) to (10);
create table p_2 partition of p for values from (10) to (100);

So with patch v2-0002, the following query will be planned with
partitionwise join.

# explain (costs off)
select * from (p as t1 full join p as t2 on t1.k = t2.k) as
t12(k1,val1,k2,val2)
full join p as t3 on COALESCE(t12.val1, t12.k1)
= t3.k;
QUERY PLAN
--
 Append
   ->  Hash Full Join
 Hash Cond: (COALESCE(t1.val, t1.k) = t3.k)
 ->  Hash Full Join
   Hash Cond: (t1.k = t2.k)
   ->  Seq Scan on p_1 t1
   ->  Hash
 ->  Seq Scan on p_1 t2
 ->  Hash
   ->  Seq Scan on p_1 t3
   ->  Hash Full Join
 Hash Cond: (COALESCE(t1_1.val, t1_1.k) = t3_1.k)
 ->  Hash Full Join
   Hash Cond: (t1_1.k = t2_1.k)
   ->  Seq Scan on p_2 t1_1
   ->  Hash
 ->  Seq Scan on p_2 t2_1
 ->  Hash
   ->  Seq Scan on p_2 t3_1
(19 rows)

But as t1.val is not a partition key, actually we cannot use
partitionwise join here.

If we insert below data into the table, we will get wrong results for
the query above.

insert into p select 5,15;
insert into p select 15,5;

Thanks
Richard


Re: Plug-in common/logging.h with vacuumlo and oid2name

2019-09-04 Thread Peter Eisentraut
On 2019-08-20 03:28, Michael Paquier wrote:
> + pg_log_error("\nfailed to remove lo %u: 
> %s", lo,
> +  
> PQerrorMessage(conn));

This newline should be removed.

progname can probably be made into a local variable now.

The rest looks good to me.

Do we need set_pglocale_pgservice() calls here if we're not doing NLS?
Does the logging stuff require it?  I'm not sure.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: refactoring - share str2*int64 functions

2019-09-04 Thread Michael Paquier
On Tue, Sep 03, 2019 at 08:10:37PM +0200, Fabien COELHO wrote:
> Attached a rebased version which implements the int64/uint64 stuff. It is
> basically the previous patch without the overflow inlined functions.

- if (!strtoint64(yytext, true, >ival))
+ if (unlikely(pg_strtoint64(yytext, >ival) != PG_STRTOINT_OK))
It seems to me that we should have unlikely() only within
pg_strtoint64(), pg_strtouint64(), etc.

-   /* skip leading spaces; cast is consistent with strtoint64 */
-   while (*ptr && isspace((unsigned char) *ptr))
+   /* skip leading spaces; cast is consistent with pg_strtoint64 */
+   while (isspace((unsigned char) *ptr))
ptr++;
What do you think about splitting this part in two?  I would suggest
to do the refactoring in a first patch, and the consider all the
optimizations for the routines you have in mind afterwards.

I think that we don't actually need is_an_int() and str2int64() at all
in pgbench.c as we could just check for the return code of
pg_strtoint64() and switch to the double parsing only if we don't have
PG_STRTOINT_OK.

scanint8() only has one caller in the backend with your patch, and we
don't check after its return result in int8.c, so I would suggest to
move the error handling directly in int8in() and to remove scanint8().

I think that we should also introduce the [u]int[16|32] flavors and
expand them in the code in a single patch, in a way consistent with
what you have done for int64/uint64 as the state that we reach with
the patch is kind of weird as there are existing versions numutils.c.

Have you done some performance testing of the patch?  The COPY
bulkload is a good example of that:
https://www.postgresql.org/message-id/20190717181428.krqpmduejbqr4m6g%40alap3.anarazel.de
--
Michael


signature.asc
Description: PGP signature


Re: BUG #15977: Inconsistent behavior in chained transactions

2019-09-04 Thread Andres Freund
Hi,

On 2019-09-03 11:54:57 +0200, Peter Eisentraut wrote:
> On 2019-08-29 16:58, Fabien COELHO wrote:
> > 
> >> Thanks, I got it. I have never made a patch before so I'll keep it in my 
> >> mind. Self-contained patch is now attached.
> > 
> > v3 applies, compiles, "make check" ok.
> > 
> > I turned it ready on the app.

I don't think is quite sufficient. Note that the same problem also
exists for commits, one just needs force the commit to be part of a
multi-statement implicit transaction (i.e. a simple protocol exec /
PQexec(), with multiple statements).

E.g.:

postgres[32545][1]=# ROLLBACK;
WARNING:  25P01: there is no transaction in progress
LOCATION:  UserAbortTransactionBlock, xact.c:3914
ROLLBACK
Time: 0.790 ms
postgres[32545][1]=# SELECT 1\;COMMIT AND CHAIN;
WARNING:  25P01: there is no transaction in progress
LOCATION:  EndTransactionBlock, xact.c:3728
COMMIT
Time: 0.945 ms
postgres[32545][1]*=# COMMIT ;
COMMIT
Time: 0.539 ms

the \; bit forces psql to not split command into two separate protocol
level commands, but to submit them together.


> Should we make it an error instead of a warning?

Yea, I think for AND CHAIN we have to either error, or always start a
new transaction. I can see arguments for both, as long as it's
consistent.

The historical behaviour of issuing only WARNINGS when issuing COMMIT or
ROLLBACK outside of an explicit transaction seems to weigh in favor of
not erroring. Given that the result of such a transaction command is not
an error, the AND CHAIN portion should work.

Additionally, we actually have COMMIT; ROLLBACK; PREPARE TRANSACTION all
work meaningfully for implicit transactions. E.g.:

postgres[32545][1]=# CREATE TABLE test()\; PREPARE TRANSACTION 'frak';
WARNING:  25P01: there is no transaction in progress
LOCATION:  EndTransactionBlock, xact.c:3728
PREPARE TRANSACTION
Time: 15.094 ms
postgres[32545][1]=# \d test
Did not find any relation named "test".
postgres[32545][1]=# COMMIT PREPARED 'frak';
COMMIT PREPARED
Time: 4.727 ms
postgres[32545][1]=# \d test
   Table "public.test"
┌┬──┬───┬──┬─┐
│ Column │ Type │ Collation │ Nullable │ Default │
├┼──┼───┼──┼─┤
└┴──┴───┴──┴─┘


The argument in the other direction is that not erroring out hides bugs,
like an accidentally already committed transaction (which is
particularly bad for ROLLBACK). We can't easily change that for plain
COMMIT/ROLLBACK due to backward compat concerns, but for COMMIT|ROLLBACK
AND CHAIN there's no such such concern.

I think there's an argument that we ought to behave differently for
COMMIT/ROLLBACK/PREPARE in implicit transactions where multiple commands
exist, and ones where that's not the case. Given that they all actually
have an effect if there's a preceding statement in the implicit
transaction, the WARNING doesn't actually seem that appropriate?

There's some arguments that it's sometimes useful to be able to force
committing an implicit transaction. Consider e.g. executing batch schema
modifications with some sensitivity to latency (and thus wanting to use
reduce latency by executing multiple statements via one protocol
message). There's a few cases where committing earlier is quite useful
in that scenario, e.g.:

CREATE TYPE test_enum AS ENUM ('a', 'b', 'c');
CREATE TABLE uses_test_enum(v test_enum);

without explicit commit:

postgres[32545][1]=# ALTER TYPE test_enum ADD VALUE 'd'\;INSERT INTO 
uses_test_enum VALUES('d');
ERROR:  55P04: unsafe use of new value "d" of enum type test_enum
LINE 1: ...t_enum ADD VALUE 'd';INSERT INTO uses_test_enum VALUES('d');
  ^
HINT:  New enum values must be committed before they can be used.
LOCATION:  check_safe_enum_use, enum.c:98


with explicit commit:

postgres[32545][1]=# ALTER TYPE test_enum ADD VALUE 'd'\;COMMIT\;INSERT INTO 
uses_test_enum VALUES('d');
WARNING:  25P01: there is no transaction in progress
LOCATION:  EndTransactionBlock, xact.c:3728
INSERT 0 1

There's also the case that one might want to batch execute statements,
but not have to redo them if a later command fails.

Greetings,

Andres Freund




Re: d25ea01275 and partitionwise join

2019-09-04 Thread Richard Guo
Hi Amit,

On Wed, Sep 4, 2019 at 10:01 AM Amit Langote 
wrote:

> Fujita-san,
>
> To avoid losing track of this, I've added this to November CF.
>
> https://commitfest.postgresql.org/25/2278/
>
> I know there is one more patch beside the partitionwise join fix, but
> I've set the title to suggest that this is related mainly to
> partitionwise joins.
>

 Thank you for working on this. Currently partitionwise join does not
 take COALESCE expr into consideration when matching to partition keys.
 This is a problem.

 BTW, a rebase is needed for the patch set.

Thanks
Richard


Re: [HACKERS] CLUSTER command progress monitor

2019-09-04 Thread Michael Paquier
On Tue, Sep 03, 2019 at 02:52:28PM +0900, Michael Paquier wrote:
> Indeed, good catch.  This is wrong since b6fb647 which has introduced
> the progress reports.  I'll fix that one and back-patch if there are
> no objections.

OK, applied this part down to 9.6.
--
Michael


signature.asc
Description: PGP signature


Re: [Patch] Mingw: Fix import library extension, build actual static libraries

2019-09-04 Thread Peter Eisentraut
On 2019-03-07 15:23, Sandro Mani wrote:
> Currently building postgresql for Win32 with a mingw toolchain produces
> import libraries with *.a extension, whereas the extension should be
> *.dll.a.

I have read the MinGW documentation starting from
 and have found no information supporting
this claim.  All their examples match our existing naming.  What is your
source for this information -- other than ...

> There are various downstream workarounds for this, see i.e. [1]
> and [2]. The attached patch 0001-Fix-import-library-extension.patch
> addresses this.

I'm confused what Fedora and Arch Linux have to do with this.  Are they
distributing Windows binaries of third-party software?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [Patch] Invalid permission check in pg_stats for functional indexes

2019-09-04 Thread Kuntal Ghosh
On Wed, Sep 4, 2019 at 12:23 AM Pierre Ducroquet  wrote:
>
> All my apologies for that. I submitted this patch some time ago but forgot to
> add it to the commit fest. Attached to this mail is a rebased version.
>
Thank you for the new version of the patch. It looks good to me.
Moving the status to ready for committer.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH] psql: add tab completion for \df slash command suffixes

2019-09-04 Thread Ian Barwick

On 2019/09/04 15:14, Kuntal Ghosh wrote:

Hello Ian,

On Fri, Aug 30, 2019 at 10:31 AM Ian Barwick
 wrote:

I just noticed "\df[TAB]" fails to offer any tab-completion for
the possible suffixes ("\dfa", "\dfn", "\dfp", "\dft" and "\dfw").
Trivial patch attached, which applies back to Pg96, and separate
patches for Pg95 and Pg94.

I'll add this to the next commitfest.

I've reviewed the patch. It works as expected in master. But, the
syntax "\dfp" doesn't exist beyond Pg11. So, you've to remove the same
from the patches for Pg95 and Pg94. I guess the same patch should be
applied on Pg10 and Pg96.


Ah, good catch, I will update and resubmit.


Thanks!

Ian Barwick


--
 Ian Barwick   https://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Append with naive multiplexing of FDWs

2019-09-04 Thread Thomas Munro
Hello,

A few years back[1] I experimented with a simple readiness API that
would allow Append to start emitting tuples from whichever Foreign
Scan has data available, when working with FDW-based sharding.  I used
that primarily as a way to test Andres's new WaitEventSet stuff and my
kqueue implementation of that, but I didn't pursue it seriously
because I knew we wanted a more ambitious async executor rewrite and
many people had ideas about that, with schedulers capable of jumping
all over the tree etc.

Anyway, Stephen Frost pinged me off-list to ask about that patch, and
asked why we don't just do this naive thing until we have something
better.  It's a very localised feature that works only between Append
and its immediate children.  The patch makes it work for postgres_fdw,
but it should work for any FDW that can get its hands on a socket.

Here's a quick rebase of that old POC patch, along with a demo.  Since
2016, Parallel Append landed, but I didn't have time to think about
how to integrate with that so I did a quick "sledgehammer" rebase that
disables itself if parallelism is in the picture.

=== demo ===

create table t (a text, b text);

create or replace function slow_data(name text) returns setof t as
$$
begin
  perform pg_sleep(random());
  return query select name, generate_series(1, 100)::text as i;
end;
$$
language plpgsql;

create view t1 as select * from slow_data('t1');
create view t2 as select * from slow_data('t2');
create view t3 as select * from slow_data('t3');

create extension postgres_fdw;
create server server1 foreign data wrapper postgres_fdw options
(dbname 'postgres');
create server server2 foreign data wrapper postgres_fdw options
(dbname 'postgres');
create server server3 foreign data wrapper postgres_fdw options
(dbname 'postgres');
create user mapping for current_user server server1;
create user mapping for current_user server server2;
create user mapping for current_user server server3;
create foreign table ft1 (a text, b text) server server1 options
(table_name 't1');
create foreign table ft2 (a text, b text) server server2 options
(table_name 't2');
create foreign table ft3 (a text, b text) server server3 options
(table_name 't3');

-- create three remote shards
create table pt (a text, b text) partition by list (a);
alter table pt attach partition ft1 for values in ('ft1');
alter table pt attach partition ft2 for values in ('ft2');
alter table pt attach partition ft3 for values in ('ft3');

-- see that tuples come back in the order that they're ready

select * from pt where b like '42';

[1] 
https://www.postgresql.org/message-id/CAEepm%3D1CuAWfxDk%3D%3DjZ7pgCDCv52fiUnDSpUvmznmVmRKU5zpA%40mail.gmail.com

-- 
Thomas Munro
https://enterprisedb.com


0001-Multiplexing-Append-POC.patch
Description: Binary data


Re: Proposal: roll pg_stat_statements into core

2019-09-04 Thread David Fetter
On Wed, Sep 04, 2019 at 12:14:50AM -0400, Tom Lane wrote:
> Stephen Frost  writes:
> > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> >> I think there is a false dichotomy here.  Migrating an extension out of
> >> contrib doesn't have to equate making it no longer an extension.  We
> >> could, instead, keep it being an extension, but move it out of contrib
> >> and into (say) src/extensions/ so that it becomes installed and
> >> available by default, but still an extension.  Users won't have to
> >> install a separate contrib package, but they would still have to run
> >> CREATE EXTENSION.
> 
> > I don't agree that it's a false dichotomy- from a user experience, you
> > aren't really changing anything with this approach and the entire point
> > is that most of these things should just be available in core.
> 
> I don't think that auto-installing such things requires changing much
> of anything.  See plpgsql, which is auto-installed now but still sits
> in src/pl/ alongside other PLs that are not auto-installed.  Similarly,
> there's nothing really stopping us from choosing to install some
> module of contrib by default; rearranging the source tree is not a
> prerequisite to that.
> 
> The situation with pg_stat_statements is more than that, since what
> David is requesting is not merely that we auto-install that extension
> but that we automatically push it into shared_preload_libraries.

What I am actually suggesting is that it not be a separate library at
all, i.e. that all its parts live in PostgreSQL proper.

> > ...maybe we need a way to turn on/off things like pg_stat_statements but
> > that should have been a runtime "track_stat_statements" or some such,
> > similar to other things like "track_io_timing", not an "independent"
> > extension that is actually developed, managed, and released just as core
> > is.
> 
> A key point that hasn't been highlighted in this discussion is that having
> pg_stat_statements as an extension is proof-of-concept that such features
> *can* be implemented outside of core.  Don't you think that there are
> probably people maintaining private variants of that extension, who would
> be really sad if we removed or broke APIs they need for it once
> pg_stat_statements is part of core?

Nowhere near the number of people who are inconvenienced, at a
minimum, by the high barriers needed in order to install and use it in
its current form.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: [PATCH] psql: add tab completion for \df slash command suffixes

2019-09-04 Thread Kuntal Ghosh
Hello Ian,

On Fri, Aug 30, 2019 at 10:31 AM Ian Barwick
 wrote:
> I just noticed "\df[TAB]" fails to offer any tab-completion for
> the possible suffixes ("\dfa", "\dfn", "\dfp", "\dft" and "\dfw").
> Trivial patch attached, which applies back to Pg96, and separate
> patches for Pg95 and Pg94.
>
> I'll add this to the next commitfest.
I've reviewed the patch. It works as expected in master. But, the
syntax "\dfp" doesn't exist beyond Pg11. So, you've to remove the same
from the patches for Pg95 and Pg94. I guess the same patch should be
applied on Pg10 and Pg96.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com