date:20201106

Re: Log message for GSS connection is missing once connection authorization is successful.

2020-11-06 Thread Bharath Rupireddy

On Sat, Nov 7, 2020 at 9:27 AM vignesh C  wrote:
>
> Yes the test will fail if it takes more than the max_attempts as there
> is a like statement immediately after the loop:
> like($first_logfile, qr/\Q$expect_log_msg\E/,
>  'found expected log file content');
> I have also increased the attempts to 180 seconds just like other
> tests to avoid failure in very slow systems.
>

+1 for this.

>
> > 2. I intentionally altered(for testing purpose only) the expected log 
> > message input given to test_access(), expecting the tests to fail, but the 
> > test succeeded. Am I missing something here? Is it that the syslogger 
> > process not logging the message at all or within the 10sec waiting? Do we 
> > need to increase the wait duration? Do we need to do something to fail the 
> > test when we don't see the expected log message in test_access()?
> >
> > "cXNnnection authorized: user=..
> > "connecTEion authorized: user=
> > "connection auTThorized:.
> >
>
> Thanks for testing this, I had missed testing this. The expression
> matching was not correct. Attached v6 patch which includes the fix for
> this.
>

This use case works as expected i.e. test fails if the log message is
altered intentionally.

>
> Attached v6 patch which includes the fix for this.
>

Thanks. I have no further comments on the V6 patch, it looks good to
me. make check of 001_auth.pl, regression tests make check and make
check world passes. It can be passed to committer for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: pgbench stopped supporting large number of client connections on Windows

2020-11-06 Thread Tom Lane

Fabien COELHO  writes:
>> Any suggestions are welcome!

> Use ppoll, and start more threads but not too many?

Does ppoll exist on Windows?

There was a prior thread on this topic, which seems to have drifted off
into the sunset:

https://www.postgresql.org/message-id/flat/BL0PR1901MB1985F219C46C61EDE7036C34ED8E0%40BL0PR1901MB1985.namprd19.prod.outlook.com

regards, tom lane

Re: PG13: message style changes

2020-11-06 Thread Michael Paquier

On Sat, Nov 07, 2020 at 12:49:43AM -0300, Alvaro Herrera wrote:
> In 0001, I propose changing messages that were introduced as different 
> for parallel vacuum workers.  Frankly I don't understand why we are
> bragging about the vacuum being done in a parallel worker; does the user
> care?  It seems to me that users are just satisfied to know that the
> indexes were scanned; the fact that this was done in a parallel worker
> is not of much interest, so why call attention to that?  Therefore, we
> can reduce the message to what's emitted in the normal case.

Indeed.  Worth noting also that one can get the same level of
information with %P in log_line_prefix.

> In 0002, I propose to remove the word "concurrently" in an error
> message when an invalid index cannot be reindexed.  In fact, the problem
> is generic: we just cannot reindex the index at all, regardless of
> concurrently or not.  So we can reduce this message to be identical to
> the one we throw in the non-concurrent case.

No issues from me here.

> Patch 0004 just adds a comment to clarify a message that I found
> confusing when doing the translation.

+1.
--
Michael


signature.asc
Description: PGP signature

Re: extension patch of CREATE OR REPLACE TRIGGER

2020-11-06 Thread Dilip Kumar

On Sat, Nov 7, 2020 at 10:00 AM Peter Smith  wrote:
>
> Hello Osumi-san.
>
> I have checked the latest v17 patch w.r.t. to my previous comments.
>
> The v17 patch applies cleanly.
>
> make check is successful.
>
> The regenerated docs look OK.
>
> I have no further review comments, so have flagged this v17 as "ready
> for committer" - https://commitfest.postgresql.org/30/2307/
>

The patch looks fine to me however I feel that in the test case there
are a lot of duplicate statement which can be reduced
e.g.
+-- 1. Overwrite existing regular trigger with regular trigger
(without OR REPLACE)
+create trigger my_trig
+  after insert on my_table
+  for each row execute procedure funcA();
+create trigger my_trig
+  after insert on my_table
+  for each row execute procedure funcB(); -- should fail
+drop trigger my_trig on my_table;
+
+-- 2. Overwrite existing regular trigger with regular trigger (with OR REPLACE)
+create trigger my_trig
+  after insert on my_table
+  for each row execute procedure funcA();
+insert into my_table values (1);
+create or replace trigger my_trig
+  after insert on my_table
+  for each row execute procedure funcB(); -- OK
+insert into my_table values (1);
+drop trigger my_trig on my_table;

In this test, test 1 failed because it tried to change the trigger
function without OR REPLACE, which is fine but now test 2 can continue
from there, I mean we don't need to drop the trigger at end of the
test1 and then test2 can try it with OR REPLACE syntax.  This way we
can reduce the extra statement execution which is not necessary.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: extension patch of CREATE OR REPLACE TRIGGER

2020-11-06 Thread Peter Smith

Hello Osumi-san.

I have checked the latest v17 patch w.r.t. to my previous comments.

The v17 patch applies cleanly.

make check is successful.

The regenerated docs look OK.

I have no further review comments, so have flagged this v17 as "ready
for committer" - https://commitfest.postgresql.org/30/2307/

---

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Log message for GSS connection is missing once connection authorization is successful.

2020-11-06 Thread vignesh C

On Thu, Nov 5, 2020 at 9:50 AM Bharath Rupireddy
 wrote:
>
> On Thu, Nov 5, 2020 at 7:55 AM Euler Taveira  
> wrote:
> >
> > No. Don't worry with translations during the development. Make sure to 
> > follow
> > the instructions provided here [1]. Translations are coordinated in a 
> > different
> > mailing list: pgsql-translators [2]. There is a different repository [3] for
> > handling PO files and the updated files are merged by Peter Eisentraut just
> > before each minor/major release. We usually start to update translations 
> > after
> > feature freeze.
> >
>
> Thanks.
>
> On Tue, Nov 3, 2020 at 12:49 PM vignesh C  wrote:
> >
> > Thanks for the explanation, I have attached a v5 patch with the
> > changes where the translation should not have any problem.
> >
>
> I have taken a further look at the V5 patch:
>
> 1. We wait 10sec until the syslogger process logs the expected message, what 
> happens if someone intentionally made the syslogger process to wait for a 
> longer duration?  Will the new tests fail?
> # might need to retry if logging collector process is slow...
> my $max_attempts = 10 * 10;
> my $first_logfile;
> for (my $attempts = 0; $attempts < $max_attempts; $attempts++)
> {
> $first_logfile = slurp_file($node->data_dir . '/' . $lfname);
> last if $first_logfile =~ m/$expect_log_msg /;
> usleep(100_000);
> }
>

Yes the test will fail if it takes more than the max_attempts as there
is a like statement immediately after the loop:
like($first_logfile, qr/\Q$expect_log_msg\E/,
 'found expected log file content');
I have also increased the attempts to 180 seconds just like other
tests to avoid failure in very slow systems.

> 2. I intentionally altered(for testing purpose only) the expected log message 
> input given to test_access(), expecting the tests to fail, but the test 
> succeeded. Am I missing something here? Is it that the syslogger process not 
> logging the message at all or within the 10sec waiting? Do we need to 
> increase the wait duration? Do we need to do something to fail the test when 
> we don't see the expected log message in test_access()?
>
> "cXNnnection authorized: user=..
> "connecTEion authorized: user=
> "connection auTThorized:.
>

Thanks for testing this, I had missed testing this. The expression
matching was not correct. Attached v6 patch which includes the fix for
this.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From 0d190bbe888127aed0c8db2e061cb8cdc8b99ee5 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Fri, 30 Oct 2020 17:58:45 +0530
Subject: [PATCH v6] Improving the connection authorization message for GSS
 authenticated/encrypted connections.

Added log message to include GSS authentication, encryption & principal
information. This message will help the user to know if GSS authentication or
encryption was used and which GSS principal was used.
---
 src/backend/utils/init/postinit.c |  81 ++
 src/test/kerberos/t/001_auth.pl   | 117 +++---
 2 files changed, 112 insertions(+), 86 deletions(-)

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index d4ab4c7..0e73598 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -246,62 +246,39 @@ PerformAuthentication(Port *port)
 
 	if (Log_connections)
 	{
+		StringInfoData logmsg;
+		initStringInfo(&logmsg);
 		if (am_walsender)
-		{
-#ifdef USE_SSL
-			if (port->ssl_in_use)
-ereport(LOG,
-		(port->application_name != NULL
-		 ? errmsg("replication connection authorized: user=%s application_name=%s SSL enabled (protocol=%s, cipher=%s, bits=%d, compression=%s)",
-  port->user_name,
-  port->application_name,
-  be_tls_get_version(port),
-  be_tls_get_cipher(port),
-  be_tls_get_cipher_bits(port),
-  be_tls_get_compression(port) ? _("on") : _("off"))
-		 : errmsg("replication connection authorized: user=%s SSL enabled (protocol=%s, cipher=%s, bits=%d, compression=%s)",
-  port->user_name,
-  be_tls_get_version(port),
-  be_tls_get_cipher(port),
-  be_tls_get_cipher_bits(port),
-  be_tls_get_compression(port) ? _("on") : _("off";
-			else
-#endif
-ereport(LOG,
-		(port->application_name != NULL
-		 ? errmsg("replication connection authorized: user=%s application_name=%s",
-  port->user_name,
-  port->application_name)
-		 : errmsg("replication connection authorized: user=%s",
-  port->user_name)));
-		}
+			appendStringInfo(&logmsg, _("replication connection authorized: user=%s"),
+			 port->user_name);
 		else
-		{
+			appendStringInfo(&logmsg, _("connection authorized: user=%s"),
+			 port->user_name);
+		if (!am_walsender)
+			appendStringInfo(&logmsg, _(" database=%s"), port->database_name);
+

PG13: message style changes

2020-11-06 Thread Alvaro Herrera

I propose to apply the following changes to messages in pg13.

In 0001, I propose changing messages that were introduced as different 
for parallel vacuum workers.  Frankly I don't understand why we are
bragging about the vacuum being done in a parallel worker; does the user
care?  It seems to me that users are just satisfied to know that the
indexes were scanned; the fact that this was done in a parallel worker
is not of much interest, so why call attention to that?  Therefore, we
can reduce the message to what's emitted in the normal case.

In 0002, I propose to remove the word "concurrently" in an error
message when an invalid index cannot be reindexed.  In fact, the problem
is generic: we just cannot reindex the index at all, regardless of
concurrently or not.  So we can reduce this message to be identical to
the one we throw in the non-concurrent case.

(Dropped 0003 while composing this email.)

Patch 0004 just adds a comment to clarify a message that I found
confusing when doing the translation.

-- 
Álvaro Herrera39°49'30"S 73°17'W
>From 3ea8f3ebd421e3c9b5633fffb7f38d999f21895f Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 22 Sep 2020 13:04:25 -0300
Subject: [PATCH 1/4] Simplify message

The user doesn't really care that a particular index vacuum was executed
by a parallel worker or not; remove that bit from the progress message.
---
 src/backend/access/heap/vacuumlazy.c | 16 ++--
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2174fccb1e..25f2d5df1b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2432,7 +2432,6 @@ lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
   LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats)
 {
 	IndexVacuumInfo ivinfo;
-	const char *msg;
 	PGRUsage	ru0;
 	LVSavedErrInfo saved_err_info;
 
@@ -2462,13 +2461,8 @@ lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
 	*stats = index_bulk_delete(&ivinfo, *stats,
 			   lazy_tid_reaped, (void *) dead_tuples);
 
-	if (IsParallelWorker())
-		msg = gettext_noop("scanned index \"%s\" to remove %d row versions by parallel vacuum worker");
-	else
-		msg = gettext_noop("scanned index \"%s\" to remove %d row versions");
-
 	ereport(elevel,
-			(errmsg(msg,
+			(errmsg("scanned index \"%s\" to remove %d row versions",
 	vacrelstats->indname,
 	dead_tuples->num_tuples),
 			 errdetail_internal("%s", pg_rusage_show(&ru0;
@@ -2491,7 +2485,6 @@ lazy_cleanup_index(Relation indrel,
    double reltuples, bool estimated_count, LVRelStats *vacrelstats)
 {
 	IndexVacuumInfo ivinfo;
-	const char *msg;
 	PGRUsage	ru0;
 	LVSavedErrInfo saved_err_info;
 
@@ -2522,13 +2515,8 @@ lazy_cleanup_index(Relation indrel,
 
 	if (*stats)
 	{
-		if (IsParallelWorker())
-			msg = gettext_noop("index \"%s\" now contains %.0f row versions in %u pages as reported by parallel vacuum worker");
-		else
-			msg = gettext_noop("index \"%s\" now contains %.0f row versions in %u pages");
-
 		ereport(elevel,
-(errmsg(msg,
+(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
 		RelationGetRelationName(indrel),
 		(*stats)->num_index_tuples,
 		(*stats)->num_pages),
-- 
2.20.1

>From 0c86613625a66ca81ed3f57f28f63567f5e690fc Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 22 Sep 2020 13:07:11 -0300
Subject: [PATCH 2/4] Remove "concurrently" from error message

Invalid indexes on toast tables cannot be reindexed regardless of mode
of operation.
---
 src/backend/commands/indexcmds.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 75552c64ed..3522f4b6a6 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -3187,13 +3187,14 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 
 /*
  * Don't allow reindex for an invalid index on TOAST table, as
- * if rebuilt it would not be possible to drop it.
+ * if rebuilt it would not be possible to drop it.  Match
+ * error message in reindex_index().
  */
 if (IsToastNamespace(get_rel_namespace(relationOid)) &&
 	!get_index_isvalid(relationOid))
 	ereport(ERROR,
 			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-			 errmsg("cannot reindex invalid index on TOAST table concurrently")));
+			 errmsg("cannot reindex invalid index on TOAST table")));
 
 /*
  * Check if parent relation can be locked and if it exists,
-- 
2.20.1

>From e78fd46092807bbef23c8941aa4f43fc78a1f7f4 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 22 Sep 2020 13:09:07 -0300
Subject: [PATCH 4/4] Make message less obscure

This message is pretty misterious, so add a 'translator:' comment to explain
---
 src/backend/libpq/be-secure-openssl.c | 2 ++
 1 file changed

Re: First-draft release notes for back branches are up

2020-11-06 Thread Tom Lane

Tomas Vondra  writes:
> We should probably include instructions what to do about the BRIN
> data corruption fixed by 7577dd8480 - a query to list might be
> affected by the bug and should be rebuilt.

Do you have some suggested text?

regards, tom lane

Re: pg_dump, ATTACH, and independently restorable child partitions

2020-11-06 Thread Alvaro Herrera

On 2020-Oct-24, Justin Pryzby wrote:

> On Fri, Oct 23, 2020 at 12:29:40AM -0500, Justin Pryzby wrote:

> > Now that I look, it seems like this is calling PQexec(), which sends a 
> > single,
> > "simple" libpq message with:
> > |CREATE TABLE ..; ALTER TABLE .. ATTACH PARTITION;
> > ..which is transactional, so when the 2nd command fails, the CREATE is 
> > rolled back.
> > https://www.postgresql.org/docs/9.5/libpq-exec.html#LIBPQ-EXEC-MAIN
> 
> The easy fix is to add an explicit begin/commit.

Hmm, I think this throws a warning when used with "pg_restore -1",
right?  I don't think that's sufficient reason to discard the idea, but
it be better to find some other way.

I have no ideas ATM :-(

Re: "unix_socket_directories" should be GUC_LIST_INPUT?

2020-11-06 Thread Michael Paquier

On Thu, Nov 05, 2020 at 09:16:10AM +0900, Michael Paquier wrote:
> No arguments against this point either.  If you consider all that, the
> switch can be done with the attached, with the change for pg_dump
> included.  I have reorganized the list in variable_is_guc_list_quote()
> alphabetically while on it.

Hearing nothing, applied on HEAD.
--
Michael


signature.asc
Description: PGP signature

Re: speed up unicode decomposition and recomposition

2020-11-06 Thread Michael Paquier

On Sat, Nov 07, 2020 at 09:29:30AM +0900, Michael Paquier wrote:
> Thanks John.  Both look right to me.  I'll apply both in a bit.

Done that now.  Just for the note: you forgot to run pgperltidy.
--
Michael


signature.asc
Description: PGP signature

Re: speed up unicode decomposition and recomposition

2020-11-06 Thread Michael Paquier

On Fri, Nov 06, 2020 at 06:20:00PM -0400, John Naylor wrote:
> There is a latent bug in the way code pairs for recomposition are sorted
> due to a copy-pasto on my part. Makes no difference now, but it could in
> the future.  While looking, it seems pg_bswap.h should actually be
> backend-only. Both fixed in the attached.

Thanks John.  Both look right to me.  I'll apply both in a bit.
--
Michael


signature.asc
Description: PGP signature

Re: First-draft release notes for back branches are up

2020-11-06 Thread Tomas Vondra

On Fri, 06 Nov 2020 17:07:12 -0500
Tom Lane  wrote:

> See
> 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c66a3225e07b5098a796f24588a6b81bfdedd2fd
> 
> Please send any corrections by Sunday.
> 

We should probably include instructions what to do about the BRIN
data corruption fixed by 7577dd8480 - a query to list might be
affected by the bug and should be rebuilt.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fix brin_form_tuple to properly detoast data

2020-11-06 Thread Tomas Vondra

On Thu, 5 Nov 2020 18:16:04 -0300
Alvaro Herrera  wrote:

> On 2020-Nov-05, Tomas Vondra wrote:
> 
> > On 11/5/20 6:17 PM, Alvaro Herrera wrote:  
> 
> > > There are recent changes in vacuum for temp tables (commit
> > > 94bc27b57680?) that would maybe make this stable enough, without
> > > having to have the CIC there.  At least, I tried it locally a few
> > > times and it appears to work well. This won't work for older
> > > releases though, just master.  This is patch 0001 attached here.  
> > 
> > IIUC you're suggesting to use a temporary table in the test?
> > Unfortunately, this does not work on older releases, and IMHO the
> > test should be backpatched too. IMHO the CIC "hack" is acceptable,
> > unless there's a better solution that I'm not aware of.  
> 
> Oh, sure, the CIC hack is acceptable for the older branches.  I'm just
> saying that you can use a temp table everywhere, and keep CIC in the
> old branches and no CIC in master.
> 
> > This got me thinking though - wouldn't it be better to handle too
> > large values by treating the range as "degenerate" (i.e. store NULL
> > and consider it as matching all queries), instead of failing the
> > CREATE INDEX or DML? I find the current behavior rather annoying,
> > because it depends on the other rows in the page range, not just on
> > the one row the user deals with. Perhaps this might even be
> > considered an information leak about the other data. Of course, not
> > something this patch should deal with.  
> 
> Hmm.  Regarding text I remember thinking we could just truncate values
> (as we do for LIKE, as I recall).  I suppose that strategy would work
> even for bytea.
> 
> 
> > > > +/*
> > > > + * This enables de-toasting of index entries.  Needed until
> > > > VACUUM is
> > > > + * smart enough to rebuild indexes from scratch.
> > > > + */  
> > > 
> > > ... because, surely, we're now never working on having VACUUM
> > > rebuild indexes from scratch.   In fact, I wonder if we need the
> > > #define at all.  I propose to remove all those #ifdef lines in
> > > your patch.  
> > 
> > That's a verbatim copy of a comment from indextuple.c. IMHO we
> > should keep it the same in both places.  
> 
> Sure, if you want to.
> 
> > > The fix looks good to me.  I just added a comment in 0003.  
> > 
> > Thanks. Any opinions on fixing this in existing clusters? Any
> > better ideas than just giving users the SQL query to list
> > possibly-affected indexes?  
> 
> None here.
> 

OK, pushed and backpatched all the way back to 9.5. I decided not to
use the temporary table - I'd still need to use the CIC trick on older
releases, and there were enough differences already.



regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: list_free() in index_get_partition()

2020-11-06 Thread Alvaro Herrera

On 2020-Nov-06, Michael Paquier wrote:

> On Thu, Nov 05, 2020 at 02:36:06PM -0600, Justin Pryzby wrote:
> > Seems to be missing.
> 
> Any code paths calling index_get_partition() is in tablecmds.c,
> involving commands that are not that hot with such lookups, but that
> could be an issue if this gets called by some out-of-core code if
> memory context cleanup is sloppy.  So I agree to fix this one and
> backpatch down to 12.  I'd like to apply your fix, except if Alvaro
> thinks differently as that's his commit originally.  So let's wait a
> bit first.

Agreed; I'll get this pushed now.  Thanks for reporting.

> > The 2nd patch does some more cleanup - Before, a failed syscache lookup 
> > would
> > ERROR, but I don't think that's supposed to happen.  
> > get_rel_relispartition()
> > would instead return false, and we won't call get_partition_parent().
> 
> The cache lookup error is here as a safeguard to remind that any code
> path calling index_get_partition() needs a proper lock on the
> relations involved before doing such lookups, so that's a defensive
> move.  By switching the code as you do in 0002, you could mask such
> logical problems as different errors could get raised.  So I don't
> think that it is a good idea.

Yeah, I'm not so sure either.  I have memories of purposefully not using
get_rel_relispartition there, but I don't remember exactly why and
it's not in the archives.

errors when there is a bit literal in ecpg

2020-11-06 Thread Wang, Shenhao

Hi, hacker

I met an error as below when I use ecpg

a.pgc:5: ERROR: invalid bit string literal
a.pgc:5: ERROR: internal error: unreachable state; please report this to 


the test source is attached.

After investigating the code, I think the process of pgc.l is:
Step 1:  {xbstart},  addlitchar('b') is called, literalbuf contains a char 
'b';
Step 2:  {xbinside},   the rest of char is added in literalbuf
Step 3:   {other},thecondition literalbuf[strspn(literalbuf, "01") 
+ 1] != '\0'   will always be true;
error is occurred here

I try to fix this bug by deleting 'addlitchar('b');' from source I also add a 
test case to test all const str in ecpg.

The patch is also attached.

Best regards, Shenhao Wang




0001-Fix-error-when-handle-bit-string.patch
Description: 0001-Fix-error-when-handle-bit-string.patch


a.pgc
Description: a.pgc

Re: speed up unicode decomposition and recomposition

2020-11-06 Thread John Naylor

There is a latent bug in the way code pairs for recomposition are sorted
due to a copy-pasto on my part. Makes no difference now, but it could in
the future. While looking, it seems pg_bswap.h should actually be
backend-only. Both fixed in the attached.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


minor-fixes-in-unicode-norm.patch
Description: Binary data

First-draft release notes for back branches are up

2020-11-06 Thread Tom Lane

See

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c66a3225e07b5098a796f24588a6b81bfdedd2fd

Please send any corrections by Sunday.

regards, tom lane

Re: pgbench stopped supporting large number of client connections on Windows

2020-11-06 Thread Fabien COELHO




Hello Marina,

While trying to test a patch that adds a synchronization barrier in pgbench 
[1] on Windows,


Thanks for trying that, I do not have a windows setup for testing, and the 
sync code I wrote for Windows is basically blind coding:-(


I found that since the commit "Use ppoll(2), if available, to 
wait for input in pgbench." [2] I cannot use a large number of client 
connections in pgbench on my Windows virtual machines (Windows Server 2008 R2 
and Windows 2019), for example:



bin\pgbench.exe -c 90 -S -T 3 postgres

starting vacuum...end.


ISTM that 1 thread with 90 clients is a bad idea, see below.

The almost same thing happens with reindexdb and vacuumdb (build on 
commit [3]):


Windows fd implementation is somehow buggy because it does not return the 
smallest number available, and then with the assumption that select uses a 
dense array indexed with them (true on linux, less so on Windows which 
probably uses a sparse array), so that the number gets over the limit, 
even if less are actually used, hence the catch, as you noted.


Another point is windows has a hardcoded number of objects one thread can 
really wait for, typically 64, so that waiting for more requires actually 
forking threads to do the waiting. But if you are ready to fork threads 
just to wait, then probaly you could have started pgbench with more 
threads in the first place. Now it would probably not make the problem go 
away because fd numbers would be per process, not per thread, but it 
really suggests that one should not load a thread is more than 64 clients.


IIUC the checks below are not correct on Windows, since on this system 
sockets can have values equal to or greater than FD_SETSIZE (see Windows 
documentation [4] and pgbench debug output in attached pgbench_debug.txt).


Okay.

But then, how may one detect that there are too many fds in the set?

I think that an earlier version of the code needed to make assumptions 
about the internal implementation of windows (there is a counter somewhere 
in windows fd_set struct), which was rejected because if was breaking the 
interface. Now your patch is basically resurrecting that. Why not if there 
is no other solution, but this is quite depressing, and because it breaks 
the interface it would be broken if windows changed its internals for some 
reason:-(


Doesn't windows has "ppoll"? Should we implement the stuff above windows 
polling capabilities and coldly skip its failed posix portability 
attempts? This raises again the issue that you should not have more that 
64 clients per thread anyway, because it is an intrinsic limit on windows.


I think that at one point it was suggested to error or warn if 
nclients/nthreads is too great, but that was not kept in the end.


I tried to fix this, see attached fix_max_client_conn_on_Windows.patch (based 
on commit [3]). I checked it for reindexdb and vacuumdb, and it works for 
simple databases (1025 jobs are not allowed and 1024 jobs is ok). 
Unfortunately, pgbench was getting connection errors when it tried to use 
1000 jobs on my virtual machines, although there were no errors for fewer 
jobs (500) and the same number of clients (1000)...


It seems that the max number of threads you can start depends on available 
memory, because each thread is given its own stack, so it would depend on 
your vm settings?



Any suggestions are welcome!


Use ppoll, and start more threads but not too many?

--
Fabien.

Re: Allow some recovery parameters to be changed with reload

2020-11-06 Thread Sergei Kornilov

Hello

> I'm wondering if it's safe to allow restore_command to be emptied during
> archive recovery. Even when it's emptied, archive recovery can proceed
> by reading WAL files from pg_wal directory. This is the same behavior as
> when restore_command is set to, e.g., /bin/false.

I am always confused by this implementation detail. restore_command fails? 
Fine, let's just read file from pg_wal. But this is different topic...

I do not know the history of this fatal ereport. It looks like "must specify 
restore_command when standby mode is not enabled" check is only intended to 
protect the user from misconfiguration and the rest code will treat empty 
restore_command correctly, just like /bin/false. Did not notice anything around 
StandbyMode conditions.

regards, Sergei

Re: Support for NSS as a libpq TLS backend

2020-11-06 Thread Jacob Champion

On Nov 4, 2020, at 5:09 AM, Daniel Gustafsson  wrote:

> (sorry for slow response).  You are absolutely right, the has_password flag
> must be tracked per connection in PGconn.  The attached v17 implements this as
> well a frontend bugfix which caused dropped connections and some smaller 
> fixups
> to make strings more translateable.

Some initial notes from building and testing on macOS Mojave. I'm working with
both a brew-packaged NSS/NSPR (which includes basic nss-/nspr-config) and a
hand-built NSS/NSPR (which does not).

1. In configure.ac:

> +  LDFLAGS="$LDFLAGS $NSS_LIBS $NSPR_LIBS"
> +  CFLAGS="$CFLAGS $NSS_CFLAGS $NSPR_CFLAGS"
> +
> +  AC_CHECK_LIB(nss3, SSL_VersionRangeSet, [], [AC_MSG_ERROR([library 'nss3' 
> is required for NSS])])

Looks like SSL_VersionRangeSet is part of libssl3, not libnss3. So this fails
with the hand-built stack, where there is no nss-config to populate LDFLAGS. I
changed the function to NSS_InitContext and that seems to work nicely.

2. Among the things to eventually think about when it comes to configuring, it
looks like some platforms [1] install the headers under  and
 instead of  and . It's unfortunate that the NSS
maintainers never chose an official installation layout.

3. I need two more `#define NO_NSPR_10_SUPPORT` guards added in both

  src/include/common/pg_nss.h
  src/port/pg_strong_random.c

before the tree will compile for me. Both of those files include NSS headers.

4. be_tls_init() refuses to run correctly for me; I end up getting an NSPR
assertion that looks like

  sslMutex_Init not implemented for multi-process applications !

With assertions disabled, this ends up showing a somewhat unhelpful

  FATAL:  unable to set up TLS connection cache: security library failure. 
(SEC_ERROR_LIBRARY_FAILURE)

It looks like cross-process locking isn't actually enabled on macOS, which is a
long-standing bug in NSPR [2, 3]. So calls to SSL_ConfigMPServerSIDCache()
error out.

--Jacob

[1] https://github.com/erthink/ReOpenLDAP/issues/112
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=538680
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1192500

re: pgbench stopped supporting large number of client connections on Windows

2020-11-06 Thread Ranier Vilela

Hi Marina,
Nice catch.

>rc/bin/pgbench/pgbench.c, the function add_socket_to_set:
>if (fd < 0 || fd >= FD_SETSIZE)
>{
>/*
>* Doing a hard exit here is a bit grotty, but it doesn't seem worth
>* complicating the API to make it less grotty.
>*/
>pg_log_fatal("too many client connections for select()");
>exit(1);
>}

It seems to me that the limit is hardcode in,
src/backend/port/win32/socket.c

FD_SETSIZE * 2

that would be 2048?

regards,

Ranier Vilela

pgbench stopped supporting large number of client connections on Windows

2020-11-06 Thread Marina Polyakova


Hello, hackers!

While trying to test a patch that adds a synchronization barrier in 
pgbench [1] on Windows, I found that since the commit "Use ppoll(2), if 
available, to wait for input in pgbench." [2] I cannot use a large 
number of client connections in pgbench on my Windows virtual machines 
(Windows Server 2008 R2 and Windows 2019), for example:



bin\pgbench.exe -c 90 -S -T 3 postgres

starting vacuum...end.
too many client connections for select()

The almost same thing happens with reindexdb and vacuumdb (build on 
commit [3]):



bin\reindexdb.exe -j 95 postgres

reindexdb: fatal: too many jobs for this platform -- try 90


bin\vacuumdb.exe -j 95 postgres

vacuumdb: vacuuming database "postgres"
vacuumdb: fatal: too many jobs for this platform -- try 90

IIUC the checks below are not correct on Windows, since on this system 
sockets can have values equal to or greater than FD_SETSIZE (see Windows 
documentation [4] and pgbench debug output in attached 
pgbench_debug.txt).


src/bin/pgbench/pgbench.c, the function add_socket_to_set:
if (fd < 0 || fd >= FD_SETSIZE)
{
/*
 * Doing a hard exit here is a bit grotty, but it doesn't seem worth
 * complicating the API to make it less grotty.
 */
pg_log_fatal("too many client connections for select()");
exit(1);
}

src/bin/scripts/scripts_parallel.c, the function ParallelSlotsSetup:
/*
 * Fail and exit immediately if trying to use a socket in an
 * unsupported range.  POSIX requires open(2) to use the lowest
 * unused file descriptor and the hint given relies on that.
 */
if (PQsocket(conn) >= FD_SETSIZE)
{
pg_log_fatal("too many jobs for this platform -- try %d", i);
exit(1);
}

I tried to fix this, see attached fix_max_client_conn_on_Windows.patch 
(based on commit [3]). I checked it for reindexdb and vacuumdb, and it 
works for simple databases (1025 jobs are not allowed and 1024 jobs is 
ok). Unfortunately, pgbench was getting connection errors when it tried 
to use 1000 jobs on my virtual machines, although there were no errors 
for fewer jobs (500) and the same number of clients (1000)...


Any suggestions are welcome!

[1] 
https://www.postgresql.org/message-id/flat/20200227180100.zyvjwzcpiokfsqm2%40alap3.anarazel.de
[2] 
https://github.com/postgres/postgres/commit/60e612b602999e670f2d57a01e52799eaa903ca9
[3] 
https://github.com/postgres/postgres/commit/48e1291342dd7771cf8c67aa1d7ec1f394b95dd8
[4] From 
https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select 
:
Internally, socket handles in an fd_set structure are not represented as 
bit flags as in Berkeley Unix. Their data representation is opaque.


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 3057665bbec567331ad5ea03d31af707f5e91b4c..7a54638db191982d538cabf007d82715fa254b6a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -62,6 +62,7 @@
 #include "common/string.h"
 #include "fe_utils/cancel.h"
 #include "fe_utils/conditional.h"
+#include "fe_utils/socket_utils.h"
 #include "getopt_long.h"
 #include "libpq-fe.h"
 #include "pgbench.h"
@@ -6698,7 +6699,7 @@ clear_socket_set(socket_set *sa)
 static void
 add_socket_to_set(socket_set *sa, int fd, int idx)
 {
-	if (fd < 0 || fd >= FD_SETSIZE)
+	if (fd < 0 || !check_fd_set_size(fd, &sa->fds))
 	{
 		/*
 		 * Doing a hard exit here is a bit grotty, but it doesn't seem worth
diff --git a/src/bin/scripts/scripts_parallel.c b/src/bin/scripts/scripts_parallel.c
index ec264a269a7d7a506c347112af6be183b2aec9ce..61977741c7c5f76b1966ab8e59792a1d2b53b934 100644
--- a/src/bin/scripts/scripts_parallel.c
+++ b/src/bin/scripts/scripts_parallel.c
@@ -25,6 +25,7 @@
 #include "common.h"
 #include "common/logging.h"
 #include "fe_utils/cancel.h"
+#include "fe_utils/socket_utils.h"
 #include "scripts_parallel.h"
 
 static void init_slot(ParallelSlot *slot, PGconn *conn);
@@ -144,6 +145,16 @@ ParallelSlotsGetIdle(ParallelSlot *slots, int numslots)
 			if (sock < 0)
 continue;
 
+			/*
+			 * Fail and exit immediately if trying to use a socket in an
+			 * unsupported range.
+			 */
+			if (!check_fd_set_size(sock, &slotset))
+			{
+pg_log_fatal("too many jobs for this platform -- try %d", i);
+exit(1);
+			}
+
 			FD_SET(sock, &slotset);
 			if (sock > maxFd)
 maxFd = sock;
@@ -221,18 +232,6 @@ ParallelSlotsSetup(const ConnParams *cparams,
 		for (i = 1; i < numslots; i++)
 		{
 			conn = connectDatabase(cparams, progname, echo, false, true);
-
-			/*
-			 * Fail and exit immediately if trying to use a socket in an
-			 * unsupported range.  POSIX requires open(2) to use the lowest
-			 * unused file descriptor and the hint given relies on that.
-			 */
-			if (PQsocket(conn) >= FD_SETSIZE)
-			{
-pg_log_fatal("too many jobs for this platform -- try %d", i);
-exit(1);
-			}
-
 			init_slot(slots + i, conn);
 		}

Re: bitmaps and correlation

2020-11-06 Thread Justin Pryzby

On Fri, Nov 06, 2020 at 01:51:26PM +, Anastasia Lubennikova wrote:
> I wonder why this patch hangs so long without a review. Maybe it will help to 
> move discussion forward, if you provide more examples of queries that can 
> benefit from this imporovement?

Thanks for looking.

The explanation is that the planner currently gives index scans a cost
"discount" for correlation.  Jeff Janes has pointed out that there are two
discounts: 1) fewer pages are read; and, 2) lower cost-per-page.  This patch
aims to give bitmap scans the same benefits.  A "dense" bitmap will read fewer
pages, more sequentially.

Tom pointed out that the "correlation" isn't a perfect metric for this, since
the index might be "clumpy" without being well-ordered, which doesn't matter
for bitmap scans, which access in physical order anyway.  In those cases, the
correlation logic would fail to reduce the estimated cost of bitmap scan, even
though the actual cost is less (same as now).  This is true, but my goal is to
give bitmap scans at least the same benefit as index scans, even if there's
additional "discounts" which aren't yet being considered.

This was an issue for me in the past when the planner chose a to scan a index,
but it was slower than projected (for reasons unrelated to this patch).  Making
bitmap cost account for high correlation was one step towards addressing that.
Since then, we switched to brin indexes, which force bitmap scans.
https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com

Here's an example.

CREATE TABLE t AS SELECT a,b FROM generate_series(1,999) a, 
generate_series(1,999) b ORDER BY a+b/9.9;
CREATE INDEX ON t(a);

postgres=# SELECT attname, correlation FROM pg_stats WHERE tablename ='t';
 a   |   0.9951819
 b   |  0.10415093

postgres=# explain analyze SELECT * FROM t WHERE a BETWEEN 55 AND 77;
 Index Scan using t_a_idx on t  (cost=0.42..810.89 rows=22683 width=8) (actual 
time=0.292..66.657 rows=22977 loops=1)

vs (without my patch, with SET enable_indexscan=off);
 Bitmap Heap Scan on t  (cost=316.93..5073.17 rows=22683 width=8) (actual 
time=10.810..26.633 rows=22977 loops=1)

vs (with my patch, with SET enable_indexscan=off):
postgres=# explain analyze SELECT * FROM t WHERE a BETWEEN 55 AND 77;
 Bitmap Heap Scan on t  (cost=316.93..823.84 rows=22683 width=8) (actual 
time=10.742..33.279 rows=22977 loops=1)

So bitmap scan is cheaper, but the cost estimate is a lot higher.  My patch
improves but doesn't completely "fix" that - bitmap scan is still costed as
more expensive, but happens to be.  This is probably not even a particularly
good example, as it's a small table cached in RAM.  There's always going to be
cases like this, certainly near the costs where the plan changes "shape".  I
think a cost difference of 10 here is very reasonable (cpu_oper_cost,
probably), but a cost difference of 5x is not.

There's not many regression tests changed.  Probably partially because bitmap
scans have an overhead (the heap scan cannot start until after the index scan
finishes), and we avoid large tests.

If there's no interest in the patch, I guess we should just close it rather
than letting it rot.

> The first patch is simply a refactoring and don't see any possible objections 
> against it.
> The second patch also looks fine to me. The logic is understandable and the 
> code is neat.
> 
> It wouldn't hurt to add a comment for this computation, though.
> + pages_fetched = pages_fetchedMAX + 
> indexCorrelation*indexCorrelation*(pages_fetchedMIN - pages_fetchedMAX);

You're right.  It's like this:
// interpolate between c==0: pages_fetched=max and c==1: pages_fetched=min
pages_fetched = min + (max-min)*(1-c**2) 
pages_fetched = min + max*(1-c**2) - min*(1-c**2)
pages_fetched = max*(1-c**2) + min - min*(1-c**2)
pages_fetched = max*(1-c**2) + min*(c**2)
pages_fetched = max - max*c**2 + min*(c**2)
pages_fetched = max + min*(c**2) - max*c**2
pages_fetched = max + c**2 * (min-max)

I'm not sure if there's a reason why it's written like that, but (min-max)
looks odd, so I wrote it like:
pages_fetched = max - c**2 * (max-min)

> The new status of this patch is: Waiting on Author
>From af1e640af2b1a80430191a38b80dde1f2b750757 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 8 Jan 2020 19:23:51 -0600
Subject: [PATCH v6 1/2] Make more clear the computation of min/max IO..

..and specifically the double use and effect of correlation.

Avoid re-use of the "pages_fetched" variable
---
 src/backend/optimizer/path/costsize.c | 47 ++-
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index f1dfdc1a4a..083448def7 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -503,12 +503,13 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 csquared;
 	double		spc_seq_page_cost,
 spc_random_page_cost;
-	C

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

2020-11-06 Thread Fujii Masao





On 2020/11/05 12:12, Kyotaro Horiguchi wrote:

At Wed, 4 Nov 2020 21:16:29 +0530, Bharath Rupireddy 
 wrote in

On Wed, Nov 4, 2020 at 2:36 PM Fujii Masao  wrote:


Regarding other two patches, I think that it's better to use MyLatch
rather than MyProc->procLatch or walrcv->latch in WaitLatch() and
ResetLatch(), like other code does. Attached are the updated versions
of the patches. Thought?



+1 for replacing MyProc->procLatch with MyLatch in the autoprewarm
module, and the patch looks good to me.


Looks good to me, too.


Thanks for the review! I pushed the patch.




I'm not quite sure to replace all the places in the walreceiver
process, for instance in WalRcvForceReply() we are using spinlock to
acquire the latch pointer and . Others may have better thoughts on
this.


The SIGTERM part looks good. The only difference between
WalRcvSigHupHandler and SignalHandlerForConfigReload is whether latch
is set or not.  I don't think it's a problem that config-reload causes
an extra wakeup.  Couldn't we do the same thing for SIGHUP?


I agree that we can use even standard SIGHUP handler in walreceiver.
I'm not sure why SetLatch() was not called in walreceiver's SIGHUP
handler so far.

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Allow some recovery parameters to be changed with reload

2020-11-06 Thread Fujii Masao





On 2020/11/06 21:36, Sergei Kornilov wrote:

Hello


Currently when restore_command is not set, archive recovery fails
at the beginning. With the patch, how should we treat the case where
retore_command is reset to empty during archive recovery? We should
reject that change of restore_command?


Good point. I think we should reject that change. But (AFAIC) I cannot use GUC 
check callback for this purpose, as only the startup process knows 
StandbyModeRequested. I think it would be appropriate to call 
validateRecoveryParameters from StartupRereadConfig.


I don't think this idea is ok because emptying restore_command and the reload
of configuration file could cause the server doing archive recovery to
shut down with FATAL error.

I'm wondering if it's safe to allow restore_command to be emptied during
archive recovery. Even when it's emptied, archive recovery can proceed
by reading WAL files from pg_wal directory. This is the same behavior as
when restore_command is set to, e.g., /bin/false. So maybe we don't need
to treat the empty restore_command so special??

OTOH, we should not remove the check of restore_command in
validateRecoveryParameters(). Otherwise, when users forget to specify
restore_command when starting archive recovery, recovery could
wrongly proceed and the database could get corrupted.

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-11-06 Thread Anastasia Lubennikova

Status update for a commitfest entry.

This thread was inactive for a while and from the latest messages, I see that 
the patch needs some further work.
So I move it to "Waiting on Author".

The new status of this patch is: Waiting on Author

Re: document pg_settings view doesn't display custom options

2020-11-06 Thread Fujii Masao

On 2020/10/31 2:06, John Naylor wrote:

On Fri, Oct 30, 2020 at 12:48 PM Tom Lane mailto:t...@sss.pgh.pa.us>> wrote:

John Naylor mailto:john.nay...@enterprisedb.com>> writes:
 > Okay, along those lines here's a patch using "this view" in a new 
paragraph
 > for simplicity.

Basically OK with me, but ...

It seems fairly weird to use a nonspecific reference first and then a
specific one.  That is, I'd expect to read "The pg_settings view ..."
and then "This view ...", not the other way around.  So we could
put this para second, or put it first but make this para say
"The pg_settings view ..." while the existing text gets reduced to
"This view ...".

Or just make them both say "This view ..." so we don't have to have
this discussion again the next time somebody wants to add a para here.

Okay, how's this?

Looks good to me. Barring any objection, I will commit the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: A problem about partitionwise join

2020-11-06 Thread Anastasia Lubennikova

Status update for a commitfest entry.

According to CFbot this patch fails to apply. Richard, can you send an update, 
please?

Also, I see that the thread was inactive for a while.
Are you going to continue this work? I think it would be helpful, if you could 
write a short recap about current state of the patch and list open questions 
for reviewers.

The new status of this patch is: Waiting on Author

Re: Advance xmin aggressively on Read Commit isolation level

2020-11-06 Thread Andy Fan

On Fri, Nov 6, 2020 at 4:54 PM Thomas Munro  wrote:

> On Fri, Nov 6, 2020 at 9:48 PM Andy Fan  wrote:
> > I have 2 ideas about this.  One is in the Read Committed level,  we can
> advance xmin
> > aggressively. suppose it started at t1, and complete a query at t2.  the
> xmin should
> > be t1 currently.  Can we advance the xmin to t2 since it is read
> committed level,
> > The data older than t2 will never be used?   Another one is can we force
> to clean
> > up the old tuples which are older than xxx?  If users want to access
> that,
> > we can just raise errors.  Oracle uses this strategy and the error code
> is
> > ORA-01555.
>
> Hi Andy,
>
> For the second idea, we have old_snapshot_threshold which does exactly
> that since 9.6.  There have been some questions about whether it works
> correctly, though: see https://commitfest.postgresql.org/30/2682/ if
> you would like to help look into that :-)
>

Hi Tomas:
   This is exactly what I want and I have big interest with that. Thanks for
the information!

-- 
Best Regards
Andy Fan

Re: bitmaps and correlation

2020-11-06 Thread Anastasia Lubennikova

Status update for a commitfest entry

According to cfbot, the patch fails to apply.  Could you please send a rebased 
version?

I wonder why this patch hangs so long without a review. Maybe it will help to 
move discussion forward, if you provide more examples of queries that can 
benefit from this imporovement?

The first patch is simply a refactoring and don't see any possible objections 
against it.
The second patch also looks fine to me. The logic is understandable and the 
code is neat.

It wouldn't hurt to add a comment for this computation, though.
+   pages_fetched = pages_fetchedMAX + 
indexCorrelation*indexCorrelation*(pages_fetchedMIN - pages_fetchedMAX);

The new status of this patch is: Waiting on Author

Re: Move OpenSSL random under USE_OPENSSL_RANDOM

2020-11-06 Thread Michael Paquier

On Fri, Nov 06, 2020 at 01:31:44PM +0100, Magnus Hagander wrote:
> I think the defensive-in-code instead of defensive-in-docs is a really
> strong argument, so I have pushed it as such.

Fine by me.  Thanks for the commit.
--
Michael


signature.asc
Description: PGP signature

Re: Any objections to implementing LogicalDecodeMessageCB for pgoutput?

2020-11-06 Thread Ashutosh Bapat

On Thu, Nov 5, 2020 at 9:16 AM David Pirotte  wrote:
>
> On Tue, Nov 3, 2020 at 7:19 AM Dave Cramer  wrote:
>>
>> David,
>>
>> On Thu, 24 Sep 2020 at 00:22, Michael Paquier  wrote:
>>>
>>> On Tue, Sep 08, 2020 at 12:18:23PM -0700, Andres Freund wrote:
>>> > A test verifying that a non-transactional message in later aborted
>>> > transaction is handled correctly would be good.
>>>
>>> On top of that, the patch needs a rebase as it visibly fails to apply,
>>> per the CF bot.
>>> --
>>> Michael
>>
>>
>> Where are you with this? Are you able to work on it ?
>> Dave Cramer
>
>
> Apologies for the delay, here.
>
> I've attached v2 of this patch which applies cleanly to master. The patch 
> also now includes a test demonstrating that pg_logical_emit_message correctly 
> sends non-transactional messages when called inside a transaction that is 
> rolled back. (Thank you, Andres, for this suggestion.) The only other change 
> is that I added this new message type into the LogicalRepMsgType enum added 
> earlier this week.
>
> Let me know what you think.

This feature looks useful. Here are some comments.

+/*
+ * Write MESSAGE to stream
+ */
+void
+logicalrep_write_message(StringInfo out, ReorderBufferTXN *txn, XLogRecPtr lsn,
+bool transactional, const char *prefix, Size sz,
+const char *message)
+{
+   uint8   flags = 0;
+
+   pq_sendbyte(out, LOGICAL_REP_MSG_MESSAGE);
+

Similar to the UPDATE/DELETE/INSERT records decoded when streaming is being
used, we need to add transaction id for transactional messages. May be we add
that even in case of non-streaming case and use it to decide whether it's a
transactional message or not. That might save us a byte when we are adding a
transaction id.

+   /* encode and send message flags */
+   if (transactional)
+   flags |= MESSAGE_TRANSACTIONAL;
+
+   pq_sendint8(out, flags);

Is 8 bits enough considering future improvements? What if we need to use more
than 8 bit flags?

@@ -1936,6 +1936,9 @@ apply_dispatch(StringInfo s)
apply_handle_origin(s);
return;

+   case LOGICAL_REP_MSG_MESSAGE:

Should we add the logical message to the WAL downstream so that it flows
further down to a cascaded logical replica. Should that be controlled
by an option?

-- 
Best Wishes,
Ashutosh Bapat

Re: Move catalog toast table and index declarations

2020-11-06 Thread John Naylor

On Thu, Nov 5, 2020 at 2:20 PM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2020-11-05 12:59, John Naylor wrote:
> > I think we're talking past eachother. Here's a concrete example:
> >
> > #define BKI_ROWTYPE_OID(oid,oidmacro)
> > #define DECLARE_TOAST(name,toastoid,indexoid) extern int no_such_variable
> >
> > I understand these to be functionally equivalent as far as what the C
> > compiler sees.
>
> The issue is that you can't have a bare semicolon at the top level of a
> C compilation unit, at least on some compilers.  So doing
>
> #define FOO(stuff) /*empty*/
>
> and then
>
> FOO(123);
>
> won't work.  You need to fill the definition of FOO with some stuff to
> make it valid.
>
> BKI_ROWTYPE_OID on the other hand is not used at the top level like
> this, so it can be defined to empty.
>

Thank you for the explanation.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Allow some recovery parameters to be changed with reload

2020-11-06 Thread Sergei Kornilov

Hello

> Currently when restore_command is not set, archive recovery fails
> at the beginning. With the patch, how should we treat the case where
> retore_command is reset to empty during archive recovery? We should
> reject that change of restore_command?

Good point. I think we should reject that change. But (AFAIC) I cannot use GUC 
check callback for this purpose, as only the startup process knows 
StandbyModeRequested. I think it would be appropriate to call 
validateRecoveryParameters from StartupRereadConfig. As side effect this add 
warning/hint "specified neither primary_conninfo nor restore_command" in 
standby mode in appropriate configuration state. Not sure about the rest checks 
in validateRecoveryParameters, maybe it's a wrong idea to recheck them here and 
I need to separate these checks into another function.

regards, Sergeidiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f043433e31..ec74cb43ad 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3542,7 +3542,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows

 

-This parameter can only be set at server start.
+This parameter can only be set in the postgresql.conf
+file or on the server command line.

   
  
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a1078a7cfc..148dd34633 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -883,7 +883,6 @@ static MemoryContext walDebugCxt = NULL;
 #endif
 
 static void readRecoverySignalFile(void);
-static void validateRecoveryParameters(void);
 static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
 static bool recoveryStopsBefore(XLogReaderState *record);
 static bool recoveryStopsAfter(XLogReaderState *record);
@@ -5458,7 +5457,7 @@ readRecoverySignalFile(void)
  errmsg("standby mode is not supported by single-user servers")));
 }
 
-static void
+void
 validateRecoveryParameters(void)
 {
 	if (!ArchiveRecoveryRequested)
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index eab9c8c4ed..bf7dc866db 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -114,6 +114,11 @@ StartupRereadConfig(void)
 
 	if (conninfoChanged || slotnameChanged || tempSlotChanged)
 		StartupRequestWalReceiverRestart();
+
+	/*
+	 * Check the combination of new parameters
+	 */
+	validateRecoveryParameters();
 }
 
 /* Handle various signals that might be sent to the startup process */
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a62d64eaa4..87fd593924 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3699,7 +3699,7 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
-		{"restore_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY,
+		{"restore_command", PGC_SIGHUP, WAL_ARCHIVE_RECOVERY,
 			gettext_noop("Sets the shell command that will be called to retrieve an archived WAL file."),
 			NULL
 		},
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cb571f7cc..9c9091e601 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -253,7 +253,6 @@
 # placeholders: %p = path of file to restore
 #   %f = file name only
 # e.g. 'cp /mnt/server/archivedir/%f %p'
-# (change requires restart)
 #archive_cleanup_command = ''	# command to execute at every restartpoint
 #recovery_end_command = ''	# command to execute at completion of recovery
 
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 221af87e71..965ed109b3 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -347,6 +347,7 @@ extern void WakeupRecovery(void);
 extern void SetWalWriterSleeping(bool sleeping);
 
 extern void StartupRequestWalReceiverRestart(void);
+extern void validateRecoveryParameters(void);
 extern void XLogRequestWalReceiverReply(void);
 
 extern void assign_max_wal_size(int newval, void *extra);

Re: Move OpenSSL random under USE_OPENSSL_RANDOM

2020-11-06 Thread Magnus Hagander

On Fri, Nov 6, 2020 at 12:08 PM Daniel Gustafsson  wrote:
>
> > On 6 Nov 2020, at 00:36, Michael Paquier  wrote:
>
> > I still don't see the point of this extra complexity, as
> > USE_OPENSSL_RANDOM implies USE_OPENSSL,
>
> As long as we're sure that we'll remember to fix this when that assumption no
> longer holds (intentional or unintentional) then it's fine to skip and instead
> be defensive in documentation rather than code.

I think the defensive-in-code instead of defensive-in-docs is a really
strong argument, so I have pushed it as such.


--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Corruption during WAL replay

2020-11-06 Thread Masahiko Sawada

On Tue, Aug 18, 2020 at 3:22 AM Andres Freund  wrote:
>
> Hi,
>
> On 2020-08-17 14:05:37 +0300, Heikki Linnakangas wrote:
> > On 14/04/2020 22:04, Teja Mupparti wrote:
> > > Thanks Kyotaro and Masahiko for the feedback. I think there is a
> > > consensus on the critical-section around truncate,
> >
> > +1
>
> I'm inclined to think that we should do that independent of the far more
> complicated fix for other related issues.

+1

If we had a critical section in RelationTruncate(), crash recovery
would continue failing until the situation of the underlying file is
recovered if a PANIC happens. The current comment in
RelationTruncate() says it’s worse than the disease. But considering
physical replication, as Andres mentioned, a failure to truncate the
file after logging WAL is no longer a harmless failure. Also, the
critical section would be necessary even if we reversed the order of
truncation and dropping buffers and resolved the issue. So I agree to
proceed with the patch that adds a critical section independent of
fixing other related things discussed in this thread. If Teja seems
not to work on this I’ll write the patch.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

Re: [Patch] Optimize dropping of relation buffers using dlist

2020-11-06 Thread Amit Kapila

On Fri, Nov 6, 2020 at 11:10 AM Thomas Munro  wrote:
>
> On Fri, Nov 6, 2020 at 5:09 PM Amit Kapila  wrote:
> >
> > It is not very clear to me how this argument applies to the patch
> > in-discussion where we are relying on the cached value of blocks
> > during recovery. I understand your point that we might skip scanning
> > the pages and thus might not show some recently added data but that
> > point is not linked with what we are trying to do with this patch.
>
> It's an argument for giving up the hard-to-name cache trick completely
> and going back to using unmodified smgrnblocks(), both in recovery and
> online.  If the only mechanism for unexpected file shrinkage is
> writeback failure, then your system will be panicking soon enough
> anyway
>

How else (except for writeback failure due to unexpected shrinkage)
the system will panic? Are you saying that if users don't get some
data due to lseek lying with us then it will be equivalent to panic or
are you indicating the scenario where ReadBuffer_common gives error
"unexpected data beyond EOF "?

> -- so is it really that bad if there are potentially some other
> weird errors logged some time before that?  Maybe those errors will
> even take the system down sooner, and maybe that's appropriate?
>

Yeah, it might be appropriate to panic in such situations but
ReadBuffer_common gives an error and ask user to update the system.

>  If
> there are other mechanisms for random file shrinkage that don't imply
> a panic in your near future, then we have bigger problems that can't
> be solved by any number of bandaids, at least not without
> understanding the details of this hypothetical unknown failure mode.
>

I think one of the problems is returning fewer rows and that too
without any warning or error, so maybe that is a bigger problem but we
seem to be okay with it as that is already a known thing though I
think that is not documented anywhere.

> The main argument I can think of against the idea of using plain old
> smgrnblocks() is that the current error messages on smgrwrite()
> failure for stray blocks would be indistinguishible from cases where
> an external actor unlinked the file.  I don't mind getting an error
> that prevents checkpointing -- your system is in big trouble! -- but
> it'd be nice to be able to detect that *we* unlinked the file,
> implying the filesystem and bufferpool are out of sync, and spit out a
> special diagnostic message.  I suppose if it's the checkpointer doing
> the writing, it could check if the relfilenode is on the
> queued-up-for-delete-after-the-checkpoint list, and if so, it could
> produce a different error message just for this edge case.
> Unfortunately that's not a general solution, because any backend might
> try to write a buffer out and they aren't synchronised with
> checkpoints.
>

Yeah, but I am not sure if we can consider manual (external actor)
tinkering with the files the same as something that happened due to
the database server relying on the wrong information.

> I'm not sure what the best approach is.  It'd certainly be nice to be
> able to drop small tables quickly online too, as a benefit of this
> approach.

Right, that is why I was thinking to do it only for recovery where it
is safe from the database server perspective. OTOH, if we broadly
accept that any time filesystem lies with us the behavior could be
unpredictable like the system can return fewer rows than expected or
it could cause panic. I think there is an argument that it might be
better to error out (even with panic) rather than silently returning
fewer rows but unfortunately detecting the same in each and every case
doesn't seem feasible.

One vague idea could be to develop pg_test_seek which can detect such
problems but not sure if we can rely on such a tool to always give us
the right answer. Were you able to consistently reproduce the lseek
problem on the system where you have tried?

-- 
With Regards,
Amit Kapila.

Re: Some doubious code in pgstat.c

2020-11-06 Thread Amit Kapila

On Thu, Nov 5, 2020 at 2:13 PM Kyotaro Horiguchi
 wrote:
>
> At Thu, 5 Nov 2020 11:48:24 +0530, Amit Kapila  
> wrote in
> > On Thu, Nov 5, 2020 at 9:44 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Thu, Nov 5, 2020 at 11:18 AM Kyotaro Horiguchi
> > >  wrote:
> > > > As another issue, just replace memcpy with strlcpy makes compiler
> > > > complain of type mismatch, as the first paramter to memcpy had an
> > > > needless "&" operator. I removed it in this patch.
> > > >
> > > > (&msg.m_slotname is a "char (*)[NAMEDATALEN]", not a "char *".)
> > > >
> > >
> > > The patch looks good to me.
> > >
> >
> > LGTM as well but the proposed commit message seems to be a bit
> > unclear. How about something like this:
> > "Use strlcpy instead of memcpy for copying the slot name in pgstat.c.
> >
> > There is no outright bug here but it is better to be consistent with
> > the usage at other places in the same file. In the passing, fix a wrong
> > Assertion in pgstat_recv_replslot."
>
> Looks better, thanks.
>

Pushed!

-- 
With Regards,
Amit Kapila.

Re: Move OpenSSL random under USE_OPENSSL_RANDOM

2020-11-06 Thread Daniel Gustafsson

> On 6 Nov 2020, at 00:36, Michael Paquier  wrote:

> I still don't see the point of this extra complexity, as
> USE_OPENSSL_RANDOM implies USE_OPENSSL,

As long as we're sure that we'll remember to fix this when that assumption no
longer holds (intentional or unintentional) then it's fine to skip and instead
be defensive in documentation rather than code.

cheers ./daniel

Re: Advance xmin aggressively on Read Commit isolation level

2020-11-06 Thread Thomas Munro

On Fri, Nov 6, 2020 at 9:48 PM Andy Fan  wrote:
> I have 2 ideas about this.  One is in the Read Committed level,  we can 
> advance xmin
> aggressively. suppose it started at t1, and complete a query at t2.  the xmin 
> should
> be t1 currently.  Can we advance the xmin to t2 since it is read committed 
> level,
> The data older than t2 will never be used?   Another one is can we force to 
> clean
> up the old tuples which are older than xxx?  If users want to access that,
> we can just raise errors.  Oracle uses this strategy and the error code is
> ORA-01555.

Hi Andy,

For the second idea, we have old_snapshot_threshold which does exactly
that since 9.6.  There have been some questions about whether it works
correctly, though: see https://commitfest.postgresql.org/30/2682/ if
you would like to help look into that :-)

Advance xmin aggressively on Read Commit isolation level

2020-11-06 Thread Andy Fan

Since the impact of the long transaction is huge in postgresql,  for example
a long transaction by incident, tables may become very huge and it can't
become small again even the transaction is completed unless a vacuum full
is used.

I have 2 ideas about this.  One is in the Read Committed level,  we can
advance xmin
aggressively. suppose it started at t1, and complete a query at t2.  the
xmin should
be t1 currently.  Can we advance the xmin to t2 since it is read committed
level,
The data older than t2 will never be used?   Another one is can we force to
clean
up the old tuples which are older than xxx?  If users want to access that,
we can just raise errors.  Oracle uses this strategy and the error code is
ORA-01555.


-- 
Best Regards
Andy Fan

Re: Protect syscache from bloating with negative cache entries

2020-11-06 Thread Heikki Linnakangas


On 06/11/2020 10:24, Kyotaro Horiguchi wrote:

Thank you for the comment!

First off, I thought that I managed to eliminate the degradation
observed on the previous versions, but significant degradation (1.1%
slower) is still seen in on case.


One thing to keep in mind with micro-benchmarks like this is that even 
completely unrelated code changes can change the layout of the code in 
memory, which in turn can affect CPU caching affects in surprising ways. 
If you're lucky, you can see 1-5% differences just by adding a function 
that's never called, for example, if it happens to move other code in 
memory so that a some hot codepath or struct gets split across CPU cache 
lines. It can be infuriating when benchmarking.



At Thu, 5 Nov 2020 11:09:09 +0200, Heikki Linnakangas  wrote in
(A) original test patch

I naively thought that the code path is too short to bury the
degradation of additional a few instructions.  Actually I measured
performance again with the same patch set on the current master and
had the more or less the same result.

master 8195.58ms, patched 8817.40 ms: +10.75%

However, I noticed that the additional call was a recursive call and a
jmp inserted for the recursive call seems taking significant
time. After avoiding the recursive call, the difference reduced to
+0.96% (master 8268.71ms : patched 8348.30ms)

Just two instructions below are inserted in this case, which looks
reasonable.

   8720ff <+31>:  cmpl   $0x,0x4ba942(%rip)# 0xd2ca48 

   872106 <+38>:  jl 0x872240  (call to a function)


That's interesting. I think a 1% degradation would be acceptable.

I think we'd like to enable this feature by default though, so the 
performance when it's enabled is also very important.



(C) inserting bare counter-update code without a branch


Do we actually need a branch there? If I understand correctly, the
point is to bump up a usage counter on the catcache entry. You could
increment the counter unconditionally, even if the feature is not
used, and avoid the branch that way.


That change causes 4.9% degradation, which is worse than having a
branch.

master 8364.54ms, patched 8666.86ms (+4.9%)

The additional instructions follow.

+ 8721ab <+203>:  mov0x30(%rbx),%eax  # %eax = ct->naccess
+ 8721ae <+206>:  mov$0x2,%edx
+ 8721b3 <+211>:  add$0x1,%eax# %eax++
+ 8721b6 <+214>:  cmove  %edx,%eax# if %eax == 0 then %eax = 2

+ 8721bf <+223>:  mov%eax,0x30(%rbx)  # ct->naccess = %eax
+ 8721c2 <+226>:  mov0x4cfe9f(%rip),%rax# 0xd42068 
+ 8721c9 <+233>:  mov%rax,0x38(%rbx)  # ct->lastaccess = %rax


Do you need the "ntaccess == 2" test? You could always increment the 
counter, and in the code that uses ntaccess to decide what to evict, 
treat all values >= 2 the same.


Need to handle integer overflow somehow. Or maybe not: integer overflow 
is so infrequent that even if a hot syscache entry gets evicted 
prematurely because its ntaccess count wrapped around to 0, it will 
happen so rarely that it won't make any difference in practice.


- Heikki

Re: Protect syscache from bloating with negative cache entries

2020-11-06 Thread Kyotaro Horiguchi

me> First off, I thought that I managed to eliminate the degradation
me> observed on the previous versions, but significant degradation (1.1%
me> slower) is still seen in on case.

While trying benchmarking with many patterns, I noticed that it slows
down catcache search significantly to call CatCacheCleanupOldEntries()
even if the function does almost nothing.  Oddly enough the
degradation gets larger if I removed the counter-updating code from
SearchCatCacheInternal. It seems that RehashCatCache is called far
frequently than I thought and CatCacheCleanupOldEntries was suffering
the branch penalty.

The degradation vanished by a likely() attached to the condition. On
the contrary patched version is constantly slightly faster than
master.

For now, I measured the patch with three access patterns as the
catcachebench was designed.

 master  patched-off patched-on(300s)
test 1   3898.18ms   3896.11ms (-0.1%)   3889.44ms (-  0.2%)
test 2   8013.37ms   8098.51ms (+1.1%)   8640.63ms (+  7.8%)
test 3   6146.95ms   6147.91ms (+0.0%)  15466   ms (+152  %)

master : This patch is not applied.
patched-off: This patch is applied and catalog_cache_prune_min_age = -1
patched-on : This patch is applied and catalog_cache_prune_min_age = 0

test 1: Creates many negative entries in STATRELATTINH
(expiration doesn't happen)
test 2: Repeat fetch several negative entries for many times.
test 3: test 1 with expiration happens.

The result looks far better, but the test 2 still shows a small
degradation... I'll continue investigating it..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 9516267f0e2943cf955cbbfe5133c13c36288ee6 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 6 Nov 2020 17:27:18 +0900
Subject: [PATCH v4] CatCache expiration feature

---
 src/backend/access/transam/xact.c  |   3 +
 src/backend/utils/cache/catcache.c | 125 +
 src/backend/utils/misc/guc.c   |  12 +++
 src/include/utils/catcache.h   |  20 +
 4 files changed, 160 insertions(+)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index af6afcebb1..a246fcc4c0 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1086,6 +1086,9 @@ static void
 AtStart_Cache(void)
 {
 	AcceptInvalidationMessages();
+
+	if (xactStartTimestamp != 0)
+		SetCatCacheClock(xactStartTimestamp);
 }
 
 /*
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 3613ae5f44..f63224bfd5 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -38,6 +38,7 @@
 #include "utils/rel.h"
 #include "utils/resowner_private.h"
 #include "utils/syscache.h"
+#include "utils/timestamp.h"
 
 
  /* #define CACHEDEBUG */	/* turns DEBUG elogs on */
@@ -60,9 +61,18 @@
 #define CACHE_elog(...)
 #endif
 
+/*
+ * GUC variable to define the minimum age of entries that will be considered
+ * to be evicted in seconds. -1 to disable the feature.
+ */
+int catalog_cache_prune_min_age = -1;
+
 /* Cache management header --- pointer is NULL until created */
 static CatCacheHeader *CacheHdr = NULL;
 
+/* Clock for the last accessed time of a catcache entry. */
+TimestampTz	catcacheclock = 0;
+
 static inline HeapTuple SearchCatCacheInternal(CatCache *cache,
 			   int nkeys,
 			   Datum v1, Datum v2,
@@ -74,6 +84,7 @@ static pg_noinline HeapTuple SearchCatCacheMiss(CatCache *cache,
 Index hashIndex,
 Datum v1, Datum v2,
 Datum v3, Datum v4);
+static bool CatCacheCleanupOldEntries(CatCache *cp);
 
 static uint32 CatalogCacheComputeHashValue(CatCache *cache, int nkeys,
 		   Datum v1, Datum v2, Datum v3, Datum v4);
@@ -99,6 +110,12 @@ static void CatCacheFreeKeys(TupleDesc tupdesc, int nkeys, int *attnos,
 static void CatCacheCopyKeys(TupleDesc tupdesc, int nkeys, int *attnos,
 			 Datum *srckeys, Datum *dstkeys);
 
+/* GUC assign function */
+void
+assign_catalog_cache_prune_min_age(int newval, void *extra)
+{
+	catalog_cache_prune_min_age = newval;
+}
 
 /*
  *	internal support functions
@@ -863,6 +880,10 @@ RehashCatCache(CatCache *cp)
 	int			newnbuckets;
 	int			i;
 
+	/* try removing old entries before expanding hash */
+	if (CatCacheCleanupOldEntries(cp))
+		return;
+
 	elog(DEBUG1, "rehashing catalog cache id %d for %s; %d tups, %d buckets",
 		 cp->id, cp->cc_relname, cp->cc_ntup, cp->cc_nbuckets);
 
@@ -1264,6 +1285,20 @@ SearchCatCacheInternal(CatCache *cache,
 		 */
 		dlist_move_head(bucket, &ct->cache_elem);
 
+		/*
+		 * Prolong life of this entry. Since we want run as less instructions
+		 * as possible and want the branch be stable for performance reasons,
+		 * we don't give a strict cap on the counter. All numbers above 1 will
+		 * be regarded as 2 in CatCacheCleanupOldEntries().
+		 */
+		if (unlikely(catalog_cache_prune_min_age >= 0))
+		{
+			ct->naccess++;
+			if (unlikely(ct->nacces

Re: Protect syscache from bloating with negative cache entries

2020-11-06 Thread Kyotaro Horiguchi

Thank you for the comment!

First off, I thought that I managed to eliminate the degradation
observed on the previous versions, but significant degradation (1.1%
slower) is still seen in on case.

Anyway, before sending the new patch, let met just answer for the
comments.

At Thu, 5 Nov 2020 11:09:09 +0200, Heikki Linnakangas  wrote 
in 
> On 19/11/2019 12:48, Kyotaro Horiguchi wrote:
> > 1. Inserting a branch in
> > SearchCatCacheInternal. (CatCache_Pattern_1.patch)
> >   This is the most straightforward way to add an alternative feature.
> > pattern 1 | 8459.73 |  28.15  # 9% (>> 1%) slower than 7757.58
> > pattern 1 | 8504.83 |  55.61
> > pattern 1 | 8541.81 |  41.56
> > pattern 1 | 8552.20 |  27.99
> > master| 7757.58 |  22.65
> > master| 7801.32 |  20.64
> > master| 7839.57 |  25.28
> > master| 7925.30 |  38.84
> >   It's so slow that it cannot be used.
> 

> This is very surprising. A branch that's never taken ought to be
> predicted by the CPU's branch-predictor, and be very cheap.

(A) original test patch

I naively thought that the code path is too short to bury the
degradation of additional a few instructions.  Actually I measured
performance again with the same patch set on the current master and
had the more or less the same result.

master 8195.58ms, patched 8817.40 ms: +10.75%

However, I noticed that the additional call was a recursive call and a
jmp inserted for the recursive call seems taking significant
time. After avoiding the recursive call, the difference reduced to
+0.96% (master 8268.71ms : patched 8348.30ms)

Just two instructions below are inserted in this case, which looks
reasonable.

  8720ff <+31>: cmpl   $0x,0x4ba942(%rip)# 0xd2ca48 

  872106 <+38>: jl 0x872240  (call to a function)


(C) inserting bare counter-update code without a branch

> Do we actually need a branch there? If I understand correctly, the
> point is to bump up a usage counter on the catcache entry. You could
> increment the counter unconditionally, even if the feature is not
> used, and avoid the branch that way.

That change causes 4.9% degradation, which is worse than having a
branch.

master 8364.54ms, patched 8666.86ms (+4.9%)

The additional instructions follow.

+ 8721ab <+203>:mov0x30(%rbx),%eax  # %eax = ct->naccess
+ 8721ae <+206>:mov$0x2,%edx
+ 8721b3 <+211>:add$0x1,%eax# %eax++
+ 8721b6 <+214>:cmove  %edx,%eax# if %eax == 0 then %eax = 2

+ 8721bf <+223>:mov%eax,0x30(%rbx)  # ct->naccess = %eax
+ 8721c2 <+226>:mov0x4cfe9f(%rip),%rax# 0xd42068 

+ 8721c9 <+233>:mov%rax,0x38(%rbx)  # ct->lastaccess = %rax


(D) naively branching then updateing, again.

Come to think of this, I measured the same with a branch again,
specifically: (It showed siginificant degradation before, in my
memory.)

  dlsit_move_head(bucket, &ct->cache_elem);

+ if (catalog_cache_prune_min_age < -1)  # never be true
+ {
+(counter update)
+ }

And I had effectively the same numbers from both master and patched.

master 8066.93ms, patched 8052.37ms (-0.18%)

The above branching inserts the same two instructions with (B) into
different place but the result differs, for a reason uncertain to me.

+  8721bb <+203>:   cmpl   $0x,0x4bb886(%rip)   # 

+  8721c2 <+210>:   jl 0x872208 

I'm not sure why but the patched beats the master by a small
difference.  Anyway ths new result shows that compiler might have got
smarter than before?


(E) bumping up in ReleaseCatCache() (won't work)

> Another thought is to bump up the usage counter in ReleaseCatCache(),
> and only when the refcount reaches zero. That might be somewhat
> cheaper, if it's a common pattern to acquire additional leases on an
> entry that's already referenced.
> 
> Yet another thought is to replace 'refcount' with an 'acquirecount'
> and 'releasecount'. In SearchCatCacheInternal(), increment
> acquirecount, and in ReleaseCatCache, increment releasecount. When
> they are equal, the entry is not in use. Now you have a counter that
> gets incremented on every access, with the same number of CPU
> instructions in the hot paths as we have today.

These don't work for negative caches, since the corresponding tuples
are never released.


(F) removing less-significant code.

> Or maybe there are some other ways we could micro-optimize
> SearchCatCacheInternal(), to buy back the slowdown that this feature

Yeah, I thought of that in the beginning. (I removed dlist_move_head()
at the time.)  But the most difficult aspect of this approach is that
I cannot tell whether the modification never cause degradation or not.

> would add? For example, you could remove the "if (cl->dead) continue;"
> check, if dead entries were kept out of the hash buckets. Or maybe the
> catctup struct could be made slightly smaller somehow, so that it
> would fit more comfortably in a single cache line.

As a trial, I removed that code and adde

RE: extension patch of CREATE OR REPLACE TRIGGER

2020-11-06 Thread osumi.takami...@fujitsu.com

Hello


On Friday, November 6, 2020 2:25 PM Peter Smith  wrote:
> > > Yes, OK. But it might be simpler still just to it like:
> > > "CREATE OR REPLACE TRIGGER works only for replacing a regular (not
> > > constraint) trigger with another regular trigger."
> > Yeah, this kind of supplementary words help user to understand the
> > exact usage of this feature. Thanks.
> 
> Actually, I meant that after making that 1st sentence wording change, I
> thought the 2nd sentence (i.e. "That means it is impossible...") is no longer
> needed at all since it is just re-stating what the 1st sentence already says.
> 
> But if you prefer to leave it worded how it is now that is ok too.
The simpler, the better for sure ? I deleted that 2nd sentence.


> > > > > (9) COMMENT
> (snip)
> > I understand that
> > I need to add 2 syntax error cases and
> > 1 error case to replace constraint trigger at least. It makes sense.
> > At the same time, I supposed that the order of the tests in v15 patch
> > is somehow hard to read.
> > So, I decided to sort out those and take your new sets of tests there.
> > What I'd like to test there is not different, though.
> > Please have a look at the new patch.
> 
> Yes, the tests are generally OK, but unfortunately a few new problems are
> introduced with the refactoring of the combination tests.
> 
> 1) It looks like about 40 lines of test code are cut/paste 2 times by accident
This was not a mistake. The cases of 40 lines are with OR REPLACE to define
each regular trigger that will be overwritten.
But, it doesn't make nothing probably so I deleted such cases.
Please forget that part.

> 2) Typo "gramatically" --> "grammatically"
> 3) Your last test described as "create or replace constraint trigger is not
> gramatically correct." is not really doing what it is meant to do. That test 
> was
> supposed to be trying to replace an existing CONSTRAINT trigger.
Sigh. Yeah, those were not right. Fixed.


> 
> IMO if all the combination tests were consistently commented like my 8
> examples below then risk of accidental mistakes is reduced.
> e.g.
> -- 1. Overwrite existing regular trigger with regular trigger (without OR
> REPLACE)
> -- 2. Overwrite existing regular trigger with regular trigger (with OR 
> REPLACE)
> -- 3. Overwrite existing regular trigger with constraint trigger (without OR
> REPLACE)
> -- 4. Overwrite existing regular trigger with constraint trigger (with OR
> REPLACE)
> -- 5. Overwrite existing constraint trigger with regular trigger (without OR
> REPLACE)
> -- 6. Overwrite existing constraint trigger with regular trigger (with OR
> REPLACE)
> -- 7. Overwrite existing constraint trigger with constraint trigger (without 
> OR
> REPLACE)
> -- 8. Overwrite existing constraint trigger with constraint trigger (with OR
> REPLACE)
> 
> To avoid any confusion I have attached triggers.sql updated how I think it
> should be. Please compare it to see what I mean. PSA.
> 
> I hope it helps.
I cannot thank you enough.


Best,
Takamichi Osumi


CREATE_OR_REPLACE_TRIGGER_v17.patch
Description: CREATE_OR_REPLACE_TRIGGER_v17.patch

46 matches

Mail list logo