Re: Connection slots reserved for replication

2019-02-12 Thread Oleksii Kliukin



> On 12. Feb 2019, at 05:12, Michael Paquier  wrote:
> 
> On Mon, Feb 11, 2019 at 08:57:41PM -0700, Kevin Hale Boyes wrote:
>> I think there's a small problem with the commit.
>> The position of xlrec.max_wal_senders (line 117) should be below
>> max_worker_processes.
> 
> Fixed, thanks!

Wonderful, thank you very much for taking it to commit and everyone involved 
for getting it through!

Cheers,
Oleksii


Re: Connection slots reserved for replication

2019-02-11 Thread Michael Paquier
On Mon, Feb 11, 2019 at 08:57:41PM -0700, Kevin Hale Boyes wrote:
> I think there's a small problem with the commit.
> The position of xlrec.max_wal_senders (line 117) should be below
> max_worker_processes.

Fixed, thanks!
--
Michael


signature.asc
Description: PGP signature


Re: Connection slots reserved for replication

2019-02-11 Thread Kevin Hale Boyes
On Mon, 11 Feb 2019 at 18:39, Michael Paquier  wrote:

> And done.


Michael,
I think there's a small problem with the commit.
The position of xlrec.max_wal_senders (line 117) should be below
max_worker_processes.

112

appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
113

 "max_wal_senders=%d max_prepared_xacts=%d "
114

 "max_locks_per_xact=%d wal_level=%s "
115

 "wal_log_hints=%s track_commit_timestamp=%s",
116

 xlrec.MaxConnections,
117

 xlrec.max_wal_senders,
118

 xlrec.max_worker_processes,
119

 xlrec.max_prepared_xacts,
120

 xlrec.max_locks_per_xact,
121

 wal_level_str,
122

 xlrec.wal_log_hints ? "on" : "off",
123

 xlrec.track_commit_timestamp ? "on" : "off");

Kevin.


Re: Connection slots reserved for replication

2019-02-11 Thread Michael Paquier
On Fri, Feb 08, 2019 at 09:37:40AM +0900, Michael Paquier wrote:
> Thanks for the confirmation.  I am not planning to finish wrapping
> this one today anyway, and next week should be fine.

And done.  I have done an extra pass on it, reordering a bit
parameters and comments, and removed the extra comments I added
previously in guc.c.
--
Michael


signature.asc
Description: PGP signature


Re: Connection slots reserved for replication

2019-02-07 Thread Michael Paquier
On Thu, Feb 07, 2019 at 04:16:22PM +0100, Oleksii Kliukin wrote:
> On 7. Feb 2019, at 07:51, Michael Paquier  wrote:
> I’ve noticed you returned the 'see max_connections’ parameter there. As 
> noticed
> previously in this thread by Kyotaro Horiguchi, it’s not clear what exactly we
> are supposed to see there, perhaps it makes sense to elaborate (besides, the
> comment on max_wal_senders at guc.c:382 hints that max_wal_senders depends on
> max_connections, which is not true anymore).

After waking up on that, it seems that I overdid this part.  I think
that I was trying to document the relationship with the multiple
parameters.  Now it is true that it is easy enough to grep for the GUC
strings and find them.  It may be better to avoid spawning the
calculations in the check functions in multiple lines as well.

>> It seems to me that it would be a good idea to have an
>> autovacuum-worker-related message in the context of InitProcess() for
>> a failed backend creation, but we can deal with that later if
>> necessary.
> 
> Hm.. I am wondering how the autovacuum workers can run out of slots
> there?

Normally they cannot if you look at the way the launcher decides new
workers.  Still, if one begins to hack the autovacuum code for a fork
or a new feature, I think that it would be useful to display a
context-related message instead of the confusing "too many clients" in
this case.  This way it is possible to debug properly things.

>> With all that reviewed, I finish with the attached that I am
>> comfortable enough to commit.  XLOG_PAGE_MAGIC is bumped as well, as a
>> reminder because xl_parameter_change gets an upgrade, and I am most
>> likely going to forget it.
>> 
>> Please let me know if you have comments.  I am still planning to do an
>> extra pass on it.
> 
> Apart from the comment-related notes above your changes look good to
> me, thank you.

Thanks for the confirmation.  I am not planning to finish wrapping
this one today anyway, and next week should be fine.
--
Michael


signature.asc
Description: PGP signature


Re: Connection slots reserved for replication

2019-02-07 Thread Oleksii Kliukin


> On 7. Feb 2019, at 07:51, Michael Paquier  wrote:
> 
> On Sat, Feb 02, 2019 at 10:35:20AM +0900, Michael Paquier wrote:
>> Oh, I have just noticed this patch when doing my vacuum homework.  I
>> have just added my name as committer, just wait a bit until the CF is
>> closed before I begin looking at it..
> 
> So, I have looked at this thread and the latest patch, and the thing
> looks in good shape.  Nice job by the authors and the reviewers.  It
> is a bit of a pain for users to hope that max_connections would be
> able to handle replication connections when needed, which can cause
> backups to not happen.  Using a separate GUC while we have
> max_wal_senders here to do the job is also not a good idea, so the
> approach of the patch is sound.  And on top of that, dependencies
> between GUCs get reduced.

Thank you for picking it up!

> 
> I have spotted one error though: the patch does not check if changing
> max_wal_senders overflows MAX_BACKEND, and we need to reflect a proper
> calculation into check_autovacuum_max_workers,
> check_max_worker_processes and check_maxconnections.

Good catch, thank you. 

I’ve noticed you returned the 'see max_connections’ parameter there. As noticed
previously in this thread by Kyotaro Horiguchi, it’s not clear what exactly we
are supposed to see there, perhaps it makes sense to elaborate (besides, the
comment on max_wal_senders at guc.c:382 hints that max_wal_senders depends on
max_connections, which is not true anymore).

> 
> One thing is that the slot position calculation gets a bit more
> complicated with the new slot set for WAL senders, still the code is
> straight-forward to follow so that's not really an issue in my
> opinion.  The potential backward-incompatible issue issue of creating
> a standby with lower max_wal_senders value than the primary is also
> something we can live with as that's simple enough to address, and I'd
> like to think that most users prefer inheriting the parameter from the
> primary to ease failover flows.

+1

> 
> It seems to me that it would be a good idea to have an
> autovacuum-worker-related message in the context of InitProcess() for
> a failed backend creation, but we can deal with that later if
> necessary.

Hm.. I am wondering how the autovacuum workers can run out of slots there?

> 
> With all that reviewed, I finish with the attached that I am
> comfortable enough to commit.  XLOG_PAGE_MAGIC is bumped as well, as a
> reminder because xl_parameter_change gets an upgrade, and I am most
> likely going to forget it.
> 
> Please let me know if you have comments.  I am still planning to do an
> extra pass on it.

Apart from the comment-related notes above your changes look good to me, thank
you.

Cheers,
Oleksii


Re: Connection slots reserved for replication

2019-02-07 Thread Kyotaro HORIGUCHI
Hello.

At Thu, 7 Feb 2019 15:51:46 +0900, Michael Paquier  wrote 
in <20190207065146.gn4...@paquier.xyz>
> On Sat, Feb 02, 2019 at 10:35:20AM +0900, Michael Paquier wrote:
> > Oh, I have just noticed this patch when doing my vacuum homework.  I
> > have just added my name as committer, just wait a bit until the CF is
> > closed before I begin looking at it..
> 
> So, I have looked at this thread and the latest patch, and the thing
> looks in good shape.  Nice job by the authors and the reviewers.  It
> is a bit of a pain for users to hope that max_connections would be
> able to handle replication connections when needed, which can cause
> backups to not happen.  Using a separate GUC while we have
> max_wal_senders here to do the job is also not a good idea, so the
> approach of the patch is sound.  And on top of that, dependencies
> between GUCs get reduced.
> 
> I have spotted one error though: the patch does not check if changing
> max_wal_senders overflows MAX_BACKEND, and we need to reflect a proper
> calculation into check_autovacuum_max_workers,
> check_max_worker_processes and check_maxconnections.

I don't think it is a good thing, including the existing checker
functions. But as you (seem to have) wrote below it can be
another issue. So I agree to that.

> One thing is that the slot position calculation gets a bit more
> complicated with the new slot set for WAL senders, still the code is
> straight-forward to follow so that's not really an issue in my

I was hesitating to propose to change it (in InitProcGlobal) but
I like the fixed style.

> opinion.  The potential backward-incompatible issue issue of creating
> a standby with lower max_wal_senders value than the primary is also
> something we can live with as that's simple enough to address, and I'd
> like to think that most users prefer inheriting the parameter from the
> primary to ease failover flows.

I belive so.

> It seems to me that it would be a good idea to have an
> autovacuum-worker-related message in the context of InitProcess() for
> a failed backend creation, but we can deal with that later if
> necessary.

(Maybe I'm losing the point, but) The guc validate functions for
max_connections and friends seem rather harmful than useless,
since we only see an error for max_connections and others won't
be seen, and it conceals what is happening. I think we should
remove the validators and InitializeMaxBackends should complain
on that providing more meaningful information. In another patch?

> With all that reviewed, I finish with the attached that I am
> comfortable enough to commit.  XLOG_PAGE_MAGIC is bumped as well, as a
> reminder because xl_parameter_change gets an upgrade, and I am most
> likely going to forget it.

Some removed comments are revived but I'm fine with it. Adding
tags in documentation seems fine.

> Please let me know if you have comments.  I am still planning to do an
> extra pass on it.

After all I (am not the author) am fine with it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Connection slots reserved for replication

2019-02-06 Thread Michael Paquier
On Sat, Feb 02, 2019 at 10:35:20AM +0900, Michael Paquier wrote:
> Oh, I have just noticed this patch when doing my vacuum homework.  I
> have just added my name as committer, just wait a bit until the CF is
> closed before I begin looking at it..

So, I have looked at this thread and the latest patch, and the thing
looks in good shape.  Nice job by the authors and the reviewers.  It
is a bit of a pain for users to hope that max_connections would be
able to handle replication connections when needed, which can cause
backups to not happen.  Using a separate GUC while we have
max_wal_senders here to do the job is also not a good idea, so the
approach of the patch is sound.  And on top of that, dependencies
between GUCs get reduced.

I have spotted one error though: the patch does not check if changing
max_wal_senders overflows MAX_BACKEND, and we need to reflect a proper
calculation into check_autovacuum_max_workers,
check_max_worker_processes and check_maxconnections.

One thing is that the slot position calculation gets a bit more
complicated with the new slot set for WAL senders, still the code is
straight-forward to follow so that's not really an issue in my
opinion.  The potential backward-incompatible issue issue of creating
a standby with lower max_wal_senders value than the primary is also
something we can live with as that's simple enough to address, and I'd
like to think that most users prefer inheriting the parameter from the
primary to ease failover flows.

It seems to me that it would be a good idea to have an
autovacuum-worker-related message in the context of InitProcess() for
a failed backend creation, but we can deal with that later if
necessary.

With all that reviewed, I finish with the attached that I am
comfortable enough to commit.  XLOG_PAGE_MAGIC is bumped as well, as a
reminder because xl_parameter_change gets an upgrade, and I am most
likely going to forget it.

Please let me know if you have comments.  I am still planning to do an
extra pass on it.
--
Michael
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 9b7a7388d5..cc622c2268 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -697,8 +697,7 @@ include_dir 'conf.d'
 

 The default value is three connections. The value must be less
-than max_connections minus
-.
+than max_connections.
 This parameter can only be set at server start.

   
@@ -3484,24 +3483,25 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows



-Specifies the maximum number of concurrent connections from
-standby servers or streaming base backup clients (i.e., the
-maximum number of simultaneously running WAL sender
-processes). The default is 10. The value 0 means replication is
-disabled. WAL sender processes count towards the total number
-of connections, so this parameter's value must be less than
- minus
-.
-Abrupt streaming client disconnection might leave an orphaned
-connection slot behind until
-a timeout is reached, so this parameter should be set slightly
-higher than the maximum number of expected clients so disconnected
-clients can immediately reconnect.  This parameter can only
-be set at server start.
-Also, wal_level must be set to
+Specifies the maximum number of concurrent connections from standby
+servers or streaming base backup clients (i.e., the maximum number of
+simultaneously running WAL sender processes). The default is
+10.  The value 0 means
+replication is disabled.  Abrupt streaming client disconnection might
+leave an orphaned connection slot behind until a timeout is reached,
+so this parameter should be set slightly higher than the maximum
+number of expected clients so disconnected clients can immediately
+reconnect.  This parameter can only be set at server start.  Also,
+wal_level must be set to
 replica or higher to allow connections from standby
 servers.

+
+   
+ When running a standby server, you must set this parameter to the
+ same or higher value than on the master server. Otherwise, queries
+ will not be allowed in the standby server.
+

   
 
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index bbab7395a2..2b4dcd03c8 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -2177,6 +2177,11 @@ LOG:  database system is ready to accept read only connections
  max_locks_per_transaction
 

+   
+
+ max_wal_senders
+
+   

 
  max_worker_processes
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 1f78f6c956..7de26e98ad 100644
--- 

Re: Connection slots reserved for replication

2019-02-01 Thread Michael Paquier
On Thu, Jan 31, 2019 at 12:08:18PM +0100, Oleksii Kliukin wrote:
> Thanks, set it to RFC.

Oh, I have just noticed this patch when doing my vacuum homework.  I
have just added my name as committer, just wait a bit until the CF is
closed before I begin looking at it..
--
Michael


signature.asc
Description: PGP signature


Re: Connection slots reserved for replication

2019-01-31 Thread Oleksii Kliukin



> On 31. Jan 2019, at 11:50, Petr Jelinek  wrote:
> 
> On 31/01/2019 11:25, Oleksii Kliukin wrote:
>> 
>> 
>>> On 25. Jan 2019, at 18:37, Oleksii Kliukin >> > wrote:
>>> 
>>> Hello,
>>> 
 On 24. Jan 2019, at 13:47, Kyotaro HORIGUCHI
 >>> > wrote:
>>> 
>>> Thank you for the review.I took a liberty to address it with v9.
>> 
>> 
>> So, given there are no further feedback or suggestions, can we set it to
>> RFC?
> 
> I vote yes.

Thanks, set it to RFC.

Cheers,
Oleksii


Re: Connection slots reserved for replication

2019-01-31 Thread Petr Jelinek
On 31/01/2019 11:25, Oleksii Kliukin wrote:
> 
> 
>> On 25. Jan 2019, at 18:37, Oleksii Kliukin > > wrote:
>>
>> Hello,
>>
>>> On 24. Jan 2019, at 13:47, Kyotaro HORIGUCHI
>>> >> > wrote:
>>
>> Thank you for the review.I took a liberty to address it with v9.
> 
> 
> So, given there are no further feedback or suggestions, can we set it to
> RFC?

I vote yes.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: Connection slots reserved for replication

2019-01-31 Thread Oleksii Kliukin


> On 25. Jan 2019, at 18:37, Oleksii Kliukin  wrote:
> 
> Hello,
> 
>> On 24. Jan 2019, at 13:47, Kyotaro HORIGUCHI 
>> mailto:horiguchi.kyot...@lab.ntt.co.jp>> 
>> wrote:
> 
> Thank you for the review.I took a liberty to address it with v9.


So, given there are no further feedback or suggestions, can we set it to RFC?

Cheers,
Oleksii

Re: Connection slots reserved for replication

2019-01-25 Thread Oleksii Kliukin
Hello,

> On 24. Jan 2019, at 13:47, Kyotaro HORIGUCHI 
>  wrote:

Thank you for the review.I took a liberty to address it with v9.

> 
> Documentation looks fine for me. By the way, a comment for the
> caller-site of CheckRequreParameterValues() in xlog.c looks
> somewhat stale.
> 
>> /* Check to see if any changes to max_connections give problems */
>> CheckRequiredParameterValues();
> 
> may be better something like this?:
> 
>> /* Check to see if any parameter change gives a problem on replication */

I changed it to "Check if any parameter change gives a problem on recovery”, as 
I think it is independent of the [streaming] replication, but I still don’t 
like the wording, as it just duplicate the comment at the definition of 
CheckRequiredParameterValues. Maybe remove the comment altogether?

> 
> 
> In postinit.c:
>>   /*
>>* The last few connection slots are reserved for superusers.
>>*/
>>   if ((!am_superuser && !am_walsender) &&
>>   ReservedBackends > 0 &&
> 
> This is forgetting about explaing about walsenders.
> 
>> The last few connection slots are reserved for
>> superusers. Replication connections don't share the same slot
>> pool.
> 
> Or something?

I changed it to

+* The last few connection slots are reserved for superusers.
+* Replication connections are drawn from a separate pool and
+* not limited by max_connections or superuser_reserved_connections.


> 
> And the parentheses enclosing "!am_superuser..walsender" seems no
> longer needed.
> 
> 
> In guc.c:
> - /* see max_connections and max_wal_senders */
> + /* see max_connections */
> 
> I don't understand for what reason we should see max_connections
> just above. (Or even max_wal_senders) Do we need this comment?
> (There're some other instances, but they wont'be for this patch.)

I don’t understand what is it pointing to as well, so I’ve removed it.

> 
> 
> In pg_controldata.c:
> + printf(_("max_wal_senders setting: %d\n"),
> +ControlFile->max_wal_senders);
>   printf(_("max_worker_processes setting: %d\n"),
>  ControlFile->max_worker_processes);
>   printf(_("max_prepared_xacts setting:   %d\n"),
> 
> The added item seems to want some additional spaces.

Good catch, fixed.

Attached is v9. I also bumped up the PG_CONTROL_VERSION to 1200 per the prior 
comment by Robert. 

Cheers,
Oleksii



replication_reserved_connections_v9.patch
Description: Binary data


Re: Connection slots reserved for replication

2019-01-24 Thread Kyotaro HORIGUCHI
Hello.

Documentation looks fine for me. By the way, a comment for the
caller-site of CheckRequreParameterValues() in xlog.c looks
somewhat stale.

> /* Check to see if any changes to max_connections give problems */
> CheckRequiredParameterValues();

may be better something like this?:

> /* Check to see if any parameter change gives a problem on replication */


In postinit.c:
>/*
> * The last few connection slots are reserved for superusers.
> */
>if ((!am_superuser && !am_walsender) &&
>ReservedBackends > 0 &&

This is forgetting about explaing about walsenders.

> The last few connection slots are reserved for
> superusers. Replication connections don't share the same slot
> pool.

Or something?

And the parentheses enclosing "!am_superuser..walsender" seems no
longer needed.


In guc.c:
-   /* see max_connections and max_wal_senders */
+   /* see max_connections */

I don't understand for what reason we should see max_connections
just above. (Or even max_wal_senders) Do we need this comment?
(There're some other instances, but they wont'be for this patch.)


In pg_controldata.c:
+   printf(_("max_wal_senders setting: %d\n"),
+  ControlFile->max_wal_senders);
printf(_("max_worker_processes setting: %d\n"),
   ControlFile->max_worker_processes);
printf(_("max_prepared_xacts setting:   %d\n"),

The added item seems to want some additional spaces.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Connection slots reserved for replication

2019-01-15 Thread Robert Haas
On Wed, Jan 2, 2019 at 6:36 AM Alexander Kukushkin  wrote:
> I was also thinking about changing the value in PG_CONTROL_VERSION,
> because we added the new field into the control file, but decided to
> leave this change to committer.

We typically omit catversion bumps from submitted patches to avoid
unnecessary conflicts, but I think PG_CONTROL_VERSION doesn't change
enough to cause a problem.  Also, it's not date-dependent the way
catversion is.  So I would include the bump in the patch, if it were
me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Connection slots reserved for replication

2019-01-15 Thread Oleksii Kliukin
On Wed, Jan 2, 2019, at 12:36, Alexander Kukushkin wrote:
> Hi,
> 
> On Wed, 2 Jan 2019 at 12:17, Oleksii Kliukin  wrote:
> 
> > Thank you. Attached the new version(called it v8) with the following 
> > changes to the documentation:
> 
> Thank you for jumping on it. Your changes look good to me.
> 
> 
> I was also thinking about changing the value in PG_CONTROL_VERSION,
> because we added the new field into the control file, but decided to
> leave this change to committer.

Sounds reasonable, not sure what the 'official policy' is on this.

As there is no further reviews/issues found for almost 2 weeks since the last 
one, should we move it to RFC?

Cheers,
Oleksii



Re: Connection slots reserved for replication

2019-01-02 Thread Alexander Kukushkin
Hi,

On Wed, 2 Jan 2019 at 12:17, Oleksii Kliukin  wrote:

> Thank you. Attached the new version(called it v8) with the following changes 
> to the documentation:

Thank you for jumping on it. Your changes look good to me.


I was also thinking about changing the value in PG_CONTROL_VERSION,
because we added the new field into the control file, but decided to
leave this change to committer.

Regards,
--
Alexander Kukushkin



Re: Connection slots reserved for replication

2019-01-02 Thread Oleksii Kliukin


On Wed, Jan 2, 2019, at 11:02, Petr Jelinek wrote:

> The patch generally looks good, but the documentation of max_wal_senders
> needs updating. In config.sgml we say:
> 
> > WAL sender processes count towards the total number
> > of connections, so this parameter's value must be less than
> >  minus
> > .
> 
> This is now misleading.

Thank you. Attached the new version(called it v8) with the following changes to 
the documentation:

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e94b305add..6634ce8cdc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -697,8 +697,7 @@ include_dir 'conf.d'
 

 The default value is three connections. The value must be less
-than max_connections minus
-.
+than max_connections.
 This parameter can only be set at server start.

   
@@ -3453,24 +3452,25 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" 
"%p"'  # Windows



-Specifies the maximum number of concurrent connections from
-standby servers or streaming base backup clients (i.e., the
-maximum number of simultaneously running WAL sender
-processes). The default is 10. The value 0 means replication is
-disabled. WAL sender processes count towards the total number
-of connections, so this parameter's value must be less than
- minus
-.
-Abrupt streaming client disconnection might leave an orphaned
-connection slot behind until
-a timeout is reached, so this parameter should be set slightly
-higher than the maximum number of expected clients so disconnected
-clients can immediately reconnect.  This parameter can only
-be set at server start.
+Specifies the maximum number of concurrent connections from standby
+servers or streaming base backup clients (i.e., the maximum number of
+simultaneously running WAL sender processes). The default is 10. The
+value 0 means replication is disabled.  Abrupt streaming client
+disconnection might leave an orphaned connection slot behind until a
+timeout is reached, so this parameter should be set slightly higher
+than the maximum number of expected clients so disconnected clients
+can immediately reconnect.  This parameter can only be set at server
+start.
 Also, wal_level must be set to
 replica or higher to allow connections from standby
 servers.

+
+   
+ When running a standby server, you must set this parameter to the
+ same or higher value than on the master server. Otherwise, queries
+ will not be allowed in the standby server.
+

   
 
Cheers,
Oleksii
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e94b305add..6634ce8cdc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -697,8 +697,7 @@ include_dir 'conf.d'
 

 The default value is three connections. The value must be less
-than max_connections minus
-.
+than max_connections.
 This parameter can only be set at server start.

   
@@ -3453,24 +3452,25 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" 
"%p"'  # Windows



-Specifies the maximum number of concurrent connections from
-standby servers or streaming base backup clients (i.e., the
-maximum number of simultaneously running WAL sender
-processes). The default is 10. The value 0 means replication is
-disabled. WAL sender processes count towards the total number
-of connections, so this parameter's value must be less than
- minus
-.
-Abrupt streaming client disconnection might leave an orphaned
-connection slot behind until
-a timeout is reached, so this parameter should be set slightly
-higher than the maximum number of expected clients so disconnected
-clients can immediately reconnect.  This parameter can only
-be set at server start.
+Specifies the maximum number of concurrent connections from standby
+servers or streaming base backup clients (i.e., the maximum number of
+simultaneously running WAL sender processes). The default is 10. The
+value 0 means replication is disabled.  Abrupt streaming client
+disconnection might leave an orphaned connection slot behind until a
+timeout is reached, so this parameter should be set slightly higher
+than the maximum number of expected clients so disconnected clients
+can immediately reconnect.  This parameter can only be set at server
+start.
 Also, wal_level must be set to
 replica or higher to allow connections from standby
 servers.

+
+   
+ When running a standby server, you 

Re: Connection slots reserved for replication

2019-01-02 Thread Petr Jelinek
Hi,

On 02/01/2019 10:21, Oleksii Kliukin wrote:
> On Mon, Dec 17, 2018, at 14:10, Alexander Kukushkin wrote:
>> Hi,
>>
>> On Thu, 6 Dec 2018 at 00:55, Petr Jelinek  
>> wrote:
 Does excluding WAL senders from the max_connections limit and including 
 max_wal_senders in MaxBackends also imply that we need to add 
 max_wal_senders to the list at xlog.c: CheckRequiredParameterValues, 
 requiring its value on the replica to be not lower than the one on the 
 primary?

>>>
>>> I think it does, we need the proc slots for walsenders on the standby
>>> same way we do for normal backends.
>>
>> You are absolutely right. Attaching the new version of the patch.
> 
> Thank you. I've checked that the replica correctly complains when its value 
> of max_wal_senders is lower than the one on the primary at v6. 
> 
> As stated in my previous comment, I think we should retain the specific error 
> message on exceeding max_wal_senders, instead of showing the generic "too 
> many clients already'.  Attached is the patch that fixes this small thing. 
> I've also rebased it against the master and took a liberty of naming it v7.  
> It makes me wondering why don't we apply the same level of details to the 
> regular out of connection message and don't show the actual value of 
> max_connections in the error text?
> 

+1

The patch generally looks good, but the documentation of max_wal_senders
needs updating. In config.sgml we say:

> WAL sender processes count towards the total number
> of connections, so this parameter's value must be less than
>  minus
> .

This is now misleading.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: Connection slots reserved for replication

2019-01-02 Thread Oleksii Kliukin
On Mon, Dec 17, 2018, at 14:10, Alexander Kukushkin wrote:
> Hi,
> 
> On Thu, 6 Dec 2018 at 00:55, Petr Jelinek  
> wrote:
> > > Does excluding WAL senders from the max_connections limit and including 
> > > max_wal_senders in MaxBackends also imply that we need to add 
> > > max_wal_senders to the list at xlog.c: CheckRequiredParameterValues, 
> > > requiring its value on the replica to be not lower than the one on the 
> > > primary?
> > >
> >
> > I think it does, we need the proc slots for walsenders on the standby
> > same way we do for normal backends.
> 
> You are absolutely right. Attaching the new version of the patch.

Thank you. I've checked that the replica correctly complains when its value of 
max_wal_senders is lower than the one on the primary at v6. 

As stated in my previous comment, I think we should retain the specific error 
message on exceeding max_wal_senders, instead of showing the generic "too many 
clients already'.  Attached is the patch that fixes this small thing. I've also 
rebased it against the master and took a liberty of naming it v7.  It makes me 
wondering why don't we apply the same level of details to the regular out of 
connection message and don't show the actual value of max_connections in the 
error text?

The code diff to the previous version is rather small:


diff --git a/src/backend/replication/walsender.c 
b/src/backend/replication/walsender.c
index d1a8113cb6..df073673f5 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2273,8 +2273,8 @@ InitWalSenderSlot(void)
Assert(MyWalSnd == NULL);
 
/*
-* Find a free walsender slot and reserve it. If this fails, we must be
-* out of WalSnd structures.
+* Find a free walsender slot and reserve it. This must not fail due
+* to the prior check for free walsenders at InitProcess.
 */
for (i = 0; i < max_wal_senders; i++)
{
@@ -2310,13 +2310,7 @@ InitWalSenderSlot(void)
break;
}
}
-   if (MyWalSnd == NULL)
-   ereport(FATAL,
-   (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
-errmsg("number of requested standby 
connections "
-   "exceeds max_wal_senders 
(currently %d)",
-   max_wal_senders)));
-
+   Assert(MyWalSnd != NULL);
/* Arrange to clean up at walsender exit */
on_shmem_exit(WalSndKill, 0);
 }

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 33387fb71b..fe33fc42e3 100644
@ -341,6 +353,12 @@ InitProcess(void)
 * in the autovacuum case?
 */
SpinLockRelease(ProcStructLock);
+   if (am_walsender)
+   ereport(FATAL,
+   (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
+errmsg("number of requested standby 
connections "
+   "exceeds 
max_wal_senders (currently %d)",
+   max_wal_senders)));
ereport(FATAL,
(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
 errmsg("sorry, too many clients already")));


Cheers,
Oleksii

diff --git a/doc/src/sgml/high-availability.sgml 
b/doc/src/sgml/high-availability.sgml
index d8fd195da0..36ea513273 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -2172,6 +2172,11 @@ LOG:  database system is ready to accept read only 
connections
  max_locks_per_transaction
 

+   
+
+ max_wal_senders
+
+   

 
  max_worker_processes
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 8d9d40664b..18665895cb 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -717,13 +717,13 @@ psql: could not connect to server: No such file or 
directory

 SEMMNI
 Maximum number of semaphore identifiers (i.e., sets)
-at least ceil((max_connections + 
autovacuum_max_workers + max_worker_processes + 5) / 16) plus room 
for other applications
+at least ceil((max_connections + 
autovacuum_max_workers + wax_wal_senders + max_worker_processes + 5) / 
16) plus room for other applications

 

 SEMMNS
 Maximum number of semaphores system-wide
-ceil((max_connections + autovacuum_max_workers + 
max_worker_processes + 5) / 16) * 17 plus room for other 
applications
+ceil((max_connections + autovacuum_max_workers + 
max_wal_senders + max_worker_processes + 5) / 16) * 17 plus room for 
other applications

 

@@ -780,13 +780,13 @@ psql: could not connect to server: No such file or 

Re: Connection slots reserved for replication

2018-12-17 Thread Alexander Kukushkin
Hi,

On Thu, 6 Dec 2018 at 00:55, Petr Jelinek  wrote:
> > Does excluding WAL senders from the max_connections limit and including 
> > max_wal_senders in MaxBackends also imply that we need to add 
> > max_wal_senders to the list at xlog.c: CheckRequiredParameterValues, 
> > requiring its value on the replica to be not lower than the one on the 
> > primary?
> >
>
> I think it does, we need the proc slots for walsenders on the standby
> same way we do for normal backends.

You are absolutely right. Attaching the new version of the patch.

Regards,
--
Alexander Kukushkin
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index d8fd195da0..36ea513273 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -2172,6 +2172,11 @@ LOG:  database system is ready to accept read only connections
  max_locks_per_transaction
 

+   
+
+ max_wal_senders
+
+   

 
  max_worker_processes
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 8d9d40664b..18665895cb 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -717,13 +717,13 @@ psql: could not connect to server: No such file or directory

 SEMMNI
 Maximum number of semaphore identifiers (i.e., sets)
-at least ceil((max_connections + autovacuum_max_workers + max_worker_processes + 5) / 16) plus room for other applications
+at least ceil((max_connections + autovacuum_max_workers + wax_wal_senders + max_worker_processes + 5) / 16) plus room for other applications

 

 SEMMNS
 Maximum number of semaphores system-wide
-ceil((max_connections + autovacuum_max_workers + max_worker_processes + 5) / 16) * 17 plus room for other applications
+ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 5) / 16) * 17 plus room for other applications

 

@@ -780,13 +780,13 @@ psql: could not connect to server: No such file or directory
 other applications. The maximum number of semaphores in the system
 is set by SEMMNS, which consequently must be at least
 as high as max_connections plus
-autovacuum_max_workers plus max_worker_processes,
-plus one extra for each 16
+autovacuum_max_workers plus max_wal_senders,
+plus max_worker_processes, plus one extra for each 16
 allowed connections plus workers (see the formula in ).  The parameter SEMMNI
 determines the limit on the number of semaphore sets that can
 exist on the system at one time.  Hence this parameter must be at
-least ceil((max_connections + autovacuum_max_workers + max_worker_processes + 5) / 16).
+least ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 5) / 16).
 Lowering the number
 of allowed connections is a temporary workaround for failures,
 which are usually confusingly worded No space
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7b09..1a304f19b8 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -109,11 +109,12 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 
-		appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
-		 "max_prepared_xacts=%d max_locks_per_xact=%d "
-		 "wal_level=%s wal_log_hints=%s "
-		 "track_commit_timestamp=%s",
+		appendStringInfo(buf, "max_connections=%d max_wal_senders=%d "
+		 "max_worker_processes=%d max_prepared_xacts=%d "
+		 "max_locks_per_xact=%d wal_level=%s "
+		 "wal_log_hints=%s track_commit_timestamp=%s",
 		 xlrec.MaxConnections,
+		 xlrec.max_wal_senders,
 		 xlrec.max_worker_processes,
 		 xlrec.max_prepared_xacts,
 		 xlrec.max_locks_per_xact,
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c80b14ed97..668c7ffcf2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5257,6 +5257,7 @@ BootStrapXLOG(void)
 
 	/* Set important parameter values for use when replaying WAL */
 	ControlFile->MaxConnections = MaxConnections;
+	ControlFile->max_wal_senders = max_wal_senders;
 	ControlFile->max_worker_processes = max_worker_processes;
 	ControlFile->max_prepared_xacts = max_prepared_xacts;
 	ControlFile->max_locks_per_xact = max_locks_per_xact;
@@ -6161,6 +6162,9 @@ CheckRequiredParameterValues(void)
 		RecoveryRequiresIntParameter("max_connections",
 	 MaxConnections,
 	 ControlFile->MaxConnections);
+		RecoveryRequiresIntParameter("max_wal_senders",
+	 max_wal_senders,
+	 ControlFile->max_wal_senders);
 		RecoveryRequiresIntParameter("max_worker_processes",
 	 max_worker_processes,
 	 ControlFile->max_worker_processes);
@@ -9453,6 +9457,7 @@ 

Re: Connection slots reserved for replication

2018-12-05 Thread Petr Jelinek
On 05/12/2018 15:33, Oleksii Kliukin wrote:
> 
>> On 30. Nov 2018, at 13:58, Alexander Kukushkin  wrote:
>>
>> attaching the new version of the patch.
>> Now it simply reserves max_wal_senders slots in the ProcGlobal, what
>> guarantees that only walsender process could use them.
> 
> With this patch It looks like InitProcess will trigger the generic error 
> about 'too many clients' before the more specific error message in 
> InitWalSenderSlot when exceeding the number of max_wal_senders.
> 
> Does excluding WAL senders from the max_connections limit and including 
> max_wal_senders in MaxBackends also imply that we need to add max_wal_senders 
> to the list at xlog.c: CheckRequiredParameterValues, requiring its value on 
> the replica to be not lower than the one on the primary? 
> 

I think it does, we need the proc slots for walsenders on the standby
same way we do for normal backends.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: Connection slots reserved for replication

2018-12-05 Thread Oleksii Kliukin


> On 30. Nov 2018, at 13:58, Alexander Kukushkin  wrote:
> 
> attaching the new version of the patch.
> Now it simply reserves max_wal_senders slots in the ProcGlobal, what
> guarantees that only walsender process could use them.

With this patch It looks like InitProcess will trigger the generic error about 
'too many clients' before the more specific error message in InitWalSenderSlot 
when exceeding the number of max_wal_senders.

Does excluding WAL senders from the max_connections limit and including 
max_wal_senders in MaxBackends also imply that we need to add max_wal_senders 
to the list at xlog.c: CheckRequiredParameterValues, requiring its value on the 
replica to be not lower than the one on the primary? 

Cheers,
Oleksii


Re: Connection slots reserved for replication

2018-11-30 Thread Alexander Kukushkin
Hi,

attaching the new version of the patch.
Now it simply reserves max_wal_senders slots in the ProcGlobal, what
guarantees that only walsender process could use them.

Regards,
--
Alexander Kukushkin
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a33a131182..9a23699fbc 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -895,11 +895,11 @@ PostmasterMain(int argc, char *argv[])
 	/*
 	 * Check for invalid combinations of GUC settings.
 	 */
-	if (ReservedBackends + max_wal_senders >= MaxConnections)
+	if (ReservedBackends >= MaxConnections)
 	{
-		write_stderr("%s: superuser_reserved_connections (%d) plus max_wal_senders (%d) must be less than max_connections (%d)\n",
+		write_stderr("%s: superuser_reserved_connections (%d) must be less than max_connections (%d)\n",
 	 progname,
-	 ReservedBackends, max_wal_senders, MaxConnections);
+	 ReservedBackends, MaxConnections);
 		ExitPostmaster(1);
 	}
 	if (XLogArchiveMode > ARCHIVE_MODE_OFF && wal_level == WAL_LEVEL_MINIMAL)
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 33387fb71b..dbf307221c 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -43,6 +43,7 @@
 #include "postmaster/autovacuum.h"
 #include "replication/slot.h"
 #include "replication/syncrep.h"
+#include "replication/walsender.h"
 #include "storage/condition_variable.h"
 #include "storage/standby.h"
 #include "storage/ipc.h"
@@ -180,6 +181,7 @@ InitProcGlobal(void)
 	ProcGlobal->freeProcs = NULL;
 	ProcGlobal->autovacFreeProcs = NULL;
 	ProcGlobal->bgworkerFreeProcs = NULL;
+	ProcGlobal->walsenderFreeProcs = NULL;
 	ProcGlobal->startupProc = NULL;
 	ProcGlobal->startupProcPid = 0;
 	ProcGlobal->startupBufferPinWaitBufId = -1;
@@ -253,13 +255,20 @@ InitProcGlobal(void)
 			ProcGlobal->autovacFreeProcs = [i];
 			procs[i].procgloballist = >autovacFreeProcs;
 		}
-		else if (i < MaxBackends)
+		else if (i < MaxBackends - max_wal_senders)
 		{
 			/* PGPROC for bgworker, add to bgworkerFreeProcs list */
 			procs[i].links.next = (SHM_QUEUE *) ProcGlobal->bgworkerFreeProcs;
 			ProcGlobal->bgworkerFreeProcs = [i];
 			procs[i].procgloballist = >bgworkerFreeProcs;
 		}
+		else if (i < MaxBackends)
+		{
+			/* PGPROC for walsender, add to walsenderFreeProcs list */
+			procs[i].links.next = (SHM_QUEUE *) ProcGlobal->walsenderFreeProcs;
+			ProcGlobal->walsenderFreeProcs = [i];
+			procs[i].procgloballist = >walsenderFreeProcs;
+		}
 
 		/* Initialize myProcLocks[] shared memory queues. */
 		for (j = 0; j < NUM_LOCK_PARTITIONS; j++)
@@ -311,6 +320,8 @@ InitProcess(void)
 		procgloballist = >autovacFreeProcs;
 	else if (IsBackgroundWorker)
 		procgloballist = >bgworkerFreeProcs;
+	else if (am_walsender)
+		procgloballist = >walsenderFreeProcs;
 	else
 		procgloballist = >freeProcs;
 
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index b636b1e262..44e27866df 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -527,7 +527,7 @@ InitializeMaxBackends(void)
 
 	/* the extra unit accounts for the autovacuum launcher */
 	MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
-		max_worker_processes;
+		max_worker_processes + max_wal_senders;
 
 	/* internal error because the values were all checked previously */
 	if (MaxBackends > MAX_BACKENDS)
@@ -811,12 +811,9 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	}
 
 	/*
-	 * The last few connection slots are reserved for superusers.  Although
-	 * replication connections currently require superuser privileges, we
-	 * don't allow them to consume the reserved slots, which are intended for
-	 * interactive use.
+	 * The last few connection slots are reserved for superusers.
 	 */
-	if ((!am_superuser || am_walsender) &&
+	if ((!am_superuser && !am_walsender) &&
 		ReservedBackends > 0 &&
 		!HaveNFreeProcs(ReservedBackends))
 		ereport(FATAL,
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 03594e77fe..9d763b49dc 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2054,7 +2054,7 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
-		/* see max_connections and max_wal_senders */
+		/* see max_connections */
 		{"superuser_reserved_connections", PGC_POSTMASTER, CONN_AUTH_SETTINGS,
 			gettext_noop("Sets the number of connection slots reserved for superusers."),
 			NULL
@@ -2572,7 +2572,6 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
-		/* see max_connections and superuser_reserved_connections */
 		{"max_wal_senders", PGC_POSTMASTER, REPLICATION_SENDING,
 			gettext_noop("Sets the maximum number of simultaneously running WAL sender processes."),
 			NULL
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index cb613c8076..e89f7337b4 100644
--- a/src/include/storage/proc.h
+++ 

Re: Connection slots reserved for replication

2018-11-20 Thread Stephen Frost
Greetings,

* Magnus Hagander (mag...@hagander.net) wrote:
> On Fri, Nov 9, 2018 at 2:02 AM Masahiko Sawada 
> wrote:
> > On the other hand, If we always reserve max_wal_senders slots
> > available slots for normal backend will get decreased in the next
> > release, which require for users to re-confiugre the max_connection.
> > But I felt this behavior seems more natural than the current one, so I
> > think the re-configuration can be acceptable for users.
>
> Maybe what we should do instead is not consider max_wal_senders a part of
> the total number of connections, and instead size the things that needs to
> be sized by them by max_connections + max_wal_senders. That seems more
> logical given how the parameters are named as well.

I tend to agree with having max_connections + max_wal_senders.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Connection slots reserved for replication

2018-11-20 Thread Magnus Hagander
On Fri, Nov 9, 2018 at 2:02 AM Masahiko Sawada 
wrote:

> On Thu, Nov 8, 2018 at 9:30 PM Kyotaro HORIGUCHI
>  wrote:
> >
> > Hello.
> >
> > At Wed, 7 Nov 2018 19:31:00 +0900, Masahiko Sawada <
> sawada.m...@gmail.com> wrote in  evkgm-56...@mail.gmail.com>
> > > On Tue, Nov 6, 2018 at 9:16 PM Kyotaro HORIGUCHI
> > >  wrote:
> > > > InitializeMaxBackends()
> > > > MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
> > > > -   max_worker_processes;
> > > > +   max_worker_processes +
> replication_reserved_connections;
> > > >
> > > > This means walsender doesn't comsume a connection, which is
> > > > different from the current behavior. We should reserve a part of
> > > > MaxConnections for walsenders.  (in PostmasterMain,
> > > > max_wal_senders is counted as a part of MaxConnections)
> > >
> > > Yes. We can force replication_reserved_connections <= max_wal_senders
> > > and then reserved connections for replication should be a part of
> > > MaxConnections.
> > >
> > > >
> > > > +   if (am_walsender && replication_reserved_connections <
> max_wal_senders
> > > > +   && *procgloballist == NULL)
> > > > +   procgloballist = >freeProcs;
> > > >
> > > > Currently exccesive number of walsenders are rejected in
> > > > InitWalSenderSlot and emit the following error.
> > > >
> > > > >ereport(FATAL,
> > > > >(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
> > > > > errmsg("number of requested standby connections "
> > > > >"exceeds max_wal_senders (currently %d)",
> > > > >max_wal_senders)));
> > > >
> > > > With this patch, if max_wal_senders =
> > > > replication_reserved_connections = 3 and the fourth walreceiver
> > > > comes, we will get "FATAL: sorry, too many clients already"
> > > > instead. It should be fixed.
> > > >
> > > > When r_r_conn = 2 and max_wal_senders = 3 and the three
> > > > walsenders are active, in an exreme case where a new replication
> > > > connection comes at the same time another is exiting, we could
> > > > end up using two normal slots despite that one slot is vacant in
> > > > reserved slots.
> > >
> > > Doesn't the max_wal_senders prevent the case?
> >
> > Currently the variable doesn't work as so. We once accept the
> > connection request and searches for a vacant slot in
> > InitWalSenderSlot and reject the connection if it found that no
> > room is available. Even with this patch, we don't count the
> > accurate number of active walsenders (for performance reason). If
> > reserved slot are filled, there's no way other than to accept the
> > connection using non-reserved slot if r_r_conn <
> > max_wal_senders. If one of active walsenders went away since we
> > allocated non-reserved connection slot until InitWalSenderSlot
> > starts searching sendnds[] array. Finally the new walsender on
> > the unreserved slot is activated, and one reserved slot is left
> > empty. So this is "an extreme case". We could ignore the case.
> >
> > I'm doubt that we should allow the setting where r_r_conn <
> > max_wal_senders, or even r_r_conn != max_wal_senders. We don't
> > have a problem like this if we don't allow the cases.
> >
> > > Wal senders can get connection if we have free procs more than
> > > (MaxConnections - reserved for superusers). So I think for normal
> > > users the connection must be refused if (MaxConnections - (reserved
> > > for superuser and replication) > # of freeprocs) and for wal senders
> > > the connection also must be refused if (MaxConnections - (reserved for
> > > superuser) > # of freeprocs). I'm not sure we need such trick in
> > > InitWalSenderSlot().
> >
> > (For clarity, I don't mean my previous patch is good solution.)
> >
> > It works as far as we accept that some reserved slots can be left
> > unused despite of some walsenders are using normal slots. (Just
> > exiting a walsender using reserved slot causes this but it is
> > usually occupied by walsenders comes later)
> >
> > Another idea is we acquire a walsnd[] slot before getting a
> > connection slot..
>
> After more thought, I'm inclined to agree to reserve max_wal_senders
> slots and not to have replication_reserved_connections parameter.
>
> For superuser_reserved_connection, actually it works so that we
> certainly reserve slots for superuser in case where slots are almost
> full regardless of who is using other slots incluing superusers
> themselves. But replication connections requires different behaviour
> as it has the another limit (max_wal_senders). If we have
> replication_reserved_connections < max_wal_senders, it could end up
> with the same issue as what originally reported on this thread.
> Therefore many users would set replication_reserved_connections =
> max_wal_senders.
>
> On the other hand, If we always reserve max_wal_senders slots
> available slots for normal backend will get decreased in the next
> release, which require for users to re-confiugre the max_connection.
> 

Re: Connection slots reserved for replication

2018-11-08 Thread Masahiko Sawada
On Thu, Nov 8, 2018 at 9:30 PM Kyotaro HORIGUCHI
 wrote:
>
> Hello.
>
> At Wed, 7 Nov 2018 19:31:00 +0900, Masahiko Sawada  
> wrote in 
> > On Tue, Nov 6, 2018 at 9:16 PM Kyotaro HORIGUCHI
> >  wrote:
> > > InitializeMaxBackends()
> > > MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
> > > -   max_worker_processes;
> > > +   max_worker_processes + replication_reserved_connections;
> > >
> > > This means walsender doesn't comsume a connection, which is
> > > different from the current behavior. We should reserve a part of
> > > MaxConnections for walsenders.  (in PostmasterMain,
> > > max_wal_senders is counted as a part of MaxConnections)
> >
> > Yes. We can force replication_reserved_connections <= max_wal_senders
> > and then reserved connections for replication should be a part of
> > MaxConnections.
> >
> > >
> > > +   if (am_walsender && replication_reserved_connections < 
> > > max_wal_senders
> > > +   && *procgloballist == NULL)
> > > +   procgloballist = >freeProcs;
> > >
> > > Currently exccesive number of walsenders are rejected in
> > > InitWalSenderSlot and emit the following error.
> > >
> > > >ereport(FATAL,
> > > >(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
> > > > errmsg("number of requested standby connections "
> > > >"exceeds max_wal_senders (currently %d)",
> > > >max_wal_senders)));
> > >
> > > With this patch, if max_wal_senders =
> > > replication_reserved_connections = 3 and the fourth walreceiver
> > > comes, we will get "FATAL: sorry, too many clients already"
> > > instead. It should be fixed.
> > >
> > > When r_r_conn = 2 and max_wal_senders = 3 and the three
> > > walsenders are active, in an exreme case where a new replication
> > > connection comes at the same time another is exiting, we could
> > > end up using two normal slots despite that one slot is vacant in
> > > reserved slots.
> >
> > Doesn't the max_wal_senders prevent the case?
>
> Currently the variable doesn't work as so. We once accept the
> connection request and searches for a vacant slot in
> InitWalSenderSlot and reject the connection if it found that no
> room is available. Even with this patch, we don't count the
> accurate number of active walsenders (for performance reason). If
> reserved slot are filled, there's no way other than to accept the
> connection using non-reserved slot if r_r_conn <
> max_wal_senders. If one of active walsenders went away since we
> allocated non-reserved connection slot until InitWalSenderSlot
> starts searching sendnds[] array. Finally the new walsender on
> the unreserved slot is activated, and one reserved slot is left
> empty. So this is "an extreme case". We could ignore the case.
>
> I'm doubt that we should allow the setting where r_r_conn <
> max_wal_senders, or even r_r_conn != max_wal_senders. We don't
> have a problem like this if we don't allow the cases.
>
> > Wal senders can get connection if we have free procs more than
> > (MaxConnections - reserved for superusers). So I think for normal
> > users the connection must be refused if (MaxConnections - (reserved
> > for superuser and replication) > # of freeprocs) and for wal senders
> > the connection also must be refused if (MaxConnections - (reserved for
> > superuser) > # of freeprocs). I'm not sure we need such trick in
> > InitWalSenderSlot().
>
> (For clarity, I don't mean my previous patch is good solution.)
>
> It works as far as we accept that some reserved slots can be left
> unused despite of some walsenders are using normal slots. (Just
> exiting a walsender using reserved slot causes this but it is
> usually occupied by walsenders comes later)
>
> Another idea is we acquire a walsnd[] slot before getting a
> connection slot..

After more thought, I'm inclined to agree to reserve max_wal_senders
slots and not to have replication_reserved_connections parameter.

For superuser_reserved_connection, actually it works so that we
certainly reserve slots for superuser in case where slots are almost
full regardless of who is using other slots incluing superusers
themselves. But replication connections requires different behaviour
as it has the another limit (max_wal_senders). If we have
replication_reserved_connections < max_wal_senders, it could end up
with the same issue as what originally reported on this thread.
Therefore many users would set replication_reserved_connections =
max_wal_senders.

On the other hand, If we always reserve max_wal_senders slots
available slots for normal backend will get decreased in the next
release, which require for users to re-confiugre the max_connection.
But I felt this behavior seems more natural than the current one, so I
think the re-configuration can be acceptable for users.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATIONNTT Open Source Software Center



Re: Connection slots reserved for replication

2018-11-08 Thread Kyotaro HORIGUCHI
Hello.

At Wed, 7 Nov 2018 19:31:00 +0900, Masahiko Sawada  
wrote in 
> On Tue, Nov 6, 2018 at 9:16 PM Kyotaro HORIGUCHI
>  wrote:
> > InitializeMaxBackends()
> > MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
> > -   max_worker_processes;
> > +   max_worker_processes + replication_reserved_connections;
> >
> > This means walsender doesn't comsume a connection, which is
> > different from the current behavior. We should reserve a part of
> > MaxConnections for walsenders.  (in PostmasterMain,
> > max_wal_senders is counted as a part of MaxConnections)
> 
> Yes. We can force replication_reserved_connections <= max_wal_senders
> and then reserved connections for replication should be a part of
> MaxConnections.
> 
> >
> > +   if (am_walsender && replication_reserved_connections < 
> > max_wal_senders
> > +   && *procgloballist == NULL)
> > +   procgloballist = >freeProcs;
> >
> > Currently exccesive number of walsenders are rejected in
> > InitWalSenderSlot and emit the following error.
> >
> > >ereport(FATAL,
> > >(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
> > > errmsg("number of requested standby connections "
> > >"exceeds max_wal_senders (currently %d)",
> > >max_wal_senders)));
> >
> > With this patch, if max_wal_senders =
> > replication_reserved_connections = 3 and the fourth walreceiver
> > comes, we will get "FATAL: sorry, too many clients already"
> > instead. It should be fixed.
> >
> > When r_r_conn = 2 and max_wal_senders = 3 and the three
> > walsenders are active, in an exreme case where a new replication
> > connection comes at the same time another is exiting, we could
> > end up using two normal slots despite that one slot is vacant in
> > reserved slots.
> 
> Doesn't the max_wal_senders prevent the case?

Currently the variable doesn't work as so. We once accept the
connection request and searches for a vacant slot in
InitWalSenderSlot and reject the connection if it found that no
room is available. Even with this patch, we don't count the
accurate number of active walsenders (for performance reason). If
reserved slot are filled, there's no way other than to accept the
connection using non-reserved slot if r_r_conn <
max_wal_senders. If one of active walsenders went away since we
allocated non-reserved connection slot until InitWalSenderSlot
starts searching sendnds[] array. Finally the new walsender on
the unreserved slot is activated, and one reserved slot is left
empty. So this is "an extreme case". We could ignore the case.

I'm doubt that we should allow the setting where r_r_conn <
max_wal_senders, or even r_r_conn != max_wal_senders. We don't
have a problem like this if we don't allow the cases.

> Wal senders can get connection if we have free procs more than
> (MaxConnections - reserved for superusers). So I think for normal
> users the connection must be refused if (MaxConnections - (reserved
> for superuser and replication) > # of freeprocs) and for wal senders
> the connection also must be refused if (MaxConnections - (reserved for
> superuser) > # of freeprocs). I'm not sure we need such trick in
> InitWalSenderSlot().

(For clarity, I don't mean my previous patch is good solution.)

It works as far as we accept that some reserved slots can be left
unused despite of some walsenders are using normal slots. (Just
exiting a walsender using reserved slot causes this but it is
usually occupied by walsenders comes later)

Another idea is we acquire a walsnd[] slot before getting a
connection slot..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Connection slots reserved for replication

2018-11-07 Thread Masahiko Sawada
On Tue, Nov 6, 2018 at 9:16 PM Kyotaro HORIGUCHI
 wrote:
>
> Hello. Thank you for the new version.
>
> At Thu, 1 Nov 2018 11:58:52 +0100, Alexander Kukushkin  
> wrote in 
> > Hi,
> >
> > Attached rebased version patch to the current HEAD and created commit fest 
> > entry
> > On Fri, 21 Sep 2018 at 13:43, Alexander Kukushkin  
> > wrote:
> > >
> > > Hi,
> > >
> > > On 20 September 2018 at 08:18, Kyotaro HORIGUCHI
> > >  wrote:
> > >
> > > >
> > > > Instaed, we can iterally "reserve" connection slots for the
> > > > specific use by providing ProcGlobal->walsenderFreeProcs. The
> > > > slots are never stolen for other usage. Allowing additional
> > > > walsenders comes after all the slots are filled to grab an
> > > > available "normal" slot, it works as the same as the current
> > > > behavior when walsender_reserved_connectsions = 0.
> > > >
> > > > What do you think about this?
> > >
> > > Sounds reasonable, please see the updated patch.
>
> InitializeMaxBackends()
> MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
> -   max_worker_processes;
> +   max_worker_processes + replication_reserved_connections;
>
> This means walsender doesn't comsume a connection, which is
> different from the current behavior. We should reserve a part of
> MaxConnections for walsenders.  (in PostmasterMain,
> max_wal_senders is counted as a part of MaxConnections)

Yes. We can force replication_reserved_connections <= max_wal_senders
and then reserved connections for replication should be a part of
MaxConnections.

>
> +   if (am_walsender && replication_reserved_connections < max_wal_senders
> +   && *procgloballist == NULL)
> +   procgloballist = >freeProcs;
>
> Currently exccesive number of walsenders are rejected in
> InitWalSenderSlot and emit the following error.
>
> >ereport(FATAL,
> >(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
> > errmsg("number of requested standby connections "
> >"exceeds max_wal_senders (currently %d)",
> >max_wal_senders)));
>
> With this patch, if max_wal_senders =
> replication_reserved_connections = 3 and the fourth walreceiver
> comes, we will get "FATAL: sorry, too many clients already"
> instead. It should be fixed.
>
> When r_r_conn = 2 and max_wal_senders = 3 and the three
> walsenders are active, in an exreme case where a new replication
> connection comes at the same time another is exiting, we could
> end up using two normal slots despite that one slot is vacant in
> reserved slots.

Doesn't the max_wal_senders prevent the case?

Wal senders can get connection if we have free procs more than
(MaxConnections - reserved for superusers). So I think for normal
users the connection must be refused if (MaxConnections - (reserved
for superuser and replication) > # of freeprocs) and for wal senders
the connection also must be refused if (MaxConnections - (reserved for
superuser) > # of freeprocs). I'm not sure we need such trick in
InitWalSenderSlot().

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATIONNTT Open Source Software Center



Re: Connection slots reserved for replication

2018-11-06 Thread Kyotaro HORIGUCHI
Hello. Thank you for the new version.

At Thu, 1 Nov 2018 11:58:52 +0100, Alexander Kukushkin  
wrote in 
> Hi,
> 
> Attached rebased version patch to the current HEAD and created commit fest 
> entry
> On Fri, 21 Sep 2018 at 13:43, Alexander Kukushkin  wrote:
> >
> > Hi,
> >
> > On 20 September 2018 at 08:18, Kyotaro HORIGUCHI
> >  wrote:
> >
> > >
> > > Instaed, we can iterally "reserve" connection slots for the
> > > specific use by providing ProcGlobal->walsenderFreeProcs. The
> > > slots are never stolen for other usage. Allowing additional
> > > walsenders comes after all the slots are filled to grab an
> > > available "normal" slot, it works as the same as the current
> > > behavior when walsender_reserved_connectsions = 0.
> > >
> > > What do you think about this?
> >
> > Sounds reasonable, please see the updated patch.

InitializeMaxBackends()
MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
-   max_worker_processes;
+   max_worker_processes + replication_reserved_connections;

This means walsender doesn't comsume a connection, which is
different from the current behavior. We should reserve a part of
MaxConnections for walsenders.  (in PostmasterMain,
max_wal_senders is counted as a part of MaxConnections)

+   if (am_walsender && replication_reserved_connections < max_wal_senders
+   && *procgloballist == NULL)
+   procgloballist = >freeProcs;

Currently exccesive number of walsenders are rejected in
InitWalSenderSlot and emit the following error.

>ereport(FATAL,
>(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
> errmsg("number of requested standby connections "
>"exceeds max_wal_senders (currently %d)",
>max_wal_senders)));

With this patch, if max_wal_senders =
replication_reserved_connections = 3 and the fourth walreceiver
comes, we will get "FATAL: sorry, too many clients already"
instead. It should be fixed.

When r_r_conn = 2 and max_wal_senders = 3 and the three
walsenders are active, in an exreme case where a new replication
connection comes at the same time another is exiting, we could
end up using two normal slots despite that one slot is vacant in
reserved slots. If we want to avoid the case, we need to limit
the number of normal slots to use. I don't think it is acceptable
as is but basically something like the attached would do that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3b6c636077..f86c05e8e0 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2250,6 +2250,7 @@ static void
 InitWalSenderSlot(void)
 {
 	int			i;
+	bool		reject = false;
 
 	/*
 	 * WalSndCtl should be set up already (we inherit this by fork() or
@@ -2258,41 +2259,63 @@ InitWalSenderSlot(void)
 	Assert(WalSndCtl != NULL);
 	Assert(MyWalSnd == NULL);
 
+	/* limit the maximum number of non-reserved slots to use */
+	if (MyProc->procgloballist == >freeProcs)
+	{
+		int	max_normal_slots
+			= max_wal_senders - replication_reserved_connections;
+
+		if (max_normal_slots <= 0)
+		{
+			/* we mustn't use a normal slot */
+			reject = true;
+		}
+		else if (pg_atomic_add_fetch_u32(>n_using_normal_slots, 1)
+			> max_normal_slots)
+		{
+			pg_atomic_sub_fetch_u32(>n_using_normal_slots, 1);
+			reject = true;
+		}
+	}
+
 	/*
 	 * Find a free walsender slot and reserve it. If this fails, we must be
 	 * out of WalSnd structures.
 	 */
-	for (i = 0; i < max_wal_senders; i++)
+	if (!reject)
 	{
-		WalSnd	   *walsnd = >walsnds[i];
-
-		SpinLockAcquire(>mutex);
-
-		if (walsnd->pid != 0)
+		for (i = 0; i < max_wal_senders; i++)
 		{
-			SpinLockRelease(>mutex);
-			continue;
-		}
-		else
-		{
-			/*
-			 * Found a free slot. Reserve it for us.
-			 */
-			walsnd->pid = MyProcPid;
-			walsnd->sentPtr = InvalidXLogRecPtr;
-			walsnd->write = InvalidXLogRecPtr;
-			walsnd->flush = InvalidXLogRecPtr;
-			walsnd->apply = InvalidXLogRecPtr;
-			walsnd->writeLag = -1;
-			walsnd->flushLag = -1;
-			walsnd->applyLag = -1;
-			walsnd->state = WALSNDSTATE_STARTUP;
-			walsnd->latch = >procLatch;
-			SpinLockRelease(>mutex);
-			/* don't need the lock anymore */
-			MyWalSnd = (WalSnd *) walsnd;
+			WalSnd	   *walsnd = >walsnds[i];
 
-			break;
+			SpinLockAcquire(>mutex);
+
+			if (walsnd->pid != 0)
+			{
+SpinLockRelease(>mutex);
+continue;
+			}
+			else
+			{
+/*
+ * Found a free slot. Reserve it for us.
+ */
+walsnd->pid = MyProcPid;
+walsnd->sentPtr = InvalidXLogRecPtr;
+walsnd->write = InvalidXLogRecPtr;
+walsnd->flush = InvalidXLogRecPtr;
+walsnd->apply = InvalidXLogRecPtr;
+walsnd->writeLag = -1;
+walsnd->flushLag = -1;
+walsnd->applyLag = -1;
+walsnd->state = WALSNDSTATE_STARTUP;
+walsnd->latch = >procLatch;
+SpinLockRelease(>mutex);
+/* don't need the lock anymore */
+MyWalSnd = (WalSnd 

Re: Connection slots reserved for replication

2018-11-01 Thread Alexander Kukushkin
Hi,

Attached rebased version patch to the current HEAD and created commit fest entry
On Fri, 21 Sep 2018 at 13:43, Alexander Kukushkin  wrote:
>
> Hi,
>
> On 20 September 2018 at 08:18, Kyotaro HORIGUCHI
>  wrote:
>
> >
> > Instaed, we can iterally "reserve" connection slots for the
> > specific use by providing ProcGlobal->walsenderFreeProcs. The
> > slots are never stolen for other usage. Allowing additional
> > walsenders comes after all the slots are filled to grab an
> > available "normal" slot, it works as the same as the current
> > behavior when walsender_reserved_connectsions = 0.
> >
> > What do you think about this?
>
> Sounds reasonable, please see the updated patch.
>
> Regards,
> --
> Alexander Kukushkin
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7554cba3f9..d9ddcb22b7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3060,6 +3060,27 @@ include_dir 'conf.d'

   
 
+ 
+  replication_reserved_connections
+  (integer)
+  
+   replication_reserved_connections configuration parameter
+  
+  
+  
+   
+Determines the number of connection slots that
+are reserved for replication connections.
+   
+
+   
+The default value is zero. The value should not exceed max_wal_senders.
+This parameter can only be set at server start.
+   
+  
+ 
+
   
max_replication_slots (integer)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 2683385ca6..3b6c636077 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -122,6 +122,8 @@ int			max_wal_senders = 0;	/* the maximum number of concurrent
 int			wal_sender_timeout = 60 * 1000; /* maximum time to send one WAL
 			 * data message */
 bool		log_replication_commands = false;
+int			replication_reserved_connections = 0; /* the number of connection slots
+   * reserved for replication connections */
 
 /*
  * State for WalSndWakeupRequest
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 6f9aaa52fa..2d04a8204a 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -43,6 +43,7 @@
 #include "postmaster/autovacuum.h"
 #include "replication/slot.h"
 #include "replication/syncrep.h"
+#include "replication/walsender.h"
 #include "storage/condition_variable.h"
 #include "storage/standby.h"
 #include "storage/ipc.h"
@@ -180,6 +181,7 @@ InitProcGlobal(void)
 	ProcGlobal->freeProcs = NULL;
 	ProcGlobal->autovacFreeProcs = NULL;
 	ProcGlobal->bgworkerFreeProcs = NULL;
+	ProcGlobal->walsenderFreeProcs = NULL;
 	ProcGlobal->startupProc = NULL;
 	ProcGlobal->startupProcPid = 0;
 	ProcGlobal->startupBufferPinWaitBufId = -1;
@@ -253,13 +255,20 @@ InitProcGlobal(void)
 			ProcGlobal->autovacFreeProcs = [i];
 			procs[i].procgloballist = >autovacFreeProcs;
 		}
-		else if (i < MaxBackends)
+		else if (i < MaxBackends - replication_reserved_connections)
 		{
 			/* PGPROC for bgworker, add to bgworkerFreeProcs list */
 			procs[i].links.next = (SHM_QUEUE *) ProcGlobal->bgworkerFreeProcs;
 			ProcGlobal->bgworkerFreeProcs = [i];
 			procs[i].procgloballist = >bgworkerFreeProcs;
 		}
+		else if (i < MaxBackends)
+		{
+			/* PGPROC for walsender, add to walsenderFreeProcs list */
+			procs[i].links.next = (SHM_QUEUE *) ProcGlobal->walsenderFreeProcs;
+			ProcGlobal->walsenderFreeProcs = [i];
+			procs[i].procgloballist = >walsenderFreeProcs;
+		}
 
 		/* Initialize myProcLocks[] shared memory queues. */
 		for (j = 0; j < NUM_LOCK_PARTITIONS; j++)
@@ -304,6 +313,8 @@ InitProcess(void)
 		procgloballist = >autovacFreeProcs;
 	else if (IsBackgroundWorker)
 		procgloballist = >bgworkerFreeProcs;
+	else if (am_walsender)
+		procgloballist = >walsenderFreeProcs;
 	else
 		procgloballist = >freeProcs;
 
@@ -318,6 +329,14 @@ InitProcess(void)
 
 	set_spins_per_delay(ProcGlobal->spins_per_delay);
 
+	/*
+	* Try to use ProcGlobal->freeProcs as a fallback when
+	* all reserved walsender slots are already busy.
+	*/
+	if (am_walsender && replication_reserved_connections < max_wal_senders
+			&& *procgloballist == NULL)
+		procgloballist = >freeProcs;
+
 	MyProc = *procgloballist;
 
 	if (MyProc != NULL)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 4f1d2a0d28..bb90f53b1e 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -527,7 +527,7 @@ InitializeMaxBackends(void)
 
 	/* the extra unit accounts for the autovacuum launcher */
 	MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
-		max_worker_processes;
+		max_worker_processes + replication_reserved_connections;
 
 	/* internal error because the values were all checked previously */
 	if (MaxBackends > MAX_BACKENDS)
@@ -812,8 +812,8 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 
 	/*
 	 * The last 

Re: Connection slots reserved for replication

2018-09-21 Thread Alexander Kukushkin
Hi,

On 20 September 2018 at 08:18, Kyotaro HORIGUCHI
 wrote:

>
> Instaed, we can iterally "reserve" connection slots for the
> specific use by providing ProcGlobal->walsenderFreeProcs. The
> slots are never stolen for other usage. Allowing additional
> walsenders comes after all the slots are filled to grab an
> available "normal" slot, it works as the same as the current
> behavior when walsender_reserved_connectsions = 0.
>
> What do you think about this?

Sounds reasonable, please see the updated patch.

Regards,
--
Alexander Kukushkin
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e1073ac6d3..ccdb217735 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3059,6 +3059,27 @@ include_dir 'conf.d'

   
 
+ 
+  replication_reserved_connections
+  (integer)
+  
+   replication_reserved_connections configuration parameter
+  
+  
+  
+   
+Determines the number of connection slots that
+are reserved for replication connections.
+   
+
+   
+The default value is zero. The value should not exceed max_wal_senders.
+This parameter can only be set at server start.
+   
+  
+ 
+
   
max_replication_slots (integer)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 370429d746..40b9deaa0c 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -122,6 +122,8 @@ int			max_wal_senders = 0;	/* the maximum number of concurrent
 int			wal_sender_timeout = 60 * 1000; /* maximum time to send one WAL
 			 * data message */
 bool		log_replication_commands = false;
+int			replication_reserved_connections = 0; /* the number of connection slots
+   * reserved for replication connections */
 
 /*
  * State for WalSndWakeupRequest
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 6f9aaa52fa..2d04a8204a 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -43,6 +43,7 @@
 #include "postmaster/autovacuum.h"
 #include "replication/slot.h"
 #include "replication/syncrep.h"
+#include "replication/walsender.h"
 #include "storage/condition_variable.h"
 #include "storage/standby.h"
 #include "storage/ipc.h"
@@ -180,6 +181,7 @@ InitProcGlobal(void)
 	ProcGlobal->freeProcs = NULL;
 	ProcGlobal->autovacFreeProcs = NULL;
 	ProcGlobal->bgworkerFreeProcs = NULL;
+	ProcGlobal->walsenderFreeProcs = NULL;
 	ProcGlobal->startupProc = NULL;
 	ProcGlobal->startupProcPid = 0;
 	ProcGlobal->startupBufferPinWaitBufId = -1;
@@ -253,13 +255,20 @@ InitProcGlobal(void)
 			ProcGlobal->autovacFreeProcs = [i];
 			procs[i].procgloballist = >autovacFreeProcs;
 		}
-		else if (i < MaxBackends)
+		else if (i < MaxBackends - replication_reserved_connections)
 		{
 			/* PGPROC for bgworker, add to bgworkerFreeProcs list */
 			procs[i].links.next = (SHM_QUEUE *) ProcGlobal->bgworkerFreeProcs;
 			ProcGlobal->bgworkerFreeProcs = [i];
 			procs[i].procgloballist = >bgworkerFreeProcs;
 		}
+		else if (i < MaxBackends)
+		{
+			/* PGPROC for walsender, add to walsenderFreeProcs list */
+			procs[i].links.next = (SHM_QUEUE *) ProcGlobal->walsenderFreeProcs;
+			ProcGlobal->walsenderFreeProcs = [i];
+			procs[i].procgloballist = >walsenderFreeProcs;
+		}
 
 		/* Initialize myProcLocks[] shared memory queues. */
 		for (j = 0; j < NUM_LOCK_PARTITIONS; j++)
@@ -304,6 +313,8 @@ InitProcess(void)
 		procgloballist = >autovacFreeProcs;
 	else if (IsBackgroundWorker)
 		procgloballist = >bgworkerFreeProcs;
+	else if (am_walsender)
+		procgloballist = >walsenderFreeProcs;
 	else
 		procgloballist = >freeProcs;
 
@@ -318,6 +329,14 @@ InitProcess(void)
 
 	set_spins_per_delay(ProcGlobal->spins_per_delay);
 
+	/*
+	* Try to use ProcGlobal->freeProcs as a fallback when
+	* all reserved walsender slots are already busy.
+	*/
+	if (am_walsender && replication_reserved_connections < max_wal_senders
+			&& *procgloballist == NULL)
+		procgloballist = >freeProcs;
+
 	MyProc = *procgloballist;
 
 	if (MyProc != NULL)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 5ef6315d20..7de2609ef2 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -505,7 +505,7 @@ InitializeMaxBackends(void)
 
 	/* the extra unit accounts for the autovacuum launcher */
 	MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
-		max_worker_processes;
+		max_worker_processes + replication_reserved_connections;
 
 	/* internal error because the values were all checked previously */
 	if (MaxBackends > MAX_BACKENDS)
@@ -790,8 +790,8 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 
 	/*
 	 * The last few connection slots are reserved for superusers.  Although
-	 * replication connections currently require superuser privileges, we
-	 * don't allow them to consume the reserved slots, 

Re: Connection slots reserved for replication

2018-09-20 Thread Kyotaro HORIGUCHI
Hello.

I agree to the objective.

At Mon, 17 Sep 2018 14:25:56 +0200, Alexander Kukushkin  
wrote in 
> Hi,
> 
> 2018-09-14 12:23 GMT+02:00 Masahiko Sawada :
> 
> >> 2. If we know that this is neither superuser nor replication connection, we
> >> should check that there are at least (superuser_reserved_connections +
> >> NumWalSenders() - max_wal_senders) connection slots are available.
> >
> > You wanted to mean (superuser_reserved_connections + max_wal_senders -
> > NumWalSenders()) in the second point?
> 
> Sure, my bad. Did a mistake when writing an email, but in the attached
> file it looks good.
> 
> >
> > One argrable point of the second option could be that it breaks
> > backward compatibility of the parameter configurations. That is, the
> > existing systems need to re-configure the max_connections. So it might
> > be better to take the first option with
> > replication_reservd_connections = 0 by default.
> 
> Please find attached the new version of the patch, which introduces
> replication_reservd_connections GUC

With this patch, non-superuser is rejected if there are less than
super_res_conn + (remain of repl_res_conn). It gives the same
effect for walsenders with just sharing
superuser_reserved_connection by superusers and walsenders.

Instaed, we can iterally "reserve" connection slots for the
specific use by providing ProcGlobal->walsenderFreeProcs. The
slots are never stolen for other usage. Allowing additional
walsenders comes after all the slots are filled to grab an
available "normal" slot, it works as the same as the current
behavior when walsender_reserved_connectsions = 0.

What do you think about this?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Connection slots reserved for replication

2018-09-17 Thread Alexander Kukushkin
Hi,

2018-09-14 12:23 GMT+02:00 Masahiko Sawada :

>> 2. If we know that this is neither superuser nor replication connection, we
>> should check that there are at least (superuser_reserved_connections +
>> NumWalSenders() - max_wal_senders) connection slots are available.
>
> You wanted to mean (superuser_reserved_connections + max_wal_senders -
> NumWalSenders()) in the second point?

Sure, my bad. Did a mistake when writing an email, but in the attached
file it looks good.

>
> One argrable point of the second option could be that it breaks
> backward compatibility of the parameter configurations. That is, the
> existing systems need to re-configure the max_connections. So it might
> be better to take the first option with
> replication_reservd_connections = 0 by default.

Please find attached the new version of the patch, which introduces
replication_reservd_connections GUC

Regards,
--
Alexander Kukushkin
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e1073ac6d3..80e6ef9f67 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3059,6 +3059,32 @@ include_dir 'conf.d'

   
 
+ 
+  replication_reserved_connections
+  (integer)
+  
+   replication_reserved_connections configuration parameter
+  
+  
+  
+   
+Determines the number of connection slots that
+are reserved for replication connections. Whenever the number
+of active concurrent connections is at least
+max_connections minus
+replication_reserved_connections plus
+number of active wal senders, new
+non-superuser and non-replication connections will not be accepted.
+   
+
+   
+The default value is zero. The value should not exceed max_wal_senders.
+This parameter can only be set at server start.
+   
+  
+ 
+
   
max_replication_slots (integer)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 305ff36258..a5a95ee92c 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -903,6 +903,10 @@ PostmasterMain(int argc, char *argv[])
 	if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
 		ereport(ERROR,
 (errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));
+	if (replication_reserved_connections > max_wal_senders)
+		ereport(WARNING,
+(errmsg("Value of replication_reserved_connections (%d) exceeds value of max_wal_senders (%d)",
+		replication_reserved_connections, max_wal_senders)));
 
 	/*
 	 * Other one-time internal sanity checks can go here, if they are fast.
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 370429d746..e64d5ed44d 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -122,6 +122,8 @@ int			max_wal_senders = 0;	/* the maximum number of concurrent
 int			wal_sender_timeout = 60 * 1000; /* maximum time to send one WAL
 			 * data message */
 bool		log_replication_commands = false;
+int			replication_reserved_connections = 0; /* the number of connection slots
+   * reserved for replication connections */
 
 /*
  * State for WalSndWakeupRequest
@@ -2284,6 +2286,10 @@ InitWalSenderSlot(void)
 			walsnd->applyLag = -1;
 			walsnd->state = WALSNDSTATE_STARTUP;
 			walsnd->latch = >procLatch;
+
+			/* increment the number of allocated wal sender slots */
+			pg_atomic_fetch_add_u32(>num_wal_senders, 1);
+
 			SpinLockRelease(>mutex);
 			/* don't need the lock anymore */
 			MyWalSnd = (WalSnd *) walsnd;
@@ -2317,6 +2323,10 @@ WalSndKill(int code, Datum arg)
 	walsnd->latch = NULL;
 	/* Mark WalSnd struct as no longer being in use. */
 	walsnd->pid = 0;
+
+	/* decrement the number of allocated wal sender slots */
+	pg_atomic_fetch_sub_u32(>num_wal_senders, 1);
+
 	SpinLockRelease(>mutex);
 }
 
@@ -3033,6 +3043,7 @@ WalSndShmemInit(void)
 	{
 		/* First time through, so initialize */
 		MemSet(WalSndCtl, 0, WalSndShmemSize());
+		pg_atomic_init_u32(>num_wal_senders, 0);
 
 		for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
 			SHMQueueInit(&(WalSndCtl->SyncRepQueue[i]));
@@ -3587,3 +3598,9 @@ LagTrackerRead(int head, XLogRecPtr lsn, TimestampTz now)
 	Assert(time != 0);
 	return now - time;
 }
+
+/* Return the amount of allocated wal_sender slots */
+uint32 NumWalSenders(void)
+{
+	return pg_atomic_read_u32(>num_wal_senders);
+}
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 5ef6315d20..436574e85d 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -789,17 +789,28 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	}
 
 	/*
-	 * The last few connection slots are reserved for superusers.  Although
-	 * replication connections currently require superuser privileges, we
-	 * don't allow them to consume 

Re: Connection slots reserved for replication

2018-09-14 Thread Masahiko Sawada
On Wed, Aug 1, 2018 at 9:30 PM, Alexander Kukushkin  wrote:
> Hello hackers,
>
> at the moment it is possible to reserve some amount of connection slots for
> superusers and this behavior is controlled by superuser_reserved_connections
> configuration parameter with the default value = 3.
>
> In case if all non-reserved connection slots are busy, replica fails to open
> a new connection and start streaming from the primary. Such behavior is very
> bad if you want to run postgresql HA clusters

Yeah, that's also bad if we want to use pg_baseback in the situation.

>
> Initially, replication connections required superuser privileges (in 9.0)
> and therefore they were deliberately excluded from
> superuser_reserved_connections. Basically that means it has never been
> possible to reserve come connection slots for replication connections.
>
> Later (9.1) it became possible to create a user with REPLICATION and
> NOSUPERUSER options, but comment in the postinit.c still tells that
> superuser is required.
>
> Now I think now it is a time to go further, and we should make it possible
> to reserve some connection slots for replication in a manner similar to
> superuser connections.
>

+1

> How should it work:
> 1. If we know that we got the replication connection, we just should make
> sure that there are at least superuser_reserved_connections free connection
> slots are available.
> 2. If we know that this is neither superuser nor replication connection, we
> should check that there are at least (superuser_reserved_connections +
> NumWalSenders() - max_wal_senders) connection slots are available.

You wanted to mean (superuser_reserved_connections + max_wal_senders -
NumWalSenders()) in the second point?

>
> And the last question how to control the number of reserved slots for
> replication. There are two options:
> 1. We can introduce a new GUC for that: replication_reserved_connections
> 2. Or we can just use the value of max_wal_senders
>
> Personally, I more like the second option.
>

One argrable point of the second option could be that it breaks
backward compatibility of the parameter configurations. That is, the
existing systems need to re-configure the max_connections. So it might
be better to take the first option with
replication_reservd_connections = 0 by default.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center