Hi,

Yesterday an interesting scenario was diagnosed on IRC. If you're running a synchronous slave and the connection to the slave is lost momentarily, your backends start naturally waiting for the slave to reconnect. If then your application keeps trying to create new connections, it can use all non-reserved connections, thus locking out the synchronous slave when the connection problem has resolved itself. This brings the entire cluster into a state where manual intervention is necessary.

While you could limit the number of connections for non-replication roles, that's not always possible or desirable. I would like to introduce a way to reserve connection slots for replication. However, it's not clear how this would work. I looked at how superuser_reserved_connections is implented, and with small changes I could see how to implement two ideas:

  1) Reserve a portion of superuser_reserved_connections for replication
     connections.  For example, with max_connections=10,
     superuser_reserved_connections=2 and
     replication_reserved_connections=1, at 8 connections either a
     replication connection or a superuser connection can be created,
     and at 9 connections only a superuser one would be allowed.  This
     is a bit clumsy as there still aren't guaranteed slots for
     replication.
  2) A GUC which says "superuser_reserved_connections can be used up by
     replication connections", and then limiting the number of
     replication connections using per-role limits to make sure
     superusers aren't locked out.

Does anyone see a better way to do this? I'm not too satisfied with either of these ideas.


Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to