On 10/12/2017 05:50 PM, Joshua D. Drake wrote:
-Hackers,

Bumping this.


I had a long call with a firm developing front end proxy/cache/HA for Postgres today. Essentially the software is a replacement for PGPool in entirety but also supports analytics etc... When I was asking them about pain points they talked about the below and I was wondering if this is a problem we would like to solve:

  Per your request, here is our failover issue.

1.  In a modern devops environment, the database should be able to scale and morph over time based on need. 2.  Tools that are leveraging the database should be able to easily discover and potentially control (with permissions) the database. Currently, you can discover the master and what nodes are syncing off of it, but on a failure, a tool can't easily discover what orchestration has done on the back-end to make the cluster whole again, i.e. from the slave, you can't discover the master reliably and easily.

The logic that our code now uses is to:

1.  Find the master
2.  Add replication nodes per the master's configuration.

To find a master, we start with a list of candidate nodes that MAY be a master at any point, and:
1. issue "SELECT pg_is_in_recovery()" to find if it is a slave
a. If so, use "SELECT pg_read_file('recovery.conf')" to extract the host
b. Attempt to connect to the host directly, if not...
c. use the slave and use the hostname via dblink to connect to the master, as the hostname , i.e. select * from dblink('" + connInfo + " dbname=postgres', 'select inet_server_addr()') AS t(inet_server_addr inet).  This is necessary in the event the hostname used in the recovery.conf file is not resolvable from the outside. d. Use the dblink connection to ID the master node via select inet_server_addr();
e. connect to the IP provided by the master.
f.  Repeat through nodes until we get a master.

Issues:
1.  The dblink call doesn't have a way to specify a timeout, so we have to use Java futures to control how long this may take to a reasonable amount of time; 2.  NAT mapping may result in us detecting IP ranges that are not accessible to the application nodes. 3.  there is no easy way to monitor for state changes as they happen, allowing faster failovers, everything has to be polled based on events; 4.  It doesn't support cascading replication very well, although we could augment the logic to allow us to map the relationship between nodes. 5.  There is no way to connect to a db node with something akin to SQL-Server's "application intent" flags, to allow a connection to be rejected if we wish it to be a read/write connection.  This helps detect the state of the node directly without having to ask any further questions of the node, and makes it easier to "stall" during connection until a proper connection can be made. 6.  The master, on shutdown, will not actually close and disable connections as it shuts down, instead, it will issue an error that it is shutting down as it does so.

Fundamentally, the biggest issue is that it is very hard to determine the state of the cluster by asking all the nodes, in particular in the case of a failure.  Some state information is lost that is necessary to talk to the cluster moving forward in a reliable manner.




--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
*****     Unless otherwise stated, opinions are my own.   *****


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to