Re: [HACKERS] Determine state of cluster (HA)

Joshua D. Drake Fri, 13 Oct 2017 10:14:20 -0700

On 10/12/2017 05:50 PM, Joshua D. Drake wrote:

-Hackers,


Bumping this.

I had a long call with a firm developing front end proxy/cache/HA forPostgres today. Essentially the software is a replacement for PGPool inentirety but also supports analytics etc... When I was asking them aboutpain points they talked about the below and I was wondering if this is aproblem we would like to solve:
  Per your request, here is our failover issue.
1. In a modern devops environment, the database should be able to scaleand morph over time based on need.2. Tools that are leveraging the database should be able to easilydiscover and potentially control (with permissions) the database.Currently, you can discover the master and what nodes are syncing off ofit, but on a failure, a tool can't easily discover what orchestrationhas done on the back-end to make the cluster whole again, i.e. from theslave, you can't discover the master reliably and easily.
The logic that our code now uses is to:

1.  Find the master
2.  Add replication nodes per the master's configuration.
To find a master, we start with a list of candidate nodes that MAY be amaster at any point, and:
1. issue "SELECT pg_is_in_recovery()" to find if it is a slave
a. If so, use "SELECT pg_read_file('recovery.conf')" to extract the host
b. Attempt to connect to the host directly, if not...
c. use the slave and use the hostname via dblink to connect to themaster, as the hostname , i.e. select * from dblink('" + connInfo + "dbname=postgres', 'select inet_server_addr()') AS t(inet_server_addrinet). This is necessary in the event the hostname used in therecovery.conf file is not resolvable from the outside.d. Use the dblink connection to ID the master node via selectinet_server_addr();
e. connect to the IP provided by the master.
f.  Repeat through nodes until we get a master.

Issues:
1. The dblink call doesn't have a way to specify a timeout, so we haveto use Java futures to control how long this may take to a reasonableamount of time;2. NAT mapping may result in us detecting IP ranges that are notaccessible to the application nodes.3. there is no easy way to monitor for state changes as they happen,allowing faster failovers, everything has to be polled based on events;4. It doesn't support cascading replication very well, although wecould augment the logic to allow us to map the relationship between nodes.5. There is no way to connect to a db node with something akin toSQL-Server's "application intent" flags, to allow a connection to berejected if we wish it to be a read/write connection. This helps detectthe state of the node directly without having to ask any furtherquestions of the node, and makes it easier to "stall" during connectionuntil a proper connection can be made.6. The master, on shutdown, will not actually close and disableconnections as it shuts down, instead, it will issue an error that it isshutting down as it does so.
Fundamentally, the biggest issue is that it is very hard to determinethe state of the cluster by asking all the nodes, in particular in thecase of a failure. Some state information is lost that is necessary totalk to the cluster moving forward in a reliable manner.



--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
*****     Unless otherwise stated, opinions are my own.   *****


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Determine state of cluster (HA)

Reply via email to