Hey, I think I know what the problem is, after the first failover when I clone the old master to be standby with the 'repmgr standby clone' command it seems that nothing updates the repl_nodes table with the new standby in my cluster so on the next failover the repmgrd is failed to find a new upcoming standby to failover..
this issue is confirmed after that I manually updated the repl_nodes table after the clone so that the old master is now a standby database. now my question is: Where does is suppose to happen that after I issue the 'repmgr standby clone' the repl_nodes should be updated too about the new standby server? Best regards, Aviel Buskila 2015-08-16 12:11 GMT+03:00 Aviel Buskila <avie...@gmail.com>: > hey, > > I have tried to set the configuration all over again, now the status of > 'repl_nodes' before the failover is: > > id | type | upstream_node_id | cluster | name | conninfo | priority | > active > > ----+---------+---------------+------------------------------------------------------------+----------+--------- > 1 | master | | cluster_name |node1| host=node1 > dbname=repmgr port=5432 user=repmgr | 100 | t > 2 | standby| 1 | cluster_name |node2| host=node2 > dbname=repmgr port=5432 user=repmgr | 100 | t > > 3 | witness| | cluster_name |node3| host=node3 > dbname=repmgr port=5499 user=repmgr | 100 | t > > > repmgr is started on node2 and node3 (standby and witness) now when I kill > postgresmaster process I can see in the > > repmgrd log the following messages: > > [WARNING] connection to master has been lost, trying to recover... 60 > seconds before failover decision > > [WARNING] connection to master has been lost, trying to recover... 50 > seconds before failover decision > > [WARNING] connection to master has been lost, trying to recover... 40 > seconds before failover decision > > [WARNING] connection to master has been lost, trying to recover... 30 > seconds before failover decision > > [WARNING] connection to master has been lost, trying to recover... 20 > seconds before failover decision > > [WARNING] connection to master has been lost, trying to recover... 10 > seconds before failover decision > > > and than when it tried to elect node2 to be promoted it shows the > following messages: > > [DEBUG] connecting to: 'host=node2 user=repmgr dbname=repmgr > fallback_application_name='repmgr'' > > [WARNING] unable to defermmine a valid master server; waiting 10 seconds > to retry... > > [ERROR] unable to determine a valid master node, terminating... > > [INFO] repmgrd terminating.. > > > > what am I doing wrong? > > > El 14/08/15 a las 04:14, Aviel Buskila escribió: > > Hey, > > yes I did .. and still it wont fail back.. > > Can you send over the output of "repmgr cluster show" before and after > the failover process? > > The output of SELECT * FROM repmgr_schema.repl_nodes; after the failover > (you need to change repmgr_schema with what you have configured). > > Also, which version of repmgr are you running? > > > 2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen <jony.cohe...@gmail.com > >: > > > >> Hi, did you make the old master follow the new one using repmgr? > >> > >> It doesn't update itself automatically... > >> From the looks of it repmgr thinks you have 2 masters - the old one > >> offline and the new one online. > > Regards, > > -- > Martín Marqués http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >