On 2019-04-02 1:32 p.m., Andrei Borzenkov wrote: > 02.04.2019 19:32, Dileep V Nair пишет: >> >> >> Hi, >> >> I have a two node DB2 Cluster with pacemaker and HADR. When I issue a >> reboot -f on the node where Primary Database is running, I expect the >> Standby database to be promoted as Primary. But what is happening is >> pacemaker waits for 180 seconds (guess that is the SBD timeout) and by the >> time the second node takes action, the DB is already in >> STANDBY/REMOTE_CATCHUP_PENDING/DISCONNECTED state and cannot be promoted >> anymore. If that is the expected behaviour, I believe in a node crash >> situation, the cluster does not work. Can someone guide me on what could be >> wrong here. >> > > Is stonith enabled? Did you configure correct timeouts? Very cursory > look in db2 agent: > > In case of HADR be very deliberate in specifying intervals/timeouts. The > detection of a failure including promote must complete within > HADR_PEER_WINDOW.
It's worth noting that SBD fencing is "better than nothing", but slow. IPMI and/or PDU fencing completes a lot faster. -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
