[ClusterLabs] fence_apc delay?

Dan Swartzendruber Fri, 02 Sep 2016 06:20:51 -0700

So, I was testing my ZFS dual-head JBOD 2-node cluster. Manualfailovers worked just fine. I then went to try an acid-test by loggingin to node A and doing 'systemctl stop network'. Sure enough, pacemakertold the APC fencing agent to power-cycle node A. The ZFS pool moved tonode B as expected. As soon as node A was back up, I migrated thepool/IP back to node A. I *thought* all was okay, until a bit later, Idid 'zpool status', and saw checksum errors on both sides of several ofthe vdevs. After much digging and poking, the only theory I could comeup with was that maybe the fencing operation was considered complete tooquickly? I googled for examples using this, and the best tutorial Ifound showed using a power-wait=5, whereas the default seems to bepower-wait=0? (this is CentOS 7, btw...) I changed it to use 5 insteadof 0, and did a several fencing operations while a guest VM (vsphere viaNFS) was writing to the pool. So far, no evidence of corruption. BTW,the way I was creating and managing the cluster was with the lcmc javagui. Possibly the power-wait default of 0 comes from there, I can'treally tell. Any thoughts or ideas appreciated :)


_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] fence_apc delay?

Reply via email to