Re: [ClusterLabs] fence_apc delay?

Dan Swartzendruber Fri, 02 Sep 2016 07:19:10 -0700

On 2016-09-02 10:09, Ken Gaillot wrote:

On 09/02/2016 08:14 AM, Dan Swartzendruber wrote:
So, I was testing my ZFS dual-head JBOD 2-node cluster.  Manual
failovers worked just fine. I then went to try an acid-test byloggingin to node A and doing 'systemctl stop network'. Sure enough,pacemakertold the APC fencing agent to power-cycle node A. The ZFS pool movedto
node B as expected.  As soon as node A was back up, I migrated the
pool/IP back to node A. I *thought* all was okay, until a bit later,Idid 'zpool status', and saw checksum errors on both sides of severalofthe vdevs. After much digging and poking, the only theory I couldcomeup with was that maybe the fencing operation was considered completetoo
quickly?  I googled for examples using this, and the best tutorial I
found showed using a power-wait=5, whereas the default seems to be
power-wait=0? (this is CentOS 7, btw...) I changed it to use 5instead
That's a reasonable theory -- that's why power_wait is available. It
would be nice if there were a page collecting users' experience withthe
ideal power_wait for various devices. Even better if fence-agents used
those values as the defaults.

Ken, thanks. FWIW, this is a Dell Poweredge R905. I have no idea howlong the power supplies in that thing can keep things going when A/Cgoes away. Always wary of small sample sizes, but I got filesystemcorruption after 1 fencing event with power_wait=0, and none after 3fencing events with power_wait=5.




_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] fence_apc delay?

Reply via email to