[ClusterLabs] Behavior after stop action failure with the failure-timeout set and STONITH disabled

Jan Wrona Thu, 04 May 2017 07:44:21 -0700

I hope I'll be able to explain the problem clearly and correctly.

My setup (simplified): I have two cloned resources, a filesystem mount and a process which writes to that filesystem. The filesystem is Gluster so its OK to clone it. I also have a mandatory ordering constraint "start gluster-mount-clone then start writer-process-clone". I don't have a STONITH device, so I've disable STONITH by settin||||g stonith-enabled=false.

The problem: Sometimes the Gluster freezes for a while, which causes the gluster-mount resource's monitor with the OCF_CHECK_LEVEL=20 to timeout (it is unable to write the status file). When this happens, the cluster tries to recover by restarting the writer-process resource. But the writer-process is writing to the frozen filesystem which makes it uninterruptable, not even SIGKILL works. Then the stop operation times out and on-fail with disabled STONITH defaults to block (don’t perform any further operations on the resource): warning: Forcing writer-process-clone away from node1.example.org after 1000000 failures (max=1000000) After that, the cluster continues with the recovery process by restarting the gluster-mount resource on that node and it usually succeeds. As a consequence of that remount, the uninterruptable system call in the writer process fails, signals are finally delivered and the writer-process is terminated. But the cluster doesn't know about that!

I thought I can solve this by setting the failure-timeout meta attribute to the writer-process resource, but it only made things worse. The documentation states: "Stop failures are slightly different and crucial. ... If a resource fails to stop and STONITH is not enabled, then the cluster has no way to continue and will not try to start the resource elsewhere, but will try to stop it again after the failure timeout.", but I'm seeing something different. When the policy engine is launched after the nearest cluster-recheck-interval, following lines are written to the syslog:

crmd[11852]: notice: State transition S_IDLE -> S_POLICY_ENGINE

pengine[11851]: notice: Clearing expired failcount for writer-process:1 on node1.example.org pengine[11851]: notice: Clearing expired failcount for writer-process:1 on node1.example.org pengine[11851]: notice: Ignoring expired calculated failure writer-process_stop_0 (rc=1, magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on node1.example.org pengine[11851]: notice: Clearing expired failcount for writer-process:1 on node1.example.org pengine[11851]: notice: Ignoring expired calculated failure writer-process_stop_0 (rc=1, magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on node1.example.org pengine[11851]: warning: Processing failed op monitor for gluster-mount:1 on node1.example.org: unknown error (1) pengine[11851]: notice: Calculated transition 564, saving inputs in /var/lib/pacemaker/pengine/pe-input-362.bz2 crmd[11852]: notice: Transition 564 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-362.bz2): Complete

crmd[11852]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE
crmd[11852]:  notice: State transition S_IDLE -> S_POLICY_ENGINE
crmd[11852]: warning: No reason to expect node 3 to be down
crmd[11852]: warning: No reason to expect node 1 to be down
crmd[11852]: warning: No reason to expect node 1 to be down
crmd[11852]: warning: No reason to expect node 3 to be down

pengine[11851]: warning: Processing failed op stop for writer-process:1 on node1.example.org: unknown error (1) pengine[11851]: warning: Processing failed op monitor for gluster-mount:1 on node1.example.org: unknown error (1) pengine[11851]: warning: Forcing writer-process-clone away from node1.example.org after 1000000 failures (max=1000000) pengine[11851]: warning: Forcing writer-process-clone away from node1.example.org after 1000000 failures (max=1000000) pengine[11851]: warning: Forcing writer-process-clone away from node1.example.org after 1000000 failures (max=1000000) pengine[11851]: notice: Calculated transition 565, saving inputs in /var/lib/pacemaker/pengine/pe-input-363.bz2 pengine[11851]: notice: Ignoring expired calculated failure writer-process_stop_0 (rc=1, magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on node1.example.org pengine[11851]: warning: Processing failed op monitor for gluster-mount:1 on node1.example.org: unknown error (1) crmd[11852]: notice: Transition 566 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-364.bz2): Complete

crmd[11852]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE

pengine[11851]: notice: Calculated transition 566, saving inputs in /var/lib/pacemaker/pengine/pe-input-364.bz2


Then after each cluster-recheck-interval:
crmd[11852]:  notice: State transition S_IDLE -> S_POLICY_ENGINE

pengine[11851]: notice: Ignoring expired calculated failure writer-process_stop_0 (rc=1, magic=2:1;64:557:0:2169780b-ca1f-483e-ad42-118b7c7c1a7d) on node1.example.org pengine[11851]: warning: Processing failed op monitor for gluster-mount:1 on node1.example.org: unknown error (1) crmd[11852]: notice: Transition 567 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-364.bz2): Complete

crmd[11852]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE

And the crm_mon is happily showing the writer-process as Started, although it is not running. This is very confusing. Could anyone please explain what is going on here?


||||

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Behavior after stop action failure with the failure-timeout set and STONITH disabled

Reply via email to