Re: [ClusterLabs] heartbeat/pacemaker: stonith-ng does a broadcast that doesnt reach it back resulting stonith_async_timeout_handler() called
Hi, On Wed, Oct 14, 2015 at 10:31:40PM +0530, Shyam wrote: > Hi all, > > We were previously using Pacemaker 1.1.6 on Ubuntu Precise. We are shifting > to Ubuntu Trusty & it has Pacemaker 1.1.10. > > We have our own STONITH resource & with the pacemaker upgrade this is > broken. The problem that I am hitting is not something new & is exactly > that was reported in > http://clusterlabs.org/pipermail/users/2015-June/000686.html > > Basically Hearbeat + Pacemaker stack suffers this problem (even with the > github mainline code) & since a lot of people are using corosync I think > the problem is not felt with. > > The basic problem is this. Between Pacemaker 1.1.6 to 1.1.10, the following > change was done > > fencing/remote.c > > static void remote_op_done(remote_fencing_op_t * op, xmlNode * data, int > rc, int dup) > > if (dup == FALSE && safe_str_neq(subt, "broadcast")) { > /* Defer notification until the bcast message arrives */ > bcast_result_to_peers(op, rc); > goto remote_op_done_cleanup; > > remote_op_done() previously used to send a direct reply. Now it was changed > that it does a broadcast of the result & this broadcast is supposed to come > back to the same fencing agent upon which finishes the operation. However > this broadcast that is being sent by stonithd doesnt reach back itself. > This causes eventually crmd to timeout on stonith_async_timeout_handler() > to be triggered & keep retrying STONITH. so essentially STONITH is broken. I can vaguelly recall some stonith/heartbeat issue which got fixed by Lars: commit cc34288a7b2276aa238546f4aa92fa79b8cbcf88 Author: Lars Ellenberg Date: Tue Dec 9 10:48:33 2014 +0100 Sounds like the same problem. Thanks, Dejan > I tried setting up bcast in /etc/ha.d/ha.cf, but it doesnt help. To confirm > the issue, I removed the above flow for bcast_result_to_peers() & then > STONITH works perfectly fine. Any pointers/help appreciated on how to > resolve this issue in the right way? > > > From logs > > CRMD issues STONITH reboot to stonith-ng > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG: Dumping message with > 19 fields > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[0] : > [__name__=stonith_command] > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[1] : > [__name__=stonith_command] > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[2] : [t=stonith-ng] > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[3] : > [st_async_id=617dce02-057f-42b8-9df2-33bc4e64780d] > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[4] : [st_op=st_fence] > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[5] : [st_callid=2] > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[6] : [st_callopt=0] > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[7] : > [st_remote_op=617dce02-057f-42b8-9df2-33bc4e64780d] > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[8] : [st_target=node1] > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[9] : > [st_device_action=reboot] > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[10] : [st_origin=node0] > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[11] : > [st_clientid=346fea6b-a55d-4873-b630-14287d5bc71e] > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[12] : > [st_clientname=crmd.14483] > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[13] : [st_timeout=90] > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[14] : [st_mode=smart] > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[15] : [dest=node0] > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[16] : [oseq=2] > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[17] : > [from_id=stonith-ng] > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[18] : > [to_id=stonith-ng] > > > stonith-ng successfully finishes the STONITH > Oct 14 14:57:03 node0 stonith-ng[14481]: notice: log_operation: Operation > 'reboot' [14657] (call 2 from crmd.14483) for host 'node1' with device > 'Z-FENCE:0' returned: 0 (OK) > Oct 14 14:57:03 node0 stonith-ng[14481]:debug: > stonith_send_async_reply: Directed reply to node0 > Oct 14 14:57:03 node0 stonith-ng[14481]:debug: send_ha_message: > outboundst_device_id="Z-FENCE:0" > st_remote_op="617dce02-057f-42b8-9df2-33bc4e64780d" > st_clientid="346fea6b-a55d-4873-b630-14287d5bc71e" > st_clientname="crmd.14483" st_target="node1" st_device_action="reboot" > st_callid="2" st_callopt="0" st_rc="0" st_output="Performing: stonith -t > external/zstonith -T reset node1\nsuccess: node1 0\n"/> > > > & stonith-ng sends a self message for st-reply > Oct 14 14:57:03 node0 heartbeat: [14463]: debug: MSG: Dumping message with > 18 fields > Oct 14 14:57:03 node0 heartbeat: [14463]: debug: MSG[0] : > [__name__=st-reply] > Oct 14 14:57:03 node0 heartbeat: [14463]: debug: MSG[1] : [st_origin=node0] > Oct 14 14:57:03 node0 heartbeat: [14463]: debug: MSG[2] : [t=stonith-ng] > Oct 14 14:57:03 node0 heartbeat: [14463]: debug: MSG[3] : [st_op=st_fence] > O
[ClusterLabs] multiple action= lines sent to STDIN of fencing agents - why?
I inserted some debugging into fencing.py and found that stonithd sends stuff like this to STDIN of the fencing agents it forks: action=list param1=value1 param2=value2 param3=value3 action=list where paramX and valueX come from the configuration of the primitive for the fencing agent. As a corollary, if the primitive for the fencing agent has 'action' defined as one of its parameters, this means that there will be three 'action=' lines, and the middle one could have a different value to the two sandwiching it. When I first saw this, I had an extended #wtf moment and thought it was a bug. But on closer inspection, it seems very deliberate, e.g. https://github.com/ClusterLabs/pacemaker/commit/bfd620645f151b71fafafa279969e9d8bd0fd74f The "regardless of what the admin configured" comment suggests to me that there is an underlying assumption that any fencing agent will ensure that if the same parameter is duplicated on STDIN, the final value will override any previous ones. And indeed fencing.py ensures this, but presumably it is possible to write agents which don't use fencing.py. Is my understanding correct? If so: 1) Is the first 'action=' line intended in order to set some kind of default action, in the case that the admin didn't configure the primitive with an 'action=' parameter *and* _action wasn't one of list/status/monitor/metadata? In what circumstances would this happen? 2) Is this assumption of the agents always being order-sensitive (i.e. last value always wins) documented anywhere? The best documentation on the API I could find was here: https://fedorahosted.org/cluster/wiki/FenceAgentAPI but it doesn't mention this. Thanks! ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] multiple action= lines sent to STDIN of fencing agents - why?
On 15/10/15 12:25 +0100, Adam Spiers wrote: > I inserted some debugging into fencing.py and found that stonithd > sends stuff like this to STDIN of the fencing agents it forks: > > action=list > param1=value1 > param2=value2 > param3=value3 > action=list > > where paramX and valueX come from the configuration of the primitive > for the fencing agent. > > As a corollary, if the primitive for the fencing agent has 'action' > defined as one of its parameters, this means that there will be three > 'action=' lines, and the middle one could have a different value to > the two sandwiching it. > > When I first saw this, I had an extended #wtf moment and thought it > was a bug. But on closer inspection, it seems very deliberate, e.g. > > > https://github.com/ClusterLabs/pacemaker/commit/bfd620645f151b71fafafa279969e9d8bd0fd74f > > The "regardless of what the admin configured" comment suggests to me > that there is an underlying assumption that any fencing agent will > ensure that if the same parameter is duplicated on STDIN, the final > value will override any previous ones. And indeed fencing.py ensures > this, but presumably it is possible to write agents which don't use > fencing.py. > > Is my understanding correct? If so: > > 1) Is the first 'action=' line intended in order to set some kind of >default action, in the case that the admin didn't configure the >primitive with an 'action=' parameter *and* _action wasn't one of >list/status/monitor/metadata? In what circumstances would this >happen? > > 2) Is this assumption of the agents always being order-sensitive >(i.e. last value always wins) documented anywhere? The best >documentation on the API I could find was here: > > https://fedorahosted.org/cluster/wiki/FenceAgentAPI > >but it doesn't mention this. Agreed that this implicit and undocumented arrangement smells fishy. In fact I raised this point within RH team 3 month back but there wasn't any feedback at that time. Haven't filed a bug because it's unclear to me what's intentional and what's not between the two interacting sides. Back then, I was proposing that fencing.py should, at least, emit a message about subsequent parameter overrides. Note that "port" parameter (or whatever is the correct way to specify a victim in the particular case) is affected as well. -- Jan (Poki) pgpYNZysQmwGG.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] multiple action= lines sent to STDIN of fencing agents - why?
On 10/15/2015 06:25 AM, Adam Spiers wrote: > I inserted some debugging into fencing.py and found that stonithd > sends stuff like this to STDIN of the fencing agents it forks: > > action=list > param1=value1 > param2=value2 > param3=value3 > action=list > > where paramX and valueX come from the configuration of the primitive > for the fencing agent. > > As a corollary, if the primitive for the fencing agent has 'action' > defined as one of its parameters, this means that there will be three > 'action=' lines, and the middle one could have a different value to > the two sandwiching it. > > When I first saw this, I had an extended #wtf moment and thought it > was a bug. But on closer inspection, it seems very deliberate, e.g. > > > https://github.com/ClusterLabs/pacemaker/commit/bfd620645f151b71fafafa279969e9d8bd0fd74f > > The "regardless of what the admin configured" comment suggests to me > that there is an underlying assumption that any fencing agent will > ensure that if the same parameter is duplicated on STDIN, the final > value will override any previous ones. And indeed fencing.py ensures > this, but presumably it is possible to write agents which don't use > fencing.py. > > Is my understanding correct? If so: Yes, good sleuthing. > 1) Is the first 'action=' line intended in order to set some kind of >default action, in the case that the admin didn't configure the >primitive with an 'action=' parameter *and* _action wasn't one of >list/status/monitor/metadata? In what circumstances would this >happen? The first action line is usually the only one. Ideally, admins don't configure "action" as a parameter of a fence device. They either specify nothing (in which case the cluster does what it thinks should be done -- reboot, off, etc.), or they specify pcmk_*_action to override the cluster's choice. For example, pcmk_reboot_action=off tells the cluster to actually send the fence agent "action=off" when a reboot is desired. (Perhaps the admin prefers that flaky nodes stay down until investigated, or the fence device doesn't handle reboots well.) So the first action line is the result of that. If the admin configured a pcmk_*_action for the requested action, the agent will get that, otherwise it gets the requested action. Second, any parameters in the device configuration are copied to the agent. So if the admin did specify "action" there, it will get copied (as a second instance, possibly different from the first). But that would override *all* requested actions, which is a bad idea. No one wants a recurring monitor action to shoot a node! :) So that last step is a failsafe, if the admin did supply an "action", re-send the original line if the requested action was informational (list/status/monitor/metadata) and not a "real" fencing action (off/reboot). > 2) Is this assumption of the agents always being order-sensitive >(i.e. last value always wins) documented anywhere? The best >documentation on the API I could find was here: > > https://fedorahosted.org/cluster/wiki/FenceAgentAPI > >but it doesn't mention this. Good point. It would be a good idea to add that to the API since it's established practice, but it probably would also be a good idea for pacemaker to send only the final value of any parameter. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] heartbeat/pacemaker: stonith-ng does a broadcast that doesnt reach it back resulting stonith_async_timeout_handler() called
Hi Dejan, Thanks a lot for your input! I cherry picked this commit & this solves the problem. I will raise a ubuntu launchpad bug for them to pull this correction in trusty stable. Thanks! --Shyam On Thu, Oct 15, 2015 at 1:21 PM, Dejan Muhamedagic wrote: > Hi, > > On Wed, Oct 14, 2015 at 10:31:40PM +0530, Shyam wrote: > > Hi all, > > > > We were previously using Pacemaker 1.1.6 on Ubuntu Precise. We are > shifting > > to Ubuntu Trusty & it has Pacemaker 1.1.10. > > > > We have our own STONITH resource & with the pacemaker upgrade this is > > broken. The problem that I am hitting is not something new & is exactly > > that was reported in > > http://clusterlabs.org/pipermail/users/2015-June/000686.html > > > > Basically Hearbeat + Pacemaker stack suffers this problem (even with the > > github mainline code) & since a lot of people are using corosync I think > > the problem is not felt with. > > > > The basic problem is this. Between Pacemaker 1.1.6 to 1.1.10, the > following > > change was done > > > > fencing/remote.c > > > > static void remote_op_done(remote_fencing_op_t * op, xmlNode * data, int > > rc, int dup) > > > > if (dup == FALSE && safe_str_neq(subt, "broadcast")) { > > /* Defer notification until the bcast message arrives */ > > bcast_result_to_peers(op, rc); > > goto remote_op_done_cleanup; > > > > remote_op_done() previously used to send a direct reply. Now it was > changed > > that it does a broadcast of the result & this broadcast is supposed to > come > > back to the same fencing agent upon which finishes the operation. However > > this broadcast that is being sent by stonithd doesnt reach back itself. > > This causes eventually crmd to timeout on stonith_async_timeout_handler() > > to be triggered & keep retrying STONITH. so essentially STONITH is > broken. > > I can vaguelly recall some stonith/heartbeat issue which got > fixed by Lars: > > commit cc34288a7b2276aa238546f4aa92fa79b8cbcf88 > Author: Lars Ellenberg > Date: Tue Dec 9 10:48:33 2014 +0100 > > Sounds like the same problem. > > Thanks, > > Dejan > > > > I tried setting up bcast in /etc/ha.d/ha.cf, but it doesnt help. To > confirm > > the issue, I removed the above flow for bcast_result_to_peers() & then > > STONITH works perfectly fine. Any pointers/help appreciated on how to > > resolve this issue in the right way? > > > > > > From logs > > > > CRMD issues STONITH reboot to stonith-ng > > > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG: Dumping message > with > > 19 fields > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[0] : > > [__name__=stonith_command] > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[1] : > > [__name__=stonith_command] > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[2] : [t=stonith-ng] > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[3] : > > [st_async_id=617dce02-057f-42b8-9df2-33bc4e64780d] > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[4] : > [st_op=st_fence] > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[5] : [st_callid=2] > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[6] : [st_callopt=0] > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[7] : > > [st_remote_op=617dce02-057f-42b8-9df2-33bc4e64780d] > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[8] : > [st_target=node1] > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[9] : > > [st_device_action=reboot] > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[10] : > [st_origin=node0] > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[11] : > > [st_clientid=346fea6b-a55d-4873-b630-14287d5bc71e] > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[12] : > > [st_clientname=crmd.14483] > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[13] : > [st_timeout=90] > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[14] : > [st_mode=smart] > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[15] : [dest=node0] > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[16] : [oseq=2] > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[17] : > > [from_id=stonith-ng] > > Oct 14 14:57:01 node0 heartbeat: [14463]: debug: MSG[18] : > > [to_id=stonith-ng] > > > > > > stonith-ng successfully finishes the STONITH > > Oct 14 14:57:03 node0 stonith-ng[14481]: notice: log_operation: > Operation > > 'reboot' [14657] (call 2 from crmd.14483) for host 'node1' with device > > 'Z-FENCE:0' returned: 0 (OK) > > Oct 14 14:57:03 node0 stonith-ng[14481]:debug: > > stonith_send_async_reply: Directed reply to node0 > > Oct 14 14:57:03 node0 stonith-ng[14481]:debug: send_ha_message: > > outbound> st_device_id="Z-FENCE:0" > > st_remote_op="617dce02-057f-42b8-9df2-33bc4e64780d" > > st_clientid="346fea6b-a55d-4873-b630-14287d5bc71e" > > st_clientname="crmd.14483" st_target="node1" st_device_action="reboot" > > st_callid="2" st_callopt="0" st_rc="0" st_output="Performing: stonith -t > > external/zstonith -T reset node1\nsu