Re: [Pacemaker] Fencing of bare-metal remote nodes
- Original Message - 25.11.2014 23:41, David Vossel wrote: - Original Message - Hi! is subj implemented? Trying echo c /proc/sysrq-trigger on remote nodes and no fencing occurs. Yes, fencing remote-nodes works. Are you certain your fencing devices can handle fencing the remote-node? Fencing a remote-node requires a cluster node to invoke the agent that actually performs the fencing action on the remote-node. David, a couple of questions. I see that in your fencing tests you just stop systemd unit. Shouldn't pacemaker_remoted somehow notify crmd that it is being shutdown? And shouldn't crmd stop all resources on that remote node before granting that shutdown? yes, this needs to happen at some point. Right now the shutdown method for a remote-node is to disable the connection resource and wait for all the resources to stop before killing pacemaker_remoted on the remote node. That isn't exactly ideal. Also, from what I see now it would be natural to hide current implementation of remote node configuration under node/ syntax. Now remote nodes do have almost all features of normal nodes, including node attributes. What do you think about it? ha, well. yes. at this point that might make sense. I had originally never planned on remote-nodes entering the actual nodes section, but eventually that changed. I'd like for usage of remote nodes to mature a bit before I commit to changing something like this though. I'm still a bit uncertain how people are going to use baremetal remote nodes. The use cases people come up with keep surprising me. Keeping the remote node definition as a resource gives us a bit more flexibility for configuration. -- Vossel Best, Vladislav -- Vossel Best, Vladislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Fencing of bare-metal remote nodes
25.11.2014 23:41, David Vossel wrote: - Original Message - Hi! is subj implemented? Trying echo c /proc/sysrq-trigger on remote nodes and no fencing occurs. Yes, fencing remote-nodes works. Are you certain your fencing devices can handle fencing the remote-node? Fencing a remote-node requires a cluster node to invoke the agent that actually performs the fencing action on the remote-node. David, a couple of questions. I see that in your fencing tests you just stop systemd unit. Shouldn't pacemaker_remoted somehow notify crmd that it is being shutdown? And shouldn't crmd stop all resources on that remote node before granting that shutdown? Also, from what I see now it would be natural to hide current implementation of remote node configuration under node/ syntax. Now remote nodes do have almost all features of normal nodes, including node attributes. What do you think about it? Best, Vladislav -- Vossel Best, Vladislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Fencing of bare-metal remote nodes
26.11.2014 18:36, David Vossel wrote: - Original Message - 25.11.2014 23:41, David Vossel wrote: - Original Message - Hi! is subj implemented? Trying echo c /proc/sysrq-trigger on remote nodes and no fencing occurs. Yes, fencing remote-nodes works. Are you certain your fencing devices can handle fencing the remote-node? Fencing a remote-node requires a cluster node to invoke the agent that actually performs the fencing action on the remote-node. Yes, if I invoke fencing action manually ('crm node fence rnode' in crmsh syntax), node is fenced. So the issue seems to be related to the detection of a need fencing. Comments in related git commits are a little bit terse in this area. So could you please explain what exactly needs to happen on a remote node to initiate fencing? I tried so far: * kill pacemaker_remoted when no resources are running. systemd restated it and crmd reconnected after some time. * crash kernel when no resources are running * crash kernel during massive start of resources this last one should definitely cause fencing. What version of pacemaker are you using? I've made changes in this area recently. Can you provide a crm_report. It's c191bf3. crm_report is ready, but I still wait an approval from a customer to send it. -- David No fencing happened. In the last case that start actions 'hung' and were failed by timeout (it is rather long), node was not even listed as failed. My customer asked me to stop crashing nodes because one of them does not boot anymore (I like that modern UEFI hardware very much.), so it is hard for me to play more with that. Best, Vladislav -- Vossel Best, Vladislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Fencing of bare-metal remote nodes
- Original Message - 26.11.2014 18:36, David Vossel wrote: - Original Message - 25.11.2014 23:41, David Vossel wrote: - Original Message - Hi! is subj implemented? Trying echo c /proc/sysrq-trigger on remote nodes and no fencing occurs. Yes, fencing remote-nodes works. Are you certain your fencing devices can handle fencing the remote-node? Fencing a remote-node requires a cluster node to invoke the agent that actually performs the fencing action on the remote-node. Yes, if I invoke fencing action manually ('crm node fence rnode' in crmsh syntax), node is fenced. So the issue seems to be related to the detection of a need fencing. Comments in related git commits are a little bit terse in this area. So could you please explain what exactly needs to happen on a remote node to initiate fencing? I tried so far: * kill pacemaker_remoted when no resources are running. systemd restated it and crmd reconnected after some time. This should definitely cause the remote-node to be fenced. I tested this earlier today after reading you were having problems and my setup fenced the remote-node correctly. * crash kernel when no resources are running If a remote-node connection is lost and pacemaker was able to verify the node is clean before the connection is lost, pacemaker will attempt to reconnect to the remote-node without issuing a fencing request. I could see why both fencing and not fencing in this situation could be desired. Maybe i should make an option. * crash kernel during massive start of resources This should definitely cause the remote node to be fenced. this last one should definitely cause fencing. What version of pacemaker are you using? I've made changes in this area recently. Can you provide a crm_report. It's c191bf3. crm_report is ready, but I still wait an approval from a customer to send it. Great. I really need to see what you all are doing. Outside of my own setup I have not seen many setups where pacemaker remote deployed on baremetal nodes. It is possible something in your configuration exposes some edge case I haven't encountered yet. There's a US holiday Thrusday and Friday, so I won't be able to look at this until next week. -- Vossel -- David No fencing happened. In the last case that start actions 'hung' and were failed by timeout (it is rather long), node was not even listed as failed. My customer asked me to stop crashing nodes because one of them does not boot anymore (I like that modern UEFI hardware very much.), so it is hard for me to play more with that. Best, Vladislav -- Vossel Best, Vladislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Fencing of bare-metal remote nodes
Hi! is subj implemented? Trying echo c /proc/sysrq-trigger on remote nodes and no fencing occurs. Best, Vladislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org