Re: [Pacemaker] Fencing of bare-metal remote nodes

2014-12-19 Thread David Vossel


- Original Message -
 25.11.2014 23:41, David Vossel wrote:
 
 
  - Original Message -
  Hi!
 
  is subj implemented?
 
  Trying echo c  /proc/sysrq-trigger on remote nodes and no fencing occurs.
 
  Yes, fencing remote-nodes works. Are you certain your fencing devices can
  handle
  fencing the remote-node? Fencing a remote-node requires a cluster node to
  invoke the agent that actually performs the fencing action on the
  remote-node.
 
 David, a couple of questions.
 
 I see that in your fencing tests you just stop systemd unit.
 Shouldn't pacemaker_remoted somehow notify crmd that it is being
 shutdown? And shouldn't crmd stop all resources on that remote node
 before granting that shutdown?

yes, this needs to happen at some point.

Right now the shutdown method for a remote-node is to disable the connection
resource and wait for all the resources to stop before killing pacemaker_remoted
on the remote node. That isn't exactly ideal.


 Also, from what I see now it would be natural to hide current
 implementation of remote node configuration under node/ syntax. Now
 remote nodes do have almost all features of normal nodes, including node
 attributes. What do you think about it?

ha, well. yes. at this point that might make sense. I had originally never
planned on remote-nodes entering the actual nodes section, but eventually
that changed. I'd like for usage of remote nodes to mature a bit before I
commit to changing something like this though. I'm still a bit uncertain how
people are going to use baremetal remote nodes. The use cases people come
up with keep surprising me.  Keeping the remote node definition as a resource
gives us a bit more flexibility for configuration.

-- Vossel

 
 Best,
 Vladislav
 
 
  -- Vossel
 
 
  Best,
  Vladislav
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Fencing of bare-metal remote nodes

2014-12-11 Thread Vladislav Bogdanov

25.11.2014 23:41, David Vossel wrote:



- Original Message -

Hi!

is subj implemented?

Trying echo c  /proc/sysrq-trigger on remote nodes and no fencing occurs.


Yes, fencing remote-nodes works. Are you certain your fencing devices can handle
fencing the remote-node? Fencing a remote-node requires a cluster node to
invoke the agent that actually performs the fencing action on the remote-node.


David, a couple of questions.

I see that in your fencing tests you just stop systemd unit.
Shouldn't pacemaker_remoted somehow notify crmd that it is being 
shutdown? And shouldn't crmd stop all resources on that remote node 
before granting that shutdown?


Also, from what I see now it would be natural to hide current 
implementation of remote node configuration under node/ syntax. Now 
remote nodes do have almost all features of normal nodes, including node 
attributes. What do you think about it?


Best,
Vladislav



-- Vossel



Best,
Vladislav

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Fencing of bare-metal remote nodes

2014-11-26 Thread Vladislav Bogdanov
26.11.2014 18:36, David Vossel wrote:
 
 
 - Original Message -
 25.11.2014 23:41, David Vossel wrote:


 - Original Message -
 Hi!

 is subj implemented?

 Trying echo c  /proc/sysrq-trigger on remote nodes and no fencing occurs.

 Yes, fencing remote-nodes works. Are you certain your fencing devices can
 handle
 fencing the remote-node? Fencing a remote-node requires a cluster node to
 invoke the agent that actually performs the fencing action on the
 remote-node.

 Yes, if I invoke fencing action manually ('crm node fence rnode' in
 crmsh syntax), node is fenced. So the issue seems to be related to the
 detection of a need fencing.

 Comments in related git commits are a little bit terse in this area. So
 could you please explain what exactly needs to happen on a remote node
 to initiate fencing?

 I tried so far:
 * kill pacemaker_remoted when no resources are running. systemd restated
 it and crmd reconnected after some time.
 * crash kernel when no resources are running
 * crash kernel during massive start of resources
 
 this last one should definitely cause fencing. What version of pacemaker are
 you using? I've made changes in this area recently. Can you provide a 
 crm_report.

It's c191bf3.
crm_report is ready, but I still wait an approval from a customer to
send it.


 
 -- David
 

 No fencing happened. In the last case that start actions 'hung' and were
 failed by timeout (it is rather long), node was not even listed as
 failed. My customer asked me to stop crashing nodes because one of them
 does not boot anymore (I like that modern UEFI hardware very much.),
 so it is hard for me to play more with that.

 Best,
 Vladislav



 -- Vossel


 Best,
 Vladislav

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Fencing of bare-metal remote nodes

2014-11-26 Thread David Vossel


- Original Message -
 26.11.2014 18:36, David Vossel wrote:
  
  
  - Original Message -
  25.11.2014 23:41, David Vossel wrote:
 
 
  - Original Message -
  Hi!
 
  is subj implemented?
 
  Trying echo c  /proc/sysrq-trigger on remote nodes and no fencing
  occurs.
 
  Yes, fencing remote-nodes works. Are you certain your fencing devices can
  handle
  fencing the remote-node? Fencing a remote-node requires a cluster node to
  invoke the agent that actually performs the fencing action on the
  remote-node.
 
  Yes, if I invoke fencing action manually ('crm node fence rnode' in
  crmsh syntax), node is fenced. So the issue seems to be related to the
  detection of a need fencing.
 
  Comments in related git commits are a little bit terse in this area. So
  could you please explain what exactly needs to happen on a remote node
  to initiate fencing?
 
  I tried so far:
  * kill pacemaker_remoted when no resources are running. systemd restated
  it and crmd reconnected after some time.

This should definitely cause the remote-node to be fenced. I tested this
earlier today after reading you were having problems and my setup fenced
the remote-node correctly.

  * crash kernel when no resources are running

If a remote-node connection is lost and pacemaker was able to verify the
node is clean before the connection is lost, pacemaker will attempt to
reconnect to the remote-node without issuing a fencing request.

I could see why both fencing and not fencing in this situation could be desired.
Maybe i should make an option.

  * crash kernel during massive start of resources

This should definitely cause the remote node to be fenced.

  
  this last one should definitely cause fencing. What version of pacemaker
  are
  you using? I've made changes in this area recently. Can you provide a
  crm_report.
 
 It's c191bf3.
 crm_report is ready, but I still wait an approval from a customer to
 send it.

Great. I really need to see what you all are doing. Outside of my own setup I 
have
not seen many setups where pacemaker remote deployed on baremetal nodes. It is 
possible
something in your configuration exposes some edge case I haven't encountered 
yet.

There's a US holiday Thrusday and Friday, so I won't be able to look at this 
until next
week.

-- Vossel

 
  
  -- David
  
 
  No fencing happened. In the last case that start actions 'hung' and were
  failed by timeout (it is rather long), node was not even listed as
  failed. My customer asked me to stop crashing nodes because one of them
  does not boot anymore (I like that modern UEFI hardware very much.),
  so it is hard for me to play more with that.
 
  Best,
  Vladislav
 
 
 
  -- Vossel
 
 
  Best,
  Vladislav
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
  
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Fencing of bare-metal remote nodes

2014-11-25 Thread Vladislav Bogdanov
Hi!

is subj implemented?

Trying echo c  /proc/sysrq-trigger on remote nodes and no fencing occurs.

Best,
Vladislav

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org