Re: [ClusterLabs] Regular pengine warnings after a transient failure

2016-03-07 Thread Ferenc Wágner
Ken Gaillot <kgail...@redhat.com> writes: > On 03/07/2016 07:31 AM, Ferenc Wágner wrote: > >> 12:55:13 vhbl07 crmd[8484]: notice: Transition aborted by >> vm-eiffel_monitor_6 'create' on vhbl05: Foreign event >> (magic=0:0;521:0:0:634eef05-39c1-4093-9

[ClusterLabs] Regular pengine warnings after a transient failure

2016-03-07 Thread Ferenc Wágner
Hi, A couple of days ago the nodes of our Pacemaker 1.1.14 cluster (vhbl0[3-7]) experienced temporary storage outage, leading to processes stucking randomly for a couple of minutes and big load spikes. There were 30 monitor operation timeouts altogether on vhbl05, and an internal error on the

[ClusterLabs] fencing by node name or by node ID

2016-02-21 Thread Ferenc Wágner
Hi, Last night a node in our cluster (Corosync 2.3.5, Pacemaker 1.1.14) experienced some failure and fell out of the cluster: Feb 21 22:11:12 vhbl06 corosync[3603]: [TOTEM ] A new membership (10.0.6.9:612) was formed. Members left: 167773709 Feb 21 22:11:12 vhbl06 corosync[3603]: [TOTEM ]

Re: [ClusterLabs] Antw: Re: crmsh configure delete for constraints

2016-02-10 Thread Ferenc Wágner
Vladislav Bogdanov writes: > If pacemaker has got an error on start, it will run stop with the same > set of parameters anyways. And will get error again if that one was > from validation and RA does not differentiate validation for start and > stop. And then circular

Re: [ClusterLabs] crmsh configure delete for constraints

2016-02-10 Thread Ferenc Wágner
Dejan Muhamedagic writes: > If the environment is no good (bad installation, missing configuration > and similar), then the stop operation probably won't do much good. Agreed. It may not even know how to probe it. > In ocf-rarun, validate_all is run, but then the

<    1   2