Re: [ClusterLabs] resource was disabled automatically

2017-03-07 Thread Ken Gaillot
On 03/06/2017 08:29 PM, cys wrote:
> At 2017-03-07 05:47:19, "Ken Gaillot"  wrote:
>> To figure out why a resource was stopped, you want to check the logs on
>> the DC (which will be the node with the most "pengine:" messages around
>> that time). When the PE decides a resource needs to be stopped, you'll
>> see a message like
>>
>>   notice: LogActions:  Stop()
>>
>> Often, by looking at the messages before that, you can see what led it
>> to decide that. Shortly after that, you'll see something like
>>
> 
> Thanks Ken. It's really helpful.
> Finally I found the debug log of pengine(in a separate file). It has this 
> message:
> "All nodes for resource p_vs-scheduler are unavailable, unclean or shutting 
> down..."
> So it seems this caused vs-scheduler disabled.
> 
> If all nodes come back to be in good state, will pengine start the resource 
> automatically?
> I did it manually yesterday.

Yes, whenever a node changes state (such as becoming available), the
pengine will recheck what can be done.


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] resource was disabled automatically

2017-03-06 Thread Ken Gaillot
On 03/06/2017 03:49 AM, cys wrote:
> Hi,
> 
> Today I found one resource was disabled. I checked that nobody did it.
> The logs showed crmd(or pengine?) stopped it. I don't known why.
> So I want to know will pacemaker disable resource automatically?
> If so, when and why?
> 
> Thanks.


Pacemaker will never set the target-role automatically, so if you mean
that something set target-role=Stopped, that happened outside the cluster.

If you just mean stopping, the cluster can stop a resource in response
to the configuration or conditions.

The pengine decides what needs to be done, the crmd coordinates it, and
the lrmd does it (for actions on resources, anyway). So all are involved
to some extent.

To figure out why a resource was stopped, you want to check the logs on
the DC (which will be the node with the most "pengine:" messages around
that time). When the PE decides a resource needs to be stopped, you'll
see a message like

   notice: LogActions:  Stop()

Often, by looking at the messages before that, you can see what led it
to decide that. Shortly after that, you'll see something like

   Calculated transition , saving inputs in 

That file will contain the state of the cluster at that moment. So you
can grab that for some deep diving. One of the things you can do with
that file is run crm_simulate on it, to get detailed info about why each
action was taken. "crm_simulate -Ssx " will show a somewhat
painful description of everything the cluster would do and the scores
that fed into the decision.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org