Re: [ClusterLabs] [cluster-lab] reboot standby node

2016-12-12 Thread Ken Gaillot
On 12/11/2016 04:19 PM, Omar Jaber wrote:
> Hi all ,
> 
> I have cluster contains three  nodes  with different sore  for location 
> constrain and  I have  group resource (it’s a service  exsists  in
> /etc/init.d/  folder)
> 
> Running  on the  node  the  have  the highest score  for   location
> constrain when I  try to  reboot one  of  the standby nodeI  see
> when the standby node become  up  the resource  stopped  in master node
> and restart againafter  I check the  pacemaker  status  I see the
> following error  :
> 
> "error: resource  'resource_name' is active on 2 nodes attempting
> recovery "  
> 
> Then I disables the  pcs  cluster  service in boot t time in standby
> node by run the command  "/_pcs_//cluster disable / " then I reboot the
> node  and I  see the resource  is started in standby node ( because  the
> resource  stored in /etc/init.d folder)
> 
> After that I  run the  pcs cluster  service  in standby node  and  I see
> the same  error is  generated  
> 
> "error: resource  'resource_name' is active on 2 nodes attempting
> recovery "
> 
>  
> 
> The problem  is  without reboot standby node this  problem not  happen
> for  example  
> 
> If  I stop pcs  cluster service  in standby , run the  resource  in
> standby node , then I start  pcs cluster
> 
> The error   "error: resource  'resource_name' is active on 2 nodes
> attempting recovery "   not  generated in this case.

Make sure your resource agent returns exit codes expected by Pacemaker:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-ocf-return-codes

In particular, if a monitor command returns 0 (OCF_SUCCESS), it means
the service is running.

When any node reboots, Pacemaker will "probe" the existing state of all
resources on it, by running a one-time monitor command. If the service
is not running, the command should return 7 (OCF_NOT_RUNNING).

So, I'm guessing that either the resource agent is wrongly returning 0
for monitor when the service is not actually running, or the node is
wrongly starting the service at boot, outside cluster control.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [cluster lab] session hang out and fails when shutting off a standby node

2016-09-15 Thread Nurit Vilosny
(sorry, now with the subject :) )

From: Nurit Vilosny
Sent: Thursday, September 15, 2016 2:27 PM
To: users@clusterlabs.org
Subject: [cluster lab]

Hi,
I am working in a 3 node HA cluster with a resource group.  I am seeing a weird 
behavior - whenever I shutdown one of the standby nodes (one without the 
resources) or starting it up again, my application hangs and UI not responsive.
I see that the requests are pending or fail on get proxy error.

What I don't understand is why another node can affect the resources (apache) 
on the active node

Thanks for the help!
Nurit Vilosny
SW Cloud Solutions Manager

Mellanox Technologies
13 Zarchin St. Raanana, Israel
Office: 972-74-712-9410
Cell: 972-54-4713000
Fax: 972-74-712-9111


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org