subject:"\"Re\\\: \\\[ClusterLabs\\\] Corosync\\\+Pacemaker error during failover\""

Re: [ClusterLabs] Corosync+Pacemaker error during failover

2016-01-15 Thread priyanka

On 2015-10-08 21:20, Ken Gaillot wrote:

On 10/08/2015 10:16 AM, priyanka wrote:

Hi,

We are trying to build a HA setup for our servers using DRBD +
Corosync

+ pacemaker stack.

Attached is the configuration file for corosync/pacemaker and drbd.

A few things I noticed:

* Don't set become-primary-on in the DRBD configuration in a
Pacemaker

cluster; Pacemaker should handle all promotions to primary.

* I'm no NFS expert, but why is res_exportfs_root cloned? Can both
servers export it at the same time? I would expect it to be in the
group

before res_exportfs_export1.

We have followed following configuration guide for our setup,

https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha_techguides/book_sleha_techguides.html

which suggests to create clone of this resource. This resource will not
export actual data, data is exported by exportfs_export1 resource in our
setup. I did try the previous fail-over scenario without cloning this
resource but same error appeared.

* Your constraints need some adjustment. Partly it depends on the
answer

to the previous question, but currently res_fs (via the group) is
ordered after res_exportfs_root, and I don't see how that could work.

We are getting errors while testing this setup.
1. When we stop corosync on Master machine say server1(lock), it is
Stonith'ed. In this case slave-server2(sher) is promoted to master.
But when server1(lock) reboots res_exportfs_export1 is started on
both the servers and that resource goes into failed state followed
by

servers going into unclean state.
Then server1(lock) reboots and server2(sher) is master but in
unclean

state. After server1(lock) comes up, server2(sher) is stonith'ed and
server1(lock) is slave(the only online node).
When server2(sher) comes up, both the servers are slaves and
resource

group(rg_export) is stopped. Then server2(sher) becomes Master and
server1(lock) is slave and resource group is started.
At this point configuration becomes stable.

PFA logs(syslog) of server2(sher) after it is promoted to master
till it

is first rebooted when resource exportfs goes into failed state.

Please let us know if the configuration is appropriate. From the
logs we

could not figure out exact reason of resource failure.
Your comment on this scenario will be very helpful.

Thanks,
Priyanka

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org

--
Regards,
Priyanka
MTech3 Sysad
IIT Powai

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Corosync+Pacemaker error during failover

2016-01-15 Thread priyanka


On 2015-10-08 21:05, Digimer wrote:

On 08/10/15 11:16 AM, priyanka wrote:

fencing resource-only;


This needs to be 'fencing resource-and-stonith;'.
I did set the suggested parameter but error persists. Apparently node 
which comes back after fail-over is not able to detect res_exportfs_root 
on current master. Following is the log trace:



Jan 14 16:37:18 sher pengine[1383]:   notice: unpack_config: On loss of 
CCM Quorum: Ignore
Jan 14 16:37:18 sher pengine[1383]:  warning: unpack_rsc_op: Processing 
failed op monitor for res_exportfs_root:0 on sher: not running (7)
Jan 14 16:37:18 sher pengine[1383]:  warning: unpack_rsc_op: Processing 
failed op monitor for fence_lock on sher: unknown error (1)
Jan 14 16:37:18 sher pengine[1383]:error: native_create_actions: 
Resource res_exportfs_export1 (ocf::exportfs) is active on 2 nodes 
attempting recovery
Jan 14 16:37:18 sher pengine[1383]:  warning: native_create_actions: 
See http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more 
information.
Jan 14 16:37:18 sher pengine[1383]:   notice: LogActions: Start   
fence_sher#011(lock)
Jan 14 16:37:18 sher pengine[1383]:   notice: LogActions: Start   
res_drbd_export:1#011(lock)
Jan 14 16:37:18 sher pengine[1383]:   notice: LogActions: Restart 
res_exportfs_export1#011(Started sher)
Jan 14 16:37:18 sher pengine[1383]:   notice: LogActions: Start   
res_nfsserver:1#011(lock)
Jan 14 16:37:18 sher pengine[1383]:error: process_pe_message: 
Calculated Transition 7: /var/lib/pacemaker/pengine/pe-error-352.bz2
Jan 14 16:37:18 sher crmd[1384]:   notice: te_rsc_command: Initiating 
action 11: start fence_sher_start_0 on lock
Jan 14 16:37:18 sher crmd[1384]:   notice: te_rsc_command: Initiating 
action 50: stop res_exportfs_export1_stop_0 on lock
Jan 14 16:37:18 sher crmd[1384]:   notice: te_rsc_command: Initiating 
action 49: stop res_exportfs_export1_stop_0 on sher (local)
Jan 14 16:37:18 sher crmd[1384]:   notice: te_rsc_command: Initiating 
action 68: monitor res_exportfs_root_monitor_3 on lock
Jan 14 16:37:18 sher crmd[1384]:   notice: te_rsc_command: Initiating 
action 76: notify res_drbd_export_pre_notify_start_0 on sher (local)
Jan 14 16:37:18 sher crmd[1384]:   notice: te_rsc_command: Initiating 
action 58: start res_nfsserver_start_0 on lock



I have pacemaker 1.1.10 installed in my setup, should I try upgrade?

--
Regards,
Priyanka


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Corosync+Pacemaker error during failover

2016-01-15 Thread priyanka


On 2015-10-08 20:52, emmanuel segura wrote:

please check if you drbd is configured to call fence-handler
https://drbd.linbit.com/users-guide/s-pacemaker-fencing.html


yes.


2015-10-08 17:16 GMT+02:00 priyanka :

Hi,

We are trying to build a HA setup for our servers using DRBD + 
Corosync +

pacemaker stack.

Attached is the configuration file for corosync/pacemaker and drbd.

We are getting errors while testing this setup.
1. When we stop corosync on Master machine say server1(lock), it is
Stonith'ed. In this case slave-server2(sher) is promoted to master.
   But when server1(lock) reboots res_exportfs_export1 is started on 
both
the servers and that resource goes into failed state followed by 
servers

going into unclean state.
   Then server1(lock) reboots and server2(sher) is master but in 
unclean

state. After server1(lock) comes up, server2(sher) is stonith'ed and
server1(lock) is slave(the only online node).
   When server2(sher) comes up, both the servers are slaves and 
resource

group(rg_export) is stopped. Then server2(sher) becomes Master and
server1(lock) is slave and resource group is started.
   At this point configuration becomes stable.


PFA logs(syslog) of server2(sher) after it is promoted to master 
till it is

first rebooted when resource exportfs goes into failed state.

Please let us know if the configuration is appropriate. From the 
logs we

could not figure out exact reason of resource failure.
Your comment on this scenario will be very helpful.

Thanks,
Priyanka


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org



--
Regards,
Priyanka
MTech3 Sysad
IIT Powai

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Corosync+Pacemaker error during failover

2015-10-08 Thread Ken Gaillot

On 10/08/2015 10:16 AM, priyanka wrote:
> Hi,
> 
> We are trying to build a HA setup for our servers using DRBD + Corosync
> + pacemaker stack.
> 
> Attached is the configuration file for corosync/pacemaker and drbd.

A few things I noticed:

* Don't set become-primary-on in the DRBD configuration in a Pacemaker
cluster; Pacemaker should handle all promotions to primary.

* I'm no NFS expert, but why is res_exportfs_root cloned? Can both
servers export it at the same time? I would expect it to be in the group
before res_exportfs_export1.

* Your constraints need some adjustment. Partly it depends on the answer
to the previous question, but currently res_fs (via the group) is
ordered after res_exportfs_root, and I don't see how that could work.

> We are getting errors while testing this setup.
> 1. When we stop corosync on Master machine say server1(lock), it is
> Stonith'ed. In this case slave-server2(sher) is promoted to master.
>But when server1(lock) reboots res_exportfs_export1 is started on
> both the servers and that resource goes into failed state followed by
> servers going into unclean state.
>Then server1(lock) reboots and server2(sher) is master but in unclean
> state. After server1(lock) comes up, server2(sher) is stonith'ed and
> server1(lock) is slave(the only online node).
>When server2(sher) comes up, both the servers are slaves and resource
> group(rg_export) is stopped. Then server2(sher) becomes Master and
> server1(lock) is slave and resource group is started.
>At this point configuration becomes stable.
> 
> 
> PFA logs(syslog) of server2(sher) after it is promoted to master till it
> is first rebooted when resource exportfs goes into failed state.
> 
> Please let us know if the configuration is appropriate. From the logs we
> could not figure out exact reason of resource failure.
> Your comment on this scenario will be very helpful.
> 
> Thanks,
> Priyanka
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Corosync+Pacemaker error during failover

2015-10-08 Thread Digimer

On 08/10/15 11:16 AM, priyanka wrote:
>   fencing resource-only;

This needs to be 'fencing resource-and-stonith;'.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Corosync+Pacemaker error during failover

2015-10-08 Thread emmanuel segura

please check if you drbd is configured to call fence-handler
https://drbd.linbit.com/users-guide/s-pacemaker-fencing.html

2015-10-08 17:16 GMT+02:00 priyanka :
> Hi,
>
> We are trying to build a HA setup for our servers using DRBD + Corosync +
> pacemaker stack.
>
> Attached is the configuration file for corosync/pacemaker and drbd.
>
> We are getting errors while testing this setup.
> 1. When we stop corosync on Master machine say server1(lock), it is
> Stonith'ed. In this case slave-server2(sher) is promoted to master.
>But when server1(lock) reboots res_exportfs_export1 is started on both
> the servers and that resource goes into failed state followed by servers
> going into unclean state.
>Then server1(lock) reboots and server2(sher) is master but in unclean
> state. After server1(lock) comes up, server2(sher) is stonith'ed and
> server1(lock) is slave(the only online node).
>When server2(sher) comes up, both the servers are slaves and resource
> group(rg_export) is stopped. Then server2(sher) becomes Master and
> server1(lock) is slave and resource group is started.
>At this point configuration becomes stable.
>
>
> PFA logs(syslog) of server2(sher) after it is promoted to master till it is
> first rebooted when resource exportfs goes into failed state.
>
> Please let us know if the configuration is appropriate. From the logs we
> could not figure out exact reason of resource failure.
> Your comment on this scenario will be very helpful.
>
> Thanks,
> Priyanka
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Corosync+Pacemaker error during failover

Re: [ClusterLabs] Corosync+Pacemaker error during failover

Re: [ClusterLabs] Corosync+Pacemaker error during failover

Re: [ClusterLabs] Corosync+Pacemaker error during failover

Re: [ClusterLabs] Corosync+Pacemaker error during failover

Re: [ClusterLabs] Corosync+Pacemaker error during failover

6 matches

Site Navigation

Mail list logo

Footer information