Re: [ClusterLabs] Corosync+Pacemaker error during failover
On 2015-10-08 21:20, Ken Gaillot wrote: On 10/08/2015 10:16 AM, priyanka wrote: Hi, We are trying to build a HA setup for our servers using DRBD + Corosync + pacemaker stack. Attached is the configuration file for corosync/pacemaker and drbd. A few things I noticed: * Don't set become-primary-on in the DRBD configuration in a Pacemaker cluster; Pacemaker should handle all promotions to primary. * I'm no NFS expert, but why is res_exportfs_root cloned? Can both servers export it at the same time? I would expect it to be in the group before res_exportfs_export1. We have followed following configuration guide for our setup, https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha_techguides/book_sleha_techguides.html which suggests to create clone of this resource. This resource will not export actual data, data is exported by exportfs_export1 resource in our setup. I did try the previous fail-over scenario without cloning this resource but same error appeared. * Your constraints need some adjustment. Partly it depends on the answer to the previous question, but currently res_fs (via the group) is ordered after res_exportfs_root, and I don't see how that could work. We are getting errors while testing this setup. 1. When we stop corosync on Master machine say server1(lock), it is Stonith'ed. In this case slave-server2(sher) is promoted to master. But when server1(lock) reboots res_exportfs_export1 is started on both the servers and that resource goes into failed state followed by servers going into unclean state. Then server1(lock) reboots and server2(sher) is master but in unclean state. After server1(lock) comes up, server2(sher) is stonith'ed and server1(lock) is slave(the only online node). When server2(sher) comes up, both the servers are slaves and resource group(rg_export) is stopped. Then server2(sher) becomes Master and server1(lock) is slave and resource group is started. At this point configuration becomes stable. PFA logs(syslog) of server2(sher) after it is promoted to master till it is first rebooted when resource exportfs goes into failed state. Please let us know if the configuration is appropriate. From the logs we could not figure out exact reason of resource failure. Your comment on this scenario will be very helpful. Thanks, Priyanka ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Regards, Priyanka MTech3 Sysad IIT Powai ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync+Pacemaker error during failover
On 2015-10-08 21:05, Digimer wrote: On 08/10/15 11:16 AM, priyanka wrote: fencing resource-only; This needs to be 'fencing resource-and-stonith;'. I did set the suggested parameter but error persists. Apparently node which comes back after fail-over is not able to detect res_exportfs_root on current master. Following is the log trace: Jan 14 16:37:18 sher pengine[1383]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 14 16:37:18 sher pengine[1383]: warning: unpack_rsc_op: Processing failed op monitor for res_exportfs_root:0 on sher: not running (7) Jan 14 16:37:18 sher pengine[1383]: warning: unpack_rsc_op: Processing failed op monitor for fence_lock on sher: unknown error (1) Jan 14 16:37:18 sher pengine[1383]:error: native_create_actions: Resource res_exportfs_export1 (ocf::exportfs) is active on 2 nodes attempting recovery Jan 14 16:37:18 sher pengine[1383]: warning: native_create_actions: See http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information. Jan 14 16:37:18 sher pengine[1383]: notice: LogActions: Start fence_sher#011(lock) Jan 14 16:37:18 sher pengine[1383]: notice: LogActions: Start res_drbd_export:1#011(lock) Jan 14 16:37:18 sher pengine[1383]: notice: LogActions: Restart res_exportfs_export1#011(Started sher) Jan 14 16:37:18 sher pengine[1383]: notice: LogActions: Start res_nfsserver:1#011(lock) Jan 14 16:37:18 sher pengine[1383]:error: process_pe_message: Calculated Transition 7: /var/lib/pacemaker/pengine/pe-error-352.bz2 Jan 14 16:37:18 sher crmd[1384]: notice: te_rsc_command: Initiating action 11: start fence_sher_start_0 on lock Jan 14 16:37:18 sher crmd[1384]: notice: te_rsc_command: Initiating action 50: stop res_exportfs_export1_stop_0 on lock Jan 14 16:37:18 sher crmd[1384]: notice: te_rsc_command: Initiating action 49: stop res_exportfs_export1_stop_0 on sher (local) Jan 14 16:37:18 sher crmd[1384]: notice: te_rsc_command: Initiating action 68: monitor res_exportfs_root_monitor_3 on lock Jan 14 16:37:18 sher crmd[1384]: notice: te_rsc_command: Initiating action 76: notify res_drbd_export_pre_notify_start_0 on sher (local) Jan 14 16:37:18 sher crmd[1384]: notice: te_rsc_command: Initiating action 58: start res_nfsserver_start_0 on lock I have pacemaker 1.1.10 installed in my setup, should I try upgrade? -- Regards, Priyanka ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync+Pacemaker error during failover
On 2015-10-08 20:52, emmanuel segura wrote: please check if you drbd is configured to call fence-handler https://drbd.linbit.com/users-guide/s-pacemaker-fencing.html yes. 2015-10-08 17:16 GMT+02:00 priyanka : Hi, We are trying to build a HA setup for our servers using DRBD + Corosync + pacemaker stack. Attached is the configuration file for corosync/pacemaker and drbd. We are getting errors while testing this setup. 1. When we stop corosync on Master machine say server1(lock), it is Stonith'ed. In this case slave-server2(sher) is promoted to master. But when server1(lock) reboots res_exportfs_export1 is started on both the servers and that resource goes into failed state followed by servers going into unclean state. Then server1(lock) reboots and server2(sher) is master but in unclean state. After server1(lock) comes up, server2(sher) is stonith'ed and server1(lock) is slave(the only online node). When server2(sher) comes up, both the servers are slaves and resource group(rg_export) is stopped. Then server2(sher) becomes Master and server1(lock) is slave and resource group is started. At this point configuration becomes stable. PFA logs(syslog) of server2(sher) after it is promoted to master till it is first rebooted when resource exportfs goes into failed state. Please let us know if the configuration is appropriate. From the logs we could not figure out exact reason of resource failure. Your comment on this scenario will be very helpful. Thanks, Priyanka ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Regards, Priyanka MTech3 Sysad IIT Powai ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync+Pacemaker error during failover
On 10/08/2015 10:16 AM, priyanka wrote: > Hi, > > We are trying to build a HA setup for our servers using DRBD + Corosync > + pacemaker stack. > > Attached is the configuration file for corosync/pacemaker and drbd. A few things I noticed: * Don't set become-primary-on in the DRBD configuration in a Pacemaker cluster; Pacemaker should handle all promotions to primary. * I'm no NFS expert, but why is res_exportfs_root cloned? Can both servers export it at the same time? I would expect it to be in the group before res_exportfs_export1. * Your constraints need some adjustment. Partly it depends on the answer to the previous question, but currently res_fs (via the group) is ordered after res_exportfs_root, and I don't see how that could work. > We are getting errors while testing this setup. > 1. When we stop corosync on Master machine say server1(lock), it is > Stonith'ed. In this case slave-server2(sher) is promoted to master. >But when server1(lock) reboots res_exportfs_export1 is started on > both the servers and that resource goes into failed state followed by > servers going into unclean state. >Then server1(lock) reboots and server2(sher) is master but in unclean > state. After server1(lock) comes up, server2(sher) is stonith'ed and > server1(lock) is slave(the only online node). >When server2(sher) comes up, both the servers are slaves and resource > group(rg_export) is stopped. Then server2(sher) becomes Master and > server1(lock) is slave and resource group is started. >At this point configuration becomes stable. > > > PFA logs(syslog) of server2(sher) after it is promoted to master till it > is first rebooted when resource exportfs goes into failed state. > > Please let us know if the configuration is appropriate. From the logs we > could not figure out exact reason of resource failure. > Your comment on this scenario will be very helpful. > > Thanks, > Priyanka > > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync+Pacemaker error during failover
On 08/10/15 11:16 AM, priyanka wrote: > fencing resource-only; This needs to be 'fencing resource-and-stonith;'. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync+Pacemaker error during failover
please check if you drbd is configured to call fence-handler https://drbd.linbit.com/users-guide/s-pacemaker-fencing.html 2015-10-08 17:16 GMT+02:00 priyanka : > Hi, > > We are trying to build a HA setup for our servers using DRBD + Corosync + > pacemaker stack. > > Attached is the configuration file for corosync/pacemaker and drbd. > > We are getting errors while testing this setup. > 1. When we stop corosync on Master machine say server1(lock), it is > Stonith'ed. In this case slave-server2(sher) is promoted to master. >But when server1(lock) reboots res_exportfs_export1 is started on both > the servers and that resource goes into failed state followed by servers > going into unclean state. >Then server1(lock) reboots and server2(sher) is master but in unclean > state. After server1(lock) comes up, server2(sher) is stonith'ed and > server1(lock) is slave(the only online node). >When server2(sher) comes up, both the servers are slaves and resource > group(rg_export) is stopped. Then server2(sher) becomes Master and > server1(lock) is slave and resource group is started. >At this point configuration becomes stable. > > > PFA logs(syslog) of server2(sher) after it is promoted to master till it is > first rebooted when resource exportfs goes into failed state. > > Please let us know if the configuration is appropriate. From the logs we > could not figure out exact reason of resource failure. > Your comment on this scenario will be very helpful. > > Thanks, > Priyanka > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- .~. /V\ // \\ /( )\ ^`~'^ ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org