Re: [Pacemaker] Trouble with drbd/pacemaker: switch to secondary/secondary
In your post I didn't see any cluster configuration related to bnx2x only regarding IP address. On 18/10/16 10:05, Anne Nicolas wrote: > 2016-10-18 9:56 GMT+02:00 Vlad: >> Is something wrong with the network interface? >> >> [34114.046443] bnx2x :05:00.0 enp5s0f0: NIC Link is Down >> [34185.719207] bnx2x :05:00.0 enp5s0f0: NIC Link is Up, 1 Mbps >> full duplex, Flow control: ON - receive & transmit >> [34232.241599] bnx2x :05:00.0 enp5s0f0: NIC Link is Down >> [34268.637861] bnx2x :05:00.0 enp5s0f0: NIC Link is Up, 1 Mbps >> full duplex, Flow control: ON - receive & transmit > I don't think so. This interface is part of the cluster resource and > up on master only. So it seems this is due to resource restart rather. > >> >> On 14/10/16 17:54, Anne Nicolas wrote: >>> Hi! >>> >>> I'm having trouble with a 2 nodes cluster used for DRBD / Apache / Samba >>> and some other services. >>> >>> Whatever I do, it always goes to the following state: >>> >>> Last updated: Fri Oct 14 17:41:38 2016 >>> Last change: Thu Oct 13 10:42:29 2016 via cibadmin on bzvairsvr >>> Stack: corosync >>> Current DC: bzvairsvr (168430081) - partition with quorum >>> Version: 1.1.8-9.mga5-394e906 >>> 2 Nodes configured, unknown expected votes >>> 13 Resources configured. >>> >>> >>> Online: [ bzvairsvr bzvairsvr2 ] >>> >>> Master/Slave Set: drbdservClone [drbdserv] >>> Slaves: [ bzvairsvr bzvairsvr2 ] >>> Clone Set: fencing [st-ssh] >>> Started: [ bzvairsvr bzvairsvr2 ] >>> >>> When I reboot bzvairsvr2 this one goes primary again. But after a while >>> becomes secondary also. >>> I use a very basic fencing system based on ssh. It's not optimal but >>> enough for the current tests. >>> >>> Here are information about the configuration: >>> >>> node 168430081: bzvairsvr >>> node 168430082: bzvairsvr2 >>> primitive apache apache \ >>> params configfile="/etc/httpd/conf/httpd.conf" \ >>> op start interval=0 timeout=120s \ >>> op stop interval=0 timeout=120s >>> primitive clusterip IPaddr2 \ >>> params ip=192.168.100.1 cidr_netmask=24 nic=eno1 \ >>> meta target-role=Started >>> primitive clusterroute Route \ >>> params destination="0.0.0.0/0" gateway=192.168.100.254 >>> primitive drbdserv ocf:linbit:drbd \ >>> params drbd_resource=server \ >>> op monitor interval=30s role=Slave \ >>> op monitor interval=29s role=Master start-delay=30s >>> primitive fsserv Filesystem \ >>> params device="/dev/drbd/by-res/server" directory="/Server" >>> fstype=ext4 \ >>> op start interval=0 timeout=60s \ >>> op stop interval=0 timeout=60s \ >>> meta target-role=Started >>> primitive libvirt-guests systemd:libvirt-guests >>> primitive libvirtd systemd:libvirtd >>> primitive mysql systemd:mysqld >>> primitive named systemd:named >>> primitive samba systemd:smb >>> primitive st-ssh stonith:external/ssh \ >>> params hostlist="bzvairsvr bzvairsvr2" >>> group iphd clusterip clusterroute \ >>> meta target-role=Started >>> group services libvirtd libvirt-guests apache named mysql samba \ >>> meta target-role=Started >>> ms drbdservClone drbdserv \ >>> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 >>> notify=true target-role=Started >>> clone fencing st-ssh >>> colocation fs_on_drbd inf: fsserv drbdservClone:Master >>> colocation iphd_on_services inf: iphd services >>> colocation services_on_fsserv inf: services fsserv >>> order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start >>> order services_after_fsserv inf: fsserv services >>> property cib-bootstrap-options: \ >>> dc-version=1.1.8-9.mga5-394e906 \ >>> cluster-infrastructure=corosync \ >>> no-quorum-policy=ignore \ >>> stonith-enabled=true \ >>> >>> cluster logs are flooded by : >>> Oct 14 17:42:28 [3445] bzvairsvr attrd: notice: >>> attrd_trigger_update:Sending flush op to all hosts for: >>> master-drbdserv (1) >>> Oct 14 17:42:28 [3445] bzvairsvr attrd: notice: >>> attrd_perform_update:Sent update master-drbdserv=1 failed: >>> Transport endpoint is not connected >>> Oct 14 17:42:28 [3445] bzvairsvr attrd: notice: >>> attrd_perform_update:Sent update -107: master-drbdserv=1 >>> Oct 14 17:42:28 [3445] bzvairsvr attrd: warning: >>> attrd_cib_callback: Update master-drbdserv=1 failed: Transport >>> endpoint is not connected >>> Oct 14 17:42:59 [3445] bzvairsvr attrd: notice: >>> attrd_trigger_update:Sending flush op to all hosts for: >>> master-drbdserv (1) >>> Oct 14 17:42:59 [3445] bzvairsvr attrd: notice: >>> attrd_perform_update:Sent update master-drbdserv=1 failed: >>> Transport endpoint is not connected >>> Oct 14 17:42:59 [3445] bzvairsvr attrd: notice: >>> attrd_perform_update:Sent update -107: master-drbdserv=1 >>> Oct 14 17:42:59 [3445] bzvairsvr
Re: [Pacemaker] Trouble with drbd/pacemaker: switch to secondary/secondary
2016-10-18 9:56 GMT+02:00 Vlad: > Is something wrong with the network interface? > > [34114.046443] bnx2x :05:00.0 enp5s0f0: NIC Link is Down > [34185.719207] bnx2x :05:00.0 enp5s0f0: NIC Link is Up, 1 Mbps > full duplex, Flow control: ON - receive & transmit > [34232.241599] bnx2x :05:00.0 enp5s0f0: NIC Link is Down > [34268.637861] bnx2x :05:00.0 enp5s0f0: NIC Link is Up, 1 Mbps > full duplex, Flow control: ON - receive & transmit I don't think so. This interface is part of the cluster resource and up on master only. So it seems this is due to resource restart rather. > > > On 14/10/16 17:54, Anne Nicolas wrote: >> Hi! >> >> I'm having trouble with a 2 nodes cluster used for DRBD / Apache / Samba >> and some other services. >> >> Whatever I do, it always goes to the following state: >> >> Last updated: Fri Oct 14 17:41:38 2016 >> Last change: Thu Oct 13 10:42:29 2016 via cibadmin on bzvairsvr >> Stack: corosync >> Current DC: bzvairsvr (168430081) - partition with quorum >> Version: 1.1.8-9.mga5-394e906 >> 2 Nodes configured, unknown expected votes >> 13 Resources configured. >> >> >> Online: [ bzvairsvr bzvairsvr2 ] >> >> Master/Slave Set: drbdservClone [drbdserv] >> Slaves: [ bzvairsvr bzvairsvr2 ] >> Clone Set: fencing [st-ssh] >> Started: [ bzvairsvr bzvairsvr2 ] >> >> When I reboot bzvairsvr2 this one goes primary again. But after a while >> becomes secondary also. >> I use a very basic fencing system based on ssh. It's not optimal but >> enough for the current tests. >> >> Here are information about the configuration: >> >> node 168430081: bzvairsvr >> node 168430082: bzvairsvr2 >> primitive apache apache \ >> params configfile="/etc/httpd/conf/httpd.conf" \ >> op start interval=0 timeout=120s \ >> op stop interval=0 timeout=120s >> primitive clusterip IPaddr2 \ >> params ip=192.168.100.1 cidr_netmask=24 nic=eno1 \ >> meta target-role=Started >> primitive clusterroute Route \ >> params destination="0.0.0.0/0" gateway=192.168.100.254 >> primitive drbdserv ocf:linbit:drbd \ >> params drbd_resource=server \ >> op monitor interval=30s role=Slave \ >> op monitor interval=29s role=Master start-delay=30s >> primitive fsserv Filesystem \ >> params device="/dev/drbd/by-res/server" directory="/Server" >> fstype=ext4 \ >> op start interval=0 timeout=60s \ >> op stop interval=0 timeout=60s \ >> meta target-role=Started >> primitive libvirt-guests systemd:libvirt-guests >> primitive libvirtd systemd:libvirtd >> primitive mysql systemd:mysqld >> primitive named systemd:named >> primitive samba systemd:smb >> primitive st-ssh stonith:external/ssh \ >> params hostlist="bzvairsvr bzvairsvr2" >> group iphd clusterip clusterroute \ >> meta target-role=Started >> group services libvirtd libvirt-guests apache named mysql samba \ >> meta target-role=Started >> ms drbdservClone drbdserv \ >> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 >> notify=true target-role=Started >> clone fencing st-ssh >> colocation fs_on_drbd inf: fsserv drbdservClone:Master >> colocation iphd_on_services inf: iphd services >> colocation services_on_fsserv inf: services fsserv >> order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start >> order services_after_fsserv inf: fsserv services >> property cib-bootstrap-options: \ >> dc-version=1.1.8-9.mga5-394e906 \ >> cluster-infrastructure=corosync \ >> no-quorum-policy=ignore \ >> stonith-enabled=true \ >> >> cluster logs are flooded by : >> Oct 14 17:42:28 [3445] bzvairsvr attrd: notice: >> attrd_trigger_update:Sending flush op to all hosts for: >> master-drbdserv (1) >> Oct 14 17:42:28 [3445] bzvairsvr attrd: notice: >> attrd_perform_update:Sent update master-drbdserv=1 failed: >> Transport endpoint is not connected >> Oct 14 17:42:28 [3445] bzvairsvr attrd: notice: >> attrd_perform_update:Sent update -107: master-drbdserv=1 >> Oct 14 17:42:28 [3445] bzvairsvr attrd: warning: >> attrd_cib_callback: Update master-drbdserv=1 failed: Transport >> endpoint is not connected >> Oct 14 17:42:59 [3445] bzvairsvr attrd: notice: >> attrd_trigger_update:Sending flush op to all hosts for: >> master-drbdserv (1) >> Oct 14 17:42:59 [3445] bzvairsvr attrd: notice: >> attrd_perform_update:Sent update master-drbdserv=1 failed: >> Transport endpoint is not connected >> Oct 14 17:42:59 [3445] bzvairsvr attrd: notice: >> attrd_perform_update:Sent update -107: master-drbdserv=1 >> Oct 14 17:42:59 [3445] bzvairsvr attrd: warning: >> attrd_cib_callback: Update master-drbdserv=1 failed: Transport >> endpoint is not connected >> >> >> And here is dmesg >> >> [34067.547147] block drbd0: peer( Secondary -> Primary ) >> [34091.023206] block drbd0: peer(
Re: [Pacemaker] Trouble with drbd/pacemaker: switch to secondary/secondary
Is something wrong with the network interface? [34114.046443] bnx2x :05:00.0 enp5s0f0: NIC Link is Down [34185.719207] bnx2x :05:00.0 enp5s0f0: NIC Link is Up, 1 Mbps full duplex, Flow control: ON - receive & transmit [34232.241599] bnx2x :05:00.0 enp5s0f0: NIC Link is Down [34268.637861] bnx2x :05:00.0 enp5s0f0: NIC Link is Up, 1 Mbps full duplex, Flow control: ON - receive & transmit On 14/10/16 17:54, Anne Nicolas wrote: > Hi! > > I'm having trouble with a 2 nodes cluster used for DRBD / Apache / Samba > and some other services. > > Whatever I do, it always goes to the following state: > > Last updated: Fri Oct 14 17:41:38 2016 > Last change: Thu Oct 13 10:42:29 2016 via cibadmin on bzvairsvr > Stack: corosync > Current DC: bzvairsvr (168430081) - partition with quorum > Version: 1.1.8-9.mga5-394e906 > 2 Nodes configured, unknown expected votes > 13 Resources configured. > > > Online: [ bzvairsvr bzvairsvr2 ] > > Master/Slave Set: drbdservClone [drbdserv] > Slaves: [ bzvairsvr bzvairsvr2 ] > Clone Set: fencing [st-ssh] > Started: [ bzvairsvr bzvairsvr2 ] > > When I reboot bzvairsvr2 this one goes primary again. But after a while > becomes secondary also. > I use a very basic fencing system based on ssh. It's not optimal but > enough for the current tests. > > Here are information about the configuration: > > node 168430081: bzvairsvr > node 168430082: bzvairsvr2 > primitive apache apache \ > params configfile="/etc/httpd/conf/httpd.conf" \ > op start interval=0 timeout=120s \ > op stop interval=0 timeout=120s > primitive clusterip IPaddr2 \ > params ip=192.168.100.1 cidr_netmask=24 nic=eno1 \ > meta target-role=Started > primitive clusterroute Route \ > params destination="0.0.0.0/0" gateway=192.168.100.254 > primitive drbdserv ocf:linbit:drbd \ > params drbd_resource=server \ > op monitor interval=30s role=Slave \ > op monitor interval=29s role=Master start-delay=30s > primitive fsserv Filesystem \ > params device="/dev/drbd/by-res/server" directory="/Server" > fstype=ext4 \ > op start interval=0 timeout=60s \ > op stop interval=0 timeout=60s \ > meta target-role=Started > primitive libvirt-guests systemd:libvirt-guests > primitive libvirtd systemd:libvirtd > primitive mysql systemd:mysqld > primitive named systemd:named > primitive samba systemd:smb > primitive st-ssh stonith:external/ssh \ > params hostlist="bzvairsvr bzvairsvr2" > group iphd clusterip clusterroute \ > meta target-role=Started > group services libvirtd libvirt-guests apache named mysql samba \ > meta target-role=Started > ms drbdservClone drbdserv \ > meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 > notify=true target-role=Started > clone fencing st-ssh > colocation fs_on_drbd inf: fsserv drbdservClone:Master > colocation iphd_on_services inf: iphd services > colocation services_on_fsserv inf: services fsserv > order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start > order services_after_fsserv inf: fsserv services > property cib-bootstrap-options: \ > dc-version=1.1.8-9.mga5-394e906 \ > cluster-infrastructure=corosync \ > no-quorum-policy=ignore \ > stonith-enabled=true \ > > cluster logs are flooded by : > Oct 14 17:42:28 [3445] bzvairsvr attrd: notice: > attrd_trigger_update:Sending flush op to all hosts for: > master-drbdserv (1) > Oct 14 17:42:28 [3445] bzvairsvr attrd: notice: > attrd_perform_update:Sent update master-drbdserv=1 failed: > Transport endpoint is not connected > Oct 14 17:42:28 [3445] bzvairsvr attrd: notice: > attrd_perform_update:Sent update -107: master-drbdserv=1 > Oct 14 17:42:28 [3445] bzvairsvr attrd: warning: > attrd_cib_callback: Update master-drbdserv=1 failed: Transport > endpoint is not connected > Oct 14 17:42:59 [3445] bzvairsvr attrd: notice: > attrd_trigger_update:Sending flush op to all hosts for: > master-drbdserv (1) > Oct 14 17:42:59 [3445] bzvairsvr attrd: notice: > attrd_perform_update:Sent update master-drbdserv=1 failed: > Transport endpoint is not connected > Oct 14 17:42:59 [3445] bzvairsvr attrd: notice: > attrd_perform_update:Sent update -107: master-drbdserv=1 > Oct 14 17:42:59 [3445] bzvairsvr attrd: warning: > attrd_cib_callback: Update master-drbdserv=1 failed: Transport > endpoint is not connected > > > And here is dmesg > > [34067.547147] block drbd0: peer( Secondary -> Primary ) > [34091.023206] block drbd0: peer( Primary -> Secondary ) > [34096.616319] drbd server: peer( Secondary -> Unknown ) conn( Connected > -> TearDown ) pdsk( UpToDate -> DUnknown ) > [34096.616353] drbd server: asender terminated > [34096.616358] drbd server: Terminating drbd_a_server > [34096.682874] drbd server: Connection closed > [34096.682894] drbd
Re: [Pacemaker] Trouble with drbd/pacemaker: switch to secondary/secondary
Le 17/10/2016 à 11:42, Kristoffer Grönlund a écrit : > Anne Nicolaswrites: > >> Oct 14 17:42:59 [3445] bzvairsvr attrd: warning: >> attrd_cib_callback: Update master-drbdserv=1 failed: Transport >> endpoint is not connected > > Hi Anne, > > Wild guess: One or more ports is being blocked on at least one of the > nodes, probably by a firewall. > > Here's the list of basic ports that need to be open: > > TCP ports 2224, 3121, and 21064, and UDP port 5405. Well to make things easier, this test platform does not have any active firewall :/ > > Cheers, > Kristoffer > -- Anne Nicolas http://mageia.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Trouble with drbd/pacemaker: switch to secondary/secondary
Hi! I'm having trouble with a 2 nodes cluster used for DRBD / Apache / Samba and some other services. Whatever I do, it always goes to the following state: Last updated: Fri Oct 14 17:41:38 2016 Last change: Thu Oct 13 10:42:29 2016 via cibadmin on bzvairsvr Stack: corosync Current DC: bzvairsvr (168430081) - partition with quorum Version: 1.1.8-9.mga5-394e906 2 Nodes configured, unknown expected votes 13 Resources configured. Online: [ bzvairsvr bzvairsvr2 ] Master/Slave Set: drbdservClone [drbdserv] Slaves: [ bzvairsvr bzvairsvr2 ] Clone Set: fencing [st-ssh] Started: [ bzvairsvr bzvairsvr2 ] When I reboot bzvairsvr2 this one goes primary again. But after a while becomes secondary also. I use a very basic fencing system based on ssh. It's not optimal but enough for the current tests. Here are information about the configuration: node 168430081: bzvairsvr node 168430082: bzvairsvr2 primitive apache apache \ params configfile="/etc/httpd/conf/httpd.conf" \ op start interval=0 timeout=120s \ op stop interval=0 timeout=120s primitive clusterip IPaddr2 \ params ip=192.168.100.1 cidr_netmask=24 nic=eno1 \ meta target-role=Started primitive clusterroute Route \ params destination="0.0.0.0/0" gateway=192.168.100.254 primitive drbdserv ocf:linbit:drbd \ params drbd_resource=server \ op monitor interval=30s role=Slave \ op monitor interval=29s role=Master start-delay=30s primitive fsserv Filesystem \ params device="/dev/drbd/by-res/server" directory="/Server" fstype=ext4 \ op start interval=0 timeout=60s \ op stop interval=0 timeout=60s \ meta target-role=Started primitive libvirt-guests systemd:libvirt-guests primitive libvirtd systemd:libvirtd primitive mysql systemd:mysqld primitive named systemd:named primitive samba systemd:smb primitive st-ssh stonith:external/ssh \ params hostlist="bzvairsvr bzvairsvr2" group iphd clusterip clusterroute \ meta target-role=Started group services libvirtd libvirt-guests apache named mysql samba \ meta target-role=Started ms drbdservClone drbdserv \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=Started clone fencing st-ssh colocation fs_on_drbd inf: fsserv drbdservClone:Master colocation iphd_on_services inf: iphd services colocation services_on_fsserv inf: services fsserv order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start order services_after_fsserv inf: fsserv services property cib-bootstrap-options: \ dc-version=1.1.8-9.mga5-394e906 \ cluster-infrastructure=corosync \ no-quorum-policy=ignore \ stonith-enabled=true \ cluster logs are flooded by : Oct 14 17:42:28 [3445] bzvairsvr attrd: notice: attrd_trigger_update:Sending flush op to all hosts for: master-drbdserv (1) Oct 14 17:42:28 [3445] bzvairsvr attrd: notice: attrd_perform_update:Sent update master-drbdserv=1 failed: Transport endpoint is not connected Oct 14 17:42:28 [3445] bzvairsvr attrd: notice: attrd_perform_update:Sent update -107: master-drbdserv=1 Oct 14 17:42:28 [3445] bzvairsvr attrd: warning: attrd_cib_callback: Update master-drbdserv=1 failed: Transport endpoint is not connected Oct 14 17:42:59 [3445] bzvairsvr attrd: notice: attrd_trigger_update:Sending flush op to all hosts for: master-drbdserv (1) Oct 14 17:42:59 [3445] bzvairsvr attrd: notice: attrd_perform_update:Sent update master-drbdserv=1 failed: Transport endpoint is not connected Oct 14 17:42:59 [3445] bzvairsvr attrd: notice: attrd_perform_update:Sent update -107: master-drbdserv=1 Oct 14 17:42:59 [3445] bzvairsvr attrd: warning: attrd_cib_callback: Update master-drbdserv=1 failed: Transport endpoint is not connected And here is dmesg [34067.547147] block drbd0: peer( Secondary -> Primary ) [34091.023206] block drbd0: peer( Primary -> Secondary ) [34096.616319] drbd server: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) [34096.616353] drbd server: asender terminated [34096.616358] drbd server: Terminating drbd_a_server [34096.682874] drbd server: Connection closed [34096.682894] drbd server: conn( TearDown -> Unconnected ) [34096.682897] drbd server: receiver terminated [34096.682900] drbd server: Restarting receiver thread [34096.682902] drbd server: receiver (re)started [34096.682915] drbd server: conn( Unconnected -> WFConnection ) [34103.311898] drbd server: Handshake successful: Agreed network protocol version 101 [34103.311903] drbd server: Agreed to support TRIM on protocol level [34103.311997] drbd server: Peer authenticated using 20 bytes HMAC [34103.312046] drbd server: conn( WFConnection -> WFReportParams ) [34103.312062] drbd server: Starting asender thread (from drbd_r_server [4344]) [34103.380311] block drbd0: