Yikes. I don't have any suggestions. This is beyond me. Sorry. J.
On Sat, Oct 15, 2016 at 4:48 AM, Anne Nicolas <[email protected]> wrote: > Anne > http://mageia.org > > Le 15 oct. 2016 9:02 AM, "Jay Scott" <[email protected]> a écrit : > > > > > > Well, I'm a newbie myself. But this: > > drbdadm primary --force ___the name of the drbd res___ > > has worked for me. But I'm having lots of trouble myself, > > so... > > then there's this: > > drbdadm -- --overwrite-data-of-peer primary bravo > > (bravo happens to be my drbd res) and that should also > > strongarm one machine or another to be the primary. > > > > Well I used those commands it goes to primary but I czn see then pacemaker > switching it to secondary after some secondd > > > j. > > > > On Fri, Oct 14, 2016 at 3:22 PM, Anne Nicolas <[email protected]> wrote: > >> > >> Hi! > >> > >> I'm having trouble with a 2 nodes cluster used for DRBD / Apache / Samba > >> and some other services. > >> > >> Whatever I do, it always goes to the following state: > >> > >> Last updated: Fri Oct 14 17:41:38 2016 > >> Last change: Thu Oct 13 10:42:29 2016 via cibadmin on bzvairsvr > >> Stack: corosync > >> Current DC: bzvairsvr (168430081) - partition with quorum > >> Version: 1.1.8-9.mga5-394e906 > >> 2 Nodes configured, unknown expected votes > >> 13 Resources configured. > >> > >> > >> Online: [ bzvairsvr bzvairsvr2 ] > >> > >> Master/Slave Set: drbdservClone [drbdserv] > >> Slaves: [ bzvairsvr bzvairsvr2 ] > >> Clone Set: fencing [st-ssh] > >> Started: [ bzvairsvr bzvairsvr2 ] > >> > >> When I reboot bzvairsvr2 this one goes primary again. But after a while > >> becomes secondary also. > >> I use a very basic fencing system based on ssh. It's not optimal but > >> enough for the current tests. > >> > >> Here are information about the configuration: > >> > >> node 168430081: bzvairsvr > >> node 168430082: bzvairsvr2 > >> primitive apache apache \ > >> params configfile="/etc/httpd/conf/httpd.conf" \ > >> op start interval=0 timeout=120s \ > >> op stop interval=0 timeout=120s > >> primitive clusterip IPaddr2 \ > >> params ip=192.168.100.1 cidr_netmask=24 nic=eno1 \ > >> meta target-role=Started > >> primitive clusterroute Route \ > >> params destination="0.0.0.0/0" gateway=192.168.100.254 > >> primitive drbdserv ocf:linbit:drbd \ > >> params drbd_resource=server \ > >> op monitor interval=30s role=Slave \ > >> op monitor interval=29s role=Master start-delay=30s > >> primitive fsserv Filesystem \ > >> params device="/dev/drbd/by-res/server" directory="/Server" > >> fstype=ext4 \ > >> op start interval=0 timeout=60s \ > >> op stop interval=0 timeout=60s \ > >> meta target-role=Started > >> primitive libvirt-guests systemd:libvirt-guests > >> primitive libvirtd systemd:libvirtd > >> primitive mysql systemd:mysqld > >> primitive named systemd:named > >> primitive samba systemd:smb > >> primitive st-ssh stonith:external/ssh \ > >> params hostlist="bzvairsvr bzvairsvr2" > >> group iphd clusterip clusterroute \ > >> meta target-role=Started > >> group services libvirtd libvirt-guests apache named mysql samba \ > >> meta target-role=Started > >> ms drbdservClone drbdserv \ > >> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 > >> notify=true target-role=Started > >> clone fencing st-ssh > >> colocation fs_on_drbd inf: fsserv drbdservClone:Master > >> colocation iphd_on_services inf: iphd services > >> colocation services_on_fsserv inf: services fsserv > >> order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start > >> order services_after_fsserv inf: fsserv services > >> property cib-bootstrap-options: \ > >> dc-version=1.1.8-9.mga5-394e906 \ > >> cluster-infrastructure=corosync \ > >> no-quorum-policy=ignore \ > >> stonith-enabled=true \ > >> > >> cluster logs are flooded by : > >> Oct 14 17:42:28 [3445] bzvairsvr attrd: notice: > >> attrd_trigger_update: Sending flush op to all hosts for: > >> master-drbdserv (10000) > >> Oct 14 17:42:28 [3445] bzvairsvr attrd: notice: > >> attrd_perform_update: Sent update master-drbdserv=10000 failed: > >> Transport endpoint is not connected > >> Oct 14 17:42:28 [3445] bzvairsvr attrd: notice: > >> attrd_perform_update: Sent update -107: master-drbdserv=10000 > >> Oct 14 17:42:28 [3445] bzvairsvr attrd: warning: > >> attrd_cib_callback: Update master-drbdserv=10000 failed: Transport > >> endpoint is not connected > >> Oct 14 17:42:59 [3445] bzvairsvr attrd: notice: > >> attrd_trigger_update: Sending flush op to all hosts for: > >> master-drbdserv (10000) > >> Oct 14 17:42:59 [3445] bzvairsvr attrd: notice: > >> attrd_perform_update: Sent update master-drbdserv=10000 failed: > >> Transport endpoint is not connected > >> Oct 14 17:42:59 [3445] bzvairsvr attrd: notice: > >> attrd_perform_update: Sent update -107: master-drbdserv=10000 > >> Oct 14 17:42:59 [3445] bzvairsvr attrd: warning: > >> attrd_cib_callback: Update master-drbdserv=10000 failed: Transport > >> endpoint is not connected > >> > >> > >> And here is dmesg > >> > >> [34067.547147] block drbd0: peer( Secondary -> Primary ) > >> [34091.023206] block drbd0: peer( Primary -> Secondary ) > >> [34096.616319] drbd server: peer( Secondary -> Unknown ) conn( Connected > >> -> TearDown ) pdsk( UpToDate -> DUnknown ) > >> [34096.616353] drbd server: asender terminated > >> [34096.616358] drbd server: Terminating drbd_a_server > >> [34096.682874] drbd server: Connection closed > >> [34096.682894] drbd server: conn( TearDown -> Unconnected ) > >> [34096.682897] drbd server: receiver terminated > >> [34096.682900] drbd server: Restarting receiver thread > >> [34096.682902] drbd server: receiver (re)started > >> [34096.682915] drbd server: conn( Unconnected -> WFConnection ) > >> [34103.311898] drbd server: Handshake successful: Agreed network > >> protocol version 101 > >> [34103.311903] drbd server: Agreed to support TRIM on protocol level > >> [34103.311997] drbd server: Peer authenticated using 20 bytes HMAC > >> [34103.312046] drbd server: conn( WFConnection -> WFReportParams ) > >> [34103.312062] drbd server: Starting asender thread (from drbd_r_server > >> [4344]) > >> [34103.380311] block drbd0: drbd_sync_handshake: > >> [34103.380318] block drbd0: self > >> 8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0 > >> bits:0 flags:0 > >> [34103.380323] block drbd0: peer > >> 8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0 > >> bits:0 flags:0 > >> [34103.380327] block drbd0: uuid_compare()=0 by rule 40 > >> [34103.380335] block drbd0: peer( Unknown -> Secondary ) conn( > >> WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate ) > >> [34114.046443] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down > >> [34123.802580] drbd server: PingAck did not arrive in time. > >> [34123.802617] drbd server: peer( Secondary -> Unknown ) conn( Connected > >> -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) > >> [34123.802773] drbd server: asender terminated > >> [34123.802777] drbd server: Terminating drbd_a_server > >> [34123.932565] drbd server: Connection closed > >> [34123.932585] drbd server: conn( NetworkFailure -> Unconnected ) > >> [34123.932588] drbd server: receiver terminated > >> [34123.932590] drbd server: Restarting receiver thread > >> [34123.932592] drbd server: receiver (re)started > >> [34123.932605] drbd server: conn( Unconnected -> WFConnection ) > >> [34185.719207] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps > >> full duplex, Flow control: ON - receive & transmit > >> [34232.241599] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down > >> [34268.637861] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps > >> full duplex, Flow control: ON - receive & transmit > >> [34318.675122] drbd server: Handshake successful: Agreed network > >> protocol version 101 > >> [34318.675128] drbd server: Agreed to support TRIM on protocol level > >> [34318.675218] drbd server: Peer authenticated using 20 bytes HMAC > >> [34318.675258] drbd server: conn( WFConnection -> WFReportParams ) > >> [34318.675276] drbd server: Starting asender thread (from drbd_r_server > >> [4344]) > >> [34318.738909] block drbd0: drbd_sync_handshake: > >> [34318.738916] block drbd0: self > >> 8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0 > >> bits:0 flags:0 > >> [34318.738921] block drbd0: peer > >> 8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0 > >> bits:0 flags:0 > >> [34318.738924] block drbd0: uuid_compare()=0 by rule 40 > >> [34318.738933] block drbd0: peer( Unknown -> Secondary ) conn( > >> WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate ) > >> [34328.812317] block drbd0: peer( Secondary -> Primary ) > >> [37316.065793] usb 3-11: USB disconnect, device number 3 > >> [52246.642265] block drbd0: peer( Primary -> Secondary ) > >> > >> Any help would be appreciated > >> > >> Cheers > >> > >> -- > >> Anne Nicolas > >> http://mageia.org > >> > >> _______________________________________________ > >> Users mailing list: [email protected] > >> http://clusterlabs.org/mailman/listinfo/users > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: http://www.clusterlabs.org/ > doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > > > > > > > > _______________________________________________ > > Users mailing list: [email protected] > > http://clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > >
_______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
