Re: [Pacemaker] Trouble with drbd/pacemaker: switch to secondary/secondary

2016-10-21 Thread Vlad
In your post I didn't see any cluster configuration related to bnx2x
only regarding IP address.

On 18/10/16 10:05, Anne Nicolas wrote:
> 2016-10-18 9:56 GMT+02:00 Vlad :
>> Is something wrong with the network interface?
>>
>> [34114.046443] bnx2x :05:00.0 enp5s0f0: NIC Link is Down
>> [34185.719207] bnx2x :05:00.0 enp5s0f0: NIC Link is Up, 1 Mbps
>> full duplex, Flow control: ON - receive & transmit
>> [34232.241599] bnx2x :05:00.0 enp5s0f0: NIC Link is Down
>> [34268.637861] bnx2x :05:00.0 enp5s0f0: NIC Link is Up, 1 Mbps
>> full duplex, Flow control: ON - receive & transmit
> I don't think so. This interface is part of the cluster resource and
> up on master only. So it seems this is due to resource restart rather.
>
>>
>> On 14/10/16 17:54, Anne Nicolas wrote:
>>> Hi!
>>>
>>> I'm having trouble with a 2 nodes cluster used for DRBD / Apache / Samba
>>> and some other services.
>>>
>>> Whatever I do, it always goes to the following state:
>>>
>>> Last updated: Fri Oct 14 17:41:38 2016
>>> Last change: Thu Oct 13 10:42:29 2016 via cibadmin on bzvairsvr
>>> Stack: corosync
>>> Current DC: bzvairsvr (168430081) - partition with quorum
>>> Version: 1.1.8-9.mga5-394e906
>>> 2 Nodes configured, unknown expected votes
>>> 13 Resources configured.
>>>
>>>
>>> Online: [ bzvairsvr bzvairsvr2 ]
>>>
>>>  Master/Slave Set: drbdservClone [drbdserv]
>>>  Slaves: [ bzvairsvr bzvairsvr2 ]
>>>  Clone Set: fencing [st-ssh]
>>>  Started: [ bzvairsvr bzvairsvr2 ]
>>>
>>> When I reboot bzvairsvr2 this one goes primary again. But after a while
>>> becomes secondary also.
>>> I use a very basic fencing system based on ssh. It's not optimal but
>>> enough for the current tests.
>>>
>>> Here are information about the configuration:
>>>
>>> node 168430081: bzvairsvr
>>> node 168430082: bzvairsvr2
>>> primitive apache apache \
>>> params configfile="/etc/httpd/conf/httpd.conf" \
>>> op start interval=0 timeout=120s \
>>> op stop interval=0 timeout=120s
>>> primitive clusterip IPaddr2 \
>>> params ip=192.168.100.1 cidr_netmask=24 nic=eno1 \
>>> meta target-role=Started
>>> primitive clusterroute Route \
>>> params destination="0.0.0.0/0" gateway=192.168.100.254
>>> primitive drbdserv ocf:linbit:drbd \
>>> params drbd_resource=server \
>>> op monitor interval=30s role=Slave \
>>> op monitor interval=29s role=Master start-delay=30s
>>> primitive fsserv Filesystem \
>>> params device="/dev/drbd/by-res/server" directory="/Server"
>>> fstype=ext4 \
>>> op start interval=0 timeout=60s \
>>> op stop interval=0 timeout=60s \
>>> meta target-role=Started
>>> primitive libvirt-guests systemd:libvirt-guests
>>> primitive libvirtd systemd:libvirtd
>>> primitive mysql systemd:mysqld
>>> primitive named systemd:named
>>> primitive samba systemd:smb
>>> primitive st-ssh stonith:external/ssh \
>>> params hostlist="bzvairsvr bzvairsvr2"
>>> group iphd clusterip clusterroute \
>>> meta target-role=Started
>>> group services libvirtd libvirt-guests apache named mysql samba \
>>> meta target-role=Started
>>> ms drbdservClone drbdserv \
>>> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
>>> notify=true target-role=Started
>>> clone fencing st-ssh
>>> colocation fs_on_drbd inf: fsserv drbdservClone:Master
>>> colocation iphd_on_services inf: iphd services
>>> colocation services_on_fsserv inf: services fsserv
>>> order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start
>>> order services_after_fsserv inf: fsserv services
>>> property cib-bootstrap-options: \
>>> dc-version=1.1.8-9.mga5-394e906 \
>>> cluster-infrastructure=corosync \
>>> no-quorum-policy=ignore \
>>> stonith-enabled=true \
>>>
>>> cluster logs are flooded by :
>>> Oct 14 17:42:28 [3445] bzvairsvr  attrd:   notice:
>>> attrd_trigger_update:Sending flush op to all hosts for:
>>> master-drbdserv (1)
>>> Oct 14 17:42:28 [3445] bzvairsvr  attrd:   notice:
>>> attrd_perform_update:Sent update master-drbdserv=1 failed:
>>> Transport endpoint is not connected
>>> Oct 14 17:42:28 [3445] bzvairsvr  attrd:   notice:
>>> attrd_perform_update:Sent update -107: master-drbdserv=1
>>> Oct 14 17:42:28 [3445] bzvairsvr  attrd:  warning:
>>> attrd_cib_callback:  Update master-drbdserv=1 failed: Transport
>>> endpoint is not connected
>>> Oct 14 17:42:59 [3445] bzvairsvr  attrd:   notice:
>>> attrd_trigger_update:Sending flush op to all hosts for:
>>> master-drbdserv (1)
>>> Oct 14 17:42:59 [3445] bzvairsvr  attrd:   notice:
>>> attrd_perform_update:Sent update master-drbdserv=1 failed:
>>> Transport endpoint is not connected
>>> Oct 14 17:42:59 [3445] bzvairsvr  attrd:   notice:
>>> attrd_perform_update:Sent update -107: master-drbdserv=1
>>> Oct 14 17:42:59 [3445] bzvairsvr  

Re: [Pacemaker] Trouble with drbd/pacemaker: switch to secondary/secondary

2016-10-18 Thread Anne Nicolas
2016-10-18 9:56 GMT+02:00 Vlad :
> Is something wrong with the network interface?
>
> [34114.046443] bnx2x :05:00.0 enp5s0f0: NIC Link is Down
> [34185.719207] bnx2x :05:00.0 enp5s0f0: NIC Link is Up, 1 Mbps
> full duplex, Flow control: ON - receive & transmit
> [34232.241599] bnx2x :05:00.0 enp5s0f0: NIC Link is Down
> [34268.637861] bnx2x :05:00.0 enp5s0f0: NIC Link is Up, 1 Mbps
> full duplex, Flow control: ON - receive & transmit

I don't think so. This interface is part of the cluster resource and
up on master only. So it seems this is due to resource restart rather.

>
>
> On 14/10/16 17:54, Anne Nicolas wrote:
>> Hi!
>>
>> I'm having trouble with a 2 nodes cluster used for DRBD / Apache / Samba
>> and some other services.
>>
>> Whatever I do, it always goes to the following state:
>>
>> Last updated: Fri Oct 14 17:41:38 2016
>> Last change: Thu Oct 13 10:42:29 2016 via cibadmin on bzvairsvr
>> Stack: corosync
>> Current DC: bzvairsvr (168430081) - partition with quorum
>> Version: 1.1.8-9.mga5-394e906
>> 2 Nodes configured, unknown expected votes
>> 13 Resources configured.
>>
>>
>> Online: [ bzvairsvr bzvairsvr2 ]
>>
>>  Master/Slave Set: drbdservClone [drbdserv]
>>  Slaves: [ bzvairsvr bzvairsvr2 ]
>>  Clone Set: fencing [st-ssh]
>>  Started: [ bzvairsvr bzvairsvr2 ]
>>
>> When I reboot bzvairsvr2 this one goes primary again. But after a while
>> becomes secondary also.
>> I use a very basic fencing system based on ssh. It's not optimal but
>> enough for the current tests.
>>
>> Here are information about the configuration:
>>
>> node 168430081: bzvairsvr
>> node 168430082: bzvairsvr2
>> primitive apache apache \
>> params configfile="/etc/httpd/conf/httpd.conf" \
>> op start interval=0 timeout=120s \
>> op stop interval=0 timeout=120s
>> primitive clusterip IPaddr2 \
>> params ip=192.168.100.1 cidr_netmask=24 nic=eno1 \
>> meta target-role=Started
>> primitive clusterroute Route \
>> params destination="0.0.0.0/0" gateway=192.168.100.254
>> primitive drbdserv ocf:linbit:drbd \
>> params drbd_resource=server \
>> op monitor interval=30s role=Slave \
>> op monitor interval=29s role=Master start-delay=30s
>> primitive fsserv Filesystem \
>> params device="/dev/drbd/by-res/server" directory="/Server"
>> fstype=ext4 \
>> op start interval=0 timeout=60s \
>> op stop interval=0 timeout=60s \
>> meta target-role=Started
>> primitive libvirt-guests systemd:libvirt-guests
>> primitive libvirtd systemd:libvirtd
>> primitive mysql systemd:mysqld
>> primitive named systemd:named
>> primitive samba systemd:smb
>> primitive st-ssh stonith:external/ssh \
>> params hostlist="bzvairsvr bzvairsvr2"
>> group iphd clusterip clusterroute \
>> meta target-role=Started
>> group services libvirtd libvirt-guests apache named mysql samba \
>> meta target-role=Started
>> ms drbdservClone drbdserv \
>> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
>> notify=true target-role=Started
>> clone fencing st-ssh
>> colocation fs_on_drbd inf: fsserv drbdservClone:Master
>> colocation iphd_on_services inf: iphd services
>> colocation services_on_fsserv inf: services fsserv
>> order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start
>> order services_after_fsserv inf: fsserv services
>> property cib-bootstrap-options: \
>> dc-version=1.1.8-9.mga5-394e906 \
>> cluster-infrastructure=corosync \
>> no-quorum-policy=ignore \
>> stonith-enabled=true \
>>
>> cluster logs are flooded by :
>> Oct 14 17:42:28 [3445] bzvairsvr  attrd:   notice:
>> attrd_trigger_update:Sending flush op to all hosts for:
>> master-drbdserv (1)
>> Oct 14 17:42:28 [3445] bzvairsvr  attrd:   notice:
>> attrd_perform_update:Sent update master-drbdserv=1 failed:
>> Transport endpoint is not connected
>> Oct 14 17:42:28 [3445] bzvairsvr  attrd:   notice:
>> attrd_perform_update:Sent update -107: master-drbdserv=1
>> Oct 14 17:42:28 [3445] bzvairsvr  attrd:  warning:
>> attrd_cib_callback:  Update master-drbdserv=1 failed: Transport
>> endpoint is not connected
>> Oct 14 17:42:59 [3445] bzvairsvr  attrd:   notice:
>> attrd_trigger_update:Sending flush op to all hosts for:
>> master-drbdserv (1)
>> Oct 14 17:42:59 [3445] bzvairsvr  attrd:   notice:
>> attrd_perform_update:Sent update master-drbdserv=1 failed:
>> Transport endpoint is not connected
>> Oct 14 17:42:59 [3445] bzvairsvr  attrd:   notice:
>> attrd_perform_update:Sent update -107: master-drbdserv=1
>> Oct 14 17:42:59 [3445] bzvairsvr  attrd:  warning:
>> attrd_cib_callback:  Update master-drbdserv=1 failed: Transport
>> endpoint is not connected
>>
>>
>> And here is dmesg
>>
>> [34067.547147] block drbd0: peer( Secondary -> Primary )
>> [34091.023206] block drbd0: peer( 

Re: [Pacemaker] Trouble with drbd/pacemaker: switch to secondary/secondary

2016-10-18 Thread Vlad
Is something wrong with the network interface?

[34114.046443] bnx2x :05:00.0 enp5s0f0: NIC Link is Down
[34185.719207] bnx2x :05:00.0 enp5s0f0: NIC Link is Up, 1 Mbps
full duplex, Flow control: ON - receive & transmit
[34232.241599] bnx2x :05:00.0 enp5s0f0: NIC Link is Down
[34268.637861] bnx2x :05:00.0 enp5s0f0: NIC Link is Up, 1 Mbps
full duplex, Flow control: ON - receive & transmit


On 14/10/16 17:54, Anne Nicolas wrote:
> Hi!
>
> I'm having trouble with a 2 nodes cluster used for DRBD / Apache / Samba
> and some other services.
>
> Whatever I do, it always goes to the following state:
>
> Last updated: Fri Oct 14 17:41:38 2016
> Last change: Thu Oct 13 10:42:29 2016 via cibadmin on bzvairsvr
> Stack: corosync
> Current DC: bzvairsvr (168430081) - partition with quorum
> Version: 1.1.8-9.mga5-394e906
> 2 Nodes configured, unknown expected votes
> 13 Resources configured.
>
>
> Online: [ bzvairsvr bzvairsvr2 ]
>
>  Master/Slave Set: drbdservClone [drbdserv]
>  Slaves: [ bzvairsvr bzvairsvr2 ]
>  Clone Set: fencing [st-ssh]
>  Started: [ bzvairsvr bzvairsvr2 ]
>
> When I reboot bzvairsvr2 this one goes primary again. But after a while
> becomes secondary also.
> I use a very basic fencing system based on ssh. It's not optimal but
> enough for the current tests.
>
> Here are information about the configuration:
>
> node 168430081: bzvairsvr
> node 168430082: bzvairsvr2
> primitive apache apache \
> params configfile="/etc/httpd/conf/httpd.conf" \
> op start interval=0 timeout=120s \
> op stop interval=0 timeout=120s
> primitive clusterip IPaddr2 \
> params ip=192.168.100.1 cidr_netmask=24 nic=eno1 \
> meta target-role=Started
> primitive clusterroute Route \
> params destination="0.0.0.0/0" gateway=192.168.100.254
> primitive drbdserv ocf:linbit:drbd \
> params drbd_resource=server \
> op monitor interval=30s role=Slave \
> op monitor interval=29s role=Master start-delay=30s
> primitive fsserv Filesystem \
> params device="/dev/drbd/by-res/server" directory="/Server"
> fstype=ext4 \
> op start interval=0 timeout=60s \
> op stop interval=0 timeout=60s \
> meta target-role=Started
> primitive libvirt-guests systemd:libvirt-guests
> primitive libvirtd systemd:libvirtd
> primitive mysql systemd:mysqld
> primitive named systemd:named
> primitive samba systemd:smb
> primitive st-ssh stonith:external/ssh \
> params hostlist="bzvairsvr bzvairsvr2"
> group iphd clusterip clusterroute \
> meta target-role=Started
> group services libvirtd libvirt-guests apache named mysql samba \
> meta target-role=Started
> ms drbdservClone drbdserv \
> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true target-role=Started
> clone fencing st-ssh
> colocation fs_on_drbd inf: fsserv drbdservClone:Master
> colocation iphd_on_services inf: iphd services
> colocation services_on_fsserv inf: services fsserv
> order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start
> order services_after_fsserv inf: fsserv services
> property cib-bootstrap-options: \
> dc-version=1.1.8-9.mga5-394e906 \
> cluster-infrastructure=corosync \
> no-quorum-policy=ignore \
> stonith-enabled=true \
>
> cluster logs are flooded by :
> Oct 14 17:42:28 [3445] bzvairsvr  attrd:   notice:
> attrd_trigger_update:Sending flush op to all hosts for:
> master-drbdserv (1)
> Oct 14 17:42:28 [3445] bzvairsvr  attrd:   notice:
> attrd_perform_update:Sent update master-drbdserv=1 failed:
> Transport endpoint is not connected
> Oct 14 17:42:28 [3445] bzvairsvr  attrd:   notice:
> attrd_perform_update:Sent update -107: master-drbdserv=1
> Oct 14 17:42:28 [3445] bzvairsvr  attrd:  warning:
> attrd_cib_callback:  Update master-drbdserv=1 failed: Transport
> endpoint is not connected
> Oct 14 17:42:59 [3445] bzvairsvr  attrd:   notice:
> attrd_trigger_update:Sending flush op to all hosts for:
> master-drbdserv (1)
> Oct 14 17:42:59 [3445] bzvairsvr  attrd:   notice:
> attrd_perform_update:Sent update master-drbdserv=1 failed:
> Transport endpoint is not connected
> Oct 14 17:42:59 [3445] bzvairsvr  attrd:   notice:
> attrd_perform_update:Sent update -107: master-drbdserv=1
> Oct 14 17:42:59 [3445] bzvairsvr  attrd:  warning:
> attrd_cib_callback:  Update master-drbdserv=1 failed: Transport
> endpoint is not connected
>
>
> And here is dmesg
>
> [34067.547147] block drbd0: peer( Secondary -> Primary )
> [34091.023206] block drbd0: peer( Primary -> Secondary )
> [34096.616319] drbd server: peer( Secondary -> Unknown ) conn( Connected
> -> TearDown ) pdsk( UpToDate -> DUnknown )
> [34096.616353] drbd server: asender terminated
> [34096.616358] drbd server: Terminating drbd_a_server
> [34096.682874] drbd server: Connection closed
> [34096.682894] drbd 

Re: [Pacemaker] Trouble with drbd/pacemaker: switch to secondary/secondary

2016-10-17 Thread Anne Nicolas


Le 17/10/2016 à 11:42, Kristoffer Grönlund a écrit :
> Anne Nicolas  writes:
> 
>> Oct 14 17:42:59 [3445] bzvairsvr  attrd:  warning:
>> attrd_cib_callback:  Update master-drbdserv=1 failed: Transport
>> endpoint is not connected
> 
> Hi Anne,
> 
> Wild guess: One or more ports is being blocked on at least one of the
> nodes, probably by a firewall.
> 
> Here's the list of basic ports that need to be open:
> 
> TCP ports 2224, 3121, and 21064, and UDP port 5405.

Well to make things easier, this test platform does not have any active
firewall :/

> 
> Cheers,
> Kristoffer
> 

-- 
Anne Nicolas
http://mageia.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Trouble with drbd/pacemaker: switch to secondary/secondary

2016-10-14 Thread Anne Nicolas
Hi!

I'm having trouble with a 2 nodes cluster used for DRBD / Apache / Samba
and some other services.

Whatever I do, it always goes to the following state:

Last updated: Fri Oct 14 17:41:38 2016
Last change: Thu Oct 13 10:42:29 2016 via cibadmin on bzvairsvr
Stack: corosync
Current DC: bzvairsvr (168430081) - partition with quorum
Version: 1.1.8-9.mga5-394e906
2 Nodes configured, unknown expected votes
13 Resources configured.


Online: [ bzvairsvr bzvairsvr2 ]

 Master/Slave Set: drbdservClone [drbdserv]
 Slaves: [ bzvairsvr bzvairsvr2 ]
 Clone Set: fencing [st-ssh]
 Started: [ bzvairsvr bzvairsvr2 ]

When I reboot bzvairsvr2 this one goes primary again. But after a while
becomes secondary also.
I use a very basic fencing system based on ssh. It's not optimal but
enough for the current tests.

Here are information about the configuration:

node 168430081: bzvairsvr
node 168430082: bzvairsvr2
primitive apache apache \
params configfile="/etc/httpd/conf/httpd.conf" \
op start interval=0 timeout=120s \
op stop interval=0 timeout=120s
primitive clusterip IPaddr2 \
params ip=192.168.100.1 cidr_netmask=24 nic=eno1 \
meta target-role=Started
primitive clusterroute Route \
params destination="0.0.0.0/0" gateway=192.168.100.254
primitive drbdserv ocf:linbit:drbd \
params drbd_resource=server \
op monitor interval=30s role=Slave \
op monitor interval=29s role=Master start-delay=30s
primitive fsserv Filesystem \
params device="/dev/drbd/by-res/server" directory="/Server"
fstype=ext4 \
op start interval=0 timeout=60s \
op stop interval=0 timeout=60s \
meta target-role=Started
primitive libvirt-guests systemd:libvirt-guests
primitive libvirtd systemd:libvirtd
primitive mysql systemd:mysqld
primitive named systemd:named
primitive samba systemd:smb
primitive st-ssh stonith:external/ssh \
params hostlist="bzvairsvr bzvairsvr2"
group iphd clusterip clusterroute \
meta target-role=Started
group services libvirtd libvirt-guests apache named mysql samba \
meta target-role=Started
ms drbdservClone drbdserv \
meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true target-role=Started
clone fencing st-ssh
colocation fs_on_drbd inf: fsserv drbdservClone:Master
colocation iphd_on_services inf: iphd services
colocation services_on_fsserv inf: services fsserv
order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start
order services_after_fsserv inf: fsserv services
property cib-bootstrap-options: \
dc-version=1.1.8-9.mga5-394e906 \
cluster-infrastructure=corosync \
no-quorum-policy=ignore \
stonith-enabled=true \

cluster logs are flooded by :
Oct 14 17:42:28 [3445] bzvairsvr  attrd:   notice:
attrd_trigger_update:Sending flush op to all hosts for:
master-drbdserv (1)
Oct 14 17:42:28 [3445] bzvairsvr  attrd:   notice:
attrd_perform_update:Sent update master-drbdserv=1 failed:
Transport endpoint is not connected
Oct 14 17:42:28 [3445] bzvairsvr  attrd:   notice:
attrd_perform_update:Sent update -107: master-drbdserv=1
Oct 14 17:42:28 [3445] bzvairsvr  attrd:  warning:
attrd_cib_callback:  Update master-drbdserv=1 failed: Transport
endpoint is not connected
Oct 14 17:42:59 [3445] bzvairsvr  attrd:   notice:
attrd_trigger_update:Sending flush op to all hosts for:
master-drbdserv (1)
Oct 14 17:42:59 [3445] bzvairsvr  attrd:   notice:
attrd_perform_update:Sent update master-drbdserv=1 failed:
Transport endpoint is not connected
Oct 14 17:42:59 [3445] bzvairsvr  attrd:   notice:
attrd_perform_update:Sent update -107: master-drbdserv=1
Oct 14 17:42:59 [3445] bzvairsvr  attrd:  warning:
attrd_cib_callback:  Update master-drbdserv=1 failed: Transport
endpoint is not connected


And here is dmesg

[34067.547147] block drbd0: peer( Secondary -> Primary )
[34091.023206] block drbd0: peer( Primary -> Secondary )
[34096.616319] drbd server: peer( Secondary -> Unknown ) conn( Connected
-> TearDown ) pdsk( UpToDate -> DUnknown )
[34096.616353] drbd server: asender terminated
[34096.616358] drbd server: Terminating drbd_a_server
[34096.682874] drbd server: Connection closed
[34096.682894] drbd server: conn( TearDown -> Unconnected )
[34096.682897] drbd server: receiver terminated
[34096.682900] drbd server: Restarting receiver thread
[34096.682902] drbd server: receiver (re)started
[34096.682915] drbd server: conn( Unconnected -> WFConnection )
[34103.311898] drbd server: Handshake successful: Agreed network
protocol version 101
[34103.311903] drbd server: Agreed to support TRIM on protocol level
[34103.311997] drbd server: Peer authenticated using 20 bytes HMAC
[34103.312046] drbd server: conn( WFConnection -> WFReportParams )
[34103.312062] drbd server: Starting asender thread (from drbd_r_server
[4344])
[34103.380311] block drbd0: