Re: [DRBD-user] DRBD constantly re-syncing, getting to 100%, starting over. What?

2016-10-18 Thread Lars Ellenberg
On Fri, Oct 14, 2016 at 07:07:55AM +, Eric Robinson wrote:
> > > > Oct 12 06:56:11 ha14a kernel: block drbd1: Began resync as SyncTarget
> > (will sync 0 KB [0 bits set]).
> > > > Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: PingAck did not arrive 
> > > > in
> > time.
> > > > Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: peer( Primary ->
> > > > Unknown ) conn( SyncTarget -> NetworkFailure ) pdsk( UpToDate ->
> > > > DUnknown )
> > 
> > has been said before:
> > DRBD ping timeout is apparently too short for the latency in your setup.
> > increase it appropriately.
> > 
> > Where latency in this case involves network rtt plus kernel thread 
> > scheduling
> > plus maybe additional synchronous (flush/fua) IO plus whatever else DRBD
> > feels is necessary for a full DRBD to DRBD round-trip.
> > 
> > > > However, I can guarantee that the network connection is solid.
> > > > Running ping flood, I get 30,000 packets sent with no loss or
> > > > latency.
> > 
> > Mind telling us the network characteristics?  IO backend?
> > Virtualized?  Distribution? Kernel and DRBD version(s)?
> > 
> 
> We have a dozen other DRBD clusters and this has never happened to any
> of the others over the past decade or so, and they are all on the same
> switched network. The nodes are in different data centers 22 miles
> apart connected by gigabit fiber. Latency is always sub -millisecond.
> See the following ping test...
> 
> [root@ha14a ~]# ping -f ha14b-cl
> PING ha14b-cl.mycharts.md (198.51.100.43) 56(84) bytes of data.
> .^C
> --- ha14b-cl.mycharts.md ping statistics ---
> 23433 packets transmitted, 23432 received, 0% packet loss, time 15911ms
> rtt min/avg/max/mdev = 0.585/0.659/0.847/0.021 ms, ipg/ewma 0.679/0.658 ms
> 
> The servers are all physical, running RHEL 6.3 kernel 2.6.32-279.el6.x86_64. 
> SSD drives.
> 
> DRBD version is 8.4.3


So, did you try to increase the ping timeout setting?
Did it help?

Did you try to upgrade to DRBD 8.4.8?
Did that help?


-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD constantly re-syncing, getting to 100%, starting over. What?

2016-10-14 Thread Eric Robinson
> -Original Message-
> From: Viktor Villafuerte [mailto:viktor.villafue...@optusnet.com.au]
> Sent: Wednesday, October 12, 2016 3:19 PM
> To: Eric Robinson 
> Cc: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] DRBD constantly re-syncing, getting to 100%,
> starting over. What?
> 
> Hi Eric,
> 
> I've had the pleasure to deal with this exact issue, and in prod too :O
> 
> 
> On Wed 12 Oct 2016 14:04:48, Eric Robinson wrote:
> > This morning we are seeing an issue where drbd is repeatedly resyncing,
> getting to 100%, and starting over, and never getting to an
> UpToDate/UpToDate state.
> >
> > On one node, it is logging this sequence over and over…
> >
> > 
> >
> > Oct 12 06:56:11 ha14a kernel: d-con ha02_mysql: Starting asender
> > thread (from drbd_r_ha02_mys [804]) Oct 12 06:56:11 ha14a kernel: block
> drbd1: drbd_sync_handshake:
> > Oct 12 06:56:11 ha14a kernel: block drbd1: self
> >
> 13FB9B08BF812C5A::4B9700420A3698D8:4B9600420A3698D
> 9
> > bits:0 flags:0 Oct 12 06:56:11 ha14a kernel: block drbd1: peer
> >
> 38E17129E5821B5F:13FB9B08BF812C5B:13FA9B08BF812C5B:13F99B08BF812C5
> B bits:0 flags:0 Oct 12 06:56:11 ha14a kernel: block drbd1: uuid_compare()=-1
> by rule 50 Oct 12 06:56:11 ha14a kernel: block drbd1: Becoming sync target
> due to disk states.
> > Oct 12 06:56:11 ha14a kernel: block drbd1: peer( Unknown -> Primary )
> > conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
> Oct
> > 12 06:56:11 ha14a kernel: block drbd1: receive bitmap stats
> > [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> > Oct 12 06:56:11 ha14a kernel: block drbd1: send bitmap stats
> > [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> > Oct 12 06:56:11 ha14a kernel: block drbd1: conn( WFBitMapT ->
> > WFSyncUUID ) Oct 12 06:56:11 ha14a kernel: block drbd1: updated sync
> > uuid
> >
> 13FC9B08BF812C5A::4B9700420A3698D8:4B9600420A3698D
> 9
> > Oct 12 06:56:11 ha14a kernel: block drbd1: helper command:
> > /sbin/drbdadm before-resync-target minor-1 Oct 12 06:56:11 ha14a
> > kernel: block drbd1: helper command: /sbin/drbdadm
> > before-resync-target minor-1 exit code 0 (0x0) Oct 12 06:56:11 ha14a
> kernel: block drbd1: conn( WFSyncUUID -> SyncTarget ) Oct 12 06:56:11 ha14a
> kernel: block drbd1: Began resync as SyncTarget (will sync 0 KB [0 bits set]).
> 
> The two lines below are the important lines, where DRBD assumes network
> failure due PingAck not arrving in time.
> 
> > Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: PingAck did not arrive in
> time.
> > Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: peer( Primary ->
> > Unknown ) conn( SyncTarget -> NetworkFailure ) pdsk( UpToDate ->
> > DUnknown )
> 
> You need to increase the timeout withing which PingAck is expected.
> 
> drbdadm net-options -v --ping-timeout=10 drbd0
> 
> this is the command I used. The --ping-timeout is in 10th of a second so value
> of '10' is actually 1s. Please confirm this in documentation as the version of
> DRBD I run this on was 8.x
> 
> Also you may need to tweak the timeout a bit..
> 
> 
> Hope this helps
> 


Thanks much for the input. It was an emergency, so I rebooted the node. After 
it came back up, all was well. The cluster has been fine for months and now it 
is apparently fine again. I can't imagine why it would suddenly behave this 
way. 


> v
> 
> 
> > Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: asender terminated Oct
> > 12 06:56:12 ha14a kernel: d-con ha02_mysql: Terminating
> > drbd_a_ha02_mys Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql:
> > Connection closed Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql:
> > conn( NetworkFailure -> Unconnected ) Oct 12 06:56:12 ha14a kernel:
> > d-con ha02_mysql: receiver terminated Oct 12 06:56:12 ha14a kernel:
> > d-con ha02_mysql: Restarting receiver thread Oct 12 06:56:12 ha14a
> > kernel: d-con ha02_mysql: receiver (re)started Oct 12 06:56:12 ha14a
> > kernel: d-con ha02_mysql: conn( Unconnected -> WFConnection ) Oct 12
> > 06:56:12 ha14a kernel: d-con ha02_mysql: Handshake successful: Agreed
> > network protocol version 101 Oct 12 06:56:12 ha14a kernel: d-con
> > ha02_mysql: Peer authenticated using 20 bytes HMAC Oct 12 06:56:12
> > ha14a kernel: d-con ha02_mysql: conn( WFConnection -> WFReportParams
> )
> >
> > 
> >
> > On the other node, it is saying this over and over…
> >
> > 
> >
> > Oct 12 06:58:51 ha14b kernel: block drbd1: drbd_sync_handshake:
> >

Re: [DRBD-user] DRBD constantly re-syncing, getting to 100%, starting over. What?

2016-10-14 Thread Eric Robinson
> -Original Message-
> From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
> boun...@lists.linbit.com] On Behalf Of Lars Ellenberg
> Sent: Wednesday, October 12, 2016 11:49 PM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] DRBD constantly re-syncing, getting to 100%,
> starting over. What?
> 
> On Wed, Oct 12, 2016 at 04:35:58PM +0200, Jan Schermer wrote:
> > Short in the dark - are the drives (or their controller if you're
> > using raid) using any form of caching? It is conceivable that when
> > resync is finished it tries flushing the data to the device, and if
> > this takes way to long it could lead to timeout of the drbd kernel
> > thread.
> >
> > Is IO happening on those drives when they are resyncing?
> > Try running something like "sync ; sleep 1 ; sync" on the Inconsistent
> > node when it's resyncing (I hope that won't kill your IO)
> 
> sync only affects stuff in the linux (buffer/) page cache, DRBD sits below 
> that.
> "no effect" on DRBD IO.
> 
> > > Oct 12 06:56:11 ha14a kernel: block drbd1: Began resync as SyncTarget
> (will sync 0 KB [0 bits set]).
> > > Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: PingAck did not arrive in
> time.
> > > Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: peer( Primary ->
> > > Unknown ) conn( SyncTarget -> NetworkFailure ) pdsk( UpToDate ->
> > > DUnknown )
> 
> has been said before:
> DRBD ping timeout is apparently too short for the latency in your setup.
> increase it appropriately.
> 
> Where latency in this case involves network rtt plus kernel thread scheduling
> plus maybe additional synchronous (flush/fua) IO plus whatever else DRBD
> feels is necessary for a full DRBD to DRBD round-trip.
> 
> > > However, I can guarantee that the network connection is solid.
> > > Running ping flood, I get 30,000 packets sent with no loss or
> > > latency.
> 
> Mind telling us the network characteristics?  IO backend?
> Virtualized?  Distribution? Kernel and DRBD version(s)?
> 

We have a dozen other DRBD clusters and this has never happened to any of the 
others over the past decade or so, and they are all on the same switched 
network. The nodes are in different data centers 22 miles apart connected by 
gigabit fiber. Latency is always sub -millisecond. See the following ping 
test...

[root@ha14a ~]# ping -f ha14b-cl
PING ha14b-cl.mycharts.md (198.51.100.43) 56(84) bytes of data.
.^C
--- ha14b-cl.mycharts.md ping statistics ---
23433 packets transmitted, 23432 received, 0% packet loss, time 15911ms
rtt min/avg/max/mdev = 0.585/0.659/0.847/0.021 ms, ipg/ewma 0.679/0.658 ms

The servers are all physical, running RHEL 6.3 kernel 2.6.32-279.el6.x86_64. 
SSD drives.

DRBD version is 8.4.3


> --
> : Lars Ellenberg
> : LINBIT | Keeping the Digital World Running
> : DRBD -- Heartbeat -- Corosync -- Pacemaker
> 
> DRBD(r) and LINBIT(r) are registered trademarks of LINBIT __ please don't Cc
> me, but send to list -- I'm subscribed
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD constantly re-syncing, getting to 100%, starting over. What?

2016-10-12 Thread Lars Ellenberg
On Wed, Oct 12, 2016 at 04:35:58PM +0200, Jan Schermer wrote:
> Short in the dark - are the drives (or their controller if you're
> using raid) using any form of caching? It is conceivable that when
> resync is finished it tries flushing the data to the device, and if
> this takes way to long it could lead to timeout of the drbd kernel
> thread.
>
> Is IO happening on those drives when they are resyncing?
> Try running something like "sync ; sleep 1 ; sync" on the Inconsistent
> node when it's resyncing (I hope that won't kill your IO)

sync only affects stuff in the linux (buffer/) page cache,
DRBD sits below that.
"no effect" on DRBD IO.

> > Oct 12 06:56:11 ha14a kernel: block drbd1: Began resync as SyncTarget (will 
> > sync 0 KB [0 bits set]).
> > Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: PingAck did not arrive in 
> > time.
> > Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: peer( Primary -> Unknown ) 
> > conn( SyncTarget -> NetworkFailure ) pdsk( UpToDate -> DUnknown )

has been said before:
DRBD ping timeout is apparently too short for the latency in your setup.
increase it appropriately.

Where latency in this case involves network rtt plus kernel thread
scheduling plus maybe additional synchronous (flush/fua) IO plus
whatever else DRBD feels is necessary for a full DRBD to DRBD round-trip.

> > However, I can guarantee that the network connection is solid.
> > Running ping flood, I get 30,000 packets sent with no loss or
> > latency.

Mind telling us the network characteristics?  IO backend?
Virtualized?  Distribution? Kernel and DRBD version(s)?

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD constantly re-syncing, getting to 100%, starting over. What?

2016-10-12 Thread Viktor Villafuerte
Hi Eric,

I've had the pleasure to deal with this exact issue, and in prod too :O


On Wed 12 Oct 2016 14:04:48, Eric Robinson wrote:
> This morning we are seeing an issue where drbd is repeatedly resyncing, 
> getting to 100%, and starting over, and never getting to an UpToDate/UpToDate 
> state.
> 
> On one node, it is logging this sequence over and over…
> 
> 
> 
> Oct 12 06:56:11 ha14a kernel: d-con ha02_mysql: Starting asender thread (from 
> drbd_r_ha02_mys [804])
> Oct 12 06:56:11 ha14a kernel: block drbd1: drbd_sync_handshake:
> Oct 12 06:56:11 ha14a kernel: block drbd1: self 
> 13FB9B08BF812C5A::4B9700420A3698D8:4B9600420A3698D9 bits:0 
> flags:0
> Oct 12 06:56:11 ha14a kernel: block drbd1: peer 
> 38E17129E5821B5F:13FB9B08BF812C5B:13FA9B08BF812C5B:13F99B08BF812C5B bits:0 
> flags:0
> Oct 12 06:56:11 ha14a kernel: block drbd1: uuid_compare()=-1 by rule 50
> Oct 12 06:56:11 ha14a kernel: block drbd1: Becoming sync target due to disk 
> states.
> Oct 12 06:56:11 ha14a kernel: block drbd1: peer( Unknown -> Primary ) conn( 
> WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
> Oct 12 06:56:11 ha14a kernel: block drbd1: receive bitmap stats 
> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Oct 12 06:56:11 ha14a kernel: block drbd1: send bitmap stats 
> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Oct 12 06:56:11 ha14a kernel: block drbd1: conn( WFBitMapT -> WFSyncUUID )
> Oct 12 06:56:11 ha14a kernel: block drbd1: updated sync uuid 
> 13FC9B08BF812C5A::4B9700420A3698D8:4B9600420A3698D9
> Oct 12 06:56:11 ha14a kernel: block drbd1: helper command: /sbin/drbdadm 
> before-resync-target minor-1
> Oct 12 06:56:11 ha14a kernel: block drbd1: helper command: /sbin/drbdadm 
> before-resync-target minor-1 exit code 0 (0x0)
> Oct 12 06:56:11 ha14a kernel: block drbd1: conn( WFSyncUUID -> SyncTarget )
> Oct 12 06:56:11 ha14a kernel: block drbd1: Began resync as SyncTarget (will 
> sync 0 KB [0 bits set]).

The two lines below are the important lines, where DRBD assumes network
failure due PingAck not arrving in time.

> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: PingAck did not arrive in 
> time.
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: peer( Primary -> Unknown ) 
> conn( SyncTarget -> NetworkFailure ) pdsk( UpToDate -> DUnknown )

You need to increase the timeout withing which PingAck is expected.

drbdadm net-options -v --ping-timeout=10 drbd0

this is the command I used. The --ping-timeout is in 10th of a second so
value of '10' is actually 1s. Please confirm this in documentation as
the version of DRBD I run this on was 8.x

Also you may need to tweak the timeout a bit..


Hope this helps

v


> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: asender terminated
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: Terminating drbd_a_ha02_mys
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: Connection closed
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: conn( NetworkFailure -> 
> Unconnected )
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: receiver terminated
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: Restarting receiver thread
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: receiver (re)started
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: conn( Unconnected -> 
> WFConnection )
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: Handshake successful: Agreed 
> network protocol version 101
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: Peer authenticated using 20 
> bytes HMAC
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: conn( WFConnection -> 
> WFReportParams )
> 
> 
> 
> On the other node, it is saying this over and over…
> 
> 
> 
> Oct 12 06:58:51 ha14b kernel: block drbd1: drbd_sync_handshake:
> Oct 12 06:58:51 ha14b kernel: block drbd1: self 
> 38E17129E5821B5F:148D9B08BF812C5B:148C9B08BF812C5B:148B9B08BF812C5B bits:0 
> flags:0
> Oct 12 06:58:51 ha14b kernel: block drbd1: peer 
> 148D9B08BF812C5A::4B9700420A3698D8:4B9600420A3698D9 bits:0 
> flags:0
> Oct 12 06:58:51 ha14b kernel: block drbd1: uuid_compare()=1 by rule 70
> Oct 12 06:58:51 ha14b kernel: block drbd1: Becoming sync source due to disk 
> states.
> Oct 12 06:58:51 ha14b kernel: block drbd1: peer( Unknown -> Secondary ) conn( 
> WFReportParams -> WFBitMapS )
> Oct 12 06:58:51 ha14b kernel: block drbd1: send bitmap stats 
> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Oct 12 06:58:51 ha14b kernel: block drbd1: receive bitmap stats 
> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Oct 12 06:58:51 ha14b kernel: block drbd1: helper command: /sbin/drbdadm 
> before-resync-source minor-1
> Oct 12 06:58:51 ha14b kernel: block drbd1: helper command: /sbin/drbdadm 
> before-resync-source minor-1 exit code 0 (0x0)
> Oct 12 06:58:51 ha14b kernel: block drbd1: conn( WFBitMapS -> SyncSource )
> Oct 12 06:58:51 ha14b kernel: block drbd1: Began resync as SyncSou

Re: [DRBD-user] DRBD constantly re-syncing, getting to 100%, starting over. What?

2016-10-12 Thread Jan Schermer
Short in the dark - are the drives (or their controller if you're using raid) 
using any form of caching? It is conceivable that when resync is finished it 
tries flushing the data to the device, and if this takes way to long it 
could lead to timeout of the drbd kernel thread.
Is IO happening on those drives when they are resyncing?
Try running something like "sync ; sleep 1 ; sync" on the Inconsistent node 
when it's resyncing (I hope that won't kill your IO)

But that's really just a guess.

Jan

> On 12 Oct 2016, at 16:04, Eric Robinson  wrote:
> 
> This morning we are seeing an issue where drbd is repeatedly resyncing, 
> getting to 100%, and starting over, and never getting to an UpToDate/UpToDate 
> state.
>  
> On one node, it is logging this sequence over and over…
>  
> 
>  
> Oct 12 06:56:11 ha14a kernel: d-con ha02_mysql: Starting asender thread (from 
> drbd_r_ha02_mys [804])
> Oct 12 06:56:11 ha14a kernel: block drbd1: drbd_sync_handshake:
> Oct 12 06:56:11 ha14a kernel: block drbd1: self 
> 13FB9B08BF812C5A::4B9700420A3698D8:4B9600420A3698D9 bits:0 
> flags:0
> Oct 12 06:56:11 ha14a kernel: block drbd1: peer 
> 38E17129E5821B5F:13FB9B08BF812C5B:13FA9B08BF812C5B:13F99B08BF812C5B bits:0 
> flags:0
> Oct 12 06:56:11 ha14a kernel: block drbd1: uuid_compare()=-1 by rule 50
> Oct 12 06:56:11 ha14a kernel: block drbd1: Becoming sync target due to disk 
> states.
> Oct 12 06:56:11 ha14a kernel: block drbd1: peer( Unknown -> Primary ) conn( 
> WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
> Oct 12 06:56:11 ha14a kernel: block drbd1: receive bitmap stats 
> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Oct 12 06:56:11 ha14a kernel: block drbd1: send bitmap stats 
> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Oct 12 06:56:11 ha14a kernel: block drbd1: conn( WFBitMapT -> WFSyncUUID )
> Oct 12 06:56:11 ha14a kernel: block drbd1: updated sync uuid 
> 13FC9B08BF812C5A::4B9700420A3698D8:4B9600420A3698D9
> Oct 12 06:56:11 ha14a kernel: block drbd1: helper command: /sbin/drbdadm 
> before-resync-target minor-1
> Oct 12 06:56:11 ha14a kernel: block drbd1: helper command: /sbin/drbdadm 
> before-resync-target minor-1 exit code 0 (0x0)
> Oct 12 06:56:11 ha14a kernel: block drbd1: conn( WFSyncUUID -> SyncTarget )
> Oct 12 06:56:11 ha14a kernel: block drbd1: Began resync as SyncTarget (will 
> sync 0 KB [0 bits set]).
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: PingAck did not arrive in 
> time.
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: peer( Primary -> Unknown ) 
> conn( SyncTarget -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: asender terminated
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: Terminating drbd_a_ha02_mys
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: Connection closed
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: conn( NetworkFailure -> 
> Unconnected )
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: receiver terminated
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: Restarting receiver thread
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: receiver (re)started
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: conn( Unconnected -> 
> WFConnection )
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: Handshake successful: Agreed 
> network protocol version 101
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: Peer authenticated using 20 
> bytes HMAC
> Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: conn( WFConnection -> 
> WFReportParams )
>  
> 
>  
> On the other node, it is saying this over and over…
>  
> 
>  
> Oct 12 06:58:51 ha14b kernel: block drbd1: drbd_sync_handshake:
> Oct 12 06:58:51 ha14b kernel: block drbd1: self 
> 38E17129E5821B5F:148D9B08BF812C5B:148C9B08BF812C5B:148B9B08BF812C5B bits:0 
> flags:0
> Oct 12 06:58:51 ha14b kernel: block drbd1: peer 
> 148D9B08BF812C5A::4B9700420A3698D8:4B9600420A3698D9 bits:0 
> flags:0
> Oct 12 06:58:51 ha14b kernel: block drbd1: uuid_compare()=1 by rule 70
> Oct 12 06:58:51 ha14b kernel: block drbd1: Becoming sync source due to disk 
> states.
> Oct 12 06:58:51 ha14b kernel: block drbd1: peer( Unknown -> Secondary ) conn( 
> WFReportParams -> WFBitMapS )
> Oct 12 06:58:51 ha14b kernel: block drbd1: send bitmap stats 
> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Oct 12 06:58:51 ha14b kernel: block drbd1: receive bitmap stats 
> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
> Oct 12 06:58:51 ha14b kernel: block drbd1: helper command: /sbin/drbdadm 
> before-resync-source minor-1
> Oct 12 06:58:51 ha14b kernel: block drbd1: helper command: /sbin/drbdadm 
> before-resync-source minor-1 exit code 0 (0x0)
> Oct 12 06:58:51 ha14b kernel: block drbd1: conn( WFBitMapS -> SyncSource )
> Oct 12 06:58:51 ha14b kernel: block drbd1: Began resync as SyncSource (will 
> sync 0 KB [0 bits set]).
> Oct 12 06:58:5