W dniu 25.11.2013 18:25, Digimer pisze:
I'd like to see the full logs, starting from a little before the issue
started.
Here are logs since Nov 17 until Nov 24 (my pastebin is too small to
handle them):
Node A - https://www.dropbox.com/sh/dj08fbckj9zo104/Ew1QpdRq9A/A.log
Node B - https://www.dropbox.com/sh/dj08fbckj9zo104/p9ldlBkGkG/B.log
It looks though like, for whatever reason, a stop was called, failed, so
the node was fenced. This would mean that congestion, as you suggested,
is not the likely cause.
Out of curiosity though; what bonding mode are you using? My testing
showed that only mode=1 was reliable. Since I tested, corosync added
support for mode=0 and mode=2, but I've not re-tested them. When I was
doing my bonding tests, I found all other modes to break communications
in some manner of use or failure/recovery testing.
I use 802.3ad mode (so it is mode 4):
auto bond0
iface bond0 inet static
slaves eth4 eth5
bond-mode 802.3ad
bond-lacp_rate fast
bond-miimon 100
bond-downdelay 200
bond-updelay 200
address 10.0.0.1
netmask 255.255.255.0
broadcast 10.0.0.255
Do you think that it could be the reason - I mean wrong mode and some
communication issues because of that?
Thank you once more!
--
Michał Margula, alche...@uznam.net.pl, http://alchemyx.uznam.net.pl/
"W życiu piękne są tylko chwile" [Ryszard Riedel]
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org