W dniu 25.11.2013 18:25, Digimer pisze:
I'd like to see the full logs, starting from a little before the issue
started.


Here are logs since Nov 17 until Nov 24 (my pastebin is too small to handle them):

Node A - https://www.dropbox.com/sh/dj08fbckj9zo104/Ew1QpdRq9A/A.log
Node B - https://www.dropbox.com/sh/dj08fbckj9zo104/p9ldlBkGkG/B.log

It looks though like, for whatever reason, a stop was called, failed, so
the node was fenced. This would mean that congestion, as you suggested,
is not the likely cause.

Out of curiosity though; what bonding mode are you using? My testing
showed that only mode=1 was reliable. Since I tested, corosync added
support for mode=0 and mode=2, but I've not re-tested them. When I was
doing my bonding tests, I found all other modes to break communications
in some manner of use or failure/recovery testing.



I use 802.3ad mode (so it is mode 4):

auto bond0
iface bond0 inet static
        slaves eth4 eth5
        bond-mode 802.3ad
        bond-lacp_rate fast
        bond-miimon 100
        bond-downdelay 200
        bond-updelay 200
        address 10.0.0.1
        netmask 255.255.255.0
        broadcast 10.0.0.255

Do you think that it could be the reason - I mean wrong mode and some communication issues because of that?

Thank you once more!

--
Michał Margula, alche...@uznam.net.pl, http://alchemyx.uznam.net.pl/
"W życiu piękne są tylko chwile" [Ryszard Riedel]

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to