Hi,

I know this may sounds few related to oVirt, but here is what happened here : We have two oVirt datacenters in 3.4.1 (and 3.4 compat. level), and amongst many other things, we have a ctdb cluster composed of two nodes, each of them being in a different datacenter.

Some days ago, I raise the compat. level from 3.3 to 3.4, upgraded the hosts of datacenter#1 from centOS 6.4 to 6.5 (datacenter#2 was already 6.5), and our two (VM)ctdb nodes (centOS 6.5) were constantly failing into cluster partition. Almost 3 days of googling lead me to a point that seems know for some times, being : multicast on bridged interfaces is more or less supported, or randomly supported.

See :
http://lists.corosync.org/pipermail/discuss/2012-November/002208.html

On the concerned hosts, I tried the advised workaround, and yes, that stabilized the situation.

I understand this is not directly oVirt/RHEV related, but I really don't get why this ctdb cluster has worked for months, and stopped recently (I'll have to deeply dig into the release notes of the upgraded packages and try to find something useful).

I post that here to :
- ask if some of you are also running clusters amongst VMs (not particularly amongst datacenters - VM discussion amongst hosts may also be an issue)
- leave a trace in case that may help debug some setups

Regards,

--
Nicolas Ecarnot
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to