Hi Steven, Andrew, I identified the issue described in "[Openais] I think I have hit a bug, but need confirmation". It was my fault, however I am not sure what happens should not be handled more "gracefully" by openais and/or pacemaker (note for beekhof: not sure if openais behavior during this misconfiguration is wrong or pacemaker should be more resilient to it, because the reason why my problem happens could occur in production network with faulty config or as a DoS attack, and this is VERY bad that it crashed pacemaker or render it unusable).
What happens is that apart from the bare-metal machines that are used as cluster nodes, I have, on some othe bare-metal servers (same eth segment, same subnet), some compile machines running as linux-vservers. On those, I do compile openAIS, pacemaker, etc. In order to compile pacemaker, I have installed on them the openAIS package that I built, so it was in fact running on those vservers. BUT, as in the vserver setup I am using the mcast addresses cannot be set on the interface from inside the vserver, those vservers where in fact sending notifications to the mcast address configured (triggering the config change and GATHER from 11 seen in my logs on the bare-metal nodes), but were unable to receive any replies via the mcast group. Note that I can reproduce this at will by running the aisexec with pacemaker patches on the vserver (Andrew, please have a look !) but not when running the aisexec from the stock openais.org 0.80.3 tarball. I do not really know why and won't investigate being short on time for now. All I know is that one misconfigured openais on an eth segment with production machines can break havoc in a pacemaker network, and probably in any openais production setup. Thanks a lot Steven for your answers which definately helped me nail the issue, which was in fact totally unrelated to 32/64 bits and/or libc6 version. I would appreciate your thoughts about the potential risk that this issue resolution highlights, and what is your advice/philospophy regarding it. If you do not consider it as a bug but as a part of openAIS design, please explain the recommended administrative policy/configuration you recommend to avoid this risk. Regards, -- Jérôme Martin | LongPhone Responsable Architecture Réseau 122, rue la Boetie | 75008 Paris Tel : +33 (0)1 56 26 28 44 Fax : +33 (0)1 56 26 28 45 Mail : [EMAIL PROTECTED] Web : www.longphone.com <http://www.longphone.com> _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
