Hi, I've got a pair of servers running on RHEL5 x86_64 with openais-0.80 (older install) which I want to upgrade to corosync-1.3.0 + pacemaker-1.0.10. Downtime is not an issue and corosync 1.3.0 is needed for UDPU, so I built it from the corosync.org website and openais 1.1.4 from openais.org website.
With pacemaker, we won't be using the heartbeat stack, so I built the pacemaker package from the clusterlabs.org src.rpm without heartbeat support. To be more precise I used rpmbuild --without heartbeat --with ais --with snmp --with esmtp -ba pacemaker-epel.spec Now I've tested the rpm list below on a pair of XEN VM's, it works just fine. cluster-glue-1.0.6-1.6.el5.x86_64.rpm cluster-glue-libs-1.0.6-1.6.el5.x86_64.rpm corosync-1.3.0-1.x86_64.rpm corosynclib-1.3.0-1.x86_64.rpm libesmtp-1.0.4-5.el5.x86_64.rpm libibverbs-1.1.2-1.el5.x86_64.rpm librdmacm-1.0.8-1.el5.x86_64.rpm libtool-ltdl-1.5.22-6.1.x86_64.rpm openais-1.1.4-2.x86_64.rpm openaislib-1.1.4-2.x86_64.rpm openhpi-2.10.2-1.el5.x86_64.rpm openib-1.3.2-0.20080728.0355.3.el5.noarch.rpm pacemaker-1.0.10-1.4.x86_64.rpm pacemaker-libs-1.0.10-1.4.x86_64.rpm perl-TimeDate-1.16-5.el5.noarch.rpm resource-agents-1.0.3-2.6.el5.x86_64.rpm However when performing the upgrade on the servers running openais-0.80, first I removed the heartbeat, heartbeat-libs and PyXML rpms (conflicting dependencies issue) then rpm -Uvh on the rpm list above. Installation went fine, removed existing cib.xml and signatures, fresh start. Then I configured corosync, then started it on both servers, and nothing. At first I got an error related to pacemaker mgmt, which was an old package installed with the old rpms. Removed it, tried again. Nothing. Removed all cluster related rpms old and new + deps, except for DRBD, then installed the list above, then again, nothing. What nothing means: - corosync starts, never elects DC, never sees the other node or itself for that matter. - can stop corosync via the init script, it goes into an endless phase where it just prints dots to the screen, have to kill the process to make it stop. Troubleshooting done so far: - tested network sockets (nc from side to side), firewall rules (iptables down), communication is ok - searched for the original RPM's list, removed all remaining RPMs, ran ldconfig, removed new RPM's, installed new RPM's My guess is that there are some leftovers from the old openais-0.80 installation, which mess with the current installation, seeing as how the same set of RPMs on a pair of XEN VM's with the same OS work fine, however I cannot put my finger on the culprit for the real servers' issue. Logs: http://pastebin.com/i0maZM4p Ideas, suggestions? TIA. Regards, Dan -- Dan Frîncu CCNA, RHCE
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker