Hi, On Tue, Mar 16, 2010 at 02:10:25AM -0700, Steven Dake wrote: > On Tue, 2010-03-09 at 21:17 +0100, Dejan Muhamedagic wrote: > > Hello, > > > > An upgrade of one node from whitetank (openais) to flatiron > > (corosync) made that node incapable of joining the cluster again. > > During the upgrade the other nodes were running. The issue turned > > out to be that between the two releases the node ids are > > calculated differently on big endian platforms (this was on > > s390x). Actually, openais/corosync didn't find this a problem, > > but pacemaker couldn't match the old and the new node id. > > > > Reverting the following patch fixed the issue: > > > I wanted to think about this more, before responding so please excuse > the delay. > > My initial thoughts are we put this patch in to fix a serious problem > with endianness coming from the cpg service for nodeids (and affecting > all other clusters). The root of the problem was that the endian of the > nodeid was never known coming out of cpg since it wasn't stored in any > particular order. > > I still think this patch is correct but it does indeed break backward > compat. One option is to default to the pre-patch behavior if > compatibility:whitetank is set.
That sounds like a good idea. > I'll have to think through how to do that in a non-ABI breaking way. A > patch that someone else writes to do this is also a good solution :-). Not sure how would that break ABI. Looked a bit into how to make a patch, but the compatibility information is static in main.c (minimum_sync_mode). > Another/parallel option is to maintain a revert of the patch in your > packages until we get this problem sorted out. Yes, that's what we'll do. > Would you file a bugzilla against the corosync package in fedora > rawhide? It is where I track fedora bugs so I don't lose track of it. https://bugzilla.redhat.com/show_bug.cgi?id=577129 Cheers, Dejan > Regards > -steve > > > Index: branches/flatiron/exec/totemip.c > > =================================================================== > > --- branches/flatiron/exec/totemip.c (revision 2428) > > +++ branches/flatiron/exec/totemip.c (revision 2429) > > @@ -376,6 +376,9 @@ > > */ > > totemip_sockaddr_to_totemip_convert((struct sockaddr_storage > > *)sockaddr_in, boundto); > > boundto->nodeid = sockaddr_in->sin_addr.s_addr; > > +#if __BYTE_ORDER == __BIG_ENDIAN > > + boundto->nodeid = swab32 (boundto->nodeid); > > +#endif > > > > if (ioctl(id_fd, SIOCGLIFFLAGS, &lifreq[i]) < 0) { > > printf ("couldn't do ioctl\n"); > > @@ -614,6 +617,9 @@ > > if (ipaddr.family == AF_INET && ipaddr.nodeid == 0) { > > unsigned int nodeid = 0; > > memcpy (&nodeid, ipaddr.addr, sizeof (int)); > > +#if __BYTE_ORDER == __BIG_ENDIAN > > + nodeid = swab32 (nodeid); > > +#endif > > if (mask_high_bit) { > > nodeid &= 0x7FFFFFFF; > > } > > > > The nodeids with flatiron do appear now the same on both big and > > little endian platforms, but this regression prevents rolling > > upgrades of single nodes. Also, the ids are in a reversed order, > > for instance 192.168.100.13 gets the id 224700608 (hex 0D64A8C0). > > > > There is some discussion at the Novell bugzilla: > > https://bugzilla.novell.com/show_bug.cgi?id=584976 > > > > Thanks, > > > > Dejan > > _______________________________________________ > > Openais mailing list > > [email protected] > > https://lists.linux-foundation.org/mailman/listinfo/openais > _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
