On Wed, 2005-05-25 at 19:06, Troy Benjegerdes wrote: > I was running a crufty version of opensm (compiled from the > roland-uverbs branch), and I started getting these kinds of errors for > no apparent reason: > > ib0: multicast join failed for ff12:401b:ffff:0:0:0:ffff:ffff, status > -22 > ib0: multicast join failed for ff12:401b:ffff:0:0:0:ffff:ffff, status > -22 > > I'm running 2.6.11 kernels, and 'stock' modules.. I just tried > rebuilding opensm from the latest SVN, but it apparently needs a new > umad driver.. > > warn: [24878] umad_init: wrong ABI version: > /sys/class/infiniband_mad/abi_version is 2 but library ABI is 3 > > > I suppose I need to rebuild the kernel ib_umad (and maybe everything > else for good measure).. And if I do that, should I expect OpenSM to > work better regarding the multicast issue?
Yes, one quick workaround would be to pull the latest user_mad.c and user_mad.h files from OpenIB svn and rebuild the umad module. There might be some danger in this if RMPP is used as what has been pushed upstream does not include the changes for RMPP (MAD layer). Is this worth a try ? How quickly do you need a solution ? > Also, what will happen if I run opensm on two different nodes? Will they > fight, or will one of them figure out how to be a backup slave SM if the > first goes down? Is this relative to the umad ABI version ? That is just a local issue. There is no issue with OpenSMs on different nodes using different umad ABI versions as any communication between them is via standard MADs. -- Hal _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
