> Once this routing is in place, the only thing they need is to enhance > the MPI job starter/etc to allocate to each job (say) two unique > multicast --IP-- addresses on the relevant subnet and provide these IP > addresses to each rank. Now the rank can use the RDMA CM without any > hack.
I don't this isn't as easy as you've made it sound. I see two approaches to preventing address collision -- both require voluntary participation. First is a centralized authority approach (this has been used for IP multicast-based protocols). This means running some sort of daemon in a location all peers can communicate with. I'm not really keen on the idea of requiring a separate daemon just to support multicast in Open MPI. Second is peer-to-peer based approaches. These are doable, but difficult due to numerous race conditions. It's also highly desireable to minimize the time cost of joining a multicast group; this is especially difficult with a peer-to-peer solutions. Also, I'd rather not assume a single MPI job requires a constant (small) number of multicast groups/addresses. The obvious correllation is to use one multicast group per MPI communicator. Most applications will use only a few, though some may use hundreds, and may even vary the number in use as the app executes. I've also been considering approaches utilizing many groups per communicator, so again we could be looking at hundreds of multicast groups per MPI job. As I've said, implementing solutions at the MPI level is doable but difficult. I knew from earlier discussions that IB is able to allocate new, unused multicast addresses and was hoping expose that functionality and avoid the multicast address allocation problem. However I hadn't thought of the fact that other networks supported by the RDMA CM might not have similar functionality.. so this might not be appropriate there. But maybe it is worth considering how hard it is for those other networks to provide the functionality? Andrew _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
