Sean Hefty wrote: >>I'm concerned about how rdma_cm abstracts HCAs. It looks like I can use >>the src_addr argument to rdma_resolve_addr() to select which IP >>address/HCA (assuming one IP per HCA), but how can I enumerate the >>available HCAs? > > > The HCA / RDMA device abstraction is there for device hotplug, but the verb > call > to enumerate HCAs is still usable if you want to get a list of all HCAs in the > system. > > You will likely have one IP address per port, rather than per HCA. You > probably > want to distinguish between locally assigned IP addresses (those given to > ipoib > devices - ib0, etc.), versus multicast IP addresses, and verify that your > multicast routing tables direct traffic out of ipoib IP addresses, rather than > Ethernet IP addresses. The IB multicast groups will base their local routing > the same as the true IP multicast groups.
Yes - I'm actually talking about a separate issue here. It looks like using the RDMA CM for multicast is going to require using it for all of my connection management, so I'm looking at what that entails. Currently I'm using only ibverbs and Open MPI's runtime environment layer. > >>This is important for a number of reasons - one, so that I can pass on >>the available IP addresses to MPI peers out of band. It's also >>important to know which HCA's are available in the system, and to be >>able to select which HCA to use when connecting to a peer. This allows >>us to implement things like load balancing and failover. > > > HCA / port selection can be controlled by selecting a specific IP address, and > you can configure your multicast routing tables to direct traffic out any > desired port. You should have the same control over using a specific HCA / > port; only the type of address used to identify the port changes. > > I might be able to make things a little easier by adding some sort of call > that > identifies all RDMA IP addresses in the system. You could test for this today > by calling rdma_bind_addr() on all IP addresses assigned to the system. This > doesn't really help with multicast addresses though, since you don't bind to > them... That would be very nice - Open MPI already supports enumeration of IP interfaces (which I could do rdma_bind_addr() on as you suggested) in a portable fashion, but I think being able to get this via RDMA CM is a better general solution. Right about the multicast addresses - should have made it clear that I was talking unicast IP. I understand RDMA CM is a generic CM intended for other types of devices (ie iWARP), not just infiniband. Will all of these devices be supported under the ibverbs interface? I'm thinking it would be a problem if we're picking up interfaces that don't support ibverbs, then try to use ibverbs to communicate over them. > I'm not clear on what you mean about passing available IP addresses to MPI > peers, or why it's done out of band. Are you talking about IP addresses of > the > local ipoib devices? Multicast IP addresses? By out of band, do you mean > over > a socket, as opposed to an IB connection? Sorry - I'm talking about IP addresses of the local ipoib devices, or whatever sort of addressing structure a particular network uses. Yes, we currently send this information out of band, over TCP. Our network initialization works like this - we have modules written for each type of network (TCP, infiniband, GM, etc). In the first initialization stage for each module, available interfaces are enumerated, initialized, and addressing information for each interface is made available to our runtime environment layer. This addressing information is exchanged among all peers in the MPI job via TCP (I believe we have a framework for supporting other methods, but only TCP is currently implemented). Finally, each network module takes all the peer addresses for its network and sets up any necessary data structures for communicating with each of those peers. > >>Matt Leininger suggested looking at the IB CM as an alternative, as it >>gives more low-level control. Am I missing something, or does the IB CM >>not handle multicast like the RDMA CM? > > > IB multicast groups require SA interaction, and are not associated with the IB > CM. What control do you feel that the RDMA CM is missing? At the moment, I'm more concerned about how the RDMA CM API fits with Open MPI (which I think it will, just need to re-think connection management). In the future though, one thing that comes to mind is control of dynamic/multipath routing. Andrew _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
