[OMPI devel] RML tags
Hi folks, I was looking at the rml usage in ompi, and noticed that several of the btls (udapl, mvapi, and openib) use the same rml tag for their messages. My guess is that this is a mistake, but just want to ask if there is a reason for this before I correct it. Thanks, Tim
Re: [OMPI devel] [RFC] OFED 1.2 and uDAPL
I second this; it's been an annoyance here at LLNL, even for OFED v1.1, which they prefix into /usr. Andrew Jeff Squyres wrote: I just upgraded my Cisco MPI development cluster to OFED 1.2 over the weekend. This morning, I discovered a fun situation with regards to uDAPL... WHAT: We propose adding a check into the udapl configury to disable automatically building the udapl BTL when on Linux/OFED. --with- udapl can be specified to override the check and do the normal udapl configury stuff. WHY: The udapl BTL is built by default on OFED 1.2 clusters (because the UDAPL libraries are in /lib), but the /etc/dat.conf file that ships in OFED 1.2 is broken such that the UDAPL BTL will emit warnings upon init. WHERE: config/ompi_check_udapl.m4 WHEN: ASAP -- I want this for v1.2.4 because affects all OFED 1.2 users TIMEOUT: Thursday COB (because I think Brian's out today?) --- Short version: -- Terry, George, and Jeff propose to add a check into ompi_check_udapl.m4 that will disable building the udapl BTL by default when on Linux. You can specify --with-udapl when on Linux to force the normal check-for-headers-and-libraries udapl configure stuff. When not on Linux (e.g., Solaris), the normal check-for- headers-and-libraries configure stuff will always happen. Long version: - Since OFED 1.2 [by default] installs into /usr, Open MPI's configure script finds the header files/libraries for both verbs and uDAPL, and therefore builds both the openib and udapl BTLs. Keep in mind that on Linux/OFED, uDAPL is implemented as a layer on top of verbs, so it is not the "preferred" transport to use -- we want to use verbs (i.e., the openib BTL). After some poking around (and checking with George/Galen), we found that the BTL exclusivity parameter in the openib BTL is set to MCA_BTL_EXCLUSIVITY_DEFAULT; the udapl BTL sets it to (MCA_BTL_EXCLUSIVITY_DEFAULT-10). So that's good -- if Open MPI loads both BTLs, it's going to effectively ignore the udapl BTL (after initializing it) and use the openib BTL -- which is what we want. The problem is that OFED 1.2 ships with an /etc/dat.conf that is effectively broken (dat.conf is the text config file for DAT/DAPL). The udapl BTL attempts to open all DAPL providers, but by the default dat.conf in OFED 1.2, some or all of them will fail (and the UDAPL BTL will print warnings for each failure). On Solaris, where UDAPL *is* the high performance network, if there are any problems with dat.conf, users will want to know -- they will want to see the warnings from the UDAPL BTL. But on Linux, you likely don't care about these warnings because you don't care about UDAPL anyway (because you almost certainly want to be using the openib/verbs BTL). Terry, George, and I went through a bunch of different possible scenarios to fix this dichotomy, and concluded that the one that was the least evil was simply to disable building the udapl BTL on Linux by default -- you can override this default by specifying --with- udapl on the configure command line. This solution has the following properties: 1. Most importantly, the default configure/build/run on Solaris and Linux/OFED clusters works -- it follows the Law of Least Astonishment. 2. Avoids schitzophrenia in the UDAPL BTL trying to divine when a user would care about the warning messages or not. If anyone *wants* the UDAPL BTL build on Linux, they'll likely disagree that we follow the Law of Least Astonishment, but I suspect that that is a fairly small group of people. We'll add something to the FAQ about this issue so that at least the solution is a simple Google search away.
[OMPI devel] [RFC] OFED 1.2 and uDAPL
I just upgraded my Cisco MPI development cluster to OFED 1.2 over the weekend. This morning, I discovered a fun situation with regards to uDAPL... WHAT: We propose adding a check into the udapl configury to disable automatically building the udapl BTL when on Linux/OFED. --with- udapl can be specified to override the check and do the normal udapl configury stuff. WHY: The udapl BTL is built by default on OFED 1.2 clusters (because the UDAPL libraries are in /lib), but the /etc/dat.conf file that ships in OFED 1.2 is broken such that the UDAPL BTL will emit warnings upon init. WHERE: config/ompi_check_udapl.m4 WHEN: ASAP -- I want this for v1.2.4 because affects all OFED 1.2 users TIMEOUT: Thursday COB (because I think Brian's out today?) --- Short version: -- Terry, George, and Jeff propose to add a check into ompi_check_udapl.m4 that will disable building the udapl BTL by default when on Linux. You can specify --with-udapl when on Linux to force the normal check-for-headers-and-libraries udapl configure stuff. When not on Linux (e.g., Solaris), the normal check-for- headers-and-libraries configure stuff will always happen. Long version: - Since OFED 1.2 [by default] installs into /usr, Open MPI's configure script finds the header files/libraries for both verbs and uDAPL, and therefore builds both the openib and udapl BTLs. Keep in mind that on Linux/OFED, uDAPL is implemented as a layer on top of verbs, so it is not the "preferred" transport to use -- we want to use verbs (i.e., the openib BTL). After some poking around (and checking with George/Galen), we found that the BTL exclusivity parameter in the openib BTL is set to MCA_BTL_EXCLUSIVITY_DEFAULT; the udapl BTL sets it to (MCA_BTL_EXCLUSIVITY_DEFAULT-10). So that's good -- if Open MPI loads both BTLs, it's going to effectively ignore the udapl BTL (after initializing it) and use the openib BTL -- which is what we want. The problem is that OFED 1.2 ships with an /etc/dat.conf that is effectively broken (dat.conf is the text config file for DAT/DAPL). The udapl BTL attempts to open all DAPL providers, but by the default dat.conf in OFED 1.2, some or all of them will fail (and the UDAPL BTL will print warnings for each failure). On Solaris, where UDAPL *is* the high performance network, if there are any problems with dat.conf, users will want to know -- they will want to see the warnings from the UDAPL BTL. But on Linux, you likely don't care about these warnings because you don't care about UDAPL anyway (because you almost certainly want to be using the openib/verbs BTL). Terry, George, and I went through a bunch of different possible scenarios to fix this dichotomy, and concluded that the one that was the least evil was simply to disable building the udapl BTL on Linux by default -- you can override this default by specifying --with- udapl on the configure command line. This solution has the following properties: 1. Most importantly, the default configure/build/run on Solaris and Linux/OFED clusters works -- it follows the Law of Least Astonishment. 2. Avoids schitzophrenia in the UDAPL BTL trying to divine when a user would care about the warning messages or not. If anyone *wants* the UDAPL BTL build on Linux, they'll likely disagree that we follow the Law of Least Astonishment, but I suspect that that is a fairly small group of people. We'll add something to the FAQ about this issue so that at least the solution is a simple Google search away. -- Jeff Squyres Cisco Systems
[OMPI devel] Last night's MTT
The trunk nightly tarball failed to be created last night, meaning that last night's MTT runs were testing a tarball from 2 days ago. However, the 2-day-old trunk tarball had a bug in the openib BTL that caused all tests (over IB) to fail. So if your site tests IB, you might as well kill any still-running MTT trunk instances this morning. -- Jeff Squyres Cisco Systems