[OMPI devel] RML tags

2007-08-15 Thread Tim Prins

Hi folks,

I was looking at the rml usage in ompi, and noticed that several of the 
btls (udapl, mvapi, and openib) use the same rml tag for their messages. 
My guess is that this is a mistake, but just want to ask if there is a 
reason for this before I correct it.


Thanks,

Tim


Re: [OMPI devel] [RFC] OFED 1.2 and uDAPL

2007-08-15 Thread Andrew Friedley
I second this; it's been an annoyance here at LLNL, even for OFED v1.1, 
which they prefix into /usr.


Andrew

Jeff Squyres wrote:
I just upgraded my Cisco MPI development cluster to OFED 1.2 over the  
weekend.  This morning, I discovered a fun situation with regards to  
uDAPL...


WHAT: We propose adding a check into the udapl configury to disable  
automatically building the udapl BTL when on Linux/OFED.  --with- 
udapl can be specified to override the check and do the normal udapl  
configury stuff.


WHY: The udapl BTL is built by default on OFED 1.2 clusters (because  
the UDAPL libraries are in /lib), but the /etc/dat.conf file that  
ships in OFED 1.2 is broken such that the UDAPL BTL will emit  
warnings upon init.


WHERE: config/ompi_check_udapl.m4

WHEN: ASAP -- I want this for v1.2.4 because affects all OFED 1.2 users

TIMEOUT: Thursday COB (because I think Brian's out today?)

---

Short version:
--

Terry, George, and Jeff propose to add a check into  
ompi_check_udapl.m4 that will disable building the udapl BTL by  
default when on Linux.  You can specify --with-udapl when on Linux to  
force the normal check-for-headers-and-libraries udapl configure  
stuff.  When not on Linux (e.g., Solaris), the normal check-for- 
headers-and-libraries configure stuff will always happen.


Long version:
-

Since OFED 1.2 [by default] installs into /usr, Open MPI's configure  
script finds the header files/libraries for both verbs and uDAPL, and  
therefore builds both the openib and udapl BTLs.  Keep in mind that  
on Linux/OFED, uDAPL is implemented as a layer on top of verbs, so it  
is not the "preferred" transport to use -- we want to use verbs  
(i.e., the openib BTL).


After some poking around (and checking with George/Galen), we found  
that the BTL exclusivity parameter in the openib BTL is set to  
MCA_BTL_EXCLUSIVITY_DEFAULT; the udapl BTL sets it to  
(MCA_BTL_EXCLUSIVITY_DEFAULT-10).  So that's good -- if Open MPI  
loads both BTLs, it's going to effectively ignore the udapl BTL  
(after initializing it) and use the openib BTL -- which is what we want.


The problem is that OFED 1.2 ships with an /etc/dat.conf that is  
effectively broken (dat.conf is the text config file for DAT/DAPL).   
The udapl BTL attempts to open all DAPL providers, but by the default  
dat.conf in OFED 1.2, some or all of them will fail (and the UDAPL  
BTL will print warnings for each failure).


On Solaris, where UDAPL *is* the high performance network, if there  
are any problems with dat.conf, users will want to know -- they will  
want to see the warnings from the UDAPL BTL.  But on Linux, you  
likely don't care about these warnings because you don't care about  
UDAPL anyway (because you almost certainly want to be using the  
openib/verbs BTL).


Terry, George, and I went through a bunch of different possible  
scenarios to fix this dichotomy, and concluded that the one that was  
the least evil was simply to disable building the udapl BTL on Linux  
by default -- you can override this default by specifying --with- 
udapl on the configure command line.  This solution has the following  
properties:


1. Most importantly, the default configure/build/run on Solaris and  
Linux/OFED clusters works -- it follows the Law of Least Astonishment.


2. Avoids schitzophrenia in the UDAPL BTL trying to divine when a  
user would care about the warning messages or not.


If anyone *wants* the UDAPL BTL build on Linux, they'll likely  
disagree that we follow the Law of Least Astonishment, but I suspect  
that that is a fairly small group of people.  We'll add something to  
the FAQ about this issue so that at least the solution is a simple  
Google search away.




[OMPI devel] [RFC] OFED 1.2 and uDAPL

2007-08-15 Thread Jeff Squyres
I just upgraded my Cisco MPI development cluster to OFED 1.2 over the  
weekend.  This morning, I discovered a fun situation with regards to  
uDAPL...


WHAT: We propose adding a check into the udapl configury to disable  
automatically building the udapl BTL when on Linux/OFED.  --with- 
udapl can be specified to override the check and do the normal udapl  
configury stuff.


WHY: The udapl BTL is built by default on OFED 1.2 clusters (because  
the UDAPL libraries are in /lib), but the /etc/dat.conf file that  
ships in OFED 1.2 is broken such that the UDAPL BTL will emit  
warnings upon init.


WHERE: config/ompi_check_udapl.m4

WHEN: ASAP -- I want this for v1.2.4 because affects all OFED 1.2 users

TIMEOUT: Thursday COB (because I think Brian's out today?)

---

Short version:
--

Terry, George, and Jeff propose to add a check into  
ompi_check_udapl.m4 that will disable building the udapl BTL by  
default when on Linux.  You can specify --with-udapl when on Linux to  
force the normal check-for-headers-and-libraries udapl configure  
stuff.  When not on Linux (e.g., Solaris), the normal check-for- 
headers-and-libraries configure stuff will always happen.


Long version:
-

Since OFED 1.2 [by default] installs into /usr, Open MPI's configure  
script finds the header files/libraries for both verbs and uDAPL, and  
therefore builds both the openib and udapl BTLs.  Keep in mind that  
on Linux/OFED, uDAPL is implemented as a layer on top of verbs, so it  
is not the "preferred" transport to use -- we want to use verbs  
(i.e., the openib BTL).


After some poking around (and checking with George/Galen), we found  
that the BTL exclusivity parameter in the openib BTL is set to  
MCA_BTL_EXCLUSIVITY_DEFAULT; the udapl BTL sets it to  
(MCA_BTL_EXCLUSIVITY_DEFAULT-10).  So that's good -- if Open MPI  
loads both BTLs, it's going to effectively ignore the udapl BTL  
(after initializing it) and use the openib BTL -- which is what we want.


The problem is that OFED 1.2 ships with an /etc/dat.conf that is  
effectively broken (dat.conf is the text config file for DAT/DAPL).   
The udapl BTL attempts to open all DAPL providers, but by the default  
dat.conf in OFED 1.2, some or all of them will fail (and the UDAPL  
BTL will print warnings for each failure).


On Solaris, where UDAPL *is* the high performance network, if there  
are any problems with dat.conf, users will want to know -- they will  
want to see the warnings from the UDAPL BTL.  But on Linux, you  
likely don't care about these warnings because you don't care about  
UDAPL anyway (because you almost certainly want to be using the  
openib/verbs BTL).


Terry, George, and I went through a bunch of different possible  
scenarios to fix this dichotomy, and concluded that the one that was  
the least evil was simply to disable building the udapl BTL on Linux  
by default -- you can override this default by specifying --with- 
udapl on the configure command line.  This solution has the following  
properties:


1. Most importantly, the default configure/build/run on Solaris and  
Linux/OFED clusters works -- it follows the Law of Least Astonishment.


2. Avoids schitzophrenia in the UDAPL BTL trying to divine when a  
user would care about the warning messages or not.


If anyone *wants* the UDAPL BTL build on Linux, they'll likely  
disagree that we follow the Law of Least Astonishment, but I suspect  
that that is a fairly small group of people.  We'll add something to  
the FAQ about this issue so that at least the solution is a simple  
Google search away.


--
Jeff Squyres
Cisco Systems



[OMPI devel] Last night's MTT

2007-08-15 Thread Jeff Squyres
The trunk nightly tarball failed to be created last night, meaning  
that last night's MTT runs were testing a tarball from 2 days ago.


However, the 2-day-old trunk tarball had a bug in the openib BTL that  
caused all tests (over IB) to fail.  So if your site tests IB, you  
might as well kill any still-running MTT trunk instances this morning.


--
Jeff Squyres
Cisco Systems