Re: [OMPI devel] Multi-environment builds

2007-07-11 Thread Joshua Hursey


On Jul 11, 2007, at 8:09 AM, Terry D. Dontje wrote:


Jeff Squyres wrote:


On Jul 10, 2007, at 1:26 PM, Ralph H Castain wrote:




2. It may be useful to have some high-level parameters to specify a
specific run-time environment, since ORTE has multiple, related
frameworks (e.g., RAS and PLS).  E.g., "orte_base_launcher=tm", or
somesuch.




I was just writing this up in an enhancement ticket when the though
hit me: isn't this aggregate MCA parameters?  I.e.:

mpirun --am tm ...

Specifically, we'll need to make a "tm" AMCA file (and whatever other
ones we want), but my point is: does AMCA already give us what we  
want?




The above sounds like a possible solution as long as we are going to
deliver a set of such files and not require each site to create their
own.  Also, can one pull in multiple AMCA files for one run thus  
you can
specify a tm AMCA and possibly some other AMCA file that the user  
may want?


Yep. You can put a ':' between different parameters. So:
 shell$ mpirun -am tm:foo:bar ...
will pull in the three AMCA files 'tm' 'foo' 'bar' in that order of  
precedence. Meaning that 'tm' can override a MCA parameter in 'foo',  
and 'foo' can override a MCA parameter in 'bar'. And any '-mca'  
command line options take a higher precedence than AMCA parameter  
files, so could override MCA parameters set by any of 'tm' 'foo' or  
'bar'.


I'll put it on my list to make a faq entry for AMCA usage, as I don't  
see one.


-- Josh



--td
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Josh Hursey
jjhur...@open-mpi.org
http://www.open-mpi.org/





Re: [OMPI devel] Multi-environment builds

2007-07-11 Thread Ralph H Castain
Interesting point - no reason why we couldn't use that functionality for
this purpose. Good idea!


On 7/11/07 5:38 AM, "Jeff Squyres"  wrote:

> On Jul 10, 2007, at 1:26 PM, Ralph H Castain wrote:
> 
>>> 2. It may be useful to have some high-level parameters to specify a
>>> specific run-time environment, since ORTE has multiple, related
>>> frameworks (e.g., RAS and PLS).  E.g., "orte_base_launcher=tm", or
>>> somesuch.
> 
> I was just writing this up in an enhancement ticket when the though
> hit me: isn't this aggregate MCA parameters?  I.e.:
> 
> mpirun --am tm ...
> 
> Specifically, we'll need to make a "tm" AMCA file (and whatever other
> ones we want), but my point is: does AMCA already give us what we want?




Re: [OMPI devel] Multi-environment builds

2007-07-11 Thread Terry D. Dontje

Jeff Squyres wrote:


On Jul 10, 2007, at 1:26 PM, Ralph H Castain wrote:

 


2. It may be useful to have some high-level parameters to specify a
specific run-time environment, since ORTE has multiple, related
frameworks (e.g., RAS and PLS).  E.g., "orte_base_launcher=tm", or
somesuch.
 



I was just writing this up in an enhancement ticket when the though  
hit me: isn't this aggregate MCA parameters?  I.e.:


mpirun --am tm ...

Specifically, we'll need to make a "tm" AMCA file (and whatever other  
ones we want), but my point is: does AMCA already give us what we want?
 

The above sounds like a possible solution as long as we are going to 
deliver a set of such files and not require each site to create their 
own.  Also, can one pull in multiple AMCA files for one run thus you can 
specify a tm AMCA and possibly some other AMCA file that the user may want?


--td


Re: [OMPI devel] Multi-environment builds

2007-07-11 Thread Jeff Squyres

On Jul 10, 2007, at 1:26 PM, Ralph H Castain wrote:


2. It may be useful to have some high-level parameters to specify a
specific run-time environment, since ORTE has multiple, related
frameworks (e.g., RAS and PLS).  E.g., "orte_base_launcher=tm", or
somesuch.


I was just writing this up in an enhancement ticket when the though  
hit me: isn't this aggregate MCA parameters?  I.e.:


mpirun --am tm ...

Specifically, we'll need to make a "tm" AMCA file (and whatever other  
ones we want), but my point is: does AMCA already give us what we want?


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Ralph H Castain
I think that is quite accurate and would be helpful in resolving the
problem...


On 7/10/07 10:32 AM, "Jeff Squyres"  wrote:

> Point taken.
> 
> Is this an accurate summary?
> 
> 1. "Best practices" should be documented, to include sysadmins
> specifically itemizing what components should be used on their
> systems (e.g., in an environment variable or the system-wide MCA
> parameters file).
> 
> 2. It may be useful to have some high-level parameters to specify a
> specific run-time environment, since ORTE has multiple, related
> frameworks (e.g., RAS and PLS).  E.g., "orte_base_launcher=tm", or
> somesuch.
> 
> 
> On Jul 10, 2007, at 9:08 AM, Ralph H Castain wrote:
> 
>> Actually, I was talking specifically about configuration at build
>> time. I
>> realize there are trade-offs here, and suspect we can find a common
>> ground.
>> The problem with using the options Jeff described is that they require
>> knowledge on the part of the builder as to what environments have
>> had their
>> include files/libraries installed on the file system of this
>> particular
>> machine. And unfortunately, not every component is protected by these
>> "sentinel" variables, nor does it appear possible to do so in a
>> "guaranteed
>> safe" manner.
>> 
>> Note that I didn't say "installed on their machine". In most cases,
>> these
>> alternative environments are not currently installed at all - they
>> are stale
>> files, or were placed on the file system by someone that wanted to
>> look at
>> their documentation, or whatever. The problem is that Open MPI
>> blindly picks
>> them up and attempts to use them, with sometimes disastrous and
>> frequently
>> unpredictable ways.
>> 
>> Hence, the user can be "astonished" to find that an application
>> that worked
>> perfectly yesterday suddenly segfaults today - because someone
>> decided one
>> day, for example, to un-tar the bproc files in a public place where
>> we pick
>> them up, and then someone else (perhaps a sys admin or the user
>> themselves)
>> at some later time rebuilt Open MPI to bring in an update.
>> 
>> Now imagine being a software provider who gets the call about a
>> problem with
>> Open MPI and has to figure out what the heck happened
>> 
>> My suggested solution may not be the best, which is why I put it
>> out there
>> for discussion. One alternative might be for us to instruct sys
>> admins to
>> put MCA params in their default param file that force selection of the
>> proper components for each framework. Thus, someone with an lsf
>> system would
>> enter:  pls=lsf ras=lsf sds=lsf in their config file to ensure that
>> only lsf
>> was used.
>> 
>> The negative to that approach is that we would have to warn
>> everyone any
>> time that list changed (e.g., a new component for a new framework).
>> Another
>> option to help that problem, of course, would be to set one mca
>> param (say
>> something like "enviro=lsf") that we would use internal to Open MPI
>> to set
>> the individual components correctly - i.e., we would hold the list of
>> relevant frameworks internally since (hopefully) we know what they
>> should be
>> for a given environment.
>> 
>> Anyway, I'm glad people are looking at this and suggesting
>> solutions. It is
>> a problem that seems to be biting us recently and may become a
>> bigger issue
>> as the user community grows.
>> 
>> Ralph
>> 
>> 
>> On 7/10/07 6:12 AM, "Bogdan Costescu"
>>  wrote:
>> 
>>> On Tue, 10 Jul 2007, Jeff Squyres wrote:
>>> 
 Do either of these work for you?
>>> 
>>> Will report back in a bit, I'm now in the middle of an OS upgrade on
>>> the cluster.
>>> 
>>> But my question was more like: is this a configuration that should
>>> theoretically work ? Or in other words, are there known dependencies
>>> on rsh that would make a rsh-less build not work or work with reduced
>>> functionality ?
>>> 
 Most batch systems today set a sentinel environment variable that we
 check for.
>>> 
>>> I think that we talk about slightly different things - my impression
>>> was that the OP was asking about detection at config time, while your
>>> statements make perfect sense to me if they are relative to detection
>>> at run-time. If the OP was indeed asking about run-time detection,
>>> then I apologize for the time you wasted on reading and replying
>>> to my
>>> questions...
>>> 
 That's what the compile-time vs. run-time detection and selection is
 supposed to be for.
>>> 
>>> Yes, I understand that, it's the same type of mechanism as in LAM/MPI
>>> which it's not that foreign to me ;-)
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 




Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Jeff Squyres

Point taken.

Is this an accurate summary?

1. "Best practices" should be documented, to include sysadmins  
specifically itemizing what components should be used on their  
systems (e.g., in an environment variable or the system-wide MCA  
parameters file).


2. It may be useful to have some high-level parameters to specify a  
specific run-time environment, since ORTE has multiple, related  
frameworks (e.g., RAS and PLS).  E.g., "orte_base_launcher=tm", or  
somesuch.



On Jul 10, 2007, at 9:08 AM, Ralph H Castain wrote:

Actually, I was talking specifically about configuration at build  
time. I
realize there are trade-offs here, and suspect we can find a common  
ground.

The problem with using the options Jeff described is that they require
knowledge on the part of the builder as to what environments have  
had their
include files/libraries installed on the file system of this  
particular

machine. And unfortunately, not every component is protected by these
"sentinel" variables, nor does it appear possible to do so in a  
"guaranteed

safe" manner.

Note that I didn't say "installed on their machine". In most cases,  
these
alternative environments are not currently installed at all - they  
are stale
files, or were placed on the file system by someone that wanted to  
look at
their documentation, or whatever. The problem is that Open MPI  
blindly picks
them up and attempts to use them, with sometimes disastrous and  
frequently

unpredictable ways.

Hence, the user can be "astonished" to find that an application  
that worked
perfectly yesterday suddenly segfaults today - because someone  
decided one
day, for example, to un-tar the bproc files in a public place where  
we pick
them up, and then someone else (perhaps a sys admin or the user  
themselves)

at some later time rebuilt Open MPI to bring in an update.

Now imagine being a software provider who gets the call about a  
problem with

Open MPI and has to figure out what the heck happened

My suggested solution may not be the best, which is why I put it  
out there
for discussion. One alternative might be for us to instruct sys  
admins to

put MCA params in their default param file that force selection of the
proper components for each framework. Thus, someone with an lsf  
system would
enter:  pls=lsf ras=lsf sds=lsf in their config file to ensure that  
only lsf

was used.

The negative to that approach is that we would have to warn  
everyone any
time that list changed (e.g., a new component for a new framework).  
Another
option to help that problem, of course, would be to set one mca  
param (say
something like "enviro=lsf") that we would use internal to Open MPI  
to set

the individual components correctly - i.e., we would hold the list of
relevant frameworks internally since (hopefully) we know what they  
should be

for a given environment.

Anyway, I'm glad people are looking at this and suggesting  
solutions. It is
a problem that seems to be biting us recently and may become a  
bigger issue

as the user community grows.

Ralph


On 7/10/07 6:12 AM, "Bogdan Costescu"
 wrote:


On Tue, 10 Jul 2007, Jeff Squyres wrote:


Do either of these work for you?


Will report back in a bit, I'm now in the middle of an OS upgrade on
the cluster.

But my question was more like: is this a configuration that should
theoretically work ? Or in other words, are there known dependencies
on rsh that would make a rsh-less build not work or work with reduced
functionality ?


Most batch systems today set a sentinel environment variable that we
check for.


I think that we talk about slightly different things - my impression
was that the OP was asking about detection at config time, while your
statements make perfect sense to me if they are relative to detection
at run-time. If the OP was indeed asking about run-time detection,
then I apologize for the time you wasted on reading and replying  
to my

questions...


That's what the compile-time vs. run-time detection and selection is
supposed to be for.


Yes, I understand that, it's the same type of mechanism as in LAM/MPI
which it's not that foreign to me ;-)



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Jeff Squyres

On Jul 10, 2007, at 9:51 AM, Brian Barrett wrote:


Actually, there are --with-slurm/--without-slurm options. We
default to
building slurm support automatically on linux and aix, but not on
other
platforms.


On a mostly unrelated note...  We should probably also now build the
SLURM component for OS X, since SLURM is now available for OS X as
well.  And probably should also check for SLURM's srun and build if
we find it even if we aren't on Linux, AIX, or OS X.


Hah.  So SLURM isn't really a SLURM (Simple Linux Utility for  
Resource Management) after all, eh?  :-)


Point noted, though -- someone file a feature enhancement...

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Brian Barrett

On Jul 10, 2007, at 7:09 AM, Tim Prins wrote:


Jeff Squyres wrote:

2. The "--enable-mca-no-build" option takes a comma-delimited list of
components that will then not be built.  Granted, this option isn't
exactly intuitive, but it was the best that we could think of at the
time to present a general solution for inhibiting the build of a
selected list of components.  Hence, "--enable-mca-no-build=pls-
slurm,ras-slurm" would inhibit building the SLURM RAS and PLS
components (note that the SLURM components currently do not require
any additional libraries, so a) there is no corresponding --with 
[out]-

slurm option, and b) they are usually always built).


Actually, there are --with-slurm/--without-slurm options. We  
default to
building slurm support automatically on linux and aix, but not on  
other

platforms.


On a mostly unrelated note...  We should probably also now build the  
SLURM component for OS X, since SLURM is now available for OS X as  
well.  And probably should also check for SLURM's srun and build if  
we find it even if we aren't on Linux, AIX, or OS X.


Brian


Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Tim Prins

Jeff Squyres wrote:
2. The "--enable-mca-no-build" option takes a comma-delimited list of  
components that will then not be built.  Granted, this option isn't  
exactly intuitive, but it was the best that we could think of at the  
time to present a general solution for inhibiting the build of a  
selected list of components.  Hence, "--enable-mca-no-build=pls- 
slurm,ras-slurm" would inhibit building the SLURM RAS and PLS  
components (note that the SLURM components currently do not require  
any additional libraries, so a) there is no corresponding --with[out]- 
slurm option, and b) they are usually always built).


Actually, there are --with-slurm/--without-slurm options. We default to 
building slurm support automatically on linux and aix, but not on other 
platforms.


Tim



Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Ralph H Castain
Actually, I was talking specifically about configuration at build time. I
realize there are trade-offs here, and suspect we can find a common ground.
The problem with using the options Jeff described is that they require
knowledge on the part of the builder as to what environments have had their
include files/libraries installed on the file system of this particular
machine. And unfortunately, not every component is protected by these
"sentinel" variables, nor does it appear possible to do so in a "guaranteed
safe" manner.

Note that I didn't say "installed on their machine". In most cases, these
alternative environments are not currently installed at all - they are stale
files, or were placed on the file system by someone that wanted to look at
their documentation, or whatever. The problem is that Open MPI blindly picks
them up and attempts to use them, with sometimes disastrous and frequently
unpredictable ways.

Hence, the user can be "astonished" to find that an application that worked
perfectly yesterday suddenly segfaults today - because someone decided one
day, for example, to un-tar the bproc files in a public place where we pick
them up, and then someone else (perhaps a sys admin or the user themselves)
at some later time rebuilt Open MPI to bring in an update.

Now imagine being a software provider who gets the call about a problem with
Open MPI and has to figure out what the heck happened

My suggested solution may not be the best, which is why I put it out there
for discussion. One alternative might be for us to instruct sys admins to
put MCA params in their default param file that force selection of the
proper components for each framework. Thus, someone with an lsf system would
enter:  pls=lsf ras=lsf sds=lsf in their config file to ensure that only lsf
was used.

The negative to that approach is that we would have to warn everyone any
time that list changed (e.g., a new component for a new framework). Another
option to help that problem, of course, would be to set one mca param (say
something like "enviro=lsf") that we would use internal to Open MPI to set
the individual components correctly - i.e., we would hold the list of
relevant frameworks internally since (hopefully) we know what they should be
for a given environment.

Anyway, I'm glad people are looking at this and suggesting solutions. It is
a problem that seems to be biting us recently and may become a bigger issue
as the user community grows.

Ralph


On 7/10/07 6:12 AM, "Bogdan Costescu"
 wrote:

> On Tue, 10 Jul 2007, Jeff Squyres wrote:
> 
>> Do either of these work for you?
> 
> Will report back in a bit, I'm now in the middle of an OS upgrade on
> the cluster.
> 
> But my question was more like: is this a configuration that should
> theoretically work ? Or in other words, are there known dependencies
> on rsh that would make a rsh-less build not work or work with reduced
> functionality ?
> 
>> Most batch systems today set a sentinel environment variable that we
>> check for.
> 
> I think that we talk about slightly different things - my impression
> was that the OP was asking about detection at config time, while your
> statements make perfect sense to me if they are relative to detection
> at run-time. If the OP was indeed asking about run-time detection,
> then I apologize for the time you wasted on reading and replying to my
> questions...
> 
>> That's what the compile-time vs. run-time detection and selection is
>> supposed to be for.
> 
> Yes, I understand that, it's the same type of mechanism as in LAM/MPI
> which it's not that foreign to me ;-)




Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Bogdan Costescu

On Tue, 10 Jul 2007, Jeff Squyres wrote:


Do either of these work for you?


Will report back in a bit, I'm now in the middle of an OS upgrade on 
the cluster.


But my question was more like: is this a configuration that should 
theoretically work ? Or in other words, are there known dependencies 
on rsh that would make a rsh-less build not work or work with reduced 
functionality ?



Most batch systems today set a sentinel environment variable that we
check for.


I think that we talk about slightly different things - my impression 
was that the OP was asking about detection at config time, while your 
statements make perfect sense to me if they are relative to detection 
at run-time. If the OP was indeed asking about run-time detection, 
then I apologize for the time you wasted on reading and replying to my 
questions...


That's what the compile-time vs. run-time detection and selection is 
supposed to be for.


Yes, I understand that, it's the same type of mechanism as in LAM/MPI 
which it's not that foreign to me ;-)


--
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: bogdan.coste...@iwr.uni-heidelberg.de


Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Jeff Squyres

On Jul 10, 2007, at 6:07 AM, Bogdan Costescu wrote:


For example, I can readily find machines that are running TM, but
also have LSF and SLURM libraries installed (although those
environments are not "active" - the libraries in some cases are old
and stale, usually present because either someone wanted to look at
them or represent an old installation).


Whatever the outcome of this discussion is, please keep in mind that
this represents an exception rather than the rule. So the common cases
of no batch environment or one batch environment installed should work
as effortless as possible. Furthermore, keep in mind that there are
lots of people who don't compile themselves Open MPI, but rely on
packages compiled by others (Linux distributions, most likely) - so
don't make life harder for those who produce these packages.


FWIW, this is exactly the reason that we have the "auto as much as  
possible" behavior today; back in LAM/MPI, we had the problem that  
[many] users would say "I built LAM, but it doesn't support ABC, even  
though your manual says that it does!  LAM's a piece of junk!"  The  
sad fact is that most people assume that "./configure && make  
install" will do all the Right magic for their system; efforts at  
education seemed to fail.  So we took the path of least resistance  
and assumed that if we can find it on your system, we should use it.   
Specifically: it was more of a support issue than anything else.



1. ... we would only build support for those environments that the
builder specifies, and error out of the build process if multiple
conflicting environments are specified.


I think that Ralf's suggestion (auto unless forced) is better, as it
allows:
- a better chance of finding the environments for people who don't
have too much experience with building Open MPI or hate to RTFM
- control over what is built or not for people who know what they
are doing


This raises the issue of what to do with rsh, but I think we can
handle that one by simply building it wherever possible.


I've been meaning to ask this for some time: is it possible to get rid
of rsh support when building/running in an environment where rsh is
not used (like a TM-based one) ? I'm not trying to achieve security by
doing this (after all, a user can build a separate copy of Open MPI
with rsh support), but just to make sure that the programs that I
build are either using the "blessed" start-up mechanism or error out.


Do either of these work for you?

1. Use the --enable-mca-no-build option as I discussed in a mail a  
few minutes ago.

2. Remove the "mca_pls_rsh.*" files in $prefix/lib/openmpi.

2. We could laboriously go through all the components and ensure  
that they

check in their selection logic to see if that environment is active.


I might be missing something in the design of batch systems or
software in general, but how do you decide that an environment is
active or not ?


Most batch systems today set a sentinel environment variable that we  
check for.



Can a library check if it's being used in a program ?
Or if that program actually runs ? And if a configuration file exists,
does it mean that the environment is actually active ?


We do not generally assume that the presence of a plugin means that  
that plugin can run in the current environment.  I thought that all  
framework selection logic was adapted to this philosophy, but  
apparently Ralph is indicating that some do not.  :-)



How to deal
with the case where there are several versions of the same batch
system installed, all using the same configuration files and therefore
being ready to run ?


We assume that Open MPI was built compiling/linking against the Right  
version.  There's not much else we can do if you build against the  
Wrong version.



And how about the case where there is a machine
reserved for compilations, where libraries are made available but
there is no batch system active ?


That's what the compile-time vs. run-time detection and selection is  
supposed to be for.  The presence of an OMPI component at run-time is  
not supposed to mean that it can run; it's supposed to be queried and  
the component can do whatever checks it wants to see if it is  
supposed to run, and then report "Yes, I can run" / "No, I cannot  
run" back to Open MPI.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Ralf Wildenhues
* Jeff Squyres wrote on Tue, Jul 10, 2007 at 01:28:40PM CEST:
> On Jul 10, 2007, at 2:42 AM, Ralf Wildenhues wrote:
> 
> >> 1. The most obvious one (to me, at least) is to require that  
> >> people provide
> >> "--with-xx" when they build the system.
> >
> > I'll throw in another one for good measure: If --with-xx is given,
> > build with the component.  If --without-xx is given, disable it.
> > If neither is given, do as you currently do: enable it if you find
> > suitable libraries.
> 
> FWIW, we have this already:
[...]

Ah good, I must confess I wasn't aware of this existing functionality
(I don't need it myself).  Thanks.

> > In case the number of components gets too large, have a switch to
> > turn off automatic discovery even in the nonpresence of --with* flags.
> 
> Did you mean the equivalent of the --enable-mca-no-build switch, or  
> disable *all* automatic discovery?

I meant: disable all automatic discovery.  But you guys are in a much
better position to decide which way is more useful.  All I wanted is for
you to be aware of existing possibilities (which you are, but that
wasn't obvious to me).

Cheers,
Ralf


Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Jeff Squyres

On Jul 10, 2007, at 2:42 AM, Ralf Wildenhues wrote:

1. The most obvious one (to me, at least) is to require that  
people provide

"--with-xx" when they build the system.


I'll throw in another one for good measure: If --with-xx is given,
build with the component.  If --without-xx is given, disable it.
If neither is given, do as you currently do: enable it if you find
suitable libraries.


FWIW, we have this already:

1. If a particular component needs some additional libraries such  
that we added a "--with-foo" switch to specify where those libraries  
can be found, there is also an implicit "--without-foo" switch that  
will disable that component.  E.g., "--without-tm" will inhibit the  
building of the TM RAS and PLS components.


2. The "--enable-mca-no-build" option takes a comma-delimited list of  
components that will then not be built.  Granted, this option isn't  
exactly intuitive, but it was the best that we could think of at the  
time to present a general solution for inhibiting the build of a  
selected list of components.  Hence, "--enable-mca-no-build=pls- 
slurm,ras-slurm" would inhibit building the SLURM RAS and PLS  
components (note that the SLURM components currently do not require  
any additional libraries, so a) there is no corresponding --with[out]- 
slurm option, and b) they are usually always built).



In case the number of components gets too large, have a switch to
turn off automatic discovery even in the nonpresence of --with* flags.


Did you mean the equivalent of the --enable-mca-no-build switch, or  
disable *all* automatic discovery?  I'm not sure that disabling all  
automatic discovery will be useful -- you'd have to specifically list  
each component that would be built, and that list would be pretty  
darn long...


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Bogdan Costescu

On Mon, 9 Jul 2007, Ralph Castain wrote:

For example, I can readily find machines that are running TM, but 
also have LSF and SLURM libraries installed (although those 
environments are not "active" - the libraries in some cases are old 
and stale, usually present because either someone wanted to look at 
them or represent an old installation).


Whatever the outcome of this discussion is, please keep in mind that 
this represents an exception rather than the rule. So the common cases 
of no batch environment or one batch environment installed should work 
as effortless as possible. Furthermore, keep in mind that there are 
lots of people who don't compile themselves Open MPI, but rely on 
packages compiled by others (Linux distributions, most likely) - so 
don't make life harder for those who produce these packages.


1. ... we would only build support for those environments that the 
builder specifies, and error out of the build process if multiple 
conflicting environments are specified.


I think that Ralf's suggestion (auto unless forced) is better, as it 
allows:
- a better chance of finding the environments for people who don't 
have too much experience with building Open MPI or hate to RTFM
- control over what is built or not for people who know what they 
are doing


This raises the issue of what to do with rsh, but I think we can 
handle that one by simply building it wherever possible.


I've been meaning to ask this for some time: is it possible to get rid 
of rsh support when building/running in an environment where rsh is 
not used (like a TM-based one) ? I'm not trying to achieve security by 
doing this (after all, a user can build a separate copy of Open MPI 
with rsh support), but just to make sure that the programs that I 
build are either using the "blessed" start-up mechanism or error out.



2. We could laboriously go through all the components and ensure that they
check in their selection logic to see if that environment is active.


I might be missing something in the design of batch systems or 
software in general, but how do you decide that an environment is 
active or not ? Can a library check if it's being used in a program ? 
Or if that program actually runs ? And if a configuration file exists, 
does it mean that the environment is actually active ? How to deal 
with the case where there are several versions of the same batch 
system installed, all using the same configuration files and therefore 
being ready to run ? And how about the case where there is a machine 
reserved for compilations, where libraries are made available but 
there is no batch system active ?


--
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: bogdan.coste...@iwr.uni-heidelberg.de


Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Ralf Wildenhues
Hello Ralph,

* Ralph Castain wrote on Tue, Jul 10, 2007 at 03:51:06AM CEST:
> 
> The problem is that our Open MPI build system automatically detects the
> presence of those libraries, builds the corresponding components, and then
> links those libraries into our system. Unfortunately, this causes two
> side-effects:
[...]
> A couple of solutions come immediately to mind:
> 
> 1. The most obvious one (to me, at least) is to require that people provide
> "--with-xx" when they build the system.

I'll throw in another one for good measure: If --with-xx is given,
build with the component.  If --without-xx is given, disable it.
If neither is given, do as you currently do: enable it if you find
suitable libraries.

In case the number of components gets too large, have a switch to
turn off automatic discovery even in the nonpresence of --with* flags.

It may be a bit more work on the Open MPI configury, but it may be
more convenient for your users.

2 cents from somebody who's not going to have to implement it.  ;-)

Cheers,
Ralf


[OMPI devel] Multi-environment builds

2007-07-09 Thread Ralph Castain
Yo all

I have been working on adding/clarifying support for several environments
and have encountered a problem that appears to be fairly common out there.
Namely, machines that have - over the course of history or for specific
reasons - installed libraries to support multiple environments. For example,
I can readily find machines that are running TM, but also have LSF and SLURM
libraries installed (although those environments are not "active" - the
libraries in some cases are old and stale, usually present because either
someone wanted to look at them or represent an old installation).

The problem is that our Open MPI build system automatically detects the
presence of those libraries, builds the corresponding components, and then
links those libraries into our system. Unfortunately, this causes two
side-effects:

1. we wind up building and loading a bunch of components that we cannot use
- which impacts memory footprint; and

2. not every component in every framework runs some library function to
determine if that environment is actually active. Hence, our selection logic
can sometimes get confused due to conflicting priorities, resulting in the
selection of components that cause the system to crash

A couple of solutions come immediately to mind:

1. The most obvious one (to me, at least) is to require that people provide
"--with-xx" when they build the system. Instead of automatically detecting
an include file and library, and then deciding that the existence of those
files dictates that we build support for that environment, we would only
build support for those environments that the builder specifies, and error
out of the build process if multiple conflicting environments are specified.
This raises the issue of what to do with rsh, but I think we can handle that
one by simply building it wherever possible.

2. We could laboriously go through all the components and ensure that they
check in their selection logic to see if that environment is active. This
still causes libraries to be loaded for nothing, but keeps the automatic
nature of the build system. We would have to deal with those environments
that may not have a "safe" function we can call to see if they are "alive",
or have old/stale libraries that may have differing behavior in their APIs,
but perhaps those are few enough to not be a big problem.

Any thoughts on this? It seems like we should solve this as it is becoming
more prevalent (at least on the machines I test on).

Ralph