[OMPI users] Specifying slots with relative host indexing

2022-12-02 Thread Adams, Brian M via users
We (Dakota project at Sandia) have gotten a lot of mileage with “tiling” 
multiple MPI executions within a SLURM allocation using the relative host 
indexing options, mpirun -host +n2,+n3, for instance. (Thanks for the feature!) 
However, it’s been almost exclusively with openmpi-1.x.

I’m attempting to use the relative host indexing feature on a CTS-1/TOSS3 
machine under SLURM with an openmpi-4.1.1 application. What I’m seeing is that 
(I believe in contrast to 3.x and earlier), the default number of slots per 
node is no longer taken from SLURM_TASKS_PER_NODE or available cores when using 
the relative host indexing and I get the attached (helpful) error.

From the message and reading docs I understand the default is to assume one 
slot N=1 if not specified. It seems I can work around this with any of:
  -host +n0, +n0, ..., +n0   # (${SLURM_TASKS_PER_NODE} or num_mpi_tasks times 
if less than cores per node)
  -host mz52:${SLURM_TASKS_PER_NODE} # or num_mpi_tasks times
  --oversubscribe

But -host +n0:${SLURM_TASKS_PER_NODE} does not work. Should it?

Our typical use cases might look like 2 nodes, 16 cores each and we want to 
start up 4 mpi jobs each on 8 cores, e.g., if we don’t worry about pinning 
tasks, our dispatch script is effectively doing:
  mpirun -n 8 --bind-to none -host +n0 mpi_hello.exe config0
  mpirun -n 8 --bind-to none -host +n0 mpi_hello.exe config1
  mpirun -n 8 --bind-to none -host +n1 mpi_hello.exe config2
  mpirun -n 8 --bind-to none -host +n1 mpi_hello.exe config3

Is there a way I can specify the slots when using the relative node indexing? 
Or ideally, another way to do this where I wouldn’t have to worry about the 
slots at all and have it default to cores per node or from the slurm job 
configuration? (I’d rather not use hostfiles or hostnames if possible as the 
relative node command line options integrate super smoothly.) If not 
command-line, even an environment variable, e.g., OMPI_MCA_rmaps_*, might not 
be too hard to build into the workflow.

Thanks for any insights,
Brian



$ mpirun -n 2 -host +n0 /bin/hostname

--

There are not enough slots available in the system to satisfy the 2

slots that were requested by the application:



  /bin/hostname



Either request fewer slots for your application, or make more slots

available for use.



A "slot" is the Open MPI term for an allocatable unit where we can

launch a process.  The number of slots available are defined by the

environment in which Open MPI processes are run:



  1. Hostfile, via "slots=N" clauses (N defaults to number of

 processor cores if not provided)

  2. The --host command line parameter, via a ":N" suffix on the

 hostname (N defaults to 1 if not provided)

  3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)

  4. If none of a hostfile, the --host command line parameter, or an

 RM is present, Open MPI defaults to the number of processor cores



In all the above cases, if you want Open MPI to default to the number

of hardware threads instead of the number of processor cores, use the

--use-hwthread-cpus option.



Alternatively, you can use the --oversubscribe option to ignore the

number of available slots when deciding the number of processes to

launch.



fff



Re: [OMPI users] Multiple mpiexec's within a job (schedule within a scheduled machinefile/job allocation)

2009-07-30 Thread Adams, Brian M
Thanks Ralph, I wasn't aware of the relative indexing or sequential mapper 
capabilities.  I will check those out and report back if I still have a feature 
request. -- Brian


From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Ralph Castain
Sent: Thursday, July 30, 2009 12:26 PM
To: Open MPI Users
Subject: Re: [OMPI users] Multiple mpiexec's within a job (schedule within a 
scheduled machinefile/job allocation)


On Jul 30, 2009, at 11:49 AM, Adams, Brian M wrote:

Apologies if I'm being confusing; I'm probably trying to get at atypical use 
cases.  M and N  need not correspond to the number of nodes/ppn nor ppn/nodes 
available.  By node vs. slot doesn't much matter, as long as in the end I don't 
oversubscribe any node.  By slot might be good for efficiency in some apps, but 
I can't make a general case for it.

I think what you proposed offers some help in the case where N is an integer 
multiple of the number of available nodes, but perhaps not in other cases.  I 
must be missing something here, so instead of being fully general, perhaps 
consider a  specific case.  Suppose we have 4 nodes, 8 ppn (32 slots is I think 
the ompi language).  I might want to schedule, for example

1. M=2 simultaneous N=16 processor jobs: Here I believe what you suggested will 
work since N is a multiple of the available number of nodes.  I could use 
either npernode 4 or just bynode and I think get the same result: an even 
distribution of tasks.  (similar applies to, e.g., 8x4, 4x8)

Yes, agreed


2. M=16 simultaneous N=2 processor jobs: it seems if I use bynode or npernode, 
I would end up with 16 processes on each of the first two nodes (similar 
applies to, e.g., 32x1 or 10x3).  Scheduling many small jobs is a common 
problem for us.

3. M=3 simultaneous, N=10 processor jobs: I think we'd end up with this 
distribution (where A-D are nodes and 0-2 jobs)

A 0 0 0 1 1 1 2 2 2
B 0 0 0 1 1 1 2 2 2
C 0 0   1 1   2 2
D 0 0   1 1   2 2

where A and B are over-subscribed and there are more than the two unused slots 
I'd expect in the whole allocation.

Again, I can manage all these via a script that partitions the machine files, 
just wondering which scenarios OpenMPI can manage.


Have you looked at the relative indexing in 1.3.3? You could specify any of 
these in relative index terms, and have one "hostfile" that would support 16x2 
operations. This would work then for any allocation.

Your launch script could even just do it, something like this:

mpirun -n 2 -host +n0:1,+n1:1 app
mpirun -n 2 -host +n0:2,+n1:2 app

etc. Obviously, you could compute the relative indexing and just stick it in as 
required.

Likewise, you could use the new "seq" (sequential) mapper to achieve any 
desired layout, again utilizing relative indexing to avoid having to create a 
special hostfile for each run.

Note that in all cases, you can specify a -n N that will tell OMPI to only 
execute N processes, regardless of what is in the sequential mapper file or 
-host.

If none of those work well, please let me know. I'm happy to create the 
required capability as I'm sure LANL will use it too (know of several similar 
cases here, but the current options seem okay for them).

Thanks!
Brian

-Original Message-
From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>
[mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Wednesday, July 29, 2009 4:19 PM
To: Open MPI Users
Subject: Re: [OMPI users] Multiple mpiexec's within a job
(schedule within a scheduled machinefile/job allocation)

Oh my - that does take me back a long way! :-)

Do you need these processes to be mapped byslot (i.e., do you
care if the process ranks are sharing nodes)? If not, why not
add "-bynode" to your cmd line?

Alternatively, given the mapping you want, just do

mpirun -npernode 1 application.exe

This would launch one copy on each of your N nodes. So if you
fork M times, you'll wind up with the exact pattern you
wanted. And, as each one exits, you could immediately launch
a replacement without worrying about oversubscription.

Does that help?
Ralph

PS. we dropped that "persistent" operation - caused way too
many problems with cleanup and other things. :-)

On Jul 29, 2009, at 3:46 PM, Adams, Brian M wrote:

Hi Ralph (all),

I'm resurrecting this 2006 thread for a status check.  The
new 1.3.x
machinefile behavior is great (thanks!) -- I can use
machinefiles to
manage multiple simultaneous mpiruns within a single torque
allocation (where the hosts are a subset of $PBS_NODEFILE).
However, this requires some careful management of machinefiles.

I'm curious if OpenMPI now directly supports the behavior I need,
described in general in the quote below.  Specifically,
given a single
PBS/Torque allocation of M*N processors, I will run a
serial program
that will fork M times.  Each of the M forked processes
calls 'mpirun -np N applicatio

Re: [OMPI users] Multiple mpiexec's within a job (schedule within a scheduled machinefile/job allocation)

2009-07-30 Thread Adams, Brian M
Apologies if I'm being confusing; I'm probably trying to get at atypical use 
cases.  M and N  need not correspond to the number of nodes/ppn nor ppn/nodes 
available.  By node vs. slot doesn't much matter, as long as in the end I don't 
oversubscribe any node.  By slot might be good for efficiency in some apps, but 
I can't make a general case for it.

I think what you proposed offers some help in the case where N is an integer 
multiple of the number of available nodes, but perhaps not in other cases.  I 
must be missing something here, so instead of being fully general, perhaps 
consider a  specific case.  Suppose we have 4 nodes, 8 ppn (32 slots is I think 
the ompi language).  I might want to schedule, for example

1. M=2 simultaneous N=16 processor jobs: Here I believe what you suggested will 
work since N is a multiple of the available number of nodes.  I could use 
either npernode 4 or just bynode and I think get the same result: an even 
distribution of tasks.  (similar applies to, e.g., 8x4, 4x8)

2. M=16 simultaneous N=2 processor jobs: it seems if I use bynode or npernode, 
I would end up with 16 processes on each of the first two nodes (similar 
applies to, e.g., 32x1 or 10x3).  Scheduling many small jobs is a common 
problem for us.

3. M=3 simultaneous, N=10 processor jobs: I think we'd end up with this 
distribution (where A-D are nodes and 0-2 jobs)

A 0 0 0 1 1 1 2 2 2
B 0 0 0 1 1 1 2 2 2
C 0 0   1 1   2 2
D 0 0   1 1   2 2

where A and B are over-subscribed and there are more than the two unused slots 
I'd expect in the whole allocation.

Again, I can manage all these via a script that partitions the machine files, 
just wondering which scenarios OpenMPI can manage.

Thanks!
Brian

> -Original Message-
> From: users-boun...@open-mpi.org 
> [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Wednesday, July 29, 2009 4:19 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] Multiple mpiexec's within a job 
> (schedule within a scheduled machinefile/job allocation)
> 
> Oh my - that does take me back a long way! :-)
> 
> Do you need these processes to be mapped byslot (i.e., do you 
> care if the process ranks are sharing nodes)? If not, why not 
> add "-bynode" to your cmd line?
> 
> Alternatively, given the mapping you want, just do
> 
> mpirun -npernode 1 application.exe
> 
> This would launch one copy on each of your N nodes. So if you 
> fork M times, you'll wind up with the exact pattern you 
> wanted. And, as each one exits, you could immediately launch 
> a replacement without worrying about oversubscription.
> 
> Does that help?
> Ralph
> 
> PS. we dropped that "persistent" operation - caused way too 
> many problems with cleanup and other things. :-)
> 
> On Jul 29, 2009, at 3:46 PM, Adams, Brian M wrote:
> 
> > Hi Ralph (all),
> >
> > I'm resurrecting this 2006 thread for a status check.  The 
> new 1.3.x 
> > machinefile behavior is great (thanks!) -- I can use 
> machinefiles to 
> > manage multiple simultaneous mpiruns within a single torque
> > allocation (where the hosts are a subset of $PBS_NODEFILE).   
> > However, this requires some careful management of machinefiles.
> >
> > I'm curious if OpenMPI now directly supports the behavior I need, 
> > described in general in the quote below.  Specifically, 
> given a single 
> > PBS/Torque allocation of M*N processors, I will run a 
> serial program 
> > that will fork M times.  Each of the M forked processes
> > calls 'mpirun -np N application.exe' and blocks until completion.   
> > This seems akin to the case you described of "mpiruns executed in 
> > separate windows/prompts."
> >
> > What I'd like to see is the M processes "tiled" across the 
> available 
> > slots, so all M*N processors are used.  What I see instead 
> appears at 
> > face value to be the first N resources being oversubscribed M times.
> >
> > Also, when one of the forked processes returns, I'd like to 
> be able to 
> > spawn another and have its mpirun schedule on the resources 
> freed by 
> > the previous one that exited.  Is any of this possible?
> >
> > I tried starting an orted (1.3.3, roughly as you suggested 
> below), but 
> > got this error:
> >
> >> orted --daemonize
> > [gy8:25871] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file 
> > runtime/orte_init.c at line 125
> > 
> --
> >  It looks like orte_init failed for some reason; your parallel 
> > process is likely to abort.  There are many reasons that a parallel 

Re: [OMPI users] Multiple mpiexec's within a job (schedule within a scheduled machinefile/job allocation)

2009-07-29 Thread Adams, Brian M
Hi Ralph (all),

I'm resurrecting this 2006 thread for a status check.  The new 1.3.x 
machinefile behavior is great (thanks!) -- I can use machinefiles to manage 
multiple simultaneous mpiruns within a single torque allocation (where the 
hosts are a subset of $PBS_NODEFILE).  However, this requires some careful 
management of machinefiles.

I'm curious if OpenMPI now directly supports the behavior I need, described in 
general in the quote below.  Specifically, given a single PBS/Torque allocation 
of M*N processors, I will run a serial program that will fork M times.  Each of 
the M forked processes calls 'mpirun -np N application.exe' and blocks until 
completion.  This seems akin to the case you described of "mpiruns executed in 
separate windows/prompts."

What I'd like to see is the M processes "tiled" across the available slots, so 
all M*N processors are used.  What I see instead appears at face value to be 
the first N resources being oversubscribed M times.  

Also, when one of the forked processes returns, I'd like to be able to spawn 
another and have its mpirun schedule on the resources freed by the previous one 
that exited.  Is any of this possible?

I tried starting an orted (1.3.3, roughly as you suggested below), but got this 
error:

> orted --daemonize
[gy8:25871] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file 
runtime/orte_init.c at line 125
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_base_select failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--
[gy8:25871] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file 
orted/orted_main.c at line 323

I spared the debugging info as I'm not even sure this is a correct invocation...

Thanks for any suggestions you can offer!
Brian
--
Brian M. Adams, PhD (bria...@sandia.gov)
Optimization and Uncertainty Quantification
Sandia National Laboratories, Albuquerque, NM
http://www.sandia.gov/~briadam


> From: Ralph Castain (rhc_at_[hidden])
> Date: 2006-12-12 00:46:59
> 
> Hi Chris
> 
> 
> Some of this is doable with today's codeand one of these 
> behaviors is not. :-(
> 
> 
> Open MPI/OpenRTE can be run in "persistent" mode - this 
> allows multiple jobs to share the same allocation. This works 
> much as you describe (syntax is slightly different, of 
> course!) - the first mpirun will map using whatever mode was 
> requested, then the next mpirun will map starting from where 
> the first one left off.
> 
> 
> I *believe* you can run each mpirun in the background. 
> However, I don't know if this has really been tested enough 
> to support such a claim. All testing that I know about 
> to-date has executed mpirun in the foreground - thus, your 
> example would execute sequentially instead of in parallel.
> 
> 
> I know people have tested multiple mpirun's operating in 
> parallel within a single allocation (i.e., persistent mode) 
> where the mpiruns are executed in separate windows/prompts. 
> So I suspect you could do something like you describe - just 
> haven't personally verified it.
> 
> 
> Where we definitely differ is that Open MPI/RTE will *not* 
> block until resources are freed up from the prior mpiruns. 
> Instead, we will attempt to execute each mpirun immediately - 
> and will error out the one(s) that try to execute without 
> sufficient resources. I imagine we could provide the kind of 
> "flow control" you describe, but I'm not sure when that might happen.
> 
> 
> I am (in my copious free time...haha) working on an 
> "orteboot" program that will startup a virtual machine to 
> make the persistent mode of operation a little easier. For 
> now, though, you can do it by:
> 
> 
> 1. starting up the "server" using the following command:
> orted --seed --persistent --scope public [--universe foo]
> 
> 
> 2. do your mpirun commands. They will automagically find the 
> "server" and connect to it. If you specified a universe name 
> when starting the server, then you must specify the same 
> universe name on your mpirun commands.
> 
> 
> When you are done, you will have to (unfortunately) manually 
> "kill" the server and remove its session directory. I have a 
> program called "ortehalt"
> in the trunk that will do this cleanly for you, but it isn't 
> yet in the release distributions. You are welcome to use it, 
> though, if you are working with the trunk - I can't promise 
> it is bulletproof yet, but it seems to be working.
> 
> 
> Ralph
> 
> 
> On 12/11/06 8:07 PM, "Maestas, Christopher Daniel" 
> 
> 

Re: [OMPI users] OpenMPI runtime-specific environment variable?

2008-10-24 Thread Adams, Brian M
> -Original Message-
> From: users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Wednesday, October 22, 2008 8:02 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] OpenMPI runtime-specific
> environment variable?
>
> What I think Brian is trying to do is detect that his code
> was not launched by mpirun -prior- to calling MPI_Init so he
> can decide if he wants to do that at all. Checking for the
> enviro params I suggested is a good way to do it - I'm not
> sure that adding another one really helps. The key issue is
> having something he can rely on, and I think the ones I
> suggested are probably his best bet for OMPI.

Just closing the loop on this thread -- again thanks for all the good 
discussion.  Ralph's comment here is exactly right.  On some platforms, e.g., 
AIX/Poe/IBM MPI,  we've historically been bitten as it's not safe to call 
MPI_Init when not running inside a job submitted to the queue and running in an 
MPI environment.  I realize it's our (perhaps stubborn) choice to distribute 
MPI-linked binaries that have to work correctly in both serial (not just mpirun 
-np 1) and MPI-parallel mdoes, and that complicates things.

Brian




Re: [OMPI users] OpenMPI runtime-specific environment variable?

2008-10-21 Thread Adams, Brian M
> I'm not sure I understand the problem. The ale3d program from
> LLNL operates exactly as you describe and it can be built
> with mpich, lam, or openmpi.

Hi Doug,

I'm not sure what reply would be most helpful, so here's an attempt.

It sounds like we're on the same page with regard to the desired behavior.  
Historically, we've been able to detect serial vs. parallel launch of the 
binary foo, with a whole host of implementations, including those you mention, 
as well as some vendor-specific implementations (possibly including DEC/OSF, 
SGI, Sun, and AIX/poe, though I don't know all the details).  We typically 
distinguish serial from parallel executions on the basis of environment 
variables set only in the MPI runtime environment.  I was just trying to 
ascertain what variable would be best to test for in an OpenMPI environment, 
and I think Ralph helped with that.

If the ale3d code takes a different approach, I'd love to hear about it, 
off-list if necessary.

Brian




Re: [OMPI users] OpenMPI runtime-specific environment variable?

2008-10-21 Thread Adams, Brian M
> -Original Message-
> From: users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf Of Reuti
> Sent: Tuesday, October 21, 2008 11:36 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] OpenMPI runtime-specific
> environment variable?
>
> Hi,
>
> Am 21.10.2008 um 18:52 schrieb Ralph Castain:
>
> > On Oct 21, 2008, at 10:37 AM, Adams, Brian M wrote:
> >
> >> Doug is right that we could use an additional command line flag to
> >> indicate MPI runs, but at this point, we're trying to hide
> that from
> >> the user, such that all they have to do is run the binary vs.
> >> orterun/mpirun the binary and we detect whether it's a serial or
> >> parallel run.
>
> And when you have this information you decide for your user,
> whether to use mpirun (and the correct version to use) or
> just the plain binary?

I might have created some confusion here too.  The goal is to build an 
MPI-enabled binary 'foo' which a user may invoke as

(1) ./foo
-OR-
(2) mpirun -np 4 ./foo

The binary foo then determines at run-time whether it is to run in (1) serial, 
where MPI_Init will never be called; or (2) parallel, calling MPI_Init and so 
on.  This is a historical behavior which we need to preserve, at least for our 
present software release.

> You are making something like "strings the_binary" and grep
> for indications of the compilation type? For the standard
> Open MPI with shared libraries a "ldd the_binary" might
> reveal some information.

Hadn't thought to do that actually, since it addresses a slightly different 
problem than I propose above.  Thanks for the suggestion.  This is another 
possibility if instead of doing this detection directly in our binary, we 
decide to change to a wrapper script approach.

In any case, I appreciate all the discussion -- I believe I have a reasonable 
path forward using a combination of pre-processor defines that the OMPI 
wrappers and headers make with the runtime environment variables Ralph 
suggested (I'll just check for both the <1.3 and >= 1.3 environment cases).

Brian




Re: [OMPI users] OpenMPI runtime-specific environment variable?

2008-10-21 Thread Adams, Brian M
> -Original Message-
> From: users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Tuesday, October 21, 2008 10:53 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] OpenMPI runtime-specific
> environment variable?
>
>
> On Oct 21, 2008, at 10:37 AM, Adams, Brian M wrote:
>
> > think it will help here.  While MPICH implementations
> typically left
> > args like -p4pg -p4amslave on the command line, I don't see that
> > coming from OpenMPI-launched jobs.
>
> Really? That doesn't sound right - we don't touch the
> arguments to your application. We test that pretty regularly
> and I have always seen the args come through.
>
> Can you provide an example of where it isn't?

Sorry Ralph, I caused some confusion here.  You are correct that OMPI does 
nothing to muck with any arguments passed to the user application -- they come 
through as expected.

I meant that when launching a binary with MPICH mpirun, it would _append_ to 
the command line arguments things including
-p4pg and -p4wd, possibly -p4amslave, which we could detect at run-time in the 
app context to determine that the app had indeed been launched with mpirun.

Brian




Re: [OMPI users] OpenMPI runtime-specific environment variable?

2008-10-21 Thread Adams, Brian M
Thank you Doug, Ralph, and Mattijs for the helpful input.  Some replies to 
Ralph's message and a question inlined here. -- Brian

> -Original Message-
> From: users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Monday, October 20, 2008 5:38 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] OpenMPI runtime-specific
> environment variable?
>
> It depends on what you are trying to do. If you intend to use
> this solely with OMPI 1.2.x, then you could use some of
> those. However, they are risky as they are in general
> internal to OMPI's infrastructure - and thus, subject to
> change from one release to another.

Ok, sounds like the variables I called out aren't good choices.

> We do have some environmental variables that we guarantee to
> be "stable" across releases. You could look for
> OMPI_COMM_WORLD_SIZE, or OMPI_UNIVERSE_SIZE (there are a
> couple of others as well, but any of these would do).

Q: I just wrote a simple C++ program, including mpi.h and getenv to check for 
these two variables and compiled with the mpicxx wrapper (openmpi-1.2.5 as 
distributed with RHEL5).  When running this program with orterun, these 
variables come back NULL from the environment.  The same is true if I just 
orterun a shell script to dump the environment to a file.  Am I making an 
obvious mistake here?

> However, these will only tell you that the job was launched
> via OMPI's mpirun - it won't tell you that it was a parallel
> job. It could be a serial job that just happened to be
> launched by mpirun. For example, we set the same
> environmental params when we execute "mpirun hostname" -
> mpirun has no way of knowing the parallel vs serial nature of
> the app it is launching, so it sets all the variables
> required by a parallel job just-in-case.

Understood -- we have some other logic to (hopefully) handle this case.

> Likewise, these variables will only tell you it is a parallel
> job launched by OMPI. If you use another MPI (e.g., MVAPICH),
> none of these would be set - yet it would still be a parallel job.

Also understood.  While ultimately we'll probably redesign the code base, right 
now we have tests specific to each MPI implementation for which we have known 
use cases.  So adding an OpenMPI-specific test is actually what I'm after in 
the short term.

> So it boils down to your particular mode of operation. If you
> only run with OMPI, and you would only launch via OMPI's
> mpirun if you wanted to execute in a parallel mode, then you
> could look for either of those two environmental params.
> Otherwise, you may have to do as Doug suggests and create
> your own "flag".

Doug is right that we could use an additional command line flag to indicate MPI 
runs, but at this point, we're trying to hide that from the user, such that all 
they have to do is run the binary vs. orterun/mpirun the binary and we detect 
whether it's a serial or parallel run.

As for parsing the command line $argv[0] before MPI_Init, I don't think it will 
help here.  While MPICH implementations typically left args like -p4pg 
-p4amslave on the command line, I don't see that coming from OpenMPI-launched 
jobs.

Brian




[OMPI users] OpenMPI runtime-specific environment variable?

2008-10-20 Thread Adams, Brian M
I work on an application (DAKOTA) that has opted for single binaries with 
source code to detect serial vs. MPI execution at run-time.  While I realize 
there are many other ways to handle this (wrapper scripts, command-line 
switches, different binaries for serial vs. MPI, etc.), I'm looking for a 
reliable way to detect (in source) whether a binary has been launched in serial 
or with orterun.

We typically do this via detecting environment variables, so the easiest path 
for me would be to know an environment variable present when an application is 
invoked with orterun that is not typically present outside that MPI runtime 
environment.  Some candidates that came up in my particular environment include 
the following, but I don't know if any is a safe bet:

OMPI_MCA_gpr_replica_uri
OMPI_MCA_mpi_paffinity_processor
OMPI_MCA_mpi_yield_when_idle
OMPI_MCA_ns_nds
OMPI_MCA_ns_nds_cellid
OMPI_MCA_ns_nds_jobid
OMPI_MCA_ns_nds_num_procs
OMPI_MCA_ns_nds_vpid
OMPI_MCA_ns_nds_vpid_start
OMPI_MCA_ns_replica_uri
OMPI_MCA_orte_app_num
OMPI_MCA_orte_base_nodename
OMPI_MCA_orte_precondition_transports
OMPI_MCA_pls
OMPI_MCA_ras
OMPI_MCA_rds
OMPI_MCA_rmaps
OMPI_MCA_rmgr
OMPI_MCA_universe

I'd also welcome suggestions for other in-source tests that might reliably 
detect run via orterun.  Thanks!

Brian
--
Brian M. Adams, PhD (bria...@sandia.gov)
Optimization and Uncertainty Estimation
Sandia National Laboratories, Albuquerque, NM
http://www.sandia.gov/~briadam




Re: [OMPI users] Torque and OpenMPI 1.2

2007-12-19 Thread Adams, Brian M
Ralph,

Thanks for the clarification as I'm dealing with workarounds for this at
Sandia as well...

I might have missed this earlier in the dialog, but is this capability
in the SVN trunk right now, or still on the TODO list?

Brian

Brian M. Adams, PhD (bria...@sandia.gov)
Optimization and Uncertainty Estimation
Sandia National Laboratories
P.O. Box 5800, Mail Stop 1318
Albuquerque, NM 87185-1318
Voice: 505-284-8845, FAX: 505-284-2518

> -Original Message-
> From: users-boun...@open-mpi.org 
> [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph H Castain
> Sent: Wednesday, December 19, 2007 2:35 PM
> To: Open MPI Users ; pat.o'bry...@exxonmobil.com
> Cc: Castain, Ralph H. (LANL)
> Subject: Re: [OMPI users] Torque and OpenMPI 1.2
> 
> 
> Open MPI 1.3 will support use of the hostfile and the tm 
> launcher simultaneously. It will work slightly differently, 
> though, with respect to the hostfile:
> 
> 1. PBS_NODEFILE will be read to obtain a complete list of 
> what has been allocated to us
> 
> 2. you will be allowed to provide a hostfile for each 
> app_context as a separate entry to define the hosts to be 
> used for that specific app_context.
> The hosts in your hostfile, however, must be included in the 
> PBS_NODEFILE.
> 
> Basically, the hostfile argument will serve as a filter to 
> the hosts provided via PBS_NODEFILE. We will use the TM 
> launcher (unless, of course, you tell us to do otherwise), so 
> the issues I mentioned before will go away.
> 
> There will be a FAQ entry describing the revised hostfile 
> behavior in some detail. We think the change will help 
> rationalize the behavior so it is more consistent across all 
> the different use-cases people have invented. ;-)
> 
> Hope that helps
> Ralph
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 




[OMPI users] OpenMPI with system call -- openib error on SNL tbird

2007-04-16 Thread Adams, Brian M
Hello,

I am attempting to port Sandia's DAKOTA code from MVAPICH to the default
OpenMPI/Intel environment on Sandia's thunderbird cluster.  I can
successfully build DAKOTA in the default tbird software environment, but
I'm having runtime problems when DAKOTA attempts to make a system call.
Typical output looks like:

[0,1,1][btl_openib_component.c:897:mca_btl_openib_component_progress]
from an64 to: an64 error polling HP CQ with status LOCAL LENGTH ERROR
status number 1 for wr_id 5714048 opcode 0

I'm attaching a tarball containing output from `ompi_info --all` as well
as two simple sample programs with output to demonstrate the problem
behavior.  I built them in the default tbird MPI environment
(openmpi-1.1.2-ofed-intel-9.1) with 

  mpicc mpi_syscall.c -i_dynamic -o mpi_syscall
  mpicc mpi_nosyscall.c -i_dynamic -o mpi_nosyscall

where `which mpicc` =
/apps/x86_64/mpi/openmpi/intel-9.1/openmpi-1.1.2-ofed/bin/mpicc The
latter has no system call and runs fine on two processors, whereas the
former gives the openib error (not in the attached output, though dumped
to the screen).  The problem exists regardless of whether -i_dynamic is
included.  I am executing from within an interactive 2 processor job
using 

  /apps/x86_64/mpi/openmpi/intel-9.1/openmpi-1.1.2-ofed/bin/mpiexec ->
orterun

I know some OpenMPI developers have access to thunderbird for testing,
but if you require additional information on the build or runtime
environment, please advise and I will attempt to send it along. 

Note:  Both programs run fine with MVAPICH on tbird, and with OpenMPI or
MPICH on my Linux x86_64 SMP workstation.

Thanks,
Brian

Brian M. Adams, PhD (bria...@sandia.gov) 
Optimization and Uncertainty Estimation 
Sandia National Laboratories 
P.O. Box 5800, Mail Stop 1318 
Albuquerque, NM 87185-1318
Voice: 505-284-8845, FAX: 505-284-2518






ompi_tbird_system.tgz
Description: ompi_tbird_system.tgz