> On May 15, 2019, at 7:18 PM, Adam Sylvester via users
> wrote:
>
> Up to this point, I've been running a single MPI rank per physical host
> (using multithreading within my application to use all available cores). I
> use this command:
> mpirun -N 1 --bind-to none --hostfile hosts.txt
>
Hi Lee Ann
I fear so - and assign it to @hoopoepg , @brminich and @yosefe
Ralph
> On May 17, 2019, at 11:14 AM, Riesen, Lee Ann via users
> wrote:
>
> I haven't received a reply to this. Should I submit a bug report? Lee Ann
>
> -
> Lee Ann Riesen, Enterprise and Government Group,
On Jun 21, 2019, at 1:52 PM, Noam Bernstein mailto:noam.bernst...@nrl.navy.mil> > wrote:
On Jun 21, 2019, at 4:45 PM, Ralph Castain mailto:r...@open-mpi.org> > wrote:
Hilarious - I wrote that code and I have no idea who added that option or what
it is supposed to do. I can assure, however,
, at 1:43 PM, Noam Bernstein mailto:noam.bernst...@nrl.navy.mil> > wrote:
On Jun 21, 2019, at 4:04 PM, Ralph Castain via users mailto:users@lists.open-mpi.org> > wrote:
I’m unaware of any “map-to cartofile” option, nor do I find it in mpirun’s help
or man page. Are you seeing it somew
I’m unaware of any “map-to cartofile” option, nor do I find it in mpirun’s help
or man page. Are you seeing it somewhere?
On Jun 21, 2019, at 12:43 PM, Noam Bernstein via users
mailto:users@lists.open-mpi.org> > wrote:
Hi - are there any examples of the cartofile format? Or is there some
Take a look at "man orte_hosts" for a full explanation of how to use hostfile -
/etc/hosts is not a properly formatted hostfile.
You really just want a file that lists the names of the hosts, one per line, as
that is the simplest hostfile.
> On Sep 7, 2019, at 4:23 AM, Sepinoud Azimi via users
Yeah, we do currently require that to be true. Process mapping is distributed
across the daemons - i.e., the daemon on each node independently computes the
map. We have talked about picking up the hostfile on the head node and sending
out the contents, but haven't implemented that yet.
On Aug
I just wanted to address a question to the SGE users and/or developers on this
list. As you may know, we have been developing PMIx for the last few years and
have now integrated it into various RMs. This allows the RMs to directly launch
application processes without going through mpirun and
I'm afraid I cannot replicate this problem on OMPI master, so it could be
something different about OMPI 4.0.1 or your environment. Can you download and
test one of the nightly tarballs from the "master" branch and see if it works
for you?
https://www.open-mpi.org/nightly/master/
Ralph
On
Did you configure Slurm to use PMIx? If so, then you simply need to set the
"--mpi=pmix" or "--mpi=pmix_v2" (depending on which version of PMIx you used)
flag on your srun cmd line so it knows to use it.
If not (and you can't fix it), then you have to explicitly configure OMPI to
use Slurm's
Artem - do you have any suggestions?
On Aug 8, 2019, at 12:06 PM, Jing Gong mailto:gongj...@kth.se> > wrote:
Hi Ralph,
$ Did you remember to add "--mpi=pmix" to your srun cmd line?
On the cluster,
$ srun --mpi=list
srun: MPI types are...
srun: none
srun: openmpi
srun: pmi2
srun: pmix
srun:
ailed
--> Returned value Not found (-13) instead of ORTE_SUCCESS
--
What is the issue?
Thanks a lot.
/Jing
From: users mailto:users-boun...@lists.open-mpi.org> > on behalf of Ra
rent MODEX keys
> are used. It seems like MODEX can not fetch messages in another order
> than it was sent. Is that so?
>
> Not sure how to tell the other processes to not use CMA, while some
> processes are still transmitting their user namespace ID to PROC 0.
>
>
If that works, then it might be possible to include the namespace ID in the
job-info provided by PMIx at startup - would have to investigate, so please
confirm that the modex option works first.
> On Jul 22, 2019, at 1:22 AM, Gilles Gouaillardet via users
> wrote:
>
> Adrian,
>
>
> An
Upgrade to OMPI v4 or at least something in the v3 series. If you continue to
have a problem, then set PMIX_MCA_ptl=tcp in your environment.
On Jul 26, 2019, at 12:12 PM, Kulshrestha, Vipul via users
mailto:users@lists.open-mpi.org> > wrote:
Hi,
I am trying to setup my open-mpi application
I'm afraid I don't know how to advise you on this - you may need to talk to the
Slurm folks. When you start your application with mpirun, we use "srun" to
start our own daemons on the job's nodes. The application processes, however,
are subsequently started by those daemons using our own
I _think_ what the user is saying is that their "hello world" program is
returning rank=0 for all procs when started with mpirun, but not when started
with MPICH's mpiexec.hydra.
The most likely problem is that your "hello" program wasn't built against OMPI
- are you trying to run the same
Difficult to know what to say here. I have no idea what your program does after
validating the license. Does it execute some kind of MPI collective operation?
Does only one proc validate the license and all others just use it?
All I can tell from your output is that the procs all launched okay.
The man page is simply out of date - see
https://github.com/open-mpi/ompi/issues/7095 for further thinking
On Nov 12, 2019, at 1:26 AM, Max Sagebaum via users mailto:users@lists.open-mpi.org> > wrote:
Hello @ all,
Short question: How to select what is the behavior of --output-filename?
Long
It's a different code path, that's all - just a question of what path gets
traversed.
Would you mind posting a little more info on your two use-cases? For example,
do you have a default hostfile telling mpirun what machines to use?
On Sep 25, 2019, at 12:41 PM, Martín Morales
Yes, of course it can - however, I believe there is a bug in the add-hostfile
code path. We can address that problem far easier than moving to a different
interconnect.
On Sep 25, 2019, at 11:39 AM, Martín Morales via users
mailto:users@lists.open-mpi.org> > wrote:
Thanks Steven. So,
Urrr...this problem has been resolved, Howard.
On Jan 29, 2020, at 2:51 PM, Howard Pritchard via users
mailto:users@lists.open-mpi.org> > wrote:
Collin,
A couple of things to try. First, could you just configure without using the
mellanox platform file and see if you can run the app with 100
It is also wise to create a "tmp" directory under your home directory, and
reset TMPDIR to point there. Avoiding use of the system tmpdir is highly
advisable under Mac OS, especially Catalina.
On Feb 6, 2020, at 4:09 PM, Gutierrez, Samuel K. via users
mailto:users@lists.open-mpi.org> > wrote:
Does it work with pbs but not Mellanox? Just trying to isolate the problem.
On Jan 28, 2020, at 6:39 AM, Collin Strassburger via users
mailto:users@lists.open-mpi.org> > wrote:
Hello,
I have done some additional testing and I can say that it works correctly with
gcc8 and no mellanox or pbs
n you send the output of a failed run including your command line.
Josh
On Tue, Jan 28, 2020 at 11:26 AM Ralph Castain via users
mailto:users@lists.open-mpi.org> > wrote:
Okay, so this is a problem with the Mellanox software - copying Artem.
On Jan 28, 2020, at 8:15 AM, Collin S
users mailto:users-boun...@lists.open-mpi.org> > On Behalf Of Ralph Castain via users
Sent: Tuesday, January 28, 2020 11:02 AM
To: Open MPI Users mailto:users@lists.open-mpi.org> >
Cc: Ralph Castain mailto:r...@open-mpi.org> >
Subject: Re: [OMPI users] [External] Re: OMPI returns e
Okay, debug-daemons isn't going to help as we aren't launching any daemons.
This is all one node. So try adding "--mca odls_base_verbose 10 --mca
state_base_verbose 10" to the cmd line and let's see what is going on.
I agree with Josh - neither mpirun nor hostname are invoking the Mellanox
Okay, that nailed it down - the problem is the number of open file descriptors
is exceeding your system limit. I suspect the connection to the Mellanox
drivers is solely due to it also having some descriptors open, and you are just
close enough to the boundary that it causes you to hit it.
See
reut...@siemens.com>
www.sw.siemens.com <http://www.sw.siemens.com/>
From: users mailto:users-boun...@lists.open-mpi.org> > On Behalf Of Ralph Castain via users
Sent: Montag, 6. April 2020 16:32
To: Open MPI Users mailto:users@lists.open-mpi.org> >
Cc: Ralph Castain mailto:r...@open-mp
Currently, mpirun takes that second SIGINT to mean "you seem to be stuck trying
to cleanly abort - just die", which means mpirun exits immediately without
doing any cleanup. The individual procs all commit suicide when they see their
daemons go away, which is why you don't get zombies left
I updated the message to explain the flags (instead of a numerical value) for
OMPI v5. In brief:
#define PRRTE_NODE_FLAG_DAEMON_LAUNCHED 0x01 // whether or not the daemon
on this node has been launched
#define PRRTE_NODE_FLAG_LOC_VERIFIED 0x02 // whether or not the
FWIW: I have replaced those flags in the display option output with their
string equivalent to make interpretation easier. This is available in OMPI
master and will be included in the v5 release.
> On Nov 21, 2019, at 2:08 AM, Peter Kjellström via users
> wrote:
>
> On Mon, 18 Nov 2019
Hi Nathan
Sorry for the long, long delay in responding - no reasonable excuse (just busy,
switching over support areas, etc.). Hopefully, you already found the solution.
You can specify the signals to forward to children using an MCA parameter:
OMPI_MCA_ess_base_forward_signals=SIGINT
should
I'm afraid the short answer is "no" - there is no way to do that today.
> On Mar 30, 2020, at 1:45 PM, Jean-Baptiste Skutnik via users
> wrote:
>
> Hello,
>
> I am writing a wrapper around `mpirun` which requires pre-processing of the
> user's program. To achieve this, I need to isolate the
..
srun: none
srun: pmi2
srun: openmpi
I did launch the job with srun --mpi=pmi2
Does OpenMPI 4 need PMIx specifically?
On 4/23/20 10:23 AM, Ralph Castain via users wrote:
Is Slurm built with PMIx support? Did you tell srun to use it?
On Apr 23, 2020, at 7:00 AM, Prentice Bisbal via users
--mpi=pmi2
>
> Does OpenMPI 4 need PMIx specifically?
>
>
> On 4/23/20 10:23 AM, Ralph Castain via users wrote:
>> Is Slurm built with PMIx support? Did you tell srun to use it?
>>
>>
>>> On Apr 23, 2020, at 7:00 AM, Prentice Bisbal via users
&
Is Slurm built with PMIx support? Did you tell srun to use it?
> On Apr 23, 2020, at 7:00 AM, Prentice Bisbal via users
> wrote:
>
> I'm using OpenMPI 4.0.3 with Slurm 19.05.5 I'm testing the software with a
> very simple hello, world MPI program that I've used reliably for years. When
> I
hy is that? Can I not trust the output
> of --mpi=list?
>
> Prentice
>
> On 4/23/20 10:43 AM, Ralph Castain via users wrote:
>> No, but you do have to explicitly build OMPI with non-PMIx support if that
>> is what you are going to use. In this case, you need to conf
he working node flag (0x11) and the
non-working nodes’ flags (0x13) is the flagPRRTE_NODE_FLAG_LOC_VERIFIED.
What does that imply? The location of the daemon has NOT been verified?
Kurt
From: users mailto:users-boun...@lists.open-mpi.org> > On Behalf Of Ralph Castain via users
Sent: Mo
I'm not sure I understand why you are trying to build CentOS rpms for PMIx,
Slurm, or OMPI - all three are readily available online. Is there some
particular reason you are trying to do this yourself? I ask because it is
non-trivial to do and requires significant familiarity with both the
Try adding --without-psm2 to the PMIx configure line - sounds like you have
that library installed on your machine, even though you don't have omnipath.
On May 12, 2020, at 4:42 AM, Leandro via users mailto:users@lists.open-mpi.org> > wrote:
HI,
I compile it statically to make sure compilers
The following (from what you posted earlier):
$ srun --mpi=list
srun: MPI types are...
srun: none
srun: pmix_v3
srun: pmi2
srun: openmpi
srun: pmix
would indicate that Slurm was built against a PMIx v3.x release. Using OMPI
v4.0.3 with pmix=internal should be just fine so long as you set
Sorry for the incredibly late reply. Hopefully, you have already managed to
find the answer.
I'm not sure what your comm_spawn command looks like, but it appears you
specified the host in it using the "dash_host" info-key, yes? The problem is
that this is interpreted the same way as the "-host
I fear those cards are past end-of-life so far as support is concerned. I'm not
sure if anyone can really advise you on them. It sounds like the fabric is
experiencing failures, but that's just a guess.
On May 8, 2020, at 12:56 PM, Prentice Bisbal via users
mailto:users@lists.open-mpi.org> >
Your use-case sounds more like a workflow than an application - in which case,
you probably should be using PRRTE to execute it instead of "mpirun" as PRRTE
will "remember" the multiple jobs and avoid the overload scenario you describe.
This link will walk you thru how to get and build it:
hemist and not a sysadmin (I miss
a lot a specialized sysadmin in our Department!).
Carlo
Il giorno gio 20 ago 2020 alle ore 18:45 Ralph Castain via users
mailto:users@lists.open-mpi.org> > ha scritto:
Your use-case sounds more like a workflow than an application - in which case,
you pro
I'm not sure where you are looking, but those params are indeed present in the
opal/mca/btl/tcp component:
/*
* Called by MCA framework to open the component, registers
* component parameters.
*/
static int mca_btl_tcp_component_register(void)
{
char* message;
/* register TCP
The messages about the daemons is coming from two different sources. Grid is
saying it was able to spawn the orted - then the orted is saying it doesn't
know how to communicate and fails.
I think the root of the problem lies in the plm output that shows the qrsh it
will use to start the job.
_base).
> Please check with your sys admin to determine the correct location to use.
>
> * compilation of the orted with dynamic libraries when static are required
> (e.g., on Cray). Please check your configure cmd line and consider using
> one of the contrib/platform definitions for your system type.
>
> * an inability to create a connection back to mpirun due to a
> lack of common ne
Afraid I have no real ideas here. Best I can suggest is taking the qrsh cmd
line from the prior debug output and try running it manually. This might give
you a chance to manipulate it and see if you can identify what is causing it an
issue, if anything. Without mpirun executing, the daemons
Be default, OMPI will bind your procs to a single core. You probably want to at
least bind to socket (for NUMA reasons), or not bind at all if you want to use
all the cores on the node.
So either add "--bind-to socket" or "--bind-to none" to your cmd line.
On Aug 3, 2020, at 1:33 AM, John
The Java bindings were added specifically to support the Spark/Hadoop
communities, so I see no reason why you couldn't use them for Akka or whatever.
Note that there are also Python wrappers for MPI at mpi4py that you could build
upon.
There is plenty of evidence out there for a general
Well, we aren't really that picky :-) While I agree with Gilles that we are
unlikely to be able to help you resolve the problem, we can give you a couple
of ideas on how to chase it down
First, be sure to build OMPI with "--enable-debug" and then try adding "--mca
oob_base_verbose 100" to you
Howard - if there is a problem in PMIx that is causing this problem, then we
really could use a report on it ASAP as we are getting ready to release v3.1.6
and I doubt we have addressed anything relevant to what is being discussed here.
On Aug 11, 2020, at 4:35 PM, Martín Morales via users
My apologies - I should have included "--debug-daemons" for the mpirun cmd line
so that the stderr of the backend daemons would be output.
> On Aug 10, 2020, at 10:28 AM, John Duffy via users
> wrote:
>
> Thanks Ralph
>
> I will do all of that. Much appreciated.
Setting aside the known issue with comm_spawn in v4.0.4, how are you planning
to forward stdin without the use of "mpirun"? Something has to collect stdin of
the terminal and distribute it to the stdin of the processes
> On Aug 12, 2020, at 9:20 AM, Alvaro Payero Pinto via users
> wrote:
>
>
Add "--mca pml cm" to your cmd line
On Jul 31, 2020, at 9:54 PM, Supun Kamburugamuve via users
mailto:users@lists.open-mpi.org> > wrote:
Hi all,
I'm trying to setup OpenMPI on a cluster with the Omni-Path network. When I try
the following command it gives an error.
mpirun -n 2 --hostfile
18:29, Ralph Castain via users mailto:users@lists.open-mpi.org> > escribió:
Setting aside the known issue with comm_spawn in v4.0.4, how are you planning
to forward stdin without the use of "mpirun"? Something has to collect stdin of
the terminal and distribute it to the stdin of t
You cannot cascade mpirun cmds like that - the child mpirun picks up envars
that causes it to break. You'd have to either use comm_spawn to start the child
job, or do a fork/exec where you can set the environment to be some pristine
set of values.
> On Jul 11, 2020, at 1:12 PM, John Retterer
Note that you can also resolve it by adding --use-hwthread-cpus to your cmd
line - it instructs mpirun to treat the HWTs as independent cpus so you would
have 4 slots in this case.
> On Jun 8, 2020, at 11:28 AM, Collin Strassburger via users
> wrote:
>
> Hello David,
>
> The slot
While possible, it is highly unlikely that your desktop version is going to be
binary compatible with your cluster...
On Jul 24, 2020, at 9:55 AM, Lana Deere via users mailto:users@lists.open-mpi.org> > wrote:
I have open-mpi 4.0.4 installed on my desktop and my small test programs are
Just a point to consider. OMPI does _not_ want to get in the mode of modifying
imported software packages. That is a blackhole of effort we simply cannot
afford.
The correct thing to do would be to flag Rob Latham on that PR and ask that he
upstream the fix into ROMIO so we can absorb it. We
You want to use the "sequential" mapper and then specify each proc's location,
like this for your hostfile:
host1
host1
host2
host2
host3
host3
host1
host2
host3
and then add "--mca rmaps seq" to your mpirun cmd line.
Ralph
On Dec 21, 2020, at 5:22 AM, Vineet Soni via users
Did you remember to build the Slurm pmi and pmi2 libraries? They aren't built
by default - IIRC, you have to manually go into a subdirectory and do a "make
install" to have them built and installed. You might check the Slurm
documentation for details.
You also might need to add a
t; expected. I just want to make sure that this was the case, and the error
> below wasn't a sign of another issue with the job.
>
> Prentice
>
> On 11/11/20 5:47 PM, Ralph Castain via users wrote:
>> Looks like it is coming from the Slurm PMIx plugin, not OMPI.
>>
&
Looks like it is coming from the Slurm PMIx plugin, not OMPI.
Artem - any ideas?
Ralph
> On Nov 11, 2020, at 10:03 AM, Prentice Bisbal via users
> wrote:
>
> One of my users recently reported a failed job that was using OpenMPI 4.0.4
> compiled with PGI 20.4. There two different errors
That would be very kind of you and most welcome!
> On Nov 14, 2020, at 12:38 PM, Alexei Colin wrote:
>
> On Sat, Nov 14, 2020 at 08:07:47PM +0000, Ralph Castain via users wrote:
>> IIRC, the correct syntax is:
>>
>> prun -host +e ...
>>
>> Thi
Afraid I would have no idea - all I could tell them is that there was a bug and
it has been fixed
On Nov 2, 2020, at 12:18 AM, Andrea Piacentini via users
mailto:users@lists.open-mpi.org> > wrote:
I installed version 4.0.5 and the problem appears to be fixed.
Can you please help us
Could you please tell us what version of OMPI you are using?
On Oct 28, 2020, at 11:16 AM, Andrea Piacentini via users
mailto:users@lists.open-mpi.org> > wrote:
Good morning we need to launch a MPMD application with two fortran excutables
and one interpreted python (mpi4py) application.
I think you mean add "--mca mtl ofi" to the mpirun cmd line
> On Jan 25, 2021, at 10:18 AM, Heinz, Michael William via users
> wrote:
>
> What happens if you specify -mtl ofi ?
>
> -Original Message-
> From: users On Behalf Of Patrick Begou via
> users
> Sent: Monday, January 25,
There should have been an error message right above that - all this is saying
is that the same error message was output by 7 more processes besides the one
that was output. It then indicates that process 3 (which has pid 0?) was killed.
Looking at the help message tag, it looks like no NICs
Okay, I can't promise when I'll get to it, but I'll try to have it in time for
OMPI v5.
On Jan 29, 2021, at 1:30 AM, Luis Cebamanos via users mailto:users@lists.open-mpi.org> > wrote:
Hi Ralph,
It would be great to have it for load balancing issues. Ideally one could do
something like
the app-contexts wind up in MPI_COMM_WORLD.
On Jan 28, 2021, at 3:18 PM, Luis Cebamanos via users mailto:users@lists.open-mpi.org> > wrote:
That's right Ralph!
On 28/01/2021 23:13, Ralph Castain via users wrote:
Trying to wrap my head around this, so let me try a 2-node example. You want
Trying to wrap my head around this, so let me try a 2-node example. You want
(each rank bound to a single core):
ranks 0-3 to be mapped onto node1
ranks 4-7 to be mapped onto node2
ranks 8-11 to be mapped onto node1
ranks 12-15 to be mapped onto node2
etc.etc.
Correct?
> On Jan 28, 2021, at
You can still use "map-by" to get what you want since you know there are four
interfaces per node - just do "--map-by ppr:8:node". Note that you definitely
do NOT want to list those multiple IP addresses in your hostfile - all you are
doing is causing extra work for mpirun as it has to DNS
To answer your specific questions:
The backend daemons (orted) will not exit until all locally spawned procs exit.
This is not configurable - for one thing, OMPI procs will suicide if they see
the daemon depart, so it makes no sense to have the daemon fail if a proc
terminates. The logic
The original configure line is correct ("--without-orte") - just a typo in the
later text.
You may be running into some issues with Slurm's built-in support for OMPI. Try
running it with OMPI's "mpirun" instead and see if you get better performance.
You'll have to reconfigure to remove the
I'm not sure we support what you are wanting to do.
You can direct mpiexec to use a specified script to launch its daemons on
remote nodes. The daemons will need to connect back via TCP to mpiexec. The
daemons are responsible for fork/exec'ing the local MPI application procs on
each node.
Apologies for the very long delay in response. This has been verified fixed in
OMPI's master branch that is to be released as v5.0 in the near future.
Unfortunately, there are no plans to backport that fi to earlier release
series. We therefore recommend that you upgrade to v5.0 if you retain
Hmmm...disturbing. The changes I made have somehow been lost. I'll have to redo
it - will get back to you when it is restored.
On Mar 25, 2021, at 2:54 PM, L Lutret mailto:lu.lut...@gmail.com> > wrote:
Hi Ralph,
Thanks for your response. I tried with the master branch a very simple spawn
(pure default), it just doesn’t function (I’m guessing
because it chose “bad” or in-use ports).
On 18 Mar 2021, at 14:11, Ralph Castain via users mailto:users@lists.open-mpi.org> > wrote:
Hard to say - unless there is some reason, why not make it large enough to not
be an issue?
t range resulted in
the issue I posted about here before, where mpirun just does nothing for 5mins
and then terminates itself, without any error messages.)
Cheers,
Sendu.
On 17 Mar 2021, at 13:25, Ralph Castain via users mailto:users@lists.open-mpi.org> > wrote:
What you are missing i
What you are missing is that there are _two_ messaging layers in the system.
You told the btl/tcp layer to use the specified ports, but left the oob/tcp one
unspecified. You need to add
oob_tcp_dynamic_ipv4_ports = 46207-46239
or whatever range you want to specify
Note that if you want the
ailable ports, but is it checking those
ports are also available on all the other hosts it’s going to run on?
On 18 Mar 2021, at 15:57, Ralph Castain via users mailto:users@lists.open-mpi.org> > wrote:
Hmmm...then you have something else going on. By default, OMPI will ask the
loca
You did everything right - the OSHMEM implementation in OMPI only supports UCX
as it is essentially a Mellanox offering. I think the main impediment to
broadening it is simply interest and priority on the part of the non-UCX
developers.
> On Mar 22, 2021, at 7:51 AM, Michael Di Domenico via
[../BB/../..
/../../../../../../../../../../../../../../../../../../../../../../../../../../.
./../../../../../../../../../../../../../../../../../../../../../../../../../../
../../../../../../..][../../../../../../../../../../../../../../../../../../../.
./../../../../../../../../../../../../../../../../../../../../../../../../../../
../../../../../../../../../../../../../../../../../..]
On 28/02/2021 16:24, Ralph Castain via users wrote:
Did you read the documentation on rankfile? The "slot=N" directive saids to
"put this proc on
Excuse me, but would you please ensure that you do not send mail to a mailing
list containing this label:
[AMD Official Use Only - Internal Distribution Only]
Thank you
Ralph
On Mar 4, 2021, at 4:55 AM, Raut, S Biplab via users mailto:users@lists.open-mpi.org> > wrote:
[AMD Official Use Only
e core,
and the second bound to all the rest, with no use of hyperthreads.
Would this be
--map-by ppr:2:node --bind-to core --cpu-list 0,1-31
?
Thx
On 2/28/21 5:44 PM, Ralph Castain via users wrote:
The only way I know of to do what you want is
--map-by ppr:32:socket
other policies. I have also tried with
--cpu-set with identical results. Probably rankfile is my only option too.
On 28/02/2021 22:44, Ralph Castain via users wrote:
The only way I know of to do what you want is
--map-by ppr:32:socket --bind-to core --cpu-list 0,2,4,6,...
whe
I need a rankfile listing all the hosts?
John
On 3/1/21 10:26 AM, Ralph Castain via users wrote:
I'm afraid not - you have simply told us that all cpus are available. I don't
know of any way to accomplish what John wants other than with a rankfile.
On Mar 1, 2021, at 7:13 AM, Luis Ceb
Your command line is incorrect:
--map-by ppr:32:socket:PE=4 --bind-to hwthread
should be
--map-by ppr:32:socket:PE=2 --bind-to core
On Feb 28, 2021, at 5:57 AM, Luis Cebamanos via users mailto:users@lists.open-mpi.org> > wrote:
I should have said, "I would like to run 128 MPI processes on 2
./..]
And this is still different from the output produce using the rankfile.
Cheers,
Luis
On 28/02/2021 14:06, Ralph Castain via users wrote:
Your command line is incorrect:
--map-by ppr:32:socket:PE=4 --bind-to hwthread
---
> ||_// the State| Ryan Novosielski - novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> || \\of NJ| Office of Advanced Research Computing - MSB C630,
> Newark
> `'
>
>> On Jul
ob step aborted: Waiting up to 32 seconds for job step to finish.
> srun: error: gpu004: tasks 0-1: Exited with exit code 1
>
> --
> #BlackLivesMatter
> ____
> || \\UTGERS,
> |---*O*---
> ||_// the State
Ryan - I suspect what Sergey was trying to say was that you need to ensure OMPI
doesn't try to use the OpenIB driver, or at least that it doesn't attempt to
initialize it. Try adding
OMPI_MCA_pml=ucx
to your environment.
On Jul 29, 2021, at 1:56 AM, Sergey Oblomov via users
You just need to tell mpirun that you want your procs to be bound to cores, not
socket (which is the default).
Add "--bind-to core" to your mpirun cmd line
On Oct 10, 2021, at 11:17 PM, Chang Liu via users mailto:users@lists.open-mpi.org> > wrote:
Yes they are. This is an interactive job from
d that? Thanks.
>Ray
>
>
> From: users on behalf of Ralph Castain via
> users
> Sent: Monday, October 11, 2021 1:49 PM
> To: Open MPI Users
> Cc: Ralph Castain
> Subject: Re: [OMPI users] [External] Re: cpu bi
o processes sharing a
physical core.
I guess there is a way to do that by playing with mapping. I just want to know
if this is a bug in mpirun, or this feature for interacting with slurm was
never implemented.
Chang
On 10/11/21 10:07 AM, Ralph Castain via users wrote:
You just need to tell
mailto:users@lists.open-mpi.org> > wrote:
OK thank you. Seems that srun is a better option for normal users.
Chang
On 10/11/21 1:23 PM, Ralph Castain via users wrote:
Sorry, your output wasn't clear about cores vs hwthreads. Apparently, your
Slurm config is setup to use hwthreads as indep
Could you please include (a) what version of OMPI you are talking about, and
(b) the binding patterns you observed from both srun and mpirun?
> On Oct 9, 2021, at 6:41 PM, Chang Liu via users
> wrote:
>
> Hi,
>
> I wonder if mpirun can follow the cpu binding settings from slurm, when
>
1 - 100 of 126 matches
Mail list logo