Hi,
I am evaluating OpenMPI 5.0.0 and I am experiencing a race condition when
spawning a different number of processes in different nodes.
With:
$cat hostfile
node00
node01
node02
node03
If I run this code:
#include
#include
#include
int main(int argc, char* argv[]){
MPI_Init(&a
Hi there,
I have an issue in OpenMPI 4.0.2 and 4.1.1 that MPI_COMM_SPAWN() cannot
spawn across nodes, while I could successfully use this function in
OpenMPI 2.1.1 I am testing on a cluster with CentOS 7.9, LSF Batch
system, and GCC 6.3.0.
I used this code for testing (called it "spawn_examp
Sorry for the incredibly late reply. Hopefully, you have already managed to
find the answer.
I'm not sure what your comm_spawn command looks like, but it appears you
specified the host in it using the "dash_host" info-key, yes? The problem is
that this is interpreted the same way as the "-host
I am trying to launch a number of manager processes, one per node, and then have
each of those managers spawn, on its own same node, a number of workers. For
this example,
I have 2 managers and 2 workers per manager. I'm following the instructions at
this link
https://stackoverflow.com/questi
was read by OpenMPI. Is this correct?
Thanks,
Kurt
From: Ralph Castain mailto:r...@open-mpi.org> >
Subject: [EXTERNAL] Re: [OMPI users] MPI_Comm_Spawn failure: All nodes already
filled
I'm afraid I cannot replicate this problem on OMPI master, so it could be
something different ab
: [OMPI users] MPI_Comm_Spawn failure: All nodes already
filled
I'm afraid I cannot replicate this problem on OMPI master, so it could be
something different about OMPI 4.0.1 or your environment. Can you download and
test one of the nightly tarballs from the "master" branch and see i
I'm afraid I cannot replicate this problem on OMPI master, so it could be
something different about OMPI 4.0.1 or your environment. Can you download and
test one of the nightly tarballs from the "master" branch and see if it works
for you?
https://www.open-mpi.org/nightly/master/
Ralph
On Au
Hi,
MPI_Comm_spawn() is failing with the error message "All nodes which are
allocated for this job are already filled". I compiled OpenMpi 4.0.1 with the
Portland Group C++ compiler, v. 19.5.0, both with and without Torque/Maui
support. I thought that not using Torque/Maui support would gi
est wishes,
> Thomas Pak
>
> *From: *"Thomas Pak"
> *To: *users@lists.open-mpi.org
> *Sent: *Friday, 7 December, 2018 17:51:29
> *Subject: *[OMPI users] MPI_Comm_spawn leads to pipe leak and other errors
>
> Dear all,
>
> My MPI application spawns a lar
> To: Open MPI Users
> Cc: Open MPI Users
> Subject: Re: [OMPI users] MPI_Comm_spawn leads to pipe leak and other errors
>
> Dear Jeff,
>
> I did find a way to circumvent this issue for my specific application by
> spawning less frequently. However, I wanted to at least b
pen MPI Users
Subject: Re: [OMPI users] MPI_Comm_spawn leads to pipe leak and other errors
Dear Jeff,
I did find a way to circumvent this issue for my specific application by
spawning less frequently. However, I wanted to at least bring attention to this
issue for the OpenMPI community, as it can
that there is a fundamental flaw in how OpenMPI handles dynamic
> process creation.
>
> Best wishes,
> Thomas Pak
>
> From: "Thomas Pak" <mailto:thomas@maths.ox.ac.uk>>
> To: users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> Sent
enMPI handles
> > dynamic process creation.
> >
> > Best wishes,
> > Thomas Pak
> >
> >
> > From: "Thomas Pak" > (mailto:thomas@maths.ox.ac.uk)>
> > To: users@lists.open-mpi.org (mailto:users@lists.open-mpi.org)
> > Sent: F
c process creation.
>
> Best wishes,
> Thomas Pak
>
> --
> *From: *"Thomas Pak"
> *To: *users@lists.open-mpi.org
> *Sent: *Friday, 7 December, 2018 17:51:29
> *Subject: *[OMPI users] MPI_Comm_spawn leads to pipe leak and other errors
>
&
: "Thomas Pak"
To: users@lists.open-mpi.org
Sent: Friday, 7 December, 2018 17:51:29
Subject: [OMPI users] MPI_Comm_spawn leads to pipe leak and other errors
Dear all,
My MPI application spawns a large number of MPI processes using MPI_Comm_spawn
over its total lifetime. Unfortunate
Dear all,
My MPI application spawns a large number of MPI processes using MPI_Comm_spawn
over its total lifetime. Unfortunately, I have experienced that this results in
problems for all currently supported OpenMPI versions (2.1, 3.0, 3.1 and 4.0).
I have written a short, self-contained program
Andrew,
the 2 seconds timeout is very likely a bug that was fixed, so i strongly
suggest you give a try to the latest 2.0.2 that was released earlier this
week.
Ralph is referring an other timeout which is hard coded (fwiw, the MPI
standard says nothing about timeout, so we hardcoded one to preve
We know v2.0.1 has problems with comm_spawn, and so you may be encountering one
of those. Regardless, there is indeed a timeout mechanism in there. It was
added because people would execute a comm_spawn, and then would hang and eat up
their entire allocation time for nothing.
In v2.0.2, I see i
I am using Open MPI version 2.0.1.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
What version of OMPI are you using?
> On Jan 31, 2017, at 7:33 AM, elistrato...@info.sgu.ru wrote:
>
> Hi,
>
> I am trying to write trivial master-slave program. Master simply creates
> slaves, sends them a string, they print it out and exit. Everything works
> just fine, however, when I add a d
Hi,
I am trying to write trivial master-slave program. Master simply creates
slaves, sends them a string, they print it out and exit. Everything works
just fine, however, when I add a delay (more than 2 sec) before calling
MPI_Init on slave, MPI fails with MPI_ERR_SPAWN. I am pretty sure that
MPI_
PSM_DEVICES -> TrueScale
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
r...@open-mpi.org
Sent: Thursday, September 29, 2016 7:12 AM
To: Open MPI Users
Subject: Re: [OMPI users] MPI_Comm_spawn
Ah, that may be why it wouldn’t show up in the OMPI code base itself. If that
i
The solution was to use the "tcp", "sm" and "self" BTLs for the transport
of MPI messages, with TCP restricting only the eth0 interface to
communicate and using ob1 as p2p management layer:
mpirun --mca btl_tcp_if_include eth0 --mca pml ob1 --mca btl tcp,sm,self
-np 1 --hostfile my_hosts ./manager
Ah, that may be why it wouldn’t show up in the OMPI code base itself. If that
is the case here, then no - OMPI v2.0.1 does not support comm_spawn for PSM. It
is fixed in the upcoming 2.0.2
> On Sep 29, 2016, at 6:58 AM, Gilles Gouaillardet
> wrote:
>
> Ralph,
>
> My guess is that ptl.c comes
Ralph,
My guess is that ptl.c comes from PSM lib ...
Cheers,
Gilles
On Thursday, September 29, 2016, r...@open-mpi.org wrote:
> Spawn definitely does not work with srun. I don’t recognize the name of
> the file that segfaulted - what is “ptl.c”? Is that in your manager program?
>
>
> On Sep 2
Spawn definitely does not work with srun. I don’t recognize the name of the
file that segfaulted - what is “ptl.c”? Is that in your manager program?
> On Sep 29, 2016, at 6:06 AM, Gilles Gouaillardet
> wrote:
>
> Hi,
>
> I do not expect spawn can work with direct launch (e.g. srun)
>
> Do y
Hi,
I do not expect spawn can work with direct launch (e.g. srun)
Do you have PSM (e.g. Infinipath) hardware ? That could be linked to the
failure
Can you please try
mpirun --mca pml ob1 --mca btl tcp,sm,self -np 1 --hostfile my_hosts
./manager 1
and see if it help ?
Note if you have the poss
Hello,
I am using MPI_Comm_spawn to dynamically create new processes from single
manager process. Everything works fine when all the processes are running
on the same node. But imposing restriction to run only a single process per
node does not work. Below are the errors produced during multinode
Hi Gilles,
Thanks for your answer.
BR,
Radek
On Thu, May 14, 2015 at 9:12 AM, Gilles Gouaillardet
wrote:
> This is a known limitation of the sm btl.
>
> FWIW, the vader btl (available in Open MPI 1.8) has the same limitation,
> thought i heard there are some works in progress to get rid of this
This is a known limitation of the sm btl.
FWIW, the vader btl (available in Open MPI 1.8) has the same limitation,
thought i heard there are some works in progress to get rid of this
limitation.
Cheers,
Gilles
On 5/14/2015 3:52 PM, Radoslaw Martyniszyn wrote:
Dear developers of Open MPI,
I
Dear developers of Open MPI,
I've created two applications: parent and child. Parent spawns children
using MPI_Comm_spawn. I would like to use shared memory when they
communicate. However, applications do not start when I try using sm. Please
comment on that issue. If this feature is not supported
> "George" == George Bosilca writes:
George> Why are you using system() the second time ? As you want
George> to spawn an MPI application calling MPI_Call_spawn would
George> make everything simpler.
Yes, this works! Very good trick... The system routine would be more
flexible, b
Why are you using system() the second time ? As you want to spawn an MPI
application calling MPI_Call_spawn would make everything simpler.
George
On Jul 3, 2014 4:34 PM, "Milan Hodoscek" wrote:
>
> Hi,
>
> I am trying to run the following setup in fortran without much
> success:
>
> I have an MP
Unfortunately, that has never been supported. The problem is that the embedded
mpirun picks up all those MCA params that were provided to the original
application process, and gets hopelessly confused. We have tried in the past to
figure out a solution, but it has proved difficult to separate th
Hi,
I am trying to run the following setup in fortran without much
success:
I have an MPI program, that uses mpi_comm_spawn which spawns some
interface program that communicates with the one that spawned it. This
spawned program then prepares some data and uses call system()
statement in fortran.
Funny, but I couldn't find the code path that supported that in the latest 1.6
series release (didn't check earlier ones) - but no matter, it seems logical
enough. Fixed in the trunk and cmr'd to 1.7.4
Thanks!
Ralph
On Dec 19, 2013, at 8:08 PM, Tim Miller wrote:
> Hi Ralph,
>
> That's correc
Hi Ralph,
That's correct. All of the original processes see the -x values, but
spawned ones do not.
Regards,
Tim
On Thu, Dec 19, 2013 at 6:09 PM, Ralph Castain wrote:
>
> On Dec 19, 2013, at 2:57 PM, Tim Miller wrote:
>
> > Hi All,
> >
> > I have a question similar (but not identical to) the
On Dec 19, 2013, at 2:57 PM, Tim Miller wrote:
> Hi All,
>
> I have a question similar (but not identical to) the one asked by Tom Fogel a
> week or so back...
>
> I have a code that uses MPI_Comm_spawn to launch different processes. The
> executables for these use libraries in non-standard
Hi All,
I have a question similar (but not identical to) the one asked by Tom Fogel
a week or so back...
I have a code that uses MPI_Comm_spawn to launch different processes. The
executables for these use libraries in non-standard locations, so what I've
done is add the directories containing the
One further point that I missed in my earlier note: if you are starting the
parent as a singleton, then you are fooling yourself about the "without mpirun"
comment. A singleton immediately starts a local daemon to act as mpirun so that
comm_spawn will work. Otherwise, there is no way to launch t
On 6/16/2012 8:03 AM, Roland Schulz wrote:
Hi,
I would like to start a single process without mpirun and then use
MPI_Comm_spawn to start up as many processes as required. I don't want
the parent process to take up any resources, so I tried to disconnect
the inter communicator and then finali
I'm afraid there is no option to keep the job alive if the parent exits. I
could give you several reasons for that behavior, but the bottom line is that
it can't be changed.
Why don't you have the parent loop across "sleep", waking up periodically to
check for a "we are done" message from a chi
Hi,
I would like to start a single process without mpirun and then use
MPI_Comm_spawn to start up as many processes as required. I don't want the
parent process to take up any resources, so I tried to disconnect the inter
communicator and then finalize mpi and exit the parent. But as soon as I do
Hi,
I'm working with MPI_Comm_spawn and I have some error messages.
The code is relatively simple:
#include
#include
#include
#include
#include
int main(int argc, char ** argv){
int i;
int rank, size, child_rank;
char nomehost[20];
MPI_Comm parent, intercom
Try using MPI_COMM_REMOTE_SIZE to get the size of the remote group in an
intercommunicator. MPI_COMM_SIZE returns the size of the local group.
On Jan 7, 2011, at 6:22 PM, Pierre Chanial wrote:
> Hello,
>
> When I run this code:
>
> program testcase
>
> use mpi
> implicit none
>
>
Hello,
When I run this code:
program testcase
use mpi
implicit none
integer :: rank, lsize, rsize, code
integer :: intercomm
call MPI_INIT(code)
call MPI_COMM_GET_PARENT(intercomm, code)
if (intercomm == MPI_COMM_NULL) then
call MPI_COMM_SPAWN ("./testcase"
> "Ralph" == Ralph Castain writes:
Ralph> On Oct 4, 2010, at 10:36 AM, Milan Hodoscek wrote:
>>> "Ralph" == Ralph Castain writes:
>>
Ralph> I'm not sure why the group communicator would make a
Ralph> difference - the code area in question knows nothing about
Ral
On Oct 4, 2010, at 10:36 AM, Milan Hodoscek wrote:
>> "Ralph" == Ralph Castain writes:
>
>Ralph> I'm not sure why the group communicator would make a
>Ralph> difference - the code area in question knows nothing about
>Ralph> the mpi aspects of the job. It looks like you are hitt
> "Ralph" == Ralph Castain writes:
Ralph> I'm not sure why the group communicator would make a
Ralph> difference - the code area in question knows nothing about
Ralph> the mpi aspects of the job. It looks like you are hitting a
Ralph> race condition that causes a particular in
I'm not sure why the group communicator would make a difference - the code area
in question knows nothing about the mpi aspects of the job. It looks like you
are hitting a race condition that causes a particular internal recv to not
exist when we subsequently try to cancel it, which generates th
Hi,
I am a long time happy user of mpi_comm_spawn() routine. But so far I
used it only with the MPI_COMM_WORLD communicator. Now I want to
execute more mpi_comm_spawn() routines, by creating and using group
communicators. However this seems to have some problems. I can get it
to run about 50% time
Hi Ralph,
I have confirmed that openmpi-1.4a1r22335 works with my master, slave
example. The temporary directories are cleaned up properly.
Thanks for the help!
nick
On Thu, Dec 17, 2009 at 13:38, Nicolas Bock wrote:
> Ok, I'll give it a try.
>
> Thanks, nick
>
>
>
> On Thu, Dec 17, 2009 at
Ok, I'll give it a try.
Thanks, nick
On Thu, Dec 17, 2009 at 12:44, Ralph Castain wrote:
> In case you missed it, this patch should be in the 1.4 nightly tarballs -
> feel free to test and let me know what you find.
>
> Thanks
> Ralph
>
> On Dec 2, 2009, at 10:06 PM, Nicolas Bock wrote:
>
> Th
In case you missed it, this patch should be in the 1.4 nightly tarballs - feel
free to test and let me know what you find.
Thanks
Ralph
On Dec 2, 2009, at 10:06 PM, Nicolas Bock wrote:
> That was quick. I will try the patch as soon as you release it.
>
> nick
>
>
> On Wed, Dec 2, 2009 at 21:
On Fri, Dec 4, 2009 at 12:10, Eugene Loh wrote:
> Nicolas Bock wrote:
>
> On Fri, Dec 4, 2009 at 10:29, Eugene Loh wrote:
>
>> I think you might observe a world of difference if the master issued some
>> non-blocking call and then intermixed MPI_Test calls with sleep calls. You
>> should see *
Nicolas Bock wrote:
On Fri, Dec 4, 2009 at 10:29, Eugene Loh
wrote:
I think you might observe a
world of difference if the master issued
some non-blocking call and then intermixed MPI_Test calls with sleep
calls. You should see *much* more subservient behavior.
On Fri, Dec 4, 2009 at 10:29, Eugene Loh wrote:
> Nicolas Bock wrote:
>
> On Fri, Dec 4, 2009 at 10:10, Eugene Loh wrote:
>
>> Yield helped, but not as effectively as one might have imagined.
>>
>
> Yes, that's the impression I get as well, the master process might be
> yielding, but it doesn't
Nicolas Bock wrote:
On Fri, Dec 4, 2009 at 10:10, Eugene Loh
wrote:
Yield helped, but
not as effectively as one might have imagined.
Yes, that's the impression I get as well, the master process might be
yielding, but it doesn't appear to be a lot. Maybe
On Fri, Dec 4, 2009 at 10:10, Eugene Loh wrote:
> Nicolas Bock wrote:
>
> On Fri, Dec 4, 2009 at 08:21, Ralph Castain wrote:
>
>> You used it correctly. Remember, all that cpu number is telling you is the
>> percentage of use by that process. So bottom line is: we are releasing it as
>> much as
Nicolas Bock wrote:
On Fri, Dec 4, 2009 at 08:21, Ralph Castain
wrote:
You used it correctly. Remember, all that cpu number
is telling you is the percentage of use by that process. So bottom line
is: we are releasing it as much as we possibly can, but no other
proc
On Fri, Dec 4, 2009 at 08:21, Ralph Castain wrote:
> You used it correctly. Remember, all that cpu number is telling you is the
> percentage of use by that process. So bottom line is: we are releasing it as
> much as we possibly can, but no other process wants to use the cpu, so we go
> ahead and
You used it correctly. Remember, all that cpu number is telling you is the
percentage of use by that process. So bottom line is: we are releasing it as
much as we possibly can, but no other process wants to use the cpu, so we go
ahead and use it.
If any other process wanted it, then the percent
On Fri, Dec 4, 2009 at 08:03, Ralph Castain wrote:
>
>
> It is polling at the barrier. This is done aggressively by default for
> performance. You can tell it to be less aggressive if you want via the
> yield_when_idle mca param.
>
>
How do I use this parameter correctly? I tried
/usr/local/open
On Dec 4, 2009, at 7:46 AM, Nicolas Bock wrote:
> Hello list,
>
> when I run the attached example, which spawns a "slave" process with
> MPI_Comm_spawn(), I see the following:
>
> nbock19911 0.0 0.0 53980 2288 pts/0S+ 07:42 0:00
> /usr/local/openmpi-1.3.4-gcc-4.4.2/bin/mpirun
Hello list,
when I run the attached example, which spawns a "slave" process with
MPI_Comm_spawn(), I see the following:
nbock19911 0.0 0.0 53980 2288 pts/0S+ 07:42 0:00
/usr/local/openmpi-1.3.4-gcc-4.4.2/bin/mpirun -np 3 ./master
nbock19912 92.1 0.0 158964 3868 pts/0R+
That was quick. I will try the patch as soon as you release it.
nick
On Wed, Dec 2, 2009 at 21:06, Ralph Castain wrote:
> Patch is built and under review...
>
> Thanks again
> Ralph
>
> On Dec 2, 2009, at 5:37 PM, Nicolas Bock wrote:
>
> Thanks
>
> On Wed, Dec 2, 2009 at 17:04, Ralph Castain
Patch is built and under review...
Thanks again
Ralph
On Dec 2, 2009, at 5:37 PM, Nicolas Bock wrote:
> Thanks
>
> On Wed, Dec 2, 2009 at 17:04, Ralph Castain wrote:
> Yeah, that's the one all right! Definitely missing from 1.3.x.
>
> Thanks - I'll build a patch for the next bug-fix release
>
Thanks
On Wed, Dec 2, 2009 at 17:04, Ralph Castain wrote:
> Yeah, that's the one all right! Definitely missing from 1.3.x.
>
> Thanks - I'll build a patch for the next bug-fix release
>
>
> On Dec 2, 2009, at 4:37 PM, Abhishek Kulkarni wrote:
>
> > On Wed, Dec 2, 2009 at 5:00 PM, Ralph Castain
Yeah, that's the one all right! Definitely missing from 1.3.x.
Thanks - I'll build a patch for the next bug-fix release
On Dec 2, 2009, at 4:37 PM, Abhishek Kulkarni wrote:
> On Wed, Dec 2, 2009 at 5:00 PM, Ralph Castain wrote:
>> Indeed - that is very helpful! Thanks!
>> Looks like we aren't
On Wed, Dec 2, 2009 at 5:00 PM, Ralph Castain wrote:
> Indeed - that is very helpful! Thanks!
> Looks like we aren't cleaning up high enough - missing the directory level.
> I seem to recall seeing that error go by and that someone fixed it on our
> devel trunk, so this is likely a repair that did
Indeed - that is very helpful! Thanks!
Looks like we aren't cleaning up high enough - missing the directory level. I
seem to recall seeing that error go by and that someone fixed it on our devel
trunk, so this is likely a repair that didn't get moved over to the release
branch as it should have
On Wed, Dec 2, 2009 at 14:23, Ralph Castain wrote:
> Hmmif you are willing to keep trying, could you perhaps let it run for
> a brief time, ctrl-z it, and then do an ls on a directory from a process
> that has already terminated? The pids will be in order, so just look for an
> early number (
Hmmif you are willing to keep trying, could you perhaps let it run for a
brief time, ctrl-z it, and then do an ls on a directory from a process that has
already terminated? The pids will be in order, so just look for an early number
(not mpirun or the parent, of course).
It would help if yo
On Wed, Dec 2, 2009 at 12:12, Ralph Castain wrote:
>
> On Dec 2, 2009, at 10:24 AM, Nicolas Bock wrote:
>
>
>
> On Tue, Dec 1, 2009 at 20:58, Nicolas Bock wrote:
>
>>
>>
>> On Tue, Dec 1, 2009 at 18:03, Ralph Castain wrote:
>>
>>> You may want to check your limits as defined by the shell/system
On Dec 2, 2009, at 10:24 AM, Nicolas Bock wrote:
>
>
> On Tue, Dec 1, 2009 at 20:58, Nicolas Bock wrote:
>
>
> On Tue, Dec 1, 2009 at 18:03, Ralph Castain wrote:
> You may want to check your limits as defined by the shell/system. I can also
> run this for as long as I'm willing to let it r
On Tue, Dec 1, 2009 at 20:58, Nicolas Bock wrote:
>
>
> On Tue, Dec 1, 2009 at 18:03, Ralph Castain wrote:
>
>> You may want to check your limits as defined by the shell/system. I can
>> also run this for as long as I'm willing to let it run, so something else
>> appears to be going on.
>>
>>
>>
On Tue, Dec 1, 2009 at 18:03, Ralph Castain wrote:
> You may want to check your limits as defined by the shell/system. I can
> also run this for as long as I'm willing to let it run, so something else
> appears to be going on.
>
>
>
Is that with 1.3.3? I found that with 1.3.4 I can run the exampl
You may want to check your limits as defined by the shell/system. I can also
run this for as long as I'm willing to let it run, so something else appears to
be going on.
On Dec 1, 2009, at 4:38 PM, Nicolas Bock wrote:
>
>
> On Tue, Dec 1, 2009 at 16:28, Abhishek Kulkarni wrote:
> On Tue, De
On Tue, Dec 1, 2009 at 16:28, Abhishek Kulkarni wrote:
> On Tue, Dec 1, 2009 at 6:15 PM, Nicolas Bock
> wrote:
> > After reading Anthony's question again, I am not sure now that we are
> having
> > the same problem, but we might. In any case, the attached example
> programs
> > trigger the issue
On Tue, Dec 1, 2009 at 6:15 PM, Nicolas Bock wrote:
> After reading Anthony's question again, I am not sure now that we are having
> the same problem, but we might. In any case, the attached example programs
> trigger the issue of running out of pipes. I don't see how orted could, even
> if it was
Linux mujo 2.6.30-gentoo-r5 #1 SMP PREEMPT Thu Sep 17 07:47:12 MDT 2009
x86_64 Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz GenuineIntel GNU/Linux
On Tue, Dec 1, 2009 at 16:24, Ralph Castain wrote:
> It really does help if we have some idea what OMPI version you are talking
> about, and on what k
Sorry,
openmpi-1.3.3 compiled with gcc-4.4.2
nick
On Tue, Dec 1, 2009 at 16:24, Ralph Castain wrote:
> It really does help if we have some idea what OMPI version you are talking
> about, and on what kind of platform.
>
> This issue was fixed to the best of my knowledge (not all the pipes were
It really does help if we have some idea what OMPI version you are talking
about, and on what kind of platform.
This issue was fixed to the best of my knowledge (not all the pipes were
getting closed), but I would have to look and see what release might contain
the fix...would be nice to know w
After reading Anthony's question again, I am not sure now that we are having
the same problem, but we might. In any case, the attached example programs
trigger the issue of running out of pipes. I don't see how orted could, even
if it was reused. There is only a very limited number of processes run
Hello list,
a while back in January of this year, a user (Anthony Thevenin) had the
problem of running out of open pipes when he tried to use MPI_Comm_spawn a
few times. As I the thread his started in the mailing list archives and have
just joined the mailing list myself, I unfortunately can't rep
On Sep 22, 2009, at 8:20 AM, Blesson Varghese wrote:
I am fairly new to MPI.I have a few queries regarding spawning
processes that I am listing below:
a. How can processes send data to a spawned process?
See the descriptions for MPI_COMM_SPAWN; you get an inter-communicator
back from
Hi,
I am fairly new to MPI.I have a few queries regarding spawning processes
that I am listing below:
a. How can processes send data to a spawned process?
b. Can any process (that is not a parent process) send data to a
spawned process?
c. Can MPI_Send or MPI_Recv be used to
Thanks for the info.
meanwhile I have set:
mpi_param_check = 0
in my system-wide configuation file on workers
and
mpi_param_check = 1
on the master.
Jerome
Ralph Castain wrote:
Thanks! That does indeed help clarify.
You should also then configure OMPI with
--disable-per-user-config-f
Thanks! That does indeed help clarify.
You should also then configure OMPI with --disable-per-user-config-
files. MPI procs will automatically look at the default MCA parameter
file, which is probably on your master node (wherever mpirun was
executed). However, they also look at the user's h
Hi,
thanks for the reply.
Ralph Castain wrote:
The orteds don't pass anything from MPI_Info to srun during a
comm_spawn. What the orteds do is to chdir to the specified wdir before
spawning the child process to ensure that the child has the correct
working directory, then the orted changes ba
The orteds don't pass anything from MPI_Info to srun during a
comm_spawn. What the orteds do is to chdir to the specified wdir
before spawning the child process to ensure that the child has the
correct working directory, then the orted changes back to its default
working directory.
The or
Hi !
finally I got it:
passing the mca key/value `"plm_slurm_args"/"--chdir /local/folder"' does the
trick.
As a matter of fact, my code pass the MPI_Info key/value
`"wdir"/"/local/folder"'
to MPI_Comm_spawn as well: the working directories on the nodes of the spawned
programs
are `nodes:/loc
Hello Again,
Jerome BENOIT wrote:
Hello List,
I have just noticed that, when MPI_Comm_spawn is used to launch programs
around,
oreted working directory on the nodes is the working directory of the
spawnning program:
can we ask to oreted to use an another directory ?
Changing the working th
Hello List,
I have just noticed that, when MPI_Comm_spawn is used to launch programs around,
oreted working directory on the nodes is the working directory of the spawnning
program:
can we ask to oreted to use an another directory ?
Thanks in advance,
Jerome
Hi Joao,
Unfortunately, spawn is broken on the development trunk right now. We
are working on a major revamp of the runtime system which should fix
these problems, but it is not ready yet.
Sorry about that :(
Tim
Joao Vicente Lima wrote:
Hi all,
I'm getting errors with spawn in the situat
Hi all,
I'm getting errors with spawn in the situations:
1) spawn1.c - spawning 2 process on localhost, one by one, the error is:
spawning ...
[localhost:31390] *** Process received signal ***
[localhost:31390] Signal: Segmentation fault (11)
[localhost:31390] Signal code: Address not mapped (1)
0
> #12 0x4027a748 in mca_oob_tcp_msg_data () from
> /usr/local/Mpi/openmpi-1.1.4-noBproc-noThread/lib/openmpi/mca_oob_tcp.so
> #13 0x4027bb12 in mca_oob_tcp_peer_recv_handler () from
> /usr/local/Mpi/openmpi-1.1.4-noBproc-noThread/lib/openmpi/mca_oob_tcp.so
> #14 0x400703f9 in o
nmpi-1.1.4-noBproc-noThread/lib/openmpi/mca_rmgr_urm.so
>> #9 0x4004f277 in orte_rmgr_base_cmd_dispatch () from
>> /usr/local/Mpi/openmpi-1.1.4-noBproc-noThread/lib/liborte.so.0
>> #10 0x402b10ae in orte_rmgr_urm_recv () from
>> /usr/local/Mpi/openmpi-1.1.4-noBproc-noThread/lib/
b/openmpi/mca_oob_tcp.so
> #13 0x4027bb12 in mca_oob_tcp_peer_recv_handler () from
> /usr/local/Mpi/openmpi-1.1.4-noBproc-noThread/lib/openmpi/mca_oob_tcp.so
> #14 0x400703f9 in opal_event_loop () from
> /usr/local/Mpi/openmpi-1.1.4-noBproc-noThread/lib/libopal.so.0
> #15 0x4006adfa in o
) at main.c:13
(gdb)
-Message d'origine-
De : users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]De la
part de Tim Prins
Envoyé : lundi 5 mars 2007 22:34
À : Open MPI Users
Objet : Re: [OMPI users] MPI_Comm_Spawn
Never mind, I was just able to replicate it. I'll lo
1 - 100 of 140 matches
Mail list logo