Just to help reduce the scope of the problem, can you retest with a non
CUDA-aware Open MPI 1.8.1? And if possible, use --enable-debug in the
configure line to help with the stack trace?
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime
gpu-k20-08:46045] *** End of error message ***
>--
>mpiexec noticed that process rank 1 with PID 46045 on node gpu-k20-08
>exited on signal 11 (Segmentation fault).
>---
odes, I had
>CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7
>
>instead of
>CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
>
>Sorry for the false bug and thanks for directing me toward the solution.
>
>Maxime
>
>
>Le 2014-08-19 09:15, Rolf vandeVaart a écrit :
>>
Hi Christoph:
I will try and reproduce this issue and will let you know what I find. There
may be an issue with CUDA IPC support with certain traffic patterns.
Rolf
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Christoph Winter
Sent: Tuesday, August 26, 2014 2:46 AM
To:
If you are utilizing the CUDA-aware support in Open MPI, can you send me an
email with some information about the application and the cluster you are on.
I will consolidate information.
Thanks,
Rolf (rvandeva...@nvidia.com)
The error 304 corresponds to CUDA_ERRROR_OPERATNG_SYSTEM which means an OS call
failed. However, I am not sure how that relates to the call that is getting
the error.
Also, the last error you report is from MVAPICH2-GDR, not from Open MPI. I
guess then I have a few questions.
1. Can
.
Also, our defaults for openmpi-mca-params.conf are:
mtl=^mxm
btl=^usnic,tcp
btl_openib_flags=1
service nv_peer_mem status
nv_peer_mem module is loaded.
Kindest Regards,
-
Steven Eliuk,
From: Rolf vandeVaart <rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>>
Reply-To: Op
The CUDA person is now responding. I will try and reproduce. I looked through
the zip file but did not see the mpirun command. Can this be reproduced with
-np 4 running across four nodes?
Also, in your original message you wrote "Likewise, it doesn't matter if I
enable CUDA support or not.
That is strange, not sure why that is happening. I will try to reproduce with
your program on my system. Also, perhaps you could rerun with –mca
mpi_common_cuda_verbose 100 and send me that output.
Thanks
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Xun Gong
Sent: Sunday,
I think I found a bug in your program with how you were allocating the GPU
buffers. I will send you a version offlist with the fix.
Also, there is no need to rerun with the flags I had mentioned below.
Rolf
From: Rolf vandeVaart
Sent: Monday, January 12, 2015 9:38 AM
To: us...@open-mpi.org
Let me try to reproduce this. This should not have anything to do with GPU
Direct RDMA. However, to eliminate it, you could run with:
--mca btl_openib_want_cuda_gdr 0.
Rolf
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Aulwes, Rob
Sent: Wednesday, February 11, 2015 2:17 PM
To:
retry with a pre-release
version of Open MPI 1.8.5 that is available here and confirm it fixes your
issue. Any of the ones listed on that page should be fine.
http://www.open-mpi.org/nightly/v1.8/
Thanks,
Rolf
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart
Sent
Hi Jason:
The issue is that Open MPI is (presumably) a 64 bit application and it is
trying to load up a 64-bit libcuda.so.1 but not finding one. Making the link
as you did will not fix the problem (as you saw). In all my installations, I
also have a 64-bit driver installed in
Hi Lev:
I am not sure what is happening here but there are a few things we can do to
try and narrow things done.
1. If you run with --mca btl_smcuda_use_cuda_ipc 0 then I assume this error
will go away?
2. Do you know if when you see this error it happens on the first pass through
your
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lev Givon
>Sent: Sunday, March 29, 2015 10:11 PM
>To: Open MPI Users
>Subject: Re: [OMPI users] segfault during MPI_Isend when transmitting GPU
>arrays between multiple GPUs
>
>Recei
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Rolf
>vandeVaart
>Sent: Monday, March 30, 2015 9:37 AM
>To: Open MPI Users
>Subject: Re: [OMPI users] segfault during MPI_Isend when transmitting GPU
>arrays between multiple GPUs
>
It is my belief that you cannot do this at least with the openib BTL. The IB
card to be used for communication is selected during the MPI _Init() phase
based on where the CPU process is bound to. You can see some of this selection
by using the --mca btl_base_verbose 1 flag. There is a bunch
Hi Lev:
Any chance you can try Open MPI 1.8.5rc3 and see if you see the same behavior?
That code has changed a bit from the 1.8.4 series and I am curious if you will
still see the same issue.
http://www.open-mpi.org/software/ompi/v1.8/downloads/openmpi-1.8.5rc3.tar.gz
Thanks,
Rolf
I am not sure why you are seeing this. One thing that is clear is that you
have found a bug in the error reporting. The error message is a little garbled
and I see a bug in what we are reporting. I will fix that.
If possible, could you try running with --mca btl_smcuda_use_cuda_ipc 0. My
-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lev Givon
>Sent: Tuesday, May 19, 2015 10:25 PM
>To: Open MPI Users
>Subject: Re: [OMPI users] cuIpcOpenMemHandle failure when using
>OpenMPI 1.8.5 with CUDA 7.0 and Multi-Process Service
>
&
ti-Process Service
>
>Received from Lev Givon on Thu, May 21, 2015 at 11:32:33AM EDT:
>> Received from Rolf vandeVaart on Wed, May 20, 2015 at 07:48:15AM EDT:
>>
>> (snip)
>>
>> > I see that you mentioned you are starting 4 MPS daemons. Are
I think we bumped up a default value in Open MPI 1.8.5. To go back to the old
64Mbyte value try running with:
--mca mpool_sm_min_size 67108864
Rolf
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Aurélien Bouteiller
Sent: Tuesday, May 26, 2015 10:10 AM
To: Open MPI Users
Subject:
-aware MPI_Reduce problem in Openmpi 1.8.5
Hi Rolf,
Thank you very much for clarifying the problem. Is there any plan to support
GPU RDMA for reduction in the future?
On Jun 17, 2015, at 1:38 PM, Rolf vandeVaart
<rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wro
how you observed the behavior. Does the code need to run for a while to
see this?
Any suggestions on how I could reproduce this?
Thanks,
Rolf
From: Steven Eliuk [mailto:s.el...@samsung.com]
Sent: Tuesday, June 30, 2015 6:05 PM
To: Rolf vandeVaart
Cc: Open MPI Users
Subject: 1.8.6 w/ CUDA 7.0
Hi Stefan (and Steven who reported this earlier with CUDA-aware program)
I have managed to observed the leak when running LAMMPS as well. Note that
this has nothing to do with CUDA-aware features. I am going to move this
discussion to the Open MPI developer’s list to dig deeper into this
Just an FYI that this issue has been found and fixed and will be available in
the next release.
https://github.com/open-mpi/ompi-release/pull/357
Rolf
From: Rolf vandeVaart
Sent: Wednesday, July 01, 2015 4:47 PM
To: us...@open-mpi.org
Subject: RE: [OMPI users] 1.8.6 w/ CUDA 7.0 & GDR
Hi Shahzeb:
I believe another colleague of mine may have helped you with this issue (I was
not around last week). However, to help me better understand the issue you are
seeing, could you send me your config.log file from when you did the
configuration? You can just send to
I talked with Jeremia off list and we figured out what was going on. There is
the ability to use the cuMemcpyAsync/cuStreamSynchronize rather than the
cuMemcpy but it was never made the default for Open MPI 1.8 series. So, to get
that behavior you need the following:
--mca
com>
www.ibm.com<http://www.ibm.com>
- Original message -
From: Rolf vandeVaart <rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>>
Sent by: "users" <users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>>
To: Open MPI Users <
No, it is not. You have to use pml ob1 which will pull in the smcuda and
openib BTLs which have CUDA-aware built into them.
Rolf
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Subhra Mazumdar
Sent: Friday, August 21, 2015 12:18 AM
To: Open MPI Users
Subject: [OMPI users] cuda
I am not sure why the distances are being computed as you are seeing. I do not
have a dual rail card system to reproduce with. However, short term, I think
you could get what you want by running like the following. The first argument
tells the selection logic to ignore locality, so both cards
/ where I can
>look, I could help to find the issue.
>
>Thanks a lot!
>
>Marcin
>
>
>On 08/28/2015 05:28 PM, Rolf vandeVaart wrote:
>> I am not sure why the distances are being computed as you are seeing. I do
>not have a dual rail card system to reproduce with
Lev:
Can you run with --mca mpi_common_cuda_verbose 100 --mca mpool_rgpusm_verbose
100 and send me (rvandeva...@nvidia.com) the output of that.
Thanks,
Rolf
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lev Givon
>Sent: Wednesday, September 02, 2015
Hello Yang:
It is not clear to me if you are asking about a CUDA-aware build of Open MPI
where you do the MPI_Allreduce() or the GPU buffer or if you are handling
staging the GPU into host memory and then calling the MPI_Allreduce(). Either
way, they are somewhat similar. With CUDA-aware, the
>
>Sent by Apple Mail
>
>Yang ZHANG
>
>PhD candidate
>
>Networking and Wide-Area Systems Group
>Computer Science Department
>New York University
>
>715 Broadway Room 705
>New York, NY 10003
>
>> On Sep 25, 2015, at 11:07 AM, Rolf vandeVaart <rva
I can speak to part of your issue. There are no CUDA-aware features in the 1.6
series of Open MPI. Therefore, the various configure flags you tried would not
affect Open MPI itself. Those configure flags are relevant with the 1.7 series
and later, but as the FAQ says, the CUDA-aware feature
Ed, how large are the messages that you are sending and receiving?
Rolf
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf
Of Ed Blosch
Sent: Thursday, June 27, 2013 9:01 AM
To: us...@open-mpi.org
Subject: Re: [OMPI users] Application hangs on mpi_waitall
It ran a
With respect to the CUDA-aware support, Ralph is correct. The ability to send
and receive GPU buffers is in the Open MPI 1.7 series. And incremental
improvements will be added to the Open MPI 1.7 series. CUDA 5.0 is supported.
From: users-boun...@open-mpi.org
It is looking for the libcuda.so file, not the libcudart.so file. So, maybe
--with-libdir=/usr/lib64
You need to be on a machine with the CUDA driver installed. What was your
configure command?
http://www.open-mpi.org/faq/?category=building#build-cuda
Rolf
>-Original Message-
3 2:59 PM
>To: Open MPI Users
>Cc: Rolf vandeVaart
>Subject: Re: [OMPI users] Trouble configuring 1.7.2 for Cuda 5.0.35
>
>Thank you for the quick reply Rolf,
> I personally don't know the Cuda libraries. I was hoping there had been a
>name change. I am on a Cray XT-7.
We have done some work over the last year or two to add some CUDA-aware support
into the Open MPI library. Details on building and using the feature are here.
http://www.open-mpi.org/faq/?category=building#build-cuda
http://www.open-mpi.org/faq/?category=running#mpi-cuda-support
I am looking
That might be a bug. While I am checking, you could try configuring with this
additional flag:
--enable-mca-no-build=pml-bfo
Rolf
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Hammond,
>Simon David (-EXP)
>Sent: Monday, October 07, 2013 3:30 PM
>To:
>Laboratories, NM, USA
>
>
>
>
>
>
>On 10/7/13 1:47 PM, "Rolf vandeVaart" <rvandeva...@nvidia.com> wrote:
>
>>That might be a bug. While I am checking, you could try configuring with
>>this additional flag:
>>
>>--enable-mca-no-bu
Let me try this out and see what happens for me. But yes, please go ahead and
send me the complete backtrace.
Rolf
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of KESTENER Pierre
Sent: Wednesday, October 30, 2013 11:34 AM
To: us...@open-mpi.org
Cc: KESTENER Pierre
Subject: [OMPI
The CUDA-aware support is only available when running with the verbs interface
to Infiniband. It does not work with the PSM interface which is being used in
your installation.
To verify this, you need to disable the usage of PSM. This can be done in a
variety of ways, but try running like
Thanks for the report. CUDA-aware Open MPI does not currently support doing
reduction operations on GPU memory.
Is this a feature you would be interested in?
Rolf
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Peter Zaspel
>Sent: Friday, November 29,
asoning for this? Is there some documentation,
>which MPI calls are CUDA-aware and which not?
>
>Best regards
>
>Peter
>
>
>
>On 12/02/2013 02:18 PM, Rolf vandeVaart wrote:
>> Thanks for the report. CUDA-aware Open MPI does not currently support
>do
Yes, this was a bug with Open MPI 1.7.3. I could not reproduce it, but it was
definitely an issue in certain configurations.
Here was the fix. https://svn.open-mpi.org/trac/ompi/changeset/29762
We fixed it in Open MPI 1.7.4 and the trunk version, so as you have seen, they
do not have the
I assume your first issue is happening because you configured hwloc with cuda
support which creates a dependency on libcudart.so. Not sure why that would
mess up Open MPI. Can you send me how you configured hwloc?
I am not sure I understand the second issue. Open MPI puts everything in lib
Can you try running with --mca coll ^ml and see if things work?
Rolf
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Filippo Spiga
>Sent: Monday, March 03, 2014 7:14 PM
>To: Open MPI Users
>Subject: [OMPI users] 1.7.5rc1, error "COLL-ML
.12
>1048576 765.65
>
>
>Can you clarify exactly where the problem come from?
>
>Regards,
>Filippo
>
>
>On Mar 4, 2014, at 12:17 AM, Rolf vandeVaart <rvandeva...@nvidia.com>
>wrote:
>> Can you try running with --mca coll ^ml and see if
Answers inline...
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime
>Boissonneault
>Sent: Friday, May 23, 2014 4:31 PM
>To: Open MPI Users
>Subject: [OMPI users] Advices for parameter tuning for CUDA-aware MPI
>
>Hi,
>I am currently configuring a GPU
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime
>Boissonneault
>Sent: Tuesday, May 27, 2014 4:07 PM
>To: Open MPI Users
>Subject: Re: [OMPI users] Advices for parameter tuning for CUDA-aware MPI
>
>Answers inline too.
>>> 2) Is the absence of
Do you need the vampire support in your build? If not, you could add this to
configure.
--disable-vt
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of
>jcabe...@computacion.cs.cinvestav.mx
>Sent: Monday, June 16, 2014 1:40 PM
>To: us...@open-mpi.org
With Open MPI 1.8.1, the library will use the NIC that is "closest" to the CPU.
There was a bug in earlier versions of Open MPI 1.8 so that did not happen.
You can see this by running with some verbosity using the "btl_base_verbose"
flag. For example, this is what I observed on a two node
Ethan:
Can you run just "hostname" successfully? In other words, a non-MPI
program.
If that does not work, then we know the problem is in the runtime. If
it does works, then
there is something with the way the MPI library is setting up its
connections.
Is there more than one interface on
This problem looks a lot like a thread from earlier today. Can you look
at this
ticket and see if it helps? It has a workaround documented in it.
https://svn.open-mpi.org/trac/ompi/ticket/2632
Rolf
On 11/29/10 16:13, Prentice Bisbal wrote:
No, it looks like ld is being called with the
what is in the ticket.
Rolf
On 11/29/10 16:26, Nehemiah Dacres wrote:
that looks about right. So the suggestion:
./configure LDFLAGS="-notpath ... ... ..."
-notpath should be replaced by whatever the proper flag should be, in my case -L ?
On Mon, Nov 29, 2010 at 3:1
Hi James:
I can reproduce the problem on a single node with Open MPI 1.5 and the
trunk. I have submitted a ticket with
the information.
https://svn.open-mpi.org/trac/ompi/ticket/2656
Rolf
On 12/13/10 18:44, James Dinan wrote:
Hi,
I'm getting strange behavior using datatypes in a one-sided
Hi Brice:
Yes, I have tired OMPI 1.5 with gpudirect and it worked for me. You definitely
need the patch or you will see the behavior just as you described, a hang. One
thing you could try is disabling the large message RDMA in OMPI and see if that
works. That can be done by adjusting the
] anybody tried OMPI with gpudirect?
Le 28/02/2011 17:30, Rolf vandeVaart a écrit :
> Hi Brice:
> Yes, I have tired OMPI 1.5 with gpudirect and it worked for me. You
> definitely need the patch or you will see the behavior just as you described,
> a hang. One thing you could try
-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf
Of Brice Goglin
Sent: Monday, February 28, 2011 2:14 PM
To: Open MPI Users
Subject: Re: [OMPI users] anybody tried OMPI with gpudirect?
Le 28/02/2011 19:49, Rolf vandeVaart a écrit
Hi Fengguang:
That is odd that you see the problem even when running with the openib flags
set as Brice indicated. Just to be extra sure there are no typo errors in your
flag settings, maybe you can verify with the ompi_info command like this?
ompi_info -mca btl_openib_flags 304 -param btl
>> 1: After a reboot of two nodes I ran again, and the inter-node freeze didn't
>happen until the third iteration. I take that to mean that the basic
>communication works, but that something is saturating. Is there some notion
>of buffer size somewhere in the MPI system that could explain this?
>
>-Original Message-
>From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
>On Behalf Of Chris Cooper
>Sent: Friday, October 14, 2011 1:28 AM
>To: us...@open-mpi.org
>Subject: [OMPI users] gpudirect p2p?
>
>Hi,
>
>Are the recent peer to peer capabilities of cuda leveraged by
Actually, that is not quite right. From the FAQ:
"This feature currently only exists in the trunk version of the Open MPI
library."
You need to download and use the trunk version for this to work.
http://www.open-mpi.org/nightly/trunk/
Rolf
From: users-boun...@open-mpi.org
, December 14, 2011 10:47 AM
To: Open MPI Users
Cc: Rolf vandeVaart
Subject: Re: [OMPI users] How "CUDA Init prior to MPI_Init" co-exists with
unique GPU for each MPI process?
Hi,
Processes are not spawned by MPI_Init. They are spawned before by some
applications between your mpirun cal
Open MPI cannot handle having two interfaces on a node on the same subnet. I
believe it has to do with our matching code when we try to match up a
connection.
The result is a hang as you observe. I also believe it is not good practice to
have two interfaces on the same subnet.
If you put them
Yes, they are supported in the sense that they can work together. However, if
you want to have the ability to send/receive GPU buffers directly via MPI
calls, then I recommend you get CUDA 4.1 and use the Open MPI trunk.
http://www.open-mpi.org/faq/?category=building#build-cuda
Rolf
From:
I am not sure about everything that is going wrong, but there are at least two
issues I found.
First, you are skipping the first line that you read from integers.txt. Maybe
something like this instead.
while(fgets(line, sizeof line, fp)!= NULL){
sscanf(line,"%d",[k]);
sum = sum +
I tried your program on a single node and it worked fine. Yes, TCP message
passing in Open MPI has been working well for some time.
I have a few suggestions.
1. Can you run something like hostname successfully (mpirun -np 10 -hostfile
yourhostfile hostname)
2. If that works, then you can also
>-Original Message-
>From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
>On Behalf Of Don Armstrong
>Sent: Thursday, May 03, 2012 5:43 PM
>To: us...@open-mpi.org
>Subject: Re: [OMPI users] MPI over tcp
>
>On Thu, 03 May 2012, Rolf vandeVaar
You should be running with one GPU per MPI process. If I understand correctly,
you have a 3 node cluster and each node has a GPU so you should run with np=3.
Maybe you can try that and see if your numbers come out better.
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Hi Dmitry:
Let me look into this.
Rolf
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf
Of Dmitry N. Mikushin
Sent: Monday, June 18, 2012 10:56 AM
To: Open MPI Users
Cc: Олег Рябков
Subject: Re: [OMPI users] NVCC mpi.h: error: attribute "__deprecated__" does
not
en-mpi.org] On Behalf
Of Rolf vandeVaart
Sent: Monday, June 18, 2012 11:00 AM
To: Open MPI Users
Cc: Олег Рябков
Subject: Re: [OMPI users] NVCC mpi.h: error: attribute "__deprecated__" does
not take arguments
Hi Dmitry:
Let me look into this.
Rolf
From: users-boun...@open-mpi.org [mailto:users-
Yes, this feature is in Open MPI 1.7. It is implemented in the "smcuda" btl.
If you configure as outlined in the FAQ, then things should just work. The
smcuda btl will be selected and P2P will be used between GPUs on the same node.
This is only utilized on transfers of buffers that are
The current implementation does assume that the GPUs are on the same IOH and
therefore can use the IPC features of the CUDA library for communication.
One of the initial motivations for this was that to be able to detect whether
GPUs can talk to one another, the CUDA library has to be
>-Original Message-
>From: Jeff Squyres [mailto:jsquy...@cisco.com]
>Sent: Thursday, August 09, 2012 9:45 AM
>To: Open MPI Users
>Cc: Rolf vandeVaart
>Subject: CUDA in v1.7? (was: Compilation of OpenMPI 1.5.4 & 1.6.X fail for PGI
>compiler...)
>
>On Aug 9,
To answer the original questions, Open MPI will look at taking advantage of the
RDMA CUDA when it is available. Obviously, work needs to be done to figure out
the best way to integrate into the library. Much like there are a variety of
protocols under the hood to support host transfer of data
And just to give a little context, ompi-clean was created initially to "clean"
up a node, not for cleaning up a specific job. It was for the case where MPI
jobs would leave some files behind or leave some processes running. (I do not
believe this happens much at all anymore.) But, as was
Not sure. I will look into this. And thank you for the feedback Jens!
Rolf
>-Original Message-
>From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
>On Behalf Of Jeff Squyres
>Sent: Thursday, November 08, 2012 8:49 AM
>To: Open MPI Users
>Subject: Re: [OMPI users]
Yes, unfortunately, that issue is still unfixed. I just created the ticket and
included a possible workaround.
https://svn.open-mpi.org/trac/ompi/ticket/3531
>-Original Message-
>From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
>On Behalf Of Russell Power
>Sent:
As Lenny said, you should use the if_include parameter. Specifically,
it would look like this depending on which ones you want to select.
-mca btl_openib_if_include mtcha0
or
-mca btl_openib_if_include mtcha1
Rolf
On 07/15/09 09:33, nee...@crlindia.com wrote:
Thanks Ralph,
i
I think what you are looking for is this:
--mca plm_rsh_disable_qrsh 1
This means we will disable the use of qrsh and use rsh or ssh instead.
The --mca pls ^sge does not work anymore for two reasons. First, the
"pls" framework was renamed "plm". Secondly, the gridgengine plm was
folded
I assume it is working with np=8 because the 8 processes are getting
launched on the same node as mpirun and therefore there is no call to
qrsh to start up any remote processes. When you go beyond 8, mpirun
calls qrsh to start up processes on some of the remote nodes.
I would suggest first
This message is telling you that you have run out of file descriptors.
I am surprised that the -mca parameter setting did not fix the problem.
Can you run limit or ulimit on your shell and send the information? I
typically set my limit to 65536 assuming the system allows it.
burl-16 58
Hi Paul:
I tried the running the same way as you did and I saw the same thing. I
was using ClusterTools 8.2 (Open MPI 1.3.3r21324) and running on
Solaris. I looked at the mpirun process and it was definitely consuming
approximately 12 file descriptors per a.out process.
burl-ct-v440-0 59
Hi, how exactly do you run this to get this error? I tried and it
worked for me.
burl-ct-x2200-16 50 =>mpirun -mca btl_openib_warn_default_gid_prefix 0
-mca btl self,sm,openib -np 2 -host burl-ct-x2200-16,burl-ct-x2200-17
-mca btl_openib_ib_timeout 16 a.out
I am 0 at 1252670691
I am 1 at
On 03/01/10 11:51, Ralph Castain wrote:
On Mar 1, 2010, at 8:41 AM, David Turner wrote:
On 3/1/10 1:51 AM, Ralph Castain wrote:
Which version of OMPI are you using? We know that the 1.2 series was unreliable
about removing the session directories, but 1.3 and above appear to be quite
good
Hi Eloi:
To select the different bcast algorithms, you need to add an extra mca
parameter that tells the library to use dynamic selection.
--mca coll_tuned_use_dynamic_rules 1
One way to make sure you are typing this in correctly is to use it with
ompi_info. Do the following:
ompi_info -mca
My guess is you must have a mismatched MPI_Bcast somewhere
in the code. Presumably, there is a call to MPI_Bcast on the head
node that broadcasts something larger than 1 MPI_INT and does not
have the matching call on the worker nodes. Then, when the MPI_Bcast
on the worker nodes is called,
Hi:
I managed to run a 256 process job on a single node. I ran a simple test
in which all processes send a message to all others.
This was using Sun's Binary Distribution of Open MPI on Solaris which is
based on r16572 of the 1.2 branch. The machine had 8 cores.
burl-ct-v40z-0 49
Hello:
Have you actually tried this and got it to work? It did not work for me.
burl-ct-v440-0 50 =>mpirun -host burl-ct-v440-0,burl-ct-v440-1 -np 1
-mca btl self,sm,tcp -mca btl_tcp_if_include ce0 connectivity_c : -np 1
-mca btl self,sm,tcp -mca btl_tcp_if_include ce0 connectivity_c
This worked for me although I am not sure how extensive our 32/64
interoperability support is. I tested on Solaris using the TCP
interconnect and a 1.2.5 version of Open MPI. Also, we configure with
the --enable-heterogeneous flag which may make a difference here. Also
this did not work
And if you want to stop seeing it in the short term, you have at least
two choices I know of.
At configure time, add this to your configure line.
--enable-mca-no-build=vprotocol
This will prevent that component from being built, and will eliminate
the message.
If it is in there, you can
One other option which should kill of processes and cleanup is the
orte-clean command. In your case, you could do the following:
mpirun -hostfile ~/hostfile --pernode orte-clean
There is a man page for it also.
Rolf
Brock Palen wrote:
You would be much better off to not use nohup, and
Ashley Pittman wrote:
On Sat, 2008-08-16 at 08:03 -0400, Jeff Squyres wrote:
- large all to all operations are very stressful on the network, even
if you have very low latency / high bandwidth networking such as DDR IB
- if you only have 1 IB HCA in a machine with 8 cores, the problem
I have submitted a ticket on this issue.
https://svn.open-mpi.org/trac/ompi/ticket/1468
Rolf
On 08/18/08 18:27, Mostyn Lewis wrote:
George,
I'm glad you changed the scheduling and my program seems to work.
Thank you.
However, to stress it a bit more I changed
#define NUM_ITERS 1000
to
Hi Paul:
I can comment on why you are seeing the mpicxx problem, but I am not
sure what to do about it.
In the file mpicxx.cc there is a declaration near the bottom that looks
like this.
const int LOCK_SHARED = MPI_LOCK_SHARED;
The preprocessor is going through that file and replacing
mpicxx.cc issue.
Rolf
On 08/29/08 13:48, Rolf Vandevaart wrote:
Hi Paul:
I can comment on why you are seeing the mpicxx problem, but I am not
sure what to do about it.
In the file mpicxx.cc there is a declaration near the bottom that looks
like this.
const int LOCK_SHARED = MPI_LOCK_SHARED
1 - 100 of 129 matches
Mail list logo