Hi OMPI_Developers,
It seems that I am unable to establish an MPI communication between two
independently started MPI programs using the simplest client/server call
sequence I can imagine (see the two attached files) when the client and server
process are started on different machines. Note
, 2015 5:29 PM
To: Open MPI Users
Subject: Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail between
two different machines
Try running it with “—mca oob_base_verbose 100” on both client and server - it
will tell us why the connection was refused.
> On Jul 13, 2015, at 2:14
between
two different machines
I see the problem - it's a race condition, actually. I'll try to provide a
patch for you to test, if you don't mind.
> On Jul 13, 2015, at 3:03 PM, Audet, Martin <martin.au...@cnrc-nrc.gc.ca>
> wrote:
>
> Thanks Ralph for this quick respons
Yes, this patch applied over OpenMPI 1.8.6 solves my problem.
Attached are the new output files for the server and the client when started
with "--mca oob_base_verbose 100".
Will this patch be included in 1.8.7 ?
Thanks again,
Martin Audet
From: users
Hi OpenMPI_Users and OpenMPI_Developers,
I'm unable to connect a client application using MPI_Comm_connect() to a server
job (the server job calls MPI_Open_port() before calling by MPI_Comm_accept())
when the server job uses MX MTL (although it works without problems when the
server uses MX
Sorry,
I forgot the attachements...
Martin
De : users-boun...@open-mpi.org [users-boun...@open-mpi.org] de la part de
Audet, Martin [martin.au...@imi.cnrc-nrc.gc.ca]
Date d'envoi : 4 juin 2010 19:18
À : us...@open-mpi.org
Objet : [OMPI users] Unable
Thanks to both Scott and Jeff !
Next time I have a problem, I will check the README file first (Doh !).
Also we might mitigate the problem by connecting the workstation to the Myrinet
switch.
Martin
-Original Message-
From: users-boun...@open-mpi.org
Hi,
I'm experimenting with the MPI-2 functions for supporting the client/server
model in MPI (e.g. server and client are independently created MPI jobs
establishing an intercommunicator between them at run time, see section 5.4
"Establishing Communication" of the MPI-2 standard document) and
Hi again,
Thanks Pak for the link and suggesting to start an "orted" deamon, by doing so
my clients and servers jobs were able to establish an intercommunicator
between them.
However I modified my programs to perform an MPI_Allgather() of a single "int"
over the new intercommunicator to test
re, valgrind correctly detect the non initialized data transfer.
On Mar 12, 2008, at 3:58 PM, Audet, Martin wrote:
> Hi again,
>
> Thanks Pak for the link and suggesting to start an "orted" deamon,
> by doing so my clients and servers jobs were able to establish a
Hi Georges,
Thanks for your patch, but I'm not sure I got it correctly. The patch I got
modify a few arguments passed to isend()/irecv()/recv() in
coll_basic_allgather.c. Here is the patch I applied:
Index: ompi/mca/coll/basic/coll_basic_allgather.c
for example the server is started on 3 process and the client on 2 process.
Feel free to try yourself the two small client and server programs I posted in
my first message.
Thanks,
Martin
Subject: [OMPI users] RE : users Digest, Vol 841, Issue 3
From: Audet, Martin (Martin.Audet_at_[hidden])
List-Post
ses in the two
>> groups passes for me -- and did fail with the previous version.
>>
>>
>> Thanks
>> Edgar
>>
>>
>> Audet, Martin wrote:
>>> Hi Jeff,
>>>
>>> As I said in my last message (see bellow) the patch (or at least
Hi,
I don't know if it is my sample code or if it is a problem whit MPI_Scatter()
on inter-communicator (maybe similar to the problem we found with
MPI_Allgather() on inter-communicator a few weeks ago) but a simple program I
wrote freeze during its second iteration of a loop doing an
This has nothing to do with the segmentation fault you got but in addition to
Brian comment, I would suggest you to not forget that with ISO C++ (the C++98
standard and the upcoming C++0x) a constant expression known at compile time is
needed for dimensions of local arrays.
In other words, a
4) Well, this sounds reasonable, but according to the MPI-1 standard
(see page 40 for non-blocking send/recv, a more detailed explanation in
page 30):
"A nonblocking send call indicates that the system may start copying
data out of the send buffer. The sender
Hi,
After reading a little the FAQ on the methods used by Open MPI to deal with
memory registration (or pinning) with Infiniband adapter, it seems that we
could avoid all the overhead and complexity of memory
registration/deregistration, registration cache access and update, memory
management
Thanks Jeff for your answer.
It is sad that the approach I mentioned of having all memory registered for
user process on cluster nodes didn't become more popular.
I still believe that such an approach would shorten the executed code path in
MPI libraries, reduce message latency, increase the
Thanks Jeff and Alex for your answers and comments.
mlockall(), especially with the MCL_FUTURE argument is indeed interesting.
Thanks Jeff for your clarification of what memory registration really means
(e.g. locking and telling the network stack the virtual to physical mapping).
Also
Hi,
I just compiled my library with version 1.0 of OpenMPI and I had two problems.
First the MPI_Offset datatype is defined as a preprocessor macro as follow in
mpi.h:
/* Type of MPI_Offset */
#define MPI_Offset long long
This generate a syntax error when MPI_Offset is used in C++ for
Hi,
I just tried OpenMPI 1.0.1 and this time I had much less warnings related to
the C++ API than I had with version 1.0.0 (I compile with g++ -Wall).
I nonetheless looked at the C++ headers and found that those warnings were
still related to the use of C-style cast.
Some of them were simply
Hi,
It looks like there is a problem in OpenMPI 1.0.2 with how MPI_COMM_SELF
attributes callback functions are handled by MPI_Finalize().
The following C program register a callback function associated with the
MPI_COMM_SELF communicator to be called during the first steps of
MPI_Finalize().
Hi,
When I tried to install OpenMPI on the front node of a cluster using OpenPBS
batch system (e.g. --with-tm=/usr/open-pbs argument to configure), it didn't
work and I got the error message:
--- MCA component pls:tm (m4 configuration macro)
checking for MCA component pls:tm compile mode...
Hi,
I use sometimes OpenMPI and it looks like the mpicc wrapper gives gcc an
inexistant directory with -I option. If I ask mpicc how it calls gcc it prints
the following:
[audet@linux15 libdfem]$ mpicc -show
gcc -I/usr/local/openmpi-1.1.2/include
Hi,
I would like to suggest you to add macros indicating the version of the OpenMPI
library in the C/C++ header file mpi.h analogous to the parameter constants in
the Fortran header file:
parameter (OMPI_MAJOR_VERSION=1)
parameter (OMPI_MINOR_VERSION=1)
parameter
Hi OMPI_Users && OMPI_Developers,
Is there an equivalent to the MCA parameter btl_openib_include_if when using
MXM over Infiniband (e.g. either (pml=cm mtl=mxm) or (pml=yalla)) ?
I ask this question because I'm working on a cluster where LXC containers are
used on compute nodes (with SR-IOV I
Hi Josh,
Thanks for your reply. I did try setting MXM_RDMA_PORTS=mlx4_0:1 for all my MPI
processes
and it did improve performance but the performance I obtain isn't completely
satisfying.
When I use IMB 4.1 pingpong and sendrecv benchmarks between two nodes I get
using
Open MPI 1.10.3:
Hi Devendar,
Thank you for your answer.
Setting MXM_TLS=rc,shm,self does improve the speed of MXM (both latency and
bandwidth):
without MXM_TLS
comm lat_min bw_max bw_max
pingpong pingpongsendrecv
(us) (MB/s) (MB/s)
Hi Devendar,
Thank again you for your answer.
I searched a little bit and found that UD stands for "Unreliable Datagram"
while RC is for "Reliable Connected" transport mechanism. I found another
called DC for "Dynamically Connected" which is not supported on our HCA.
Do you know what is
29 matches
Mail list logo