Re: [OMPI users] An equivalent to btl_openib_include_if when MXM over Infiniband ?

2016-08-22 Thread Audet, Martin
Hi Devendar,

Thank again you for your answer.

I searched a little bit and found that UD stands for "Unreliable Datagram"
while RC is for "Reliable Connected" transport mechanism. I found another
called DC for "Dynamically Connected" which is not supported on our HCA.

Do you know what is basically the difference between them ?

I didn't find any information about this.

Which one is used by btl=openib (iverb), is it RC ?

Also are they all standard or some of them are supported only by Mellanox ?

I will try to convince the admin of the system I'm using to increase the
maximal shared segment size (SHMMAX). I guess what we have (e.g. 32 MB) is the
default. But I didn't find any document suggesting that we should increase
SHMMAX for helping MXM. This is a bit odd, if it's important, it should be
mentioned in Mellanox documentation at least.

I will check at the messaging rate benchmark osu_mbw_mr for sure to see if its
result are improved by MXM.

After looking at the MPI performance results published on your URL (e.g.
latencies around 1 us in native mode), I'm more and more convinced that our
results are suboptimal.

And after seeing the impact of SR-IOV published on your URL, I suspect more
and more that our mediocre latency is caused by this mechanism.

But our cluster is different: SR-IOV is not used in the context of Virtual
Machines running under a host VMM. SR-IOV is used with Linux LXC containers.


Martin Audet


> Hi Martin
>
> MXM default transport is UD (MXM_TLS=*ud*,shm,self), which is scalable when
> running with large applications.  RC(MXM_TLS=*rc,*shm,self)  is recommended
> for microbenchmarks and very small scale applications,
>
> yes, max seg size setting is too small.
>
> Did you check any message rate benchmarks(like osu_mbw_mr) with MXM?
>
> virtualization env will have some overhead.  see some perf comparision here
> with mvapich
> http://mvapich.cse.ohio-state.edu/performance/v-pt_to_pt/ .



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] An equivalent to btl_openib_include_if when MXM over Infiniband ?

2016-08-19 Thread Audet, Martin
Hi Devendar,

Thank you for your answer.

Setting MXM_TLS=rc,shm,self does improve the speed of MXM (both latency and 
bandwidth):

 without MXM_TLS

comm   lat_min  bw_max  bw_max
   pingpong pingpongsendrecv
   (us) (MB/s)  (MB/s)
---
openib 1.79 5827.9311552.4
mxm2.23 5191.77 8201.76
yalla  2.18 5200.55 8109.48


 with MXM_TLS=rc,shm,self

comm   lat_min  bw_max  bw_max
   pingpong pingpongsendrecv
   (us) (MB/s)  (MB/s)
---
openib 1.79 6021.8311529
mxm1.78 5936.9211168.5
yalla  1.78 5944.8611375


Note 1: MXM_RDMA_PORTS=mlx4_0:1 and the MCA parameter 
btl_openib_include_if=mlx4_0 for both cases.

Note 2: The bandwidth reported are not very accurate. Bandwidth results can 
vary easilly by 7% from one run to another.

We see that the performance of MXM is now very similar to the performance of 
openib for these IMB tests.

However an error is now reported a few times when MXM_TLS is set:

sys.c:468  MXM  ERROR A new segment was to be created and size < SHMMIN or size 
> SHMMAX, or the new segment was to be created. A segment with given key 
existed, but size is greater than the size of that segment. Please check limits 
by 'ipcs -l'.

"ipcs -l" reports among other things that:

  max seg size (kbytes) = 32768

By the way, is it too small ?


Now if we run /opt/mellanox/mxm/mxm_perftest we get:

  without  with
  MXM_TLS  MXM_TLS
  
  avg send_lat(us)1.6261.321

  avg send_bw   -s 400(MB/s)  5219.51  5514.04
  avg bidir send_bw -s 400 -b (MB/s)  5283.13  5514.45

Note: the -b for bidirectional bandwith doesn't seen to affect the result.

Again it is an improvement both in term of latency and bandwidth.

However a warning is reported when MXM_TLS is set on the server side when the 
send_lat test is run:

 icb_ep.c:287   MXM  WARN  The min value for CIB_RX_QUEUE_LEN is 2048.

Note: setting the undocumented env variable MXM_CIB_RX_QUEUE_LEN=2048 remove 
the warning but doesn't affect the send latency.


 * * *

So now the results are better: MXM performs as well as the regular openib in 
term of latency and bandwidth (I didn't checked the overlap capacity though). 
But I'm not really impressed. I was expecting MXM (especially when used by 
yalla) to be a little better than openib. Also the latency of both openib, mxm 
and yalla at 1.8 us seems to be too high. With a configuration like ours, we 
should get something closer to 1 us.

Does anyone has an idea ?

Don't forget that this cluster uses LXC containers with SR-IOV enabled for the 
Infiniband adapter.

Martin Audet


> Hi Martin,
>
> Can you check if it is any better with  "-x MXM_TLS=rc,shm,self" ?
>
> -Devendar


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] An equivalent to btl_openib_include_if when MXM over Infiniband ?

2016-08-16 Thread Audet, Martin
Hi Josh,

Thanks for your reply. I did try setting MXM_RDMA_PORTS=mlx4_0:1 for all my MPI 
processes
and it did improve performance but the performance I obtain isn't completely 
satisfying.

When I use IMB 4.1 pingpong and sendrecv benchmarks between two nodes I get 
using
Open MPI 1.10.3:

 without MXM_RDMA_PORTS

   comm   lat_min  bw_max  bw_max
  pingpong pingpongsendrecv
  (us) (MB/s)  (MB/s)
   ---
   openib 1.79 5947.0711534
   mxm2.51 5166.96 8079.18
   yalla  2.47 5167.29 8278.15


 with MXM_RDMA_PORTS=mlx4_0:1

   comm   lat_min  bw_max  bw_max
  pingpong pingpongsendrecv
  (us) (MB/s)  (MB/s)
   ---
   openib 1.79 5827.9311552.4
   mxm2.23 5191.77 8201.76
   yalla  2.18 5200.55 8109.48


openib means: pml=ob1 btl=openib,vader,self  
btl_openib_include_if=mlx4_0
mxmmeans: pml=cm,ob1 mtl=mxm  btl=vader,self
yalla  means: pml=yalla,ob1   btl=vader,self

lspci reports for our FDR Infiniband HCA:
  Infiniband Controler: Mellanox Technologies MT27500 Family [ConnectX-3]

and 16 lines like:
  Infiniband Controler: Mellanox Technologies MT27500/MT27520 Family 
[ConnectX-3/ConnectX-3 Pro Virtual Function]

the nodes use two octacore Xeon E5-2650v2 Ivybridge-EP 2.67 GHz sockets

ofed_info reports that mxm version is 3.4.3cce223-0.32200

As you can see the results are not very good. I would expect mxm and yalla to 
perform
better than openib both in term of latency and bandwidth (note: sendrecv 
bandwidth is
full duplex). I would expect the yalla bandwidth to be around 1.1 us like shown 
here
https://www.open-mpi.org/papers/sc-2014/Open-MPI-SC14-BOF.pdf (page 33).

I also ran mxm_perftest (located in /opt/mellanox/bin) and it reports the 
following
latency between two nodes:

 without MXM_RDMA_PORTS1.92 us
 withMXM_RDMA_PORTS=mlx4_0:1   1.65 us

Again I think we can expect a better latency with our configuration. 1.65 us is 
not a
very good result.

Note however that the 0.27 us (1.92 - 1.65 = 0.27) reduction reduction in raw 
mxm
latency correspond to the above Open MPI latencies observed with mxm (2.51 - 
2.23 = 0.28)
and yalla (2.47 - 2.18 = 0.29).

Another detail: everything is run inside LXC containers. Also SR-IOV is 
probably used.

Does anyone has any idea what's wrong with our cluster ?

Martin Audet


> Hi, Martin
>
> The environment variable:
>
> MXM_RDMA_PORTS=device:port
>
> is what you're looking for. You can specify a device/port pair on your OMPI
> command line like:
>
> mpirun -np 2 ... -x MXM_RDMA_PORTS=mlx4_0:1 ...
>
>
> Best,
>
> Josh

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] An equivalent to btl_openib_include_if when MXM over Infiniband ?

2016-08-12 Thread Audet, Martin
Hi OMPI_Users && OMPI_Developers,

Is there an equivalent to the MCA parameter btl_openib_include_if when using 
MXM over Infiniband (e.g. either (pml=cm  mtl=mxm) or (pml=yalla)) ?

I ask this question because I'm working on a cluster where LXC containers are 
used on compute nodes (with SR-IOV I think) and multiple mlx4 interfaces are 
reported by lstopo (e.g. mlx4_0, mlx4_1, ..., mlx4_16) even if a single 
physical Mellanox Connect-X3 HCA is present per node.

I found that when I use the plain openib btl (e.g. (pml=ob1  btl=openib)), it 
is much faster if I specify the MCA parameter btl_openib_include_if=mlx4_0 to 
force Open MPI to use a single interface. By doing that the latency is lower 
while the bandwidth higher. I guess it is because otherwise Open MPI mess by 
trying to use all "virtual" interfaces at once.

However we all know that MXM is better than plain openib since it allows the 
HCAs to perform message matching, transfer message in the background and 
provide communication progress.

So in this case is there a way to use only mlx4_0 ?

I mean when using mxm mtl (pml=cm  mtl=mxm) or preferably using it more 
directly by yalla pml (pml=yalla).

Note I'm using Open MPI 1.10.3 I compiled myself for now but I could use 
instead Open MPI 2.0 if necessary .

Thanks,

Martin Audet

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Ability to overlap communication and computation on Infiniband

2016-07-08 Thread Audet, Martin
Hi OMPI_Users and OMPI_Developers,

I would like someone to verify if my understanding is correct concerning Open 
MPI ability to overlap communication and computations on Infiniband when using 
non-blocking MPI_Isend() and MPI_Irecv() functions (i.e. the computation is 
done between the non-blocking MPI_Isend() on the sender or MPI_Irecv() on the 
receiver and the corresponding MPI_Wait()).

After reading the following FAQ entries:

   https://www.open-mpi.org/faq/?category=openfabrics#large-message-tuning-1.2
   https://www.open-mpi.org/faq/?category=openfabrics#large-message-tuning-1.3

and the paper:

   https://www.open-mpi.org/papers/euro-pvmmpi-2006-hpc-protocols/

about the algorithm used on OpenFabric to send large messages my understanding 
is that:

1-  When the "RDMA Direct" message protocol is used, the communication is 
done by an RDMA read on the receiver side so if the receiver calls MPI_Irecv() 
after it received a matching message envelope (tag, communicator) from the 
sender, then the receiver can start the RDMA read and let the Infiniband HCA 
operate and return from the MPI_Irecv() to let the receiving process compute. 
Then the next time the MPI library is called on the receiver side (or maybe in 
the corresponding MPI_Wait() call), the receiver sends a short ACK message to 
the sender to tell the sender the that the receive is completed and it is now 
free to do whatever it wants with the send buffer. When things happens this way 
(e.g. sender envelope received before MPI_Irecv() is called on the receiver 
side), it offers a great overlap potential on both receiver and sender side 
(because for the sender MPI_Isend() only have to send the envelope eagerly and 
its MPI_Wait() wait for the ACK).

However when the receiver call MPI_Irecv() before the sender envelope is 
received, the RDMA read transfer cannot start before the envelope is received 
and the MPI library realize it can start the RDMA read. If the receiver only 
realize this in the corresponding MPI_Wait(), there will be no overlap on the 
receiver side. The overlap potential is still good on the sender side for the 
same reason as the previous case.

2-  When the "RDMA Pipeline" protocol is used both sender and receiver side 
have to actively cooperate to transfer data using multiple Infiniband 
send/receive and RDMA writes. On the receiver side as the article says: 
"protocol effectively overlaps the cost of registration/deregistration with 
RDMA writes". This allows to overlap communication with registration overhead 
on the receiver side but not with computations. On the sender side I don't see 
how overlap with computation could be possible either. In practice when using 
this protocol is used between a pair of MPI_Isend() and MPI_Irecv() I fear that 
all the communication will happen when the sender and receiver reach their 
corresponding MPI_Wait() calls (which means no overlap).

So if someone could tell me if this is correct or not I would appreciate 
greatly.

I guess that the two above protocols correspond to the basic BTL/openib 
framework/component.

When a more modern MTL/mxm or PML/yall framework/component is used, I hope 
things are different and result in more communication/computation overlap 
potential.

Thanks in advance,

Martin Audet



[OMPI users] Experience with MXM, yalla, FCA and HCOLL with Mellanox HCA ?

2016-06-27 Thread Audet, Martin
Hi Open MPI Users and Developers,

I would like to know your experience with the optional middleware and the 
corresponding Open MPI framework/components for recent Mellanox Infiniband HCA, 
especially concerning MXM, FCA (the latest versions bring HCOLL I think) and 
the related Open MPI framework/components such as the MTL/mxm, PML/yalla, the 
COLL/fca and COLL/hcoll.

Does MXM when used with MTL/mxm or PML/yalla really improve communication speed 
over the plain BTL/openib ?

Especially since MXM allows matching message tags, I suppose that in addition 
to improving a little the usual latency/bandwidth metrics, it would increase 
the communication/computation overlap potential when used with non-blocking MPI 
calls since the adapter is more autonomous.

I remember that with old Myrinet networks, the matching MX middleware for our 
application was way better than the earlier non-matching GM middleware. I guess 
it is the same thing now with Infiniband / OpenFabric networks. Matching 
middleware should therefore be better.


Also concerning FCA and HCOLL,  do they really improve the speed of the 
collective operations ?

>From the Mellanox documentation I saw they are supposed to use hardware 
>broadcast and take into account the topology to favor the faster connections 
>between process located on the same nodes. I also saw in these documents that 
>recent version of FCA is able to perform the reduction operations on the HCA 
>itself, even the floating point ones. This should greatly improve the speed of 
>MPI_Allreduce() in our codes !

So for those lucky who have access to a recent well configured Mellanox 
Infiniband cluster with recent middleware and an Open MPI library well 
configured to take advantage of this, does it deliver its promises ?

The only documentation/reports I could find on Internet on these subjects are 
from Mellanox in addition to this for PML/yalla and MTL/mxm (slide 32):

  https://www.open-mpi.org/papers/sc-2014/Open-MPI-SC14-BOF.pdf

Thanks in advance,


Martin Audet



Re: [OMPI users] Avoiding the memory registration costs by having memory always registered, is it possible with Linux ?

2016-06-27 Thread Audet, Martin
Thanks Jeff and Alex for your answers and comments.

mlockall(), especially with the MCL_FUTURE argument is indeed interesting.

Thanks Jeff for your clarification of what memory registration really means 
(e.g. locking and telling the network stack the virtual to physical mapping).

Also concerning the ummunotify kernel module, I would like to point out that 
while the link sent to github bug report suggests it is problematic, the top 
level Open MPI README file still recommends it. Should the README file need to 
be updated ?

Regards,

Martin Audet



Re: [OMPI users] Avoiding the memory registration costs by having memory always registered, is it possible with Linux ?

2016-06-20 Thread Audet, Martin
Thanks Jeff for your answer.

It is sad that the approach I mentioned of having all memory registered for 
user process on cluster nodes didn't become more popular.

I still believe that such an approach would shorten the executed code path in 
MPI libraries, reduce message latency, increase the communication/computation 
overlap potential and allows communication progress more naturally.

But now since we have to live with memory registration issues, what changes 
should be done to standard Linux distro so that Open MPI can best use a recent 
Mellanox Infiniband network ?

I guess that installing the ummunotify kernel module is a good idea ?

Maybe also removing the limits on the "max locked memory" (ulimit -l) is also 
good ?

Beside that, I guess that installing the latest OFED (to have the latest 
middleware) instead of using the default one coming with the Linux distro is a 
good idea ?

Also does the XPMEM kernel module for more efficient intra node transfer of 
large message worth installing since kernels now include the CMA API ?

Thanks,

Martin Audet



[OMPI users] Avoiding the memory registration costs by having memory always registered, is it possible with Linux ?

2016-06-16 Thread Audet, Martin
Hi,

After reading a little the FAQ on the methods used by Open MPI to deal with 
memory registration (or pinning) with Infiniband adapter, it seems that we 
could avoid all the overhead and complexity of memory 
registration/deregistration, registration cache access and update, memory 
management (ummunotify) in addition to allowing a better overlap of the 
communications with the computations (we could let the communication hardware 
do its job independently without resorting to 
registration/transfer/deregistration pipelines) by simply having all user 
process memory registered all the time.

Of course a configuration like that is not appropriate in a general setting 
(ex: a desktop environment) as it would make swapping almost impossible.

But in the context of an HPC node where the processes are not supposed to swap 
and the OS not overcommit memory, not being able to swap doesn't appear to be a 
problem.

Moreover since the maximal total memory used per process is often predefined at 
the application start as a resource specified to the queuing system, the OS 
could easily keep a defined amount of extra memory for its own need instead of 
swapping out user process memory.

I guess that specialized (non-Linux) compute node OS does this.

But is it possible and does it make sense with Linux ?

Thanks,

Martin Audet



Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail between two different machines

2015-07-14 Thread Audet, Martin
Yes, this patch applied over OpenMPI 1.8.6 solves my problem.

Attached are the new output files for the server and the client when started 
with "--mca oob_base_verbose 100".

Will this patch be included in 1.8.7 ?

Thanks again,

Martin Audet

From: users [users-boun...@open-mpi.org] On Behalf Of Ralph Castain 
[r...@open-mpi.org]
Sent: Tuesday, July 14, 2015 11:10 AM
To: Open MPI Users
Subject: Re: [OMPI users] MPI_Comm_accept() /   MPI_Comm_connect()  fail
between two different machines

This seems to fix the problem when using your example on my cluster - please 
let me know if it solves things for you



server_out2.txt.bz2
Description: server_out2.txt.bz2


client_out2.txt.bz2
Description: client_out2.txt.bz2


Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail between two different machines

2015-07-14 Thread Audet, Martin
I will happily test any patch you send me to fix this problem.

Thanks,

Martin

-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: July 13, 2015 22:55
To: Open MPI Users
Subject: Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail between 
two different machines

I see the problem - it's a race condition, actually. I'll try to provide a 
patch for you to test, if you don't mind.


> On Jul 13, 2015, at 3:03 PM, Audet, Martin  
> wrote:
> 
> Thanks Ralph for this quick response.
> 
> In the two attachements you will find the output I got when running the 
> following commands:
> 
> [audet@fn1 mpi]$ mpiexec --mca oob_base_verbose 100 -n 1 
> ./simpleserver 2>&1 | tee server_out.txt
> 
> [audet@linux15 mpi]$ mpiexec --mca oob_base_verbose 100 -n 1 
> ./simpleclient 
> '227264.0;tcp://172.17.15.20:56377+227265.0;tcp://172.17.15.20
> :34776:300' 2>&1 | tee client_out.txt
> 
> Martin
> 
> From: users [users-boun...@open-mpi.org] On Behalf Of Ralph Castain 
> [r...@open-mpi.org]
> Sent: Monday, July 13, 2015 5:29 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail   
> between two different machines
> 
> Try running it with "-mca oob_base_verbose 100" on both client and server - 
> it will tell us why the connection was refused.
> 
> 
>> On Jul 13, 2015, at 2:14 PM, Audet, Martin  
>> wrote:
>> 
>> Hi OMPI_Developers,
>> 
>> It seems that I am unable to establish an MPI communication between two 
>> independently started MPI programs using the simplest client/server call 
>> sequence I can imagine (see the two attached files) when the client and 
>> server process are started on different machines. Note that I have no 
>> problems when the client and server program run on the same machine.
>> 
>> For example if I do the following on the server machine (running on fn1):
>> 
>> [audet@fn1 mpi]$ mpicc -Wall simpleserver.c -o simpleserver
>> [audet@fn1 mpi]$ mpiexec -n 1 ./simpleserver Server port = 
>> '3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'
>> 
>> The server prints its port (created with MPI_Open_port()) and wait for a 
>> connection by calling MPI_Comm_accept().
>> 
>> Now on the client machine (running on linux15) if I compile the client and 
>> run it with the above port address on the command line, I get:
>> 
>> [audet@linux15 mpi]$ mpicc -Wall simpleclient.c -o simpleclient
>> [audet@linux15 mpi]$ mpiexec -n 1 ./simpleclient 
>> '3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'
>> trying to connect...
>> 
>> A process or daemon was unable to complete a TCP connection to 
>> another process:
>> Local host:linux15
>> Remote host:   linux15
>> This is usually caused by a firewall on the remote host. Please check 
>> that any firewall (e.g., iptables) has been disabled and try again.
>> 
>> [linux15:24193] [[13075,0],0]-[[46606,0],0] 
>> mca_oob_tcp_peer_send_handler: invalid connection state (6) on socket 
>> 16
>> 
>> And then I have to stop the client program by pressing ^C (and also the 
>> server which doesn't seems affected).
>> 
>> What's wrong ?
>> 
>> And I am almost sure there is no firewall running on linux15.
>> 
>> It is not the first MPI client/server application I am developing (with both 
>> OpenMPI and mpich).
>> These simple MPI client/server programs work well with mpich (version 3.1.3).
>> 
>> This problem happens with both OpenMPI 1.8.3 and 1.8.6
>> 
>> linux15 and fn1 run both on Fedora Core 12 Linux (64 bits) and are connected 
>> by a Gigabit Ethernet (the normal network).
>> 
>> And again if client and server run on the same machine (either fn1 or 
>> linux15) no such problems happens.
>> 
>> Thanks in advance,
>> 
>> Martin 
>> Audet
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/07/27271.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/07/27272.php
> __
> _
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/07/27273.php

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/07/27274.php


Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail between two different machines

2015-07-13 Thread Audet, Martin
Thanks Ralph for this quick response.

In the two attachements you will find the output I got when running the 
following commands:

[audet@fn1 mpi]$ mpiexec --mca oob_base_verbose 100 -n 1 ./simpleserver 2>&1 | 
tee server_out.txt

[audet@linux15 mpi]$ mpiexec --mca oob_base_verbose 100 -n 1 ./simpleclient 
'227264.0;tcp://172.17.15.20:56377+227265.0;tcp://172.17.15.20:34776:300'
 2>&1 | tee client_out.txt

Martin

From: users [users-boun...@open-mpi.org] On Behalf Of Ralph Castain 
[r...@open-mpi.org]
Sent: Monday, July 13, 2015 5:29 PM
To: Open MPI Users
Subject: Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail   between 
two different machines

Try running it with “—mca oob_base_verbose 100” on both client and server - it 
will tell us why the connection was refused.


> On Jul 13, 2015, at 2:14 PM, Audet, Martin  
> wrote:
>
> Hi OMPI_Developers,
>
> It seems that I am unable to establish an MPI communication between two 
> independently started MPI programs using the simplest client/server call 
> sequence I can imagine (see the two attached files) when the client and 
> server process are started on different machines. Note that I have no 
> problems when the client and server program run on the same machine.
>
> For example if I do the following on the server machine (running on fn1):
>
> [audet@fn1 mpi]$ mpicc -Wall simpleserver.c -o simpleserver
> [audet@fn1 mpi]$ mpiexec -n 1 ./simpleserver
> Server port = 
> '3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'
>
> The server prints its port (created with MPI_Open_port()) and wait for a 
> connection by calling MPI_Comm_accept().
>
> Now on the client machine (running on linux15) if I compile the client and 
> run it with the above port address on the command line, I get:
>
> [audet@linux15 mpi]$ mpicc -Wall simpleclient.c -o simpleclient
> [audet@linux15 mpi]$ mpiexec -n 1 ./simpleclient 
> '3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'
> trying to connect...
> 
> A process or daemon was unable to complete a TCP connection
> to another process:
>  Local host:linux15
>  Remote host:   linux15
> This is usually caused by a firewall on the remote host. Please
> check that any firewall (e.g., iptables) has been disabled and
> try again.
> 
> [linux15:24193] [[13075,0],0]-[[46606,0],0] mca_oob_tcp_peer_send_handler: 
> invalid connection state (6) on socket 16
>
> And then I have to stop the client program by pressing ^C (and also the 
> server which doesn't seems affected).
>
> What's wrong ?
>
> And I am almost sure there is no firewall running on linux15.
>
> It is not the first MPI client/server application I am developing (with both 
> OpenMPI and mpich).
> These simple MPI client/server programs work well with mpich (version 3.1.3).
>
> This problem happens with both OpenMPI 1.8.3 and 1.8.6
>
> linux15 and fn1 run both on Fedora Core 12 Linux (64 bits) and are connected 
> by a Gigabit Ethernet (the normal network).
>
> And again if client and server run on the same machine (either fn1 or 
> linux15) no such problems happens.
>
> Thanks in advance,
>
> Martin 
> Audet___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/07/27271.php

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/07/27272.php
[fn1:07315] mca: base: components_register: registering oob components
[fn1:07315] mca: base: components_register: found loaded component tcp
[fn1:07315] mca: base: components_register: component tcp register function 
successful
[fn1:07315] mca: base: components_open: opening oob components
[fn1:07315] mca: base: components_open: found loaded component tcp
[fn1:07315] mca: base: components_open: component tcp open function successful
[fn1:07315] mca:oob:select: checking available component tcp
[fn1:07315] mca:oob:select: Querying component [tcp]
[fn1:07315] oob:tcp: component_available called
[fn1:07315] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
[fn1:07315] [[37299,0],0] oob:tcp:init rejecting loopback interface lo
[fn1:07315] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4
[fn1:07315] [[37299,0],0] oob:tcp:init adding 172.17.15.20 to our list of V4 
connections
[fn1:07315] [[37299,0],0] TCP STARTUP
[

[OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail between two different machines

2015-07-13 Thread Audet, Martin
Hi OMPI_Developers,

It seems that I am unable to establish an MPI communication between two 
independently started MPI programs using the simplest client/server call 
sequence I can imagine (see the two attached files) when the client and server 
process are started on different machines. Note that I have no problems when 
the client and server program run on the same machine.

For example if I do the following on the server machine (running on fn1):

[audet@fn1 mpi]$ mpicc -Wall simpleserver.c -o simpleserver
[audet@fn1 mpi]$ mpiexec -n 1 ./simpleserver
Server port = 
'3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'

The server prints its port (created with MPI_Open_port()) and wait for a 
connection by calling MPI_Comm_accept().

Now on the client machine (running on linux15) if I compile the client and run 
it with the above port address on the command line, I get:

[audet@linux15 mpi]$ mpicc -Wall simpleclient.c -o simpleclient
[audet@linux15 mpi]$ mpiexec -n 1 ./simpleclient 
'3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'
trying to connect...

A process or daemon was unable to complete a TCP connection
to another process:
  Local host:linux15
  Remote host:   linux15
This is usually caused by a firewall on the remote host. Please
check that any firewall (e.g., iptables) has been disabled and
try again.

[linux15:24193] [[13075,0],0]-[[46606,0],0] mca_oob_tcp_peer_send_handler: 
invalid connection state (6) on socket 16

And then I have to stop the client program by pressing ^C (and also the server 
which doesn't seems affected).

What's wrong ?

And I am almost sure there is no firewall running on linux15.

It is not the first MPI client/server application I am developing (with both 
OpenMPI and mpich).
These simple MPI client/server programs work well with mpich (version 3.1.3).

This problem happens with both OpenMPI 1.8.3 and 1.8.6

linux15 and fn1 run both on Fedora Core 12 Linux (64 bits) and are connected by 
a Gigabit Ethernet (the normal network).

And again if client and server run on the same machine (either fn1 or linux15) 
no such problems happens.

Thanks in advance,

Martin Audet#include 
#include 

#include 

int main(int argc, char **argv)
{
   int   comm_rank;
   char  port_name[MPI_MAX_PORT_NAME];
   MPI_Comm intercomm;
   int  ok_flag;

   MPI_Init(&argc, &argv);

   MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);

   ok_flag = (comm_rank != 0) || (argc == 1);
   MPI_Bcast(&ok_flag, 1, MPI_INT, 0, MPI_COMM_WORLD);

   if (!ok_flag) {
  if (comm_rank == 0) {
 fprintf(stderr,"Usage: %s\n",argv[0]);
  }
  MPI_Abort(MPI_COMM_WORLD, 1);
   }

   MPI_Open_port(MPI_INFO_NULL, port_name);

   if (comm_rank == 0) {
  printf("Server port = '%s'\n", port_name);
   }
   MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm);

   MPI_Close_port(port_name);

   if (comm_rank == 0) {
  printf("MPI_Comm_accept() sucessful...\n");
   }

   MPI_Comm_disconnect(&intercomm);

   MPI_Finalize();

   return EXIT_SUCCESS;
}
#include 
#include 
#include 

#include 

int main(int argc, char **argv)
{
   int  comm_rank;
   int  ok_flag;
   MPI_Comm intercomm;

   MPI_Init(&argc, &argv);

   MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);

   ok_flag = (comm_rank != 0)  || ((argc == 2)  &&  argv[1]  &&  (*argv[1] != '\0'));
   MPI_Bcast(&ok_flag, 1, MPI_INT, 0, MPI_COMM_WORLD);

   if (!ok_flag) {
  if (comm_rank == 0) {
 fprintf(stderr,"Usage: %s mpi_port\n", argv[0]);
  }
  MPI_Abort(MPI_COMM_WORLD, 1);
   }

   if (comm_rank == 0) {
  printf("trying to connect...\n");
   }
   while (MPI_Comm_connect((comm_rank == 0) ? argv[1] : 0, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm) != MPI_SUCCESS) {
  if (comm_rank == 0) {
 printf("MPI_Comm_connect() failled, sleeping and retrying...\n");
  }
  sleep(1);
   }
   if (comm_rank == 0) {
  printf("MPI_Comm_connect() sucessful...\n");
   }

   MPI_Comm_disconnect(&intercomm);

   MPI_Finalize();

   return EXIT_SUCCESS;
}


Re: [OMPI users] Unable to connect to a server using MX MTL with TCP

2010-06-09 Thread Audet, Martin
Thanks to both Scott and Jeff !

Next time I have a problem, I will check the README file first (Doh !).

Also we might mitigate the problem by connecting the workstation to the Myrinet 
switch.

Martin

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Jeff Squyres
Sent: June 9, 2010 15:34
To: Open MPI Users
Subject: Re: [OMPI users] Unable to connect to a server using MX MTL with TCP

On Jun 5, 2010, at 7:52 AM, Scott Atchley wrote:

> I do not think this is a supported scenario. George or Jeff can correct me, 
> but when you use the MX MTL you are using the pml cm and not the pml ob1. The 
> BTLs are part of ob1. When using the MX MTL, it cannot use the TCP BTL.
> 
> You only solution would be to use the MX BTL.

Sorry for the delayed reply.

Scott is correct; the MX MTL uses the "cm" PML.  The "cm" PML can only use 
*one* MTL at a time (little known fact of Open MPI lore: "cm" stands for 
several things, one of which is "Connor MacLeod" -- there can only be one).

Here's a chunk of text from the README:

- There are three MPI network models available: "ob1", "csum", and
  "cm".  "ob1" and "csum" use BTL ("Byte Transfer Layer") components
  for each supported network.  "cm" uses MTL ("Matching Tranport
  Layer") components for each supported network.

  - "ob1" supports a variety of networks that can be used in
combination with each other (per OS constraints; e.g., there are
reports that the GM and OpenFabrics kernel drivers do not operate
well together):
- OpenFabrics: InfiniBand and iWARP
- Loopback (send-to-self)
- Myrinet: GM and MX (including Open-MX)
- Portals
- Quadrics Elan
- Shared memory
- TCP
- SCTP
- uDAPL

  - "csum" is exactly the same as "ob1", except that it performs
additional data integrity checks to ensure that the received data
is intact (vs. trusting the underlying network to deliver the data
correctly).  csum supports all the same networks as ob1, but there
is a performance penalty for the additional integrity checks.

  - "cm" supports a smaller number of networks (and they cannot be
used together), but may provide better better overall MPI
performance:
- Myrinet MX (including Open-MX, but not GM)
- InfiniPath PSM
- Portals

  Open MPI will, by default, choose to use "cm" when the InfiniPath
  PSM MTL can be used.  Otherwise, "ob1" will be used and the
  corresponding BTLs will be selected.  "csum" will never be selected
  by default.  Users can force the use of ob1 or cm if desired by
  setting the "pml" MCA parameter at run-time:

shell$ mpirun --mca pml ob1 ...
or
shell$ mpirun --mca pml csum ...
or
shell$ mpirun --mca pml cm ...

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] RE : Unable to connect to a server using MX MTL with TCP

2010-06-04 Thread Audet, Martin
Sorry,

I forgot the attachements...

Martin


De : users-boun...@open-mpi.org [users-boun...@open-mpi.org] de la part de 
Audet, Martin [martin.au...@imi.cnrc-nrc.gc.ca]
Date d'envoi : 4 juin 2010 19:18
À : us...@open-mpi.org
Objet : [OMPI users] Unable to connect to a server using MX MTL with TCP

Hi OpenMPI_Users and OpenMPI_Developers,

I'm unable to connect a client application using MPI_Comm_connect() to a server 
job (the server job calls MPI_Open_port() before calling by MPI_Comm_accept()) 
when the server job uses MX MTL (although it works without problems when the 
server uses MX BTL). The server job runs on a cluster connected to a Myrinet 
10G network (MX 1.2.11) in addition to an ordinary Ethernet network. The client 
runs on a different machine, not connected to the Myrinet network but 
accessible via the Ethernet network.

Joined to this message are the simple server and client programs (87 lines 
total) called simpleserver.c and simpleclient.c .

Note we are using OpenMPI 1.4.2 on x86_64 Linux  (server: Fedora 7 client: 
Fedora 12).

Compiling these programs with mpicc on the server front node (fn1) and client 
workstation (linux15) works well:

   [audet@fn1 bench]$ mpicc simpleserver.c -o simpleserver

   [audet@linux15 mpi]$ mpicc simpleclient.c -o simpleclient

Then if we start the server on the cluster (job is started on cluster node 
cn18) and asking to use MTL :

   [audet@fn1 bench]$ mpiexec -x MX_RCACHE=2 -machinefile machinefile_cn18 
--mca mtl mx --mca pml cm -n 1 ./simpleserver

It prints the server port (Note we uses MX_RCACHE=2 to avoid a warning but it 
doesn't affect the current issue) :

   Server port = 
'3548905472.0;tcp://172.17.15.20:39517+3548905473.0;tcp://172.17.10.18:47427:300'

Then starting the client on the workstation with this port number:

   [audet@linux15 mpi]$ mpiexec -n 1 ./simpleclient 
'3548905472.0;tcp://172.17.15.20:39517+3548905473.0;tcp://172.17.10.18:47427:300'

The server process core dump as follow:

   MPI_Comm_accept() sucessful...
   [cn18:24582] *** Process received signal ***
   [cn18:24582] Signal: Segmentation fault (11)
   [cn18:24582] Signal code: Address not mapped (1)
   [cn18:24582] Failing at address: 0x38
   [cn18:24582] [ 0] /lib64/libpthread.so.0 [0x305de0dd20]
   [cn18:24582] [ 1] /usr/local/openmpi-1.4.2/lib/openmpi/mca_mtl_mx.so 
[0x2d6a7e6d]
   [cn18:24582] [ 2] /usr/local/openmpi-1.4.2/lib/openmpi/mca_pml_cm.so 
[0x2d4a319d]
   [cn18:24582] [ 3] 
/usr/local/openmpi/lib/libmpi.so.0(ompi_dpm_base_disconnect_init+0xbf) 
[0x2ab1403f]
   [cn18:24582] [ 4] /usr/local/openmpi-1.4.2/lib/openmpi/mca_dpm_orte.so 
[0x2ed0eb19]
   [cn18:24582] [ 5] 
/usr/local/openmpi/lib/libmpi.so.0(PMPI_Comm_disconnect+0xa0) [0x2aaf4f20]
   [cn18:24582] [ 6] ./simpleserver(main+0x14c) [0x400d04]
   [cn18:24582] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x305ce1daa4]
   [cn18:24582] [ 8] ./simpleserver [0x400b09]
   [cn18:24582] *** End of error message ***
   --
   mpiexec noticed that process rank 0 with PID 24582 on node cn18 exited on 
signal 11 (Segmentation fault).
   --
   [audet@fn1 bench]$

And the client stops with the following error message:

   --
   At least one pair of MPI processes are unable to reach each other for
   MPI communications.  This means that no Open MPI device has indicated
   that it can be used to communicate between these processes.  This is
   an error; Open MPI requires that all MPI processes be able to reach
   each other.  This error can sometimes be the result of forgetting to
   specify the "self" BTL.

 Process 1 ([[31386,1],0]) is on host: linux15
 Process 2 ([[54152,1],0]) is on host: cn18
 BTLs attempted: self sm tcp

   Your MPI job is now going to abort; sorry.
   --
   MPI_Comm_connect() sucessful...
   Error in comm_disconnect_waitall
   [audet@linux15 mpi]$

I really don't understand this message because the client can connect with the 
server using tcp on Ethernet.

Moreover if I add MCA options when I start the server to include TCP BTL, the 
same problems happens (the argument list then becomes: '--mca mtl mx --mca pml 
cm --mca btl tcp,shared,self' ).

However if I remove all MCA options when I start the server (e.g. when BTL MX 
is used), no such problems appears. Everything goes fine also if I start the 
server with an explicit request to use BTL MX and TCP (e.g. with options '--mca 
btl mx,tcp,sm,self').

Four running our server application we really prefer to use MX MTL over MX BTL 
since it is much faster with MTL (although the usual ping pong test is only 
slightly fa

[OMPI users] Unable to connect to a server using MX MTL with TCP

2010-06-04 Thread Audet, Martin
Hi OpenMPI_Users and OpenMPI_Developers,

I'm unable to connect a client application using MPI_Comm_connect() to a server 
job (the server job calls MPI_Open_port() before calling by MPI_Comm_accept()) 
when the server job uses MX MTL (although it works without problems when the 
server uses MX BTL). The server job runs on a cluster connected to a Myrinet 
10G network (MX 1.2.11) in addition to an ordinary Ethernet network. The client 
runs on a different machine, not connected to the Myrinet network but 
accessible via the Ethernet network.

Joined to this message are the simple server and client programs (87 lines 
total) called simpleserver.c and simpleclient.c .

Note we are using OpenMPI 1.4.2 on x86_64 Linux  (server: Fedora 7 client: 
Fedora 12).

Compiling these programs with mpicc on the server front node (fn1) and client 
workstation (linux15) works well:

   [audet@fn1 bench]$ mpicc simpleserver.c -o simpleserver

   [audet@linux15 mpi]$ mpicc simpleclient.c -o simpleclient

Then if we start the server on the cluster (job is started on cluster node 
cn18) and asking to use MTL :

   [audet@fn1 bench]$ mpiexec -x MX_RCACHE=2 -machinefile machinefile_cn18 
--mca mtl mx --mca pml cm -n 1 ./simpleserver

It prints the server port (Note we uses MX_RCACHE=2 to avoid a warning but it 
doesn't affect the current issue) :

   Server port = 
'3548905472.0;tcp://172.17.15.20:39517+3548905473.0;tcp://172.17.10.18:47427:300'

Then starting the client on the workstation with this port number:

   [audet@linux15 mpi]$ mpiexec -n 1 ./simpleclient 
'3548905472.0;tcp://172.17.15.20:39517+3548905473.0;tcp://172.17.10.18:47427:300'

The server process core dump as follow:

   MPI_Comm_accept() sucessful...
   [cn18:24582] *** Process received signal ***
   [cn18:24582] Signal: Segmentation fault (11)
   [cn18:24582] Signal code: Address not mapped (1)
   [cn18:24582] Failing at address: 0x38
   [cn18:24582] [ 0] /lib64/libpthread.so.0 [0x305de0dd20]
   [cn18:24582] [ 1] /usr/local/openmpi-1.4.2/lib/openmpi/mca_mtl_mx.so 
[0x2d6a7e6d]
   [cn18:24582] [ 2] /usr/local/openmpi-1.4.2/lib/openmpi/mca_pml_cm.so 
[0x2d4a319d]
   [cn18:24582] [ 3] 
/usr/local/openmpi/lib/libmpi.so.0(ompi_dpm_base_disconnect_init+0xbf) 
[0x2ab1403f]
   [cn18:24582] [ 4] /usr/local/openmpi-1.4.2/lib/openmpi/mca_dpm_orte.so 
[0x2ed0eb19]
   [cn18:24582] [ 5] 
/usr/local/openmpi/lib/libmpi.so.0(PMPI_Comm_disconnect+0xa0) [0x2aaf4f20]
   [cn18:24582] [ 6] ./simpleserver(main+0x14c) [0x400d04]
   [cn18:24582] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x305ce1daa4]
   [cn18:24582] [ 8] ./simpleserver [0x400b09]
   [cn18:24582] *** End of error message ***
   --
   mpiexec noticed that process rank 0 with PID 24582 on node cn18 exited on 
signal 11 (Segmentation fault).
   --
   [audet@fn1 bench]$

And the client stops with the following error message:

   --
   At least one pair of MPI processes are unable to reach each other for
   MPI communications.  This means that no Open MPI device has indicated
   that it can be used to communicate between these processes.  This is
   an error; Open MPI requires that all MPI processes be able to reach
   each other.  This error can sometimes be the result of forgetting to
   specify the "self" BTL.

 Process 1 ([[31386,1],0]) is on host: linux15
 Process 2 ([[54152,1],0]) is on host: cn18
 BTLs attempted: self sm tcp

   Your MPI job is now going to abort; sorry.
   --
   MPI_Comm_connect() sucessful...
   Error in comm_disconnect_waitall
   [audet@linux15 mpi]$

I really don't understand this message because the client can connect with the 
server using tcp on Ethernet.

Moreover if I add MCA options when I start the server to include TCP BTL, the 
same problems happens (the argument list then becomes: '--mca mtl mx --mca pml 
cm --mca btl tcp,shared,self' ).

However if I remove all MCA options when I start the server (e.g. when BTL MX 
is used), no such problems appears. Everything goes fine also if I start the 
server with an explicit request to use BTL MX and TCP (e.g. with options '--mca 
btl mx,tcp,sm,self').

Four running our server application we really prefer to use MX MTL over MX BTL 
since it is much faster with MTL (although the usual ping pong test is only 
slightly faster with MTL).

Enclosed also the output of ompi_info --all runned on the cluster node (cn18) 
and the workstation (linux15).

Please help me. I think my problem is only a question of wrong MCA parameters 
(which is obscure for me).

Thanks,

Martin Audet, Research Officer
Industrial Material Institute
National Research Council of Canada
75 de Mortagne, Boucherville, QC, J4B 6Y4, Canada



Re: [OMPI users] Memchecker report on v1.3b2 (includes potential bug reports)

2008-11-19 Thread Audet, Martin



 4)  Well, this sounds reasonable, but according to the MPI-1 standard
 (see page 40 for non-blocking send/recv, a more detailed explanation in
 page 30):

 "A nonblocking send call indicates that the system may start copying
 data out of the send buffer. The sender should */not access*/ any part
 of the send buffer after a nonblocking send operation is called, until
 the send completes."

 So before calling MPI_Wait to complete an isend operation, any 
 access to
 the send buffer is illegal. It might be a little strict, but we have to
 do what the standard says.
>>
>> This have been changed in the new version of the MPI standard (2.1). 
>> There is no restriction anymore regarding the read operations on the 
>> buffers used for non-blocking sends.
>Do you mean the next coming version of MPI standard? Because checking 
>again standard 2.1 , I didn't see any changes of those paragraphs. See 
>MPI Standard 2.1 (PDF version), page 52, and page 41.

The (non modifying) access to a send buffer was agreed for MPI Standard 2.2 not 
version 2.1 see the MPI 2.2 Wiki: 

   https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/MpiTwoTwoWikiPage

   https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/45

Martin



Re: [OMPI users] Memory question and possible bug in 64bit addressing under Leopard!

2008-04-25 Thread Audet, Martin
This has nothing to do with the segmentation fault you got but in addition to 
Brian comment, I would suggest you to not forget that with ISO C++ (the C++98 
standard and the upcoming C++0x) a constant expression known at compile time is 
needed for dimensions of local arrays.

In other words, a construct like:

int n = 1000;
float X[n];

isn't standard compliant because n isn't a constant expression. It compile only 
because it is a g++ extension (try this with Visual C++ for example). A 
construct like:

const int n = 1000;
float X[n];

however is standard compliant since n is a constant expression known at compile 
time.

Variable length arrays would allow setting dimensions of local arrays using any 
integral expression (whether or not it is constant or known at compile time). 
This feature was added to the ISO C language in the C99 standard but not in C++.

Martin



-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Brian Barrett
Sent: April 25, 2008 16:11
To: Open MPI Users
Subject: Re: [OMPI users] Memory question and possible bug in 64bit addressing 
under Leopard!

On Apr 25, 2008, at 2:06 PM, Gregory John Orris wrote:

> produces a core dump on a machine with 12Gb of RAM.
>
> and the error message
>
> mpiexec noticed that job rank 0 with PID 75545 on node mymachine.com
> exited on signal 4 (Illegal instruction).
>
> However, substituting in
>
> float *X = new float[n];
> for
> float X[n];
>
> Succeeds!


You're running off the end of the stack, because of the large amount
of data you're trying to put there.  OS X by default has a tiny stack
size, so codes that run on Linux (which defaults to a much larger
stack size) sometimes show this problem.  Your best bets are either to
increase the max stack size or (more portably) just allocate
everything on the heap with malloc/new.

Hope this helps,

Brian

--
   Brian Barrett
   Open MPI developer
   http://www.open-mpi.org/


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] Problem with MPI_Scatter() on inter-communicator...

2008-04-04 Thread Audet, Martin
Hi,

I don't know if it is my sample code or if it is a problem whit MPI_Scatter() 
on inter-communicator (maybe similar to the problem we found with 
MPI_Allgather() on inter-communicator a few weeks ago) but a simple program I 
wrote freeze during its second iteration of a loop doing an MPI_Scatter() over 
an inter-communicator.

For example if I compile as follows:

  mpicc -Wall scatter_bug.c -o scatter_bug

I get no error or warning. Then if a start it with np=2 as follows:

mpiexec -n 2 ./scatter_bug

it prints:

   beginning Scatter i_root_group=0
   ending Scatter i_root_group=0
   beginning Scatter i_root_group=1

and then hang...

Note also that if I change the for loop to execute only the MPI_Scatter() of 
the second iteration (e.g. replacing "i_root_group=0;" by "i_root_group=1;"), 
it prints:

beginning Scatter i_root_group=1

and then hang...

The problem therefore seems to be related with the second iteration itself.

Please note that this program run fine with mpich2 1.0.7rc2 (ch3:sock device) 
for many different number of process (np) when the executable is ran with or 
without valgrind.

The OpenMPI version I use is 1.2.6rc3 and was configured as follows:

   ./configure --prefix=/usr/local/openmpi-1.2.6rc3 --disable-mpi-f77 
--disable-mpi-f90 --disable-mpi-cxx --disable-cxx-exceptions 
--with-io-romio-flags=--with-file-system=ufs+nfs

Note also that all process (when using OpenMPI or mpich2) were started on the 
same machine.

Also if you look at source code, you will notice that some arguments to 
MPI_Scatter() are NULL or 0. This may look strange and problematic when using a 
normal intra-communicator. However according to the book "MPI - The complete 
reference" vol 2 about MPI-2, for MPI_Scatter() with an inter-communicator:

  "The sendbuf, sendcount and sendtype arguments are significant only at the 
root process. The recvbuf, recvcount, and recvtype arguments are significant 
only at the processes of the leaf group."

If anyone else can have a look at this program and try it it would be helpful.

Thanks,

Martin


#include 
#include 
#include 

int main(int argc, char **argv)
{
   int ret_code = 0;
   int comm_size, comm_rank;

   MPI_Init(&argc, &argv);

   MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
   MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);

   if (comm_size > 1) {
  MPI_Comm subcomm, intercomm;
  const int group_id = comm_rank % 2;
  int i_root_group;

  /* split process in two groups:  even and odd comm_ranks. */
  MPI_Comm_split(MPI_COMM_WORLD, group_id, 0, &subcomm);

  /* The remote leader comm_rank for even and odd groups are respectively: 
1 and 0 */
  MPI_Intercomm_create(subcomm, 0, MPI_COMM_WORLD, 1-group_id, 0, 
&intercomm);

  /* for i_root_group==0 process with comm_rank==0 scatter data to all 
process with odd  comm_rank */
  /* for i_root_group==1 process with comm_rank==1 scatter data to all 
process with even comm_rank */
  for (i_root_group=0; i_root_group < 2; i_root_group++) {
 if (comm_rank == 0) {
printf("beginning Scatter i_root_group=%d\n",i_root_group);
 }
 if (group_id == i_root_group) {
const int  is_root  = (comm_rank == i_root_group);
int   *send_buf = NULL;
if (is_root) {
   const int nbr_other = (comm_size+i_root_group)/2;
   int   ii;
   send_buf = malloc(nbr_other*sizeof(*send_buf));
   for (ii=0; ii < nbr_other; ii++) {
   send_buf[ii] = ii;
   }
}
MPI_Scatter(send_buf, 1, MPI_INT,
NULL, 0, MPI_INT, (is_root ? MPI_ROOT : 
MPI_PROC_NULL), intercomm);

if (is_root) {
   free(send_buf);
}
 }
 else {
int an_int;
MPI_Scatter(NULL,0, MPI_INT,
&an_int, 1, MPI_INT, 0, intercomm);
 }
 if (comm_rank == 0) {
printf("ending Scatter i_root_group=%d\n",i_root_group);
 }
  }

  MPI_Comm_free(&intercomm);
  MPI_Comm_free(&subcomm);
   }
   else {
  fprintf(stderr, "%s: error this program must be started np > 1\n", 
argv[0]);
  ret_code = 1;
   }

   MPI_Finalize();

   return ret_code;
}



[OMPI users] RE : RE : MPI_Comm_connect() fails

2008-03-17 Thread Audet, Martin
Edgar,

I merged the changes you did from -r17848:17849 in the trunk to OpenMPI version 
1.2.6rc2 with George's patch and my small examples now work.

Martin

De : users-boun...@open-mpi.org [users-boun...@open-mpi.org] de la part de 
Edgar Gabriel [gabr...@cs.uh.edu]
Date d'envoi : 17 mars 2008 15:59
À : Open MPI Users
Objet : Re: [OMPI users] RE : MPI_Comm_connect() fails

already working on it, together with a move_request
Thanks
Edgar

Jeff Squyres wrote:
> Edgar --
>
> Can you make a patch for the 1.2 series?
>
> On Mar 17, 2008, at 3:45 PM, Edgar Gabriel wrote:
>
>> Martin,
>>
>> I found the problem in the inter-allgather, and fixed it in patch
>> 17849.
>> The same test using however MPI_Intercomm_create (just to simplify my
>> life compared to Connect/Accept) using 2 vs 4 processes in the two
>> groups passes for me -- and did fail with the previous version.
>>
>>
>> Thanks
>> Edgar
>>
>>
>> Audet, Martin wrote:
>>> Hi Jeff,
>>>
>>> As I said in my last message (see bellow) the patch (or at least
>>> the patch I got) don't fixes the problem for me. Whether I apply it
>>> over OpenMPI 1.2.5 or 1.2.6rc2, I still get the same problem:
>>>
>>>  The client aborts with a truncation error message while the server
>>> freeze when for example the server is started on 3 process and the
>>> client on 2 process.
>>>
>>> Feel free to try yourself the two small client and server programs
>>> I posted in my first message.
>>>
>>> Thanks,
>>>
>>> Martin
>>>
>>>
>>> Subject: [OMPI users] RE : users Digest, Vol 841, Issue 3
>>> From: Audet, Martin (Martin.Audet_at_[hidden])
>>> Date: 2008-03-13 17:04:25
>>>
>>> Hi Georges,
>>>
>>> Thanks for your patch, but I'm not sure I got it correctly. The
>>> patch I got modify a few arguments passed to isend()/irecv()/recv()
>>> in coll_basic_allgather.c. Here is the patch I applied:
>>>
>>> Index: ompi/mca/coll/basic/coll_basic_allgather.c
>>> ===
>>> --- ompi/mca/coll/basic/coll_basic_allgather.c (revision 17814)
>>> +++ ompi/mca/coll/basic/coll_basic_allgather.c (working copy)
>>> @@ -149,7 +149,7 @@
>>> }
>>>
>>> /* Do a send-recv between the two root procs. to avoid
>>> deadlock */
>>> - err = MCA_PML_CALL(isend(sbuf, scount, sdtype, 0,
>>> + err = MCA_PML_CALL(isend(sbuf, scount, sdtype, root,
>>>  MCA_COLL_BASE_TAG_ALLGATHER,
>>>  MCA_PML_BASE_SEND_STANDARD,
>>>  comm, &reqs[rsize]));
>>> @@ -157,7 +157,7 @@
>>> return err;
>>> }
>>>
>>> - err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, 0,
>>> + err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, root,
>>>  MCA_COLL_BASE_TAG_ALLGATHER, comm,
>>>  &reqs[0]));
>>> if (OMPI_SUCCESS != err) {
>>> @@ -186,14 +186,14 @@
>>> return err;
>>> }
>>>
>>> - err = MCA_PML_CALL(isend(rbuf, rsize * rcount, rdtype, 0,
>>> + err = MCA_PML_CALL(isend(rbuf, rsize * scount, sdtype, root,
>>>  MCA_COLL_BASE_TAG_ALLGATHER,
>>>  MCA_PML_BASE_SEND_STANDARD, comm,
>>> &req));
>>> if (OMPI_SUCCESS != err) {
>>> goto exit;
>>> }
>>>
>>> - err = MCA_PML_CALL(recv(tmpbuf, size * scount, sdtype, 0,
>>> + err = MCA_PML_CALL(recv(tmpbuf, size * rcount, rdtype, root,
>>> MCA_COLL_BASE_TAG_ALLGATHER, comm,
>>> MPI_STATUS_IGNORE));
>>> if (OMPI_SUCCESS != err) {
>>>
>>> However with this patch, I still have the problem. Suppose I start
>>> the server with three process and the client with two, the clients
>>> prints:
>>>
>>> [audet_at_linux15 dyn_connect]$ mpiexec --universe univ1 -n 2 ./
>>> aclient '0.2.0:2000'
>>> intercomm_flag = 1
>>> intercomm_remote_size = 3
>>> rem_rank_tbl[3] = { 0 1 2}
>>> [linux15:26114] *** An error occurred in MPI_Allgather
>>> [linux15:26114] *** on communicator
>>> [linux15:2

Re: [OMPI users] RE : MPI_Comm_connect() fails

2008-03-17 Thread Audet, Martin
Hi Jeff,

As I said in my last message (see bellow) the patch (or at least the patch I 
got) don't fixes the problem for me. Whether I apply it over OpenMPI 1.2.5 or 
1.2.6rc2, I still get the same problem:

  The client aborts with a truncation error message while the server freeze 
when for example the server is started on 3 process and the client on 2 process.

Feel free to try yourself the two small client and server programs I posted in 
my first message.

Thanks,

Martin


Subject: [OMPI users] RE : users Digest, Vol 841, Issue 3
From: Audet, Martin (Martin.Audet_at_[hidden])
List-Post: users@lists.open-mpi.org
Date: 2008-03-13 17:04:25

Hi Georges,

Thanks for your patch, but I'm not sure I got it correctly. The patch I got 
modify a few arguments passed to isend()/irecv()/recv() in 
coll_basic_allgather.c. Here is the patch I applied:

Index: ompi/mca/coll/basic/coll_basic_allgather.c
===
--- ompi/mca/coll/basic/coll_basic_allgather.c (revision 17814)
+++ ompi/mca/coll/basic/coll_basic_allgather.c (working copy)
@@ -149,7 +149,7 @@
 }

 /* Do a send-recv between the two root procs. to avoid deadlock */
- err = MCA_PML_CALL(isend(sbuf, scount, sdtype, 0,
+ err = MCA_PML_CALL(isend(sbuf, scount, sdtype, root,
  MCA_COLL_BASE_TAG_ALLGATHER,
  MCA_PML_BASE_SEND_STANDARD,
  comm, &reqs[rsize]));
@@ -157,7 +157,7 @@
 return err;
 }

- err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, 0,
+ err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, root,
  MCA_COLL_BASE_TAG_ALLGATHER, comm,
  &reqs[0]));
 if (OMPI_SUCCESS != err) {
@@ -186,14 +186,14 @@
 return err;
 }

- err = MCA_PML_CALL(isend(rbuf, rsize * rcount, rdtype, 0,
+ err = MCA_PML_CALL(isend(rbuf, rsize * scount, sdtype, root,
  MCA_COLL_BASE_TAG_ALLGATHER,
  MCA_PML_BASE_SEND_STANDARD, comm, &req));
 if (OMPI_SUCCESS != err) {
 goto exit;
 }

- err = MCA_PML_CALL(recv(tmpbuf, size * scount, sdtype, 0,
+ err = MCA_PML_CALL(recv(tmpbuf, size * rcount, rdtype, root,
 MCA_COLL_BASE_TAG_ALLGATHER, comm,
 MPI_STATUS_IGNORE));
 if (OMPI_SUCCESS != err) {

However with this patch, I still have the problem. Suppose I start the server 
with three process and the client with two, the clients prints:

[audet_at_linux15 dyn_connect]$ mpiexec --universe univ1 -n 2 ./aclient 
'0.2.0:2000'
intercomm_flag = 1
intercomm_remote_size = 3
rem_rank_tbl[3] = { 0 1 2}
[linux15:26114] *** An error occurred in MPI_Allgather
[linux15:26114] *** on communicator
[linux15:26114] *** MPI_ERR_TRUNCATE: message truncated
[linux15:26114] *** MPI_ERRORS_ARE_FATAL (goodbye)
mpiexec noticed that job rank 0 with PID 26113 on node linux15 exited on signal 
15 (Terminated).
[audet_at_linux15 dyn_connect]$

and abort. The server on the other side simply hang (as before).

Regards,

Martin

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Jeff Squyres
Sent: March 14, 2008 19:45
To: Open MPI Users
Subject: Re: [OMPI users] RE : MPI_Comm_connect() fails

Yes, please let us know if this fixes it.  We're working on a 1.2.6
release; we can definitely put this fix in there if it's correct.

Thanks!


On Mar 13, 2008, at 4:07 PM, George Bosilca wrote:

> I dig into the sources and I think you correctly pinpoint the bug.
> It seems we have a mismatch between the local and remote sizes in
> the inter-communicator allgather in the 1.2 series (which explain
> the message truncation error when the local and remote groups have a
> different number of processes). Attached to this email you can find
> a patch that [hopefully] solve this problem. If you can please test
> it and let me know if this solve your problem.
>
>  Thanks,
>george.
>
> 
>
>
> On Mar 13, 2008, at 1:11 PM, Audet, Martin wrote:
>
>>
>> Hi,
>>
>> After re-checking the MPI standard (www.mpi-forum.org and MPI - The
>> Complete Reference), I'm more and more convinced that my small
>> examples programs establishing a intercommunicator with
>> MPI_Comm_Connect()/MPI_Comm_accept() over an MPI port and
>> exchanging data over it with MPI_Allgather() is correct. Especially
>> calling MPI_Allgather() with recvcount=1 (its third argument)
>> instead of the total number of MPI_INT that will be received (e.g.
>> intercomm_remote_size in the examples) is both correct and
>> consistent with MPI_Allgather() behavior on intracommunicator (e.g.
>> "normal" communica

[OMPI users] RE : users Digest, Vol 841, Issue 3

2008-03-13 Thread Audet, Martin
Hi Georges,

Thanks for your patch, but I'm not sure I got it correctly. The patch I got 
modify  a few arguments passed to isend()/irecv()/recv() in 
coll_basic_allgather.c. Here is the patch I applied:

Index: ompi/mca/coll/basic/coll_basic_allgather.c
===
--- ompi/mca/coll/basic/coll_basic_allgather.c  (revision 17814)
+++ ompi/mca/coll/basic/coll_basic_allgather.c  (working copy)
@@ -149,7 +149,7 @@
 }

 /* Do a send-recv between the two root procs. to avoid deadlock */
-err = MCA_PML_CALL(isend(sbuf, scount, sdtype, 0,
+err = MCA_PML_CALL(isend(sbuf, scount, sdtype, root,
  MCA_COLL_BASE_TAG_ALLGATHER,
  MCA_PML_BASE_SEND_STANDARD,
  comm, &reqs[rsize]));
@@ -157,7 +157,7 @@
 return err;
 }

-err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, 0,
+err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, root,
  MCA_COLL_BASE_TAG_ALLGATHER, comm,
  &reqs[0]));
 if (OMPI_SUCCESS != err) {
@@ -186,14 +186,14 @@
 return err;
 }

-err = MCA_PML_CALL(isend(rbuf, rsize * rcount, rdtype, 0,
+err = MCA_PML_CALL(isend(rbuf, rsize * scount, sdtype, root,
  MCA_COLL_BASE_TAG_ALLGATHER,
  MCA_PML_BASE_SEND_STANDARD, comm, &req));
 if (OMPI_SUCCESS != err) {
 goto exit;
 }

-err = MCA_PML_CALL(recv(tmpbuf, size * scount, sdtype, 0,
+err = MCA_PML_CALL(recv(tmpbuf, size * rcount, rdtype, root,
 MCA_COLL_BASE_TAG_ALLGATHER, comm,
 MPI_STATUS_IGNORE));
 if (OMPI_SUCCESS != err) {



However with this patch, I still have the problem. Suppose I start the server 
with three process and the client with two, the clients prints:

[audet@linux15 dyn_connect]$ mpiexec --universe univ1 -n 2 ./aclient 
'0.2.0:2000'
intercomm_flag = 1
intercomm_remote_size = 3
rem_rank_tbl[3] = { 0 1 2}
[linux15:26114] *** An error occurred in MPI_Allgather
[linux15:26114] *** on communicator
[linux15:26114] *** MPI_ERR_TRUNCATE: message truncated
[linux15:26114] *** MPI_ERRORS_ARE_FATAL (goodbye)
mpiexec noticed that job rank 0 with PID 26113 on node linux15 exited on signal 
15 (Terminated).
[audet@linux15 dyn_connect]$

and abort. The server on the other side simply hang (as before).

Regards,

Martin



[OMPI users] RE : MPI_Comm_connect() fails

2008-03-13 Thread Audet, Martin

Hi,

After re-checking the MPI standard (www.mpi-forum.org and MPI - The Complete 
Reference), I'm more and more convinced that my small examples programs 
establishing a intercommunicator with MPI_Comm_Connect()/MPI_Comm_accept() over 
an MPI port and exchanging data over it with MPI_Allgather() is correct. 
Especially calling MPI_Allgather() with recvcount=1 (its third argument) 
instead of the total number of MPI_INT that will be received (e.g. 
intercomm_remote_size in the examples) is both correct and consistent with 
MPI_Allgather() behavior on intracommunicator (e.g. "normal" communicator).

   MPI_Allgather(&comm_rank,   1, MPI_INT,
 rem_rank_tbl, 1, MPI_INT,
 intercomm);

Also the recvbuf argument (the second argument) of MPI_Allgather() in the 
examples should have a size of intercomm_remote_size (e.g. the size of the 
remote group), not the sum of the local and remote groups in the client and 
sever process. The standard says that for all-to-all type of operations over an 
intercommunicator, the process send and receives data from the remote group 
only (anyway it is not possible to exchange data with process of the local 
group over an intercommunicator).

So, for me there is no reason for stopping the process with an error message 
complaining about message truncation. There should be no truncation, sendcount, 
sendtype, recvcount and recvtype arguments of MPI_Allgather() are correct and 
consistent.

So again for me the OpenMPI behavior with my example look more and more like a 
bug...

Concerning George comment about valgrind and TCP/IP, I totally agree, messages 
reported by valgrind are only a clue of a bug, especially in this contex, not a 
proof of bug. Another clue is that my small examples work perfectly with mpich2 
ch3:sock.

Regards,

Martin Audet


--

Message: 4
List-Post: users@lists.open-mpi.org
Date: Thu, 13 Mar 2008 08:21:51 +0100
From: jody 
Subject: Re: [OMPI users] RE : MPI_Comm_connect() fails
To: "Open MPI Users" 
Message-ID:
<9b0da5ce0803130021l4ead0f91qaf43e4ac7d332...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

HI
I think the recvcount argument you pass to MPI_Allgather should not be
1 but instead
the number of MPI_INTs your buffer rem_rank_tbl can contain.
As it stands now, you tell MPI_Allgather that it may only receive 1 MPI_INT.

Furthermore, i'm not sure, but i think your receive buffer should be
large enough
to contain messages from *all* processes, and not just from the "far side"

Jody

. 


--

Message: 6
List-Post: users@lists.open-mpi.org
Date: Thu, 13 Mar 2008 09:06:47 -0500
From: George Bosilca 
Subject: Re: [OMPI users] RE : MPI_Comm_connect() fails
To: Open MPI Users 
Message-ID: <82e9ff28-fb87-4ffb-a492-dde472d5d...@eecs.utk.edu>
Content-Type: text/plain; charset="us-ascii"

I am not aware of any problems with the allreduce/allgather. But, we
are aware of the problem with valgrind that report non initialized
values when used with TCP. It's a long story, but I can guarantee that
this should not affect a correct MPI application.

   george.

PS: For those who want to know the details: we have to send a header
over TCP which contain some very basic information, including the size
of the fragment. Unfortunately, we have a 2 bytes gap in the header.
As we never initialize these 2 unused bytes, but we send them over the
wire, valgrind correctly detect the non initialized data transfer.


On Mar 12, 2008, at 3:58 PM, Audet, Martin wrote:

> Hi again,
>
> Thanks Pak for the link and suggesting to start an "orted" deamon,
> by doing so my  clients and servers jobs were able to establish an
> intercommunicator between them.
>
> However I modified my programs to perform an MPI_Allgather() of a
> single "int" over the new intercommunicator to test communication a
> litle bit and I did encountered problems. I am now wondering if
> there is a problem in MPI_Allreduce() itself for intercommunicators.
> Note that the same program run without problems with mpich2
> (ch3:sock).
>
> For example if I start orted as follows:
>
>   orted --persistent --seed --scope public --universe univ1
>
> and then start the server with three process:
>
>  mpiexec --universe univ1 -n 3 ./aserver
>
> it prints:
>
>  Server port = '0.2.0:2000'
>
> Now if I start the client with two process as follow (using the
> server port):
>
>   mpiexec --universe univ1 -n 2 ./aclient '0.2.0:2000'
>
> The server prints:
>
>  intercomm_flag = 1
>  intercomm_remote_size = 2
>  rem_rank_tbl[2] = { 0 1}
>
> which is the correct output. The client then prints:
>
>  intercomm_flag = 1
>  intercomm_remote_size = 3
>  rem_rank_tbl[3] = { 0 1 2}
>  [linux15:3089

[OMPI users] RE : MPI_Comm_connect() fails

2008-03-12 Thread Audet, Martin
Hi again,

Thanks Pak for the link and suggesting to start an "orted" deamon, by doing so 
my  clients and servers jobs were able to establish an intercommunicator 
between them.

However I modified my programs to perform an MPI_Allgather() of a single "int" 
over the new intercommunicator to test communication a litle bit and I did 
encountered problems. I am now wondering if there is a problem in 
MPI_Allreduce() itself for intercommunicators. Note that the same program run 
without problems with mpich2 (ch3:sock).

For example if I start orted as follows:

   orted --persistent --seed --scope public --universe univ1

and then start the server with three process:

  mpiexec --universe univ1 -n 3 ./aserver

it prints:

  Server port = '0.2.0:2000'

Now if I start the client with two process as follow (using the server port):

   mpiexec --universe univ1 -n 2 ./aclient '0.2.0:2000'

The server prints:

  intercomm_flag = 1
  intercomm_remote_size = 2
  rem_rank_tbl[2] = { 0 1}

which is the correct output. The client then prints:

  intercomm_flag = 1
  intercomm_remote_size = 3
  rem_rank_tbl[3] = { 0 1 2}
  [linux15:30895] *** An error occurred in MPI_Allgather
  [linux15:30895] *** on communicator
  [linux15:30895] *** MPI_ERR_TRUNCATE: message truncated
  [linux15:30895] *** MPI_ERRORS_ARE_FATAL (goodbye)
  mpiexec noticed that job rank 0 with PID 30894 on node linux15 exited on 
signal 15 (Terminated).

As you can see the first messages are correct but the client job terminate with 
an error (and the server hang).

After re-reading the documentation about MPI_Allgather() over an 
intercommunicator, I don't see anything wrong in my simple code. Also if I run 
the client and server process with valgrind, I get a few messages like:

  ==29821== Syscall param writev(vector[...]) points to uninitialised byte(s)
  ==29821==at 0x36235C2130: writev (in /lib64/libc-2.3.5.so)
  ==29821==by 0x7885583: mca_btl_tcp_frag_send (in 
/home/publique/openmpi-1.2.5/lib/openmpi/mca_btl_tcp.so)
  ==29821==by 0x788501B: mca_btl_tcp_endpoint_send (in 
/home/publique/openmpi-1.2.5/lib/openmpi/mca_btl_tcp.so)
  ==29821==by 0x7467947: mca_pml_ob1_send_request_start_prepare (in 
/home/publique/openmpi-1.2.5/lib/openmpi/mca_pml_ob1.so)
  ==29821==by 0x7461494: mca_pml_ob1_isend (in 
/home/publique/openmpi-1.2.5/lib/openmpi/mca_pml_ob1.so)
  ==29821==by 0x798BF9D: mca_coll_basic_allgather_inter (in 
/home/publique/openmpi-1.2.5/lib/openmpi/mca_coll_basic.so)
  ==29821==by 0x4A5069C: PMPI_Allgather (in 
/home/publique/openmpi-1.2.5/lib/libmpi.so.0.0.0)
  ==29821==by 0x400EED: main (aserver.c:53)
  ==29821==  Address 0x40d6cac is not stack'd, malloc'd or (recently) free'd

in both MPI_Allgather() and MPI_Comm_disconnect() calls for client and server 
with valgrind always reporting that the address in question are "not stack'd, 
malloc'd or (recently) free'd".

So is there a problem with MPI_Allgather() on intercommunicators or am I doing 
something wrong ?

Thanks,

Martin


/* aserver.c */
#include 
#include 

#include 
#include 

int main(int argc, char **argv)
{
   int   comm_rank,comm_size;
   char  port_name[MPI_MAX_PORT_NAME];
   MPI_Comm intercomm;
   int  ok_flag;

   int  intercomm_flag;
   int  intercomm_remote_size;
   int *rem_rank_tbl;
   int  ii;

   MPI_Init(&argc, &argv);

   MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);
   MPI_Comm_size(MPI_COMM_WORLD, &comm_size);

   ok_flag = (comm_rank != 0) || (argc == 1);
   MPI_Bcast(&ok_flag, 1, MPI_INT, 0, MPI_COMM_WORLD);

   if (!ok_flag) {
  if (comm_rank == 0) {
 fprintf(stderr,"Usage: %s\n",argv[0]);
  }
  MPI_Abort(MPI_COMM_WORLD, 1);
   }

   MPI_Open_port(MPI_INFO_NULL, port_name);

   if (comm_rank == 0) {
  printf("Server port = '%s'\n", port_name);
   }
   MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm);

   MPI_Close_port(port_name);

   MPI_Comm_test_inter(intercomm, &intercomm_flag);
   if (comm_rank == 0) {
  printf("intercomm_flag = %d\n", intercomm_flag);
   }
   assert(intercomm_flag != 0);
   MPI_Comm_remote_size(intercomm, &intercomm_remote_size);
   if (comm_rank == 0) {
  printf("intercomm_remote_size = %d\n", intercomm_remote_size);
   }
   rem_rank_tbl = malloc(intercomm_remote_size*sizeof(*rem_rank_tbl));
   MPI_Allgather(&comm_rank,   1, MPI_INT,
 rem_rank_tbl, 1, MPI_INT,
 intercomm);
   if (comm_rank == 0) {
  printf("rem_rank_tbl[%d] = {", intercomm_remote_size);
  for (ii=0; ii < intercomm_remote_size; ii++) {
  printf(" %d", rem_rank_tbl[ii]);
  }
  printf("}\n");
   }
   free(rem_rank_tbl);

   MPI_Comm_disconnect(&intercomm);

   MPI_Finalize();

   return 0;
}

/* aclient.c */
#include 
#include 

#include 

#include 
#include 

int main(int argc, char **argv)
{
   int  comm_rank,comm_size;
   int  ok_flag;
   MPI_Comm intercomm;

   int  intercomm_flag;
   int 

[OMPI users] MPI_Comm_connect() fails.

2008-03-11 Thread Audet, Martin
Hi,

I'm experimenting with the MPI-2 functions for supporting the client/server 
model in MPI (e.g. server and client are independently created MPI jobs 
establishing an intercommunicator between them at run time, see section 5.4 
"Establishing Communication" of the MPI-2 standard document) and it looks like 
if MPI_Comm_connect() always fail...

That is if I compile simple client/server programs as follow (for the source, 
see bellow):

  mpicc aserver.c -o aserver
  mpicc aclient.c -o aclient

I first start the server with:

  mpiexec -n 1 ./aserver

it prints:

  Server port = '0.1.0:2000'

and then start the client as follow (and provide the port name printed by the 
server):

  mpiexec -n 1 ./aclient '0.1.0:2000'

I get the following error with the client (the server continue to run 
unperturbed):

  [linux15:27660] [0,1,0] ORTE_ERROR_LOG: Not found in file dss/dss_unpack.c at 
line 209
  [linux15:27660] [0,1,0] ORTE_ERROR_LOG: Not found in file 
communicator/comm_dyn.c at line 186
  [linux15:27660] *** An error occurred in MPI_Comm_connect
  [linux15:27660] *** on communicator MPI_COMM_WORLD
  [linux15:27660] *** MPI_ERR_INTERN: internal error
  [linux15:27660] *** MPI_ERRORS_ARE_FATAL (goodbye)

Note that both are started on the same machine (hostname linux15).

The same programs seems to work fine with mpich2 (ch3:sock device) so my 
question is: am I doing something wrong or is there a bug in OpenMPI ?

I use OpenMPI version 1.2.5 configured as follow:

   ./configure --prefix=/usr/local/openmpi-1.2.5 --disable-mpi-f77 
--disable-mpi-f90 --disable-mpi-cxx --disable-cxx-exceptions 
--with-io-romio-flags=--with-file-system=ufs+nfs

on a Linux x86_64 machine runing Fedora Core 4.

Thanks,

Martin Audet



/* aserver.c */

#include 
#include 

int main(int argc, char **argv)
{
   int   comm_rank,comm_size;
   char  port_name[MPI_MAX_PORT_NAME];
   MPI_Comm intercomm;
   int  ok_flag;

   MPI_Init(&argc, &argv);

   MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);
   MPI_Comm_size(MPI_COMM_WORLD, &comm_size);

   ok_flag = (comm_rank != 0) || (argc == 1);
   MPI_Bcast(&ok_flag, 1, MPI_INT, 0, MPI_COMM_WORLD);

   if (!ok_flag) {
  if (comm_rank == 0) {
 fprintf(stderr,"Usage: %s\n",argv[0]);
  }
  MPI_Abort(MPI_COMM_WORLD, 1);
   }

   MPI_Open_port(MPI_INFO_NULL, port_name);

   if (comm_rank == 0) {
  printf("Server port = '%s'\n", port_name);
   }
   MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm);

   MPI_Close_port(port_name);

   MPI_Comm_disconnect(&intercomm);

   MPI_Finalize();

   return 0;
}


/* aclient.c */

#include 
#include 

#include 

int main(int argc, char **argv)
{
   int  comm_rank,comm_size;
   int  ok_flag;
   MPI_Comm intercomm;

   MPI_Init(&argc, &argv);

   MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);
   MPI_Comm_size(MPI_COMM_WORLD, &comm_size);

   ok_flag = (comm_rank != 0)  || ((argc == 2)  &&  argv[1]  &&  (*argv[1] != 
'\0'));
   MPI_Bcast(&ok_flag, 1, MPI_INT, 0, MPI_COMM_WORLD);

   if (!ok_flag) {
  if (comm_rank == 0) {
 fprintf(stderr,"Usage: %s mpi_port\n", argv[0]);
  }
  MPI_Abort(MPI_COMM_WORLD, 1);
   }

   while (MPI_Comm_connect((comm_rank == 0) ? argv[1] : 0, MPI_INFO_NULL, 0, 
MPI_COMM_WORLD, &intercomm) != MPI_SUCCESS) {
  if (comm_rank == 0) {
 printf("MPI_Comm_connect() failled, sleeping and retrying...\n");
  }
  sleep(1);
   }

   MPI_Comm_disconnect(&intercomm);

   MPI_Finalize();

   return 0;
}



Re: [OMPI users] Suggestion: adding OMPI_ versions macros in mpi.h

2007-02-02 Thread Audet, Martin
Thanks Bert for the reply but having these macros in ompi/version.h only if a 
special option is given to configure is useless for what I would like to enable 
in OpenMPI with the present suggestion.

This is because the whole idea is to make it possible to write portable MPI 
compliant C/C++ programs that are able to chose to use or not workarounds for 
eventual bugs in OpenMPI at compile time based on the exact OpenMPI version.

Declaring the versions macros I suggested would make it possible to dectect at 
compilation if the current OpenMPI version is affected by a specific bug and to 
eventually activate a workaround if possible (or terminate compilation with 
#error preprocessor directive if no workaround exists). With the help of the 
existing OPEN_MPI macro these checks could be easilly ignored when using an MPI 
implantation other than OpenMPI.

And this would be very convenient since the application would adjust itself to 
the OpenMPI implentation without any user intervention.

What I am describing is a common practice. I have checks in my code that check 
for example ROMIO_VERSION or to activate workarounds for known bugs or checks 
for __GNUC__ or __INTEL_COMPILER to activate features in newer gcc or icc 
compilers versions (like the "restrict" pointer qualifier).

But to do similar things with OpenMPI we need these version OMPI_ macro defined 
by default in mpi.h. They have to be always defined otherwise the save no 
burden for users.

Regards,

Martin



> Hello,
>
> you can build your ompi with --with-devel-headers and use the header
> :
>
> #define OMPI_MAJOR_VERSION 1
> #define OMPI_MINOR_VERSION 1
> #define OMPI_RELEASE_VERSION 4
> #define OMPI_GREEK_VERSION ""
> 
> Bert
>
> Audet, Martin wrote:
> > Hi,
> >
> > I would like to suggest you to add macros indicating the version of the
> > OpenMPI library in the C/C++ header file mpi.h analogous to the
> > parameter constants in the Fortran header file:
> >
> > parameter (OMPI_MAJOR_VERSION=1)
> > parameter (OMPI_MINOR_VERSION=1)
> > parameter (OMPI_RELEASE_VERSION=4)
> > parameter (OMPI_GREEK_VERSION="")
> > parameter (OMPI_SVN_VERSION="r13362")
> >
> > This would be very handy if someone discover a bug XYZ and a workaround
> > for it in OpenMPI versions before (and not including) 1.1.4 for example
> > and wants his code to be portable on many OpenMPI versions and also on
> > other MPI-2 implementations. In this situation he could do something
> > like this in a common C header file:
> >
> > #ifdef OPEN_MPI
> >
> > /* true iff (x.y.z < u.v.w) */
> > #define DOTTED_LESS_THAN(x,y,z, u,v,w) \
> > (((x) < (u)) || (((x) == (u)) && (((y) < (v)) || (((y) == (v)) &&
> > ((z) < (w))
> >
> > # if DOTTED_LESS_THAN(OMPI_MAJOR_VERSION, OMPI_MINOR_VERSION,
> > OMPI_RELEASE_VERSION, 1,1,4)
> > # define USE_MPI_WORKAROUND_XYZ
> > # endif
> >
> > #endif
> >
> >
> > And later in the C source code:
> >
> > #ifdef USE_MPI_WORKAROUND_XYZ
> > /* use the workaround */
> > #else
> > /* use the normal method */
> > #endif
> >
> >
> > Thanks,
> >
> > Martin
> >



[OMPI users] Suggestion: adding OMPI_ versions macros in mpi.h

2007-02-01 Thread Audet, Martin
Hi,

I would like to suggest you to add macros indicating the version of the OpenMPI 
library in the C/C++ header file mpi.h analogous to the parameter constants in 
the Fortran header file:

   parameter (OMPI_MAJOR_VERSION=1)
   parameter (OMPI_MINOR_VERSION=1)
   parameter (OMPI_RELEASE_VERSION=4)
   parameter (OMPI_GREEK_VERSION="")
   parameter (OMPI_SVN_VERSION="r13362")

This would be very handy if someone discover a bug XYZ and a workaround for it 
in OpenMPI versions before (and not including) 1.1.4 for example and wants his 
code to be portable on many OpenMPI versions and also on other MPI-2 
implementations. In this situation he could do something like this in a common 
C header file:

   #ifdef OPEN_MPI

   /* true iff (x.y.z < u.v.w) */
   #define DOTTED_LESS_THAN(x,y,z, u,v,w)   \
  (((x) < (u)) || (((x) == (u)) && (((y) < (v)) || (((y) == (v)) && ((z) < 
(w))

   # if DOTTED_LESS_THAN(OMPI_MAJOR_VERSION, OMPI_MINOR_VERSION, 
OMPI_RELEASE_VERSION, 1,1,4)
   #  define USE_MPI_WORKAROUND_XYZ
   # endif

   #endif


And later in the C source code:

   #ifdef USE_MPI_WORKAROUND_XYZ
 /* use the workaround */
   #else
 /* use the normal method */
   #endif


Thanks,

Martin


[OMPI users] mpicc adds an inexitant directory in the include path.

2007-01-24 Thread Audet, Martin
Hi,

I use sometimes OpenMPI and it looks like the mpicc wrapper gives gcc an 
inexistant directory with -I option. If I ask mpicc how it calls gcc it prints 
the following:

   [audet@linux15 libdfem]$ mpicc -show
   gcc -I/usr/local/openmpi-1.1.2/include 
-I/usr/local/openmpi-1.1.2/include/openmpi -pthread 
-L/usr/local/openmpi-1.1.2/lib -lmpi -lorte -lopal -ldl -Wl,--export-dynamic 
-lnsl -lutil -lm -ldl
   [audet@linux15 libdfem]$ ls /usr/local/openmpi-1.1.2/include 
/usr/local/openmpi-1.1.2/include/openmpi
   ls: /usr/local/openmpi-1.1.2/include/openmpi: No such file or directory
   /usr/local/openmpi-1.1.2/include:
   mpi.h  mpif-common.h  mpif-config.h  mpif.h
   [audet@linux15 libdfem]$   

The directory 'usr/local/openmpi-1.1.2/include/openmpi' doesn't exist. And this 
explains the annoying warnings I get when I compile my sources (I like to have 
no warning):
 
   cc1plus: warning: /usr/local/openmpi-1.1.2/include/openmpi: No such file or 
directory

This happens with OpenMPI 1.1.2 configured as follow:

   ./configure --prefix=/usr/local/openmpi-1.1.2 --disable-mpi-f90 
--disable-mpi-cxx --disable-cxx-exceptions   
--with-io-romio-flags=--with-file-system=ufs+nfs

Thanks,

Martin Audet


[OMPI users] configure script not hapy with OpenPBS

2006-10-19 Thread Audet, Martin
Hi,

When I tried to install OpenMPI on the front node of a cluster using OpenPBS 
batch system (e.g. --with-tm=/usr/open-pbs argument to configure), it didn't 
work and I got the error message:

--- MCA component pls:tm (m4 configuration macro)
checking for MCA component pls:tm compile mode... dso
checking tm.h usability... yes
checking tm.h presence... yes
checking for tm.h... yes
looking for library in lib
checking for tm_init in -lpbs... no
looking for library in lib64
checking for tm_init in -lpbs... no
checking tm.h usability... yes
checking tm.h presence... yes
checking for tm.h... yes
looking for library in lib
checking for tm_finalize in -ltorque... no
looking for library in lib64
checking for tm_finalize in -ltorque... no
configure: error: TM support requested but not found.  Aborting

By looking in the very long configure script I found two typo errors in 
variable name:

  "ompi_check_tm_hapy" is set at lines 68164 and 76084
  "ompi_check_loadleveler_hapy" is set at line 73086

where the correct names are obviously "ompi_check_tm_happy" and 
"ompi_check_loadleveler_happy" (e.g. "happy" not "hapy") when looking to the 
variables used arround.

I corrected the variables names but unfortunately it didn't fixed my problem, 
configure stoped with the same error message (maybe you should also correct it 
in your "svn" repository since this may be a "latent" bug).

I'm now questionning why didn't the configuration script found the 'tm_init'  
symbol in libpbs.a since the following command:

nm /usr/open-pbs/lib/libpbs.a  | grep -e '\' -e '\'

prints:

0cd0 T tm_finalize
1270 T tm_init

Is it possible that on an EM64T Linux system the configure script require that 
lib/libpbs.a or lib64/libpbs.a be a 64 bit library to be happy (lib64/libpbs.a 
doesn't exist and lib/libpbs.a is a 32 bit library on our system since the 
OpenPBS version we use is a bit old (2.3.x) and didn't appear to be 64 bit 
clean) ?


Martin Audet


[OMPI users] MPI_LONG_LONG_INT != MPI_LONG_LONG

2006-04-24 Thread Audet, Martin

Hi,

The current and the previous versions of OpenMPI define MPI_LONG_LONG_INT and 
MPI_LONG_LONG constants as the address of two distinct global variables 
(&ompi_mpi_long_long_int and &ompi_mpi_long_long respectively) which makes the 
following expression true: MPI_LONG_LONG_INT != MPI_LONG_LONG.

After consulting the MPI standards, I noticed the following:

 - The optional datatype corresponding to the optional C/C++ "long long" type 
is MPI_LONG_LONG_INT according to article 3.2.2. "Message data" of the MPI 1.1 
standard (www.mpi-forum.org/docs/mpi-11-html/node32.html) and article 10.2. 
"Defined Constants for C and Fortran" 
(www.mpi-forum.org/docs/mpi-11-html/node169.html) of the MPI 1.1 standard.

 - The MPI_LONG_LONG optional datatype appeared for the first time in section 
9.5.2. "External Data Representation: ``external32''" of the MPI 2.0 standard 
(www.mpi-forum.org/docs/mpi-20-html/node200.htm). This paragraph state that 
with the external32 data representation, this datatype is eight (8) bytes long.

 - However the previous statement was recognized as an error in the MPI 2.0 
errata document (www.mpi-forum.org/docs/errata-20-2.html). The MPI 2.0 document 
should have used MPI_LONG_LONG_INT instead of MPI_LONG_LONG. It also state the 
following:

   In addition, the type MPI_LONG_LONG should be added as an optional type; it 
is a synonym for MPI_LONG_LONG_INT.

This means that the real optional datatype corresponding to the C/C++ "long 
long" datatype is MPI_LONG_LONG_INT and that since MPI_LONG_LONG was mentioned 
by mistake in the MPI 2.0 standard document, the MPI_LONG_LONG predefined 
datatype constant is also accepted as a synonym to MPI_LONG_LONG_INT.

We should therefore have MPI_LONG_LONG_INT == MPI_LONG_LONG which is not the 
case in OpenMPI.

So please have a look at this issue.

Note that MPICH and MPICH2 implementations satisfy: MPI_LONG_LONG_INT == 
MPI_LONG_LONG.

Regrards,


Martin AudetE: martin DOT audet AT imi cnrc-nrc gc ca
Research OfficerT: 450-641-5034
Industrial Material Institute / National Research Council
75 de Mortagne, Boucherville, QC, J4B 6Y4, Canada


[OMPI users] Incorrect behavior for attributes attached to MPI_COMM_SELF.

2006-04-10 Thread Audet, Martin
Hi,

It looks like there is a problem in OpenMPI 1.0.2 with how MPI_COMM_SELF 
attributes callback functions are handled by MPI_Finalize().

The following C program register a callback function associated with the 
MPI_COMM_SELF communicator to be called during the first steps of 
MPI_Finalize(). As shown in this example, this can be used to make sure that 
global MPI_Datatype variables associated to global datatypes are freed by 
calling MPI_Type_free() before program exit (and thus preventing ugly memory 
leaks/outstanding allocations when run in valgrind for example). This mechanism 
is used by the library I'm working on as well as by PetSc library.

The program works by taking advantage of the MPI-2 Standard Section 4.8 
"Allowing User Function at Process Termination". As it says, the MPI_Finalize() 
function calls the delete callback associated to the MPI_COMM_SELF attribute 
"before any other part of MPI are affected". It also says that "calling 
MPI_Finalized() will return false in any of these callback functions".

Section 4.9 of the MPI-2 Standard: "Determining Whether MPI Has Finished" 
moreover says that it can be determined if MPI is active by calling 
MPI_Finalized(). It also reaffirm that MPI is active in the callback functions 
invoked by MPI_Finalize().

I think that an "active" MPI library here means that basic MPI functions like 
MPI_Type_free() can be called.

The following small program therefore seems to conform to the MPI standard.

However where I run it (compiled with OpenMPI 1.0.2 mpicc), I get the following 
message:

*** An error occurred in MPI_Type_free
*** after MPI was finalized
*** MPI_ERRORS_ARE_FATAL (goodbye)

Note that this program works well with mpich2.

Please have a look at this problem.

Thanks,

Martin Audet



#include 
#include 

#include 

static int attr_delete_function(MPI_Comm p_comm, int p_keyval, void 
*p_attribute_val, void * p_extra_state)
{
   assert(p_attribute_val != NULL);

   /* Get a reference on the datatype received. */
   MPI_Datatype *const cur_datatype = (MPI_Datatype *)(p_attribute_val);

   /* Free it if non null.  */
   if (*cur_datatype != MPI_DATATYPE_NULL) {
  MPI_Type_free(cur_datatype);   
assert(*cur_datatype == MPI_DATATYPE_NULL);
   }

   return MPI_SUCCESS;
}


/* If p_datatype refer to a non null MPI datatype, this function will register 
a callback   */
/*  function to free p_datatype and set it to MPI_DATATYPE_NULL. This callback 
will be  */
/*  called during the first steps of the MPI_Finalize() function when the state 
of the MPI  */
/*  library still allows MPI functions to be called. This is done by 
associating an */
/*  attribute to the MPI_COMM_SELF communicator as allowed by the MPI 2 
standard (section 4.8). */
static void add_type_free_callback(MPI_Datatype *p_datatype)
{
   int keyval;

   assert(p_datatype != NULL);

   /* First create the keyval.   */
   /*  No callback function will be called when MPI_COMM_SELF is duplicated  */
   /*  and attr_delete_function() will be called when MPI_COMM_SELF is   */
   /*  freed (e.g. during MPI_Finalize()).   */
   /*  Since many callback can be associated with MPI_COMM_SELF to free many */
   /*  datatypes, a new keyval has to be created every time. */
   MPI_Keyval_create(MPI_NULL_COPY_FN, &attr_delete_function, &keyval, NULL);

   /* Then associate this keyval to MPI_COMM_SELF and make sure the pointer  */
   /* to the datatype p_datatype is passed to the callback.  */
   MPI_Attr_put(MPI_COMM_SELF, keyval, p_datatype);

   /* Free the keyval because it is no longer needed.*/
   MPI_Keyval_free(&keyval);
}

typedef struct {
   short ss;
   int   ii;
} glb_struct_t;

MPI_Datatype glb_dtype = MPI_DATATYPE_NULL;

static void calc_glb_dtype(void)
{
   const int NB_MEM = 3; 
   static int  len_tbl[3]  = { 1,  1,   
   1};
   static MPI_Aint disp_tbl[3] = { offsetof(glb_struct_t, ss), 
offsetof(glb_struct_t, ii), sizeof(glb_struct_t) };
   static MPI_Datatype type_tbl[3] = { MPI_SHORT,  MPI_INT, 
   MPI_UB   };

   MPI_Type_struct(NB_MEM, len_tbl, disp_tbl, type_tbl, &glb_dtype);

   MPI_Type_commit(&glb_dtype);

   add_type_free_callback(&glb_dtype);
}

int main(int argc, char *argv[])
{
   MPI_Init(&argc, &argv);
   
   calc_glb_dtype();

   MPI_Finalize();

   return 0;
}


[O-MPI users] const_cast<>(), Alltoallw() and Spawn_multiple()

2005-12-15 Thread Audet, Martin
Hi,

I just tried OpenMPI 1.0.1 and this time I had much less warnings related to 
the C++ API than I had with version 1.0.0 (I compile with g++ -Wall).

I nonetheless looked at the C++ headers and found that those warnings were 
still related to the use of C-style cast.

Some of them were simply casting away the const type qualifier to call the C 
API MPI functions. Those casts could easily be converted to the const_cast<>() 
operator specially designed to do this.

I however found that some others were simply wrong and leading to faulty 
operations. Those casts are located in Intracomm::Alltoallw() and 
Intracomm::Spawn_multiple() methods.

In the first method, a pointer to a table of const MPI::Datatype objects is 
casted into a pointer to a table of MPI_Datatype types and in the second one, a 
pointer to a table of const MPI::Info objects is casted into a pointer to a 
table of MPI_Info types.

That is it is assumed that the MPI::Datatype and MPI::Info have respectively 
the same memory layout as the corresponding C types MPI_Datatype and MPI_Info.

This assumption is incorrect in both cases even if MPI::Datatype class contains 
only a single data member of type MPI_Datatype and even if MPI::Info class 
contains only a single data member of type MPI_Info.

It is incorrect because MPI::Datatype and MPI::Info classes are polymorphics. 
That is each of them contains at least one virtual method. Since polymorphic 
classes needs to access the virtual methods table (pointer to members and 
offset to adjust "this"), the C++ compiler needs to insert at least another 
member. In all the implementation I've seen this is done by adding a member 
pointing to the virtual table for the exact class (named "__vtbl"). The 
resulting classes are then larger than they appear (ex: on my IA32 Linux 
machine sizeof(MPI::Datatype)==8 and sizeof(MPI::Info)==8 even if 
sizeof(MPI_Datatype)==4 and sizeof(MPI_Info)==4), the memory layout differs and 
therefore corresponding pointers cannot be converted by simple type casts.

A table of MPI::Datatype object have then to be converted into a table of 
MPI_Datatype by a temporary table and a small loop. The same is true for 
MPI::Info and MPI_Info.

I modified errhandler.h, intracomm.h and intracomm_inln.h files to implement 
those corrections. As expected it removes the warnings during compilation and 
should correct the conversion problems in Intracomm::Alltoallw() and 
Intracomm::Spawn_multiple() methods.

Bellow is the difference between my modified version of OpenMPI and the 
original one.

Please consider this patch for your next release.

Thanks,

Martin Audet, Research Officer
E: martin.au...@imi.cnrc-nrc.gc.ca  T: 450-641-5034
Indstrial Material Institute, National Research Council
75 de Mortagne,
Boucherville, QC
J4B 6Y4, Canada 





diff --recursive --unified openmpi-1.0.1/ompi/mpi/cxx/errhandler.h 
openmpi-1.0.1ma/ompi/mpi/cxx/errhandler.h
--- openmpi-1.0.1/ompi/mpi/cxx/errhandler.h 2005-11-11 14:21:36.0 
-0500
+++ openmpi-1.0.1ma/ompi/mpi/cxx/errhandler.h   2005-12-14 15:29:56.0 
-0500
@@ -124,7 +124,7 @@
 #if ! 0 /* OMPI_ENABLE_MPI_PROFILING */
 // $%%@#%# AIX/POE 2.3.0.0 makes us put in this cast here
 (void)MPI_Errhandler_create((MPI_Handler_function*) 
&ompi_mpi_cxx_throw_excptn_fctn,
-   (MPI_Errhandler *) &mpi_errhandler); 
+   const_cast(&mpi_errhandler)); 
 #else
 pmpi_errhandler.init();
 #endif
@@ -134,7 +134,7 @@
   //this is called from MPI::Finalize
   inline void free() const {
 #if ! 0 /* OMPI_ENABLE_MPI_PROFILING */
-(void)MPI_Errhandler_free((MPI_Errhandler *) &mpi_errhandler); 
+(void)MPI_Errhandler_free(const_cast(&mpi_errhandler)); 
 #else
 pmpi_errhandler.free();
 #endif
diff --recursive --unified openmpi-1.0.1/ompi/mpi/cxx/intracomm.h 
openmpi-1.0.1ma/ompi/mpi/cxx/intracomm.h
--- openmpi-1.0.1/ompi/mpi/cxx/intracomm.h  2005-11-11 14:21:36.0 
-0500
+++ openmpi-1.0.1ma/ompi/mpi/cxx/intracomm.h2005-12-14 16:09:29.0 
-0500
@@ -228,6 +228,10 @@
   PMPI::Intracomm pmpi_comm;
 #endif
 
+  // Convert an array of p_nbr Info object into an array of MPI_Info.
+  // A pointer to the allocated array is returned and must be eventually 
deleted.
+  static inline MPI_Info *convert_info_to_mpi_info(int p_nbr, const Info 
p_info_tbl[]);
+
 public: // JGS see above about friend decls
 #if ! 0 /* OMPI_ENABLE_MPI_PROFILING */
   static Op* current_op;
diff --recursive --unified openmpi-1.0.1/ompi/mpi/cxx/intracomm_inln.h 
openmpi-1.0.1ma/ompi/mpi/cxx/intracomm_inln.h
--- openmpi-1.0.1/ompi/mpi/cxx/intracomm_inln.h 2005-11-30 06:06:07.0 
-0500
+++ openmpi-1.0.1ma/ompi/mpi/cxx/intracomm_inln.h   2005-12-14 
16:09:35.0 -0500
@@ -144,13 +144,26 @@
void *recvbuf, const int recvcounts[],
const int rdispls[], const Datatype recvtypes[]) const
 {
+  const int comm_size = Get_size();
+  MPI_Datatype *const data_type_tbl 

[O-MPI users] MPI_Offset and C++ interface

2005-11-25 Thread Audet, Martin
Hi,

I just compiled my library with version 1.0 of OpenMPI and I had two problems.

First the MPI_Offset datatype is defined as a preprocessor macro as follow in 
mpi.h:

   /* Type of MPI_Offset */
   #define MPI_Offset long long

This generate a syntax error when MPI_Offset is used in C++ for what Stroustrup 
call a value construction (e.g. type ( expr_list ) c.f. section 6.2 in The C++ 
programming language).

For example the following code:

   MPI_Offset   ofs,size;
   int  nbr;

   // compute ofs, size and nbr.

   ofs +=  MPI_Offset(nbr)*size;

cannot compile if MPI_Offset is defined as it is currently.

The obvious solution is to define MPI_Offset as a typedef as follow:

   /* Type of MPI_Offset */
   typedef long long MPI_Offset;

Note that a similar typedef is used for MPI_Aint:

   typedef long MPI_Aint;


The seccond problem is related to the C++ interface: it uses direct C-style 
type cast that remove constness. Since ISO/C++ have the const_cast operator 
especially for this situation, the compiler generates TONS of warnings (I use 
to compile my code with -Wall and many other warning activated) and this is 
really annoying.

The solution to this problem is to replace C-style cast with const_cast 
operator. For example the MPI::Comm::Send method defined in 
openmpi/ompi/mpi/cxx/comm_inln.h as follow:

   inline void
   MPI::Comm::Send(const void *buf, int count, 
   const MPI::Datatype & datatype, int dest, int tag) const
   {
 (void)MPI_Send((void *)buf, count, datatype, dest, tag, mpi_comm);
   }

becomes:

   inline void
   MPI::Comm::Send(const void *buf, int count, 
   const MPI::Datatype & datatype, int dest, int tag) const
   {
 (void)MPI_Send(const_cast(buf), count, datatype, dest, tag, 
mpi_comm);
   }

This fix the annoying warning problem because the const_cast operator is the 
intended method to remove constness.



Martin Audet   E: matin.au...@imi.cnrc-nrc.gc.ca
Research Officer   T: 450-641-5034
Industrial Material Institute
National Research Council of Canada
75 de Mortagne, Boucherville, QC, J4B 6Y4