Re: [OMPI users] Question about RDMA

2008-06-05 Thread Jeff Squyres
Note that "eager" RDMA is only used for short messages -- it's not  
really relevant to whether the same user buffers are re-used or not  
(the mpi_leave_pinned parameter for long messages is only useful if  
long buffers are re-used).  See this FAQ item:


http://www.open-mpi.org/faq/?category=openfabrics#ib-small-message-rdma

For benchmarks (like SKAMPI) that re-use long buffers, you might want  
to use the mpi_leave_pinned MCA parameter:


http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned
http://www.open-mpi.org/faq/?category=tuning#running-perf-numbers


On Jun 5, 2008, at 9:47 AM, Gabriele Fatigati wrote:



Hi,
i'm testing SKaMPI Benchmark on IBM Blade System over Infiniband.  
Current version of OpenMPI is 1.2.6
I have tried to disable RDMA setting btl_openib_use_eager_rdma = 0.  
But, i have noted that, in MPI collectives execution time, there are  
few difference beetween RDMA active and none. Before tests, I  
expected that with RDMA off, excecution time was more long.


So, i suppose that SKaMPI benchmark does continues reallocation of  
buffers that forbid benefits of RDMA protocol. Indeed, if initial  
buffer address change every time, we have to do very much  
registration of memory pages afterwards decay of perfomance.


I used RDMA pipeline protocol. This protocol should makes no  
assumption about the application reuse of source and target buffers.  
But, is it every true?

Parameters net are explained below.

MCA btl: parameter "btl_openib_mpool" (current value: "rdma")
MCA btl: parameter "btl_openib_ib_max_rdma_dst
_ops" (current value: "4")
MCA btl: parameter "btl_openib_use_eager_rdma" (current value: "1")
MCA btl: parameter "btl_openib_eager_rdma_threshold" (current value:  
"16")

MCA btl: parameter "btl_openib_max_eager_rdma" (current value: "16")
MCA btl: parameter "btl_openib_eager_rdma_num" (current value: "16")
MCA btl: parameter "btl_openib_min_rdma_size" (current value:  
"1048576")
MCA btl: parameter "btl_openib_max_rdma_size" (current value:  
"1048576")


--
Gabriele Fatigati

CINECA Systems & Tecnologies Department

Supercomputing Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.it Tel: +39 051 6171722

g.fatig...@cineca.it ___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Documentation on running under slurm

2008-06-05 Thread Jeff Squyres

Excellent point; thanks for the heads-up!

I've updated the SLURM docs in the FAQ accordingly.


On Jun 5, 2008, at 6:15 PM, Bill Johnstone wrote:


Hello all.

It would seem that the documentation, at least the FAQ page at http://www.open-mpi.org/faq/?category=slurm 
 is a little out of date with respect to running on newer versions  
of SLURM (I just got things working with version 1.3.3) .


According to the SLURM documentation, srun -A is deperecated, and  
even if you look in the manpage for salloc, -A is not directly  
mentioned, it's just discussed in the --no-shell section.


I was able to successfully submit/run using:
salloc -n <# procs> mpirun 

without needing an interactive shell.  So doesn't this seem like the  
more up-to-date way of doing things rather than srun -A?  Also, it  
would seem sbatch replaces srun -b, but I don't use this mode of  
operation, so I'm not sure.


Perhaps the OpenMPI documentation should be updated accordingly?

Thanks.



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



[OMPI users] Documentation on running under slurm

2008-06-05 Thread Bill Johnstone
Hello all.

It would seem that the documentation, at least the FAQ page at 
http://www.open-mpi.org/faq/?category=slurm is a little out of date with 
respect to running on newer versions of SLURM (I just got things working with 
version 1.3.3) .

According to the SLURM documentation, srun -A is deperecated, and even if you 
look in the manpage for salloc, -A is not directly mentioned, it's just 
discussed in the --no-shell section.

I was able to successfully submit/run using:
salloc -n <# procs> mpirun 

without needing an interactive shell.  So doesn't this seem like the more 
up-to-date way of doing things rather than srun -A?  Also, it would seem sbatch 
replaces srun -b, but I don't use this mode of operation, so I'm not sure.

Perhaps the OpenMPI documentation should be updated accordingly?

Thanks.





Re: [OMPI users] libibverbs

2008-06-05 Thread Brock Palen

On Jun 5, 2008, at 9:21 AM, Jeff Squyres wrote:

On Jun 4, 2008, at 4:28 PM, Brock Palen wrote:


We have two installs of openmpi-1.2.3
One with the pgi compilers the other with gcc/Nagf90

One the pgi compilers does not link against libibverbs, but ompi_info
shows the openib btl and we see traffic on the fabric.

The other built with Nagware links against libibverbs.  It also shoes
in ompi_info the openib btl.

What would cause this?  It just pointed out that our login nodes
(most of our cluster does not have IB) don't have libibverbs making
code not link.  Any insight from an OFED master would be great.



Did you build the gcc/Nagf90 version with --enable-static or -- 
disable-

dlopen?

See the thread starting here:

 http://www.open-mpi.org/community/lists/users/2008/06/5821.php



Yes,
Looks like Nag requires a library that was not compiled with -fPIC,   
and the linker complains.
Looks more like a bug in our load, our login nodes were missing  
libibverbs.a  while the computes nodes do have it.



Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






[OMPI users] Question about RDMA

2008-06-05 Thread Gabriele Fatigati
Hi,
i'm testing SKaMPI Benchmark on IBM Blade System over Infiniband. Current
version of OpenMPI is 1.2.6
I have tried to disable RDMA setting btl_openib_use_eager_rdma = 0. But, i
have noted that, in MPI collectives execution time, there are few difference
beetween RDMA active and none. Before tests, I expected that with RDMA off,
excecution time was more long.

So, i suppose that SKaMPI benchmark does continues reallocation of buffers
that forbid benefits of RDMA protocol. Indeed, if initial buffer address
change every time, we have to do very much registration of memory pages
afterwards decay of perfomance.

I used RDMA pipeline protocol. This protocol should makes no assumption
about the application reuse of source and target buffers. But, is it every
true?
Parameters net are explained below.

MCA btl: parameter "btl_openib_mpool" (current value: "rdma")
MCA btl: parameter "btl_openib_ib_max_rdma_dst_ops" (current value: "4")
MCA btl: parameter "btl_openib_use_eager_rdma" (current value: "1")
MCA btl: parameter "btl_openib_eager_rdma_threshold" (current value: "16")
MCA btl: parameter "btl_openib_max_eager_rdma" (current value: "16")
MCA btl: parameter "btl_openib_eager_rdma_num" (current value: "16")
MCA btl: parameter "btl_openib_min_rdma_size" (current value: "1048576")
MCA btl: parameter "btl_openib_max_rdma_size" (current value: "1048576")

-- 
Gabriele Fatigati

CINECA Systems & Tecnologies Department

Supercomputing Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.it Tel: +39 051 6171722

g.fatig...@cineca.it


Re: [OMPI users] libibverbs

2008-06-05 Thread Jeff Squyres

On Jun 4, 2008, at 4:28 PM, Brock Palen wrote:


We have two installs of openmpi-1.2.3
One with the pgi compilers the other with gcc/Nagf90

One the pgi compilers does not link against libibverbs, but ompi_info
shows the openib btl and we see traffic on the fabric.

The other built with Nagware links against libibverbs.  It also shoes
in ompi_info the openib btl.

What would cause this?  It just pointed out that our login nodes
(most of our cluster does not have IB) don't have libibverbs making
code not link.  Any insight from an OFED master would be great.



Did you build the gcc/Nagf90 version with --enable-static or --disable- 
dlopen?


See the thread starting here:

http://www.open-mpi.org/community/lists/users/2008/06/5821.php

--
Jeff Squyres
Cisco Systems



Re: [OMPI users] libibverbs and openmpi 1.2.6

2008-06-05 Thread Samuel Sarholz

Hi Jeff,

This suggests that perhaps 1.2.6 was configured with --enable-static  
or --disable-dlopen.  If either of these two options were used, OMPI  
will suck all the plugins to become part of libmpi (and friends), and  
therefore the plugins' dependencies will therefore become dependencies  
of libmpi (and friends).


thanks for the hint.
You are right we used --enable-static with 1.2.6.


best regards
Samuel

P.S.: I liked you videos btw.
--
Dipl.-Inform. Samuel Sarholz-   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication,
Seffenter Weg 23, 52074 Aachen (Germany), Office: 2.13
Tel: +49 241/80-24915 - Fax: +49 241/80-22134
mailto:sarh...@rz.rwth-aachen.de www.rz.rwth-aachen.de


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] libibverbs and openmpi 1.2.6

2008-06-05 Thread Jeff Squyres

On Jun 5, 2008, at 7:56 AM, Samuel Sarholz wrote:

Nothing changed between 1.2.5 and 1.2.6 with regards to depending  
on  libibverbs.


Ok then I need to check what we did differently when installing both  
versions.


The mpicc wrapper from 1.2.6 links -libverbs.
(It shows up with mpicc -V).


This suggests that perhaps 1.2.6 was configured with --enable-static  
or --disable-dlopen.  If either of these two options were used, OMPI  
will suck all the plugins to become part of libmpi (and friends), and  
therefore the plugins' dependencies will therefore become dependencies  
of libmpi (and friends).


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] libibverbs and openmpi 1.2.6

2008-06-05 Thread Samuel Sarholz

Hi,


Nothing changed between 1.2.5 and 1.2.6 with regards to depending on  
libibverbs.


Ok then I need to check what we did differently when installing both 
versions.


The mpicc wrapper from 1.2.6 links -libverbs.
(It shows up with mpicc -V).



Does ldd show that your apps depend on libibverbs?


Yes, ldd shows a dependency.

Best regards,
Samuel

--
Dipl.-Inform. Samuel Sarholz-   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication,
Seffenter Weg 23, 52074 Aachen (Germany), Office: 2.13
Tel: +49 241/80-24915 - Fax: +49 241/80-22134
mailto:sarh...@rz.rwth-aachen.de www.rz.rwth-aachen.de


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] libibverbs and openmpi 1.2.6

2008-06-05 Thread Jeff Squyres

On Jun 5, 2008, at 4:37 AM, Samuel Sarholz wrote:


we have a some issues with the openmpi 1.2.6 and libibverbs.

We have some machines with Infiniband and some without.
We compiled openmpi with IB support.
With OpenMPI 1.2.5 it was no problem running that version on the  
machines without IB.
However with 1.2.6 the library libibverbs is linked which doen't  
exist on some of the machines.


Nothing changed between 1.2.5 and 1.2.6 with regards to depending on  
libibverbs.


Is there a way to get programs compiled on a IB machine running them  
on machines without IB? (remove the dependency to libibverbs from  
openmpi 1.2.6)



How did you build OMPI?  The default configuration in the 1.2 series  
should have all the IB support in plugin-modules (i.e., not directly  
part of libmpi), and therefore apps should not depend on libibverbs at  
all.


Does ldd show that your apps depend on libibverbs?

--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Communicators in Fortran and C

2008-06-05 Thread Terry Dontje

You can translate the communicator from Fortran to C using the MPI_COMM_F2C 
routine.

--td

Message: 4
Date: Thu, 05 Jun 2008 08:53:55 +0200
From: Samuel Sarholz 
Subject: [OMPI users] Communicators in Fortran and C
To: us...@open-mpi.org
Message-ID: <48478d83.6080...@rz.rwth-aachen.de>
Content-Type: text/plain; charset="iso-8859-1"

Hi,

I have problems running a Fortran program which is calling a C library 
with OpenMPI.


The problem is that the Fortan part has a communicator which is passed 
to the C library.
And if I understand the headers right a communicator is an integer in 
fortran, but a struct on the C side of openmpi.


Is there a way to translate/cast these communicators?




best regards,
Samuel




Re: [OMPI users] HPMPI versus OpenMPI performance

2008-06-05 Thread George Bosilca
If I correctly understand how you run your application I think I know  
where the problem is coming from. In few words you're using buffered  
send over shared memory.


First, buffered send has only one main benefit, it double the amount  
of memory required for the communication. A side effect is that it  
increase the number of memory copies. The original buffer have to be  
copied in the attached buffer, then from this attached buffer the data  
will be moved into the shared memory region, and from there the  
receiver can finally copy the data in the receive buffer. In total  
there are 3 memory copies involved in this operation, which  
automatically limit the available bandwidth to a 1/3 of the available  
memory bandwidth on the architecture. Additionally, if the amount of  
data involved in this communication is large enough, the cache will be  
completely thrashed by the end of the communication.


Second, using buffered send requires asynchronous progress. If your  
code doesn't call any MPI communication functions, there is no reason  
that the data transfer take place at least not until the  
MPI_Buffer_detach function is called (or any other communication  
related MPI function).


  george.


On Jun 4, 2008, at 1:55 PM, Jeff Squyres wrote:


Thanks for all the detailed information!

It is quite likely that our bsend performance has never been tuned; we
simply implemented it, verified that it works, and then moved on -- we
hadn't considered that real applications would actually use it.  :-\

But that being said, 60% difference is a bit odd.  Have you tried
running with "--mca mpi_leave_pinned 1"?  If all your sends are
MPI_BSEND, it *may* not make a difference, but it could make a
difference on the receive side.

What are the typical communication patterns for your application?



On Jun 2, 2008, at 3:39 PM, Ayer, Timothy C. wrote:





We a performing a comparison of HPMPI versus OpenMPI using
Infiniband and
seeing a performance hit in the vicinity of 60% (OpenMPI is slower)
on
controlled benchmarks.  Since everything else is similar, we
suspect a
problem with the way we are using or have installed OpenMPI.

Please find attached the following info as requested from
http://www.open-mpi.org/community/help/


Application:  in house CFD solver using both point-point and
collective
operations. Also, for historical reasons it makes extensive use of
BSEND.
We recognize that BSEND's can be inefficient but it is not
practical to
change them at this time.  We are trying to understand why the
performance
is so significantly different from HPMPI.  The application is mixed
FORTRAN 90 and C built with Portland Group compilers.

HPMPI Version info:

mpirun: HP MPI 02.02.05.00 Linux x86-64
major version 202 minor version 5

OpenMPI Version info:

mpirun (Open MPI) 1.2.4
Report bugs to http://www.open-mpi.org/community/help/




Configuration info :

The benchmark was a 4-processor job run on a single dual-socket
dual core
HP DL140G3 (Woodcrest 3.0) with 4 GB of memory.  Each rank requires
approximately 250MB of memory.

1) Output from ompi_info --all

See attached file ompi_info_output.txt
<< File: ompi_info_output.txt >>

Below is the output requested in the FAQ section:

In order for us to help you, it is most helpful if you can run a
few steps
before sending an e-mail to both perform some basic troubleshooting
and
provide us with enough information about your environment to help
you.
Please include answers to the following questions in your e-mail:


1.  Which OpenFabrics version are you running? Please specify where
you
got the software from (e.g., from the OpenFabrics community web
site, from
a vendor, or it was already included in your Linux distribution).

We obtained the software from  www.openfabrics.org 



Output from ofed_info command:

OFED-1.1

openib-1.1 (REV=9905)
# User space
https://openib.org/svn/gen2/branches/1.1/src/userspace

Git:
ref: refs/heads/ofed_1_1
commit a083ec1174cb4b5a5052ef5de9a8175df82e864a

# MPI
mpi_osu-0.9.7-mlx2.2.0.tgz
openmpi-1.1.1-1.src.rpm
mpitests-2.0-0.src.rpm



2.  What distro and version of Linux are you running? What is your
kernel version?

Linux  2.6.9-64.EL.IT133935.jbtest.1smp #1 SMP Fri Oct 19
11:28:12
EDT 2007 x86_64 x86_64 x86_64 GNU/Linux


3.  Which subnet manager are you running? (e.g., OpenSM, a
vendor-specific subnet manager, etc.)

We believe this to be HP or Voltaire but we are not certain how to
determine this.


4.  What is the output of the ibv_devinfo command on a known "good"
node
and a known "bad" node? (NOTE: there must be at least one port
listed as
"PORT_ACTIVE" for Open MPI to work. If there is not at least one
PORT_ACTIVE port, something is wrong with your OpenFabrics
environment and
Open MPI will not be able to run).

hca_id: mthca0
  fw_ver: 

[OMPI users] libibverbs and openmpi 1.2.6

2008-06-05 Thread Samuel Sarholz

Hi,

we have a some issues with the openmpi 1.2.6 and libibverbs.

We have some machines with Infiniband and some without.
We compiled openmpi with IB support.
With OpenMPI 1.2.5 it was no problem running that version on the 
machines without IB.
However with 1.2.6 the library libibverbs is linked which doen't exist 
on some of the machines.


Is there a way to get programs compiled on a IB machine running them on 
machines without IB? (remove the dependency to libibverbs from openmpi 
1.2.6)



best regards,
Samuel

--
Dipl.-Inform. Samuel Sarholz-   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication,
Seffenter Weg 23, 52074 Aachen (Germany), Office: 2.13
Tel: +49 241/80-24915 - Fax: +49 241/80-22134
mailto:sarh...@rz.rwth-aachen.de www.rz.rwth-aachen.de


smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] Communicators in Fortran and C

2008-06-05 Thread Samuel Sarholz

Hi,

I have problems running a Fortran program which is calling a C library 
with OpenMPI.


The problem is that the Fortan part has a communicator which is passed 
to the C library.
And if I understand the headers right a communicator is an integer in 
fortran, but a struct on the C side of openmpi.


Is there a way to translate/cast these communicators?




best regards,
Samuel


smime.p7s
Description: S/MIME Cryptographic Signature