Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-25 Thread Peter Kjellström
FYI, Just noticed this post from the hdf group:

https://forum.hdfgroup.org/t/hdf5-and-openmpi/5437

/Peter K


pgpmcS_mBlpzB.pgp
Description: OpenPGP digital signature
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-21 Thread Peter Kjellström
On Wed, 20 Feb 2019 10:46:10 -0500
Adam LeBlanc  wrote:

> Hello,
> 
> When I do a run with OpenMPI v4.0.0 on Infiniband with this command:
> mpirun --mca btl_openib_warn_no_device_params_found 0 --map-by node
> --mca orte_base_help_aggregate 0 --mca btl openib,vader,self --mca
> pml ob1 --mca btl_openib_allow_ib 1 -np 6
>  -hostfile /home/aleblanc/ib-mpi-hosts IMB-MPI1
> 
> I get this error:
...
> # Benchmarking Reduce_scatter
...
>   2097152   20  8738.08  9340.50  9147.89
> [pandora:04500] *** Process received signal ***
> [pandora:04500] Signal: Segmentation fault (11)

This is very likely a bug in IMB not in OpenMPI. It's been discussed on
the list before, thread name:

 MPI_Reduce_Scatter Segmentation Fault with Intel  2019 Update 1
 Compilers on OPA-1...

You can work around by using an older IMB version (the bug is in the
newer/est version).

/Peter K
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

2019-01-10 Thread Peter Kjellström
On Thu, 10 Jan 2019 21:51:03 +0900
Gilles Gouaillardet  wrote:

> Eduardo,
> 
> You have two options to use OmniPath
> 
> - “directly” via the psm2 mtl
> mpirun —mca pml cm —mca mtl psm2 ...
> 
> - “indirectly” via libfabric
> mpirun —mca pml cm —mca mtl ofi ...
> 
> I do invite you to try both. By explicitly requesting the mtl you will
> avoid potential conflicts.
> 
> libfabric is used in production by Cisco and AWS (both major
> contributors to both Open MPI and libfabric) so this is clearly not
> something to stay away from.

Both me and a 2nd person investigated 4.0.0rc on Omnipath (see devel
list thread "Re: [OMPI devel] Announcing Open MPI v4.0.0rc1").

First both psm2 and ofi seemed broken but it turned out psm2 only had
problems because ofi got in the way. And ofi was not that easily
excluded since it also had a btl component.

Essentially I got it working by deleting all mca files matching *ofi*.

YMMV,
 Peter K
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

2019-01-10 Thread Peter Kjellström
On Thu, 10 Jan 2019 11:20:12 +
ROTHE Eduardo - externe  wrote:

> Hi Gilles, thank you so much for your support!
> 
> For now I'm just testing the software, so it's running on a single
> node.
> 
> Your suggestion was very precise. In fact, choosing the ob1 component
> leads to a successfull execution! The tcp component had no effect.
> 
> mpirun --mca pml ob1 —mca btl tcp,self -np 2 ./a.out > Success
> mpirun --mca pml ob1 -np 2 ./a.out > Success
> 
> But... our cluster is equiped with Intel OMNI Path interconnects and
> we are aiming to use psm2 through ofi component in order to take full
> advantage of this technology.

Ofi support in openmpi has been something to stay away from in my
experience. You should just use the psm2 mtl instead.

/Peter K
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] open-mpi.org 3.1.3.tar.gz needs a refresh?

2018-12-31 Thread Peter Kjellström
On Sat, 22 Dec 2018 12:42:24 -0500
Bennet Fauber  wrote:

> Maybe the distribution tar ball at
> 
> https://download.open-mpi.org/release/open-mpi/v3.1/openmpi-3.1.3.tar.gz
> 
> did not get refreshed after the fix in
> 
> https://github.com/bosilca/ompi/commit/b902cd5eb765ada57f06c75048509d0716953549
> 
> was implemented? I downloaded the tarball from open-mpi.org today, 22
> Dec, and compiled and I get the warnings.

The 3.1.3 tar ball will always be exactly what it was when released
(ie. the 3.1.3 tag in git).

The commit you refer to was merged to the 3.1.x branch after 3.1.3 and
will as such be available in 3.1.4 if nothing unexpected happens.

If you want an unreleased 3.1.x you can use the corresponding nightly
build found at:

 https://www.open-mpi.org/nightly/v3.1.x/

Also the commit on the v3.1.x branch is:

commit 9cce716e75b15c2fd7b1a017d807fe2e733e6ee6
Merge: 1704063162 00ab40cd79
Author: Ralph Castain 
Date:   Tue Dec 4 06:14:41 2018 -0800

Merge pull request #6038 from hppritcha/topic/swat_issue5810_v3.1.x

btl/openib: fix a problem with ib query


/Peter___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] MPI_Reduce_Scatter Segmentation Fault with Intel 2019 Update 1 Compilers on OPA-1

2018-12-04 Thread Peter Kjellström
On Tue, 4 Dec 2018 09:15:13 -0500
George Bosilca  wrote:

> I'm trying to replicate using the same compiler (icc 2019) on my OSX
> over TCP and shared memory with no luck so far. So either the
> segfault it's something specific to OmniPath or to the memcpy
> implementation used on Skylake.

Note that it's the imb-2019.1 that is the problem (I think). And I did
get it to crash even on a single node (skylake / centos7).

/Peter

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] MPI_Reduce_Scatter Segmentation Fault with Intel 2019 Update 1 Compilers on OPA-1

2018-12-04 Thread Peter Kjellström
On Mon, 3 Dec 2018 19:41:25 +
"Hammond, Simon David via users"  wrote:

> Hi Open MPI Users,
> 
> Just wanted to report a bug we have seen with OpenMPI 3.1.3 and 4.0.0
> when using the Intel 2019 Update 1 compilers on our
> Skylake/OmniPath-1 cluster. The bug occurs when running the Github
> master src_c variant of the Intel MPI Benchmarks.

I've noticed this also when using intel mpi (2018 and 2019u1). I
classified it as a bug in imb but didn't look too deep (new
reduce_scatter code).

/Peter K

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] MPI cartesian grid : cumulate a scalar value through the procs of a given axis of the grid

2018-05-02 Thread Peter Kjellström
On Wed, 2 May 2018 08:39:30 -0400
Charles Antonelli  wrote:

> This seems to be crying out for MPI_Reduce.

No, the described reduction cannot be implemented with MPI_Reduce (note
the need for partial sums along the axis).
 
> Also in the previous solution given, I think you should do the
> MPI_Sends first.  Doing the MPI_Receives first forces serialization.

It needs that. The first thing that happens is that the first rank
skips the recv and sends its SCAL to the 2nd process that just posted
recv.

Each process needs to complete the recv to know what to send (unless
you split it out into many more sends which is possible).

What's the best solution depends on if this part is performance
critical and how large K is.

/Peter K

> Regards,
> Charles
...
> > Something like (simplified psuedo code):
> >
> > if (not_first_along_K)
> >  MPI_RECV(SCAL_tmp, previous)
> >  SCAL += SCAL_tmp
> >
> > if (not_last_along_K)
> >  MPI_SEND(SCAL, next)
> >
> > /Peter K
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
> >  


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] MPI cartesian grid : cumulate a scalar value through the procs of a given axis of the grid

2018-05-02 Thread Peter Kjellström
On Wed, 02 May 2018 06:32:16 -0600
Nathan Hjelm  wrote:

> Hit send before I finished. If each proc along the axis needs the
> partial sum (ie proc j gets sum for i = 0 -> j-1 SCAL[j]) then
> MPI_Scan will do that. 

I must confess that I had forgotten about MPI_Scan when I replied to
the OP. In fact, I don't think I've ever used it... :-)

/Peter K 

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] MPI cartesian grid : cumulate a scalar value through the procs of a given axis of the grid

2018-05-02 Thread Peter Kjellström
On Wed, 2 May 2018 11:15:09 +0200
Pierre Gubernatis  wrote:

> Hello all...
> 
> I am using a *cartesian grid* of processors which represents a spatial
> domain (a cubic geometrical domain split into several smaller
> cubes...), and I have communicators to address the procs, as for
> example a comm along each of the 3 axes I,J,K, or along a plane
> IK,JK,IJ, etc..).
> 
> *I need to cumulate a scalar value (SCAL) through the procs which
> belong to a given axis* (let's say the K axis, defined by I=J=0).
> 
> Precisely, the origin proc 0-0-0 has a given value for SCAL (say
> SCAL000). I need to update the 'following' proc (0-0-1) by doing SCAL
> = SCAL + SCAL000, and I need to *propagate* this updating along the K
> axis. At the end, the last proc of the axis should have the total sum
> of SCAL over the axis. (and of course, at a given rank k along the
> axis, the SCAL value = sum over 0,1,   K of SCAL)
> 
> Please, do you see a way to do this ? I have tried many things (with
> MPI_SENDRECV and by looping over the procs of the axis, but I get
> deadlocks that prove I don't handle this correctly...)
> Thank you in any case.

Why did you try SENDRECV? As far as I understand your description above
data only flows one direction (along K)?

There is no MPI collective to support the kind of reduction you
describe but it should not be hard to do using normal SEND and RECV.
Something like (simplified psuedo code):

if (not_first_along_K)
 MPI_RECV(SCAL_tmp, previous)
 SCAL += SCAL_tmp

if (not_last_along_K)
 MPI_SEND(SCAL, next)

/Peter K
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] About my GPU performance using Openmpi-2.0.4

2017-12-14 Thread Peter Kjellström
On Wed, 13 Dec 2017 20:34:52 +0330
Mahmood Naderan  wrote:

> >Currently I am using two Tesla K40m cards for my computational work
> >on quantum espresso (QE) suit http://www.quantum-espresso.org/. My
> >GPU enabled QE code running very slower than normal version  
> 
> Hi,
> When I hear such words, I would say, yeah it is quite natural!
> 
> My personal experience with a GPU (Quadro M2000) was actually a
> failure and loss of money. With various models, configs and companies,
> it is very hard to determine if a GPU product really boosts the
> performance

Agreed. GPU performance is not a given. It depends on app, version,
input files, hardware, job-geometry, ..

> At the end of the day, I think companies put all good features in
> their high-end products (multi thousand dollar ones). So, I think the
> K40m version, where it uses passive cooling, misses many good features
> although it has 12GB of GDDR5.

K40m is the very high end (of the previous generation, Kepler). The
only higher speced GPU is the K80 which is just two slightly less
impressive K40 in one package.

As far as "passively cooled" goes. It's a server component where the
server is expected to provide the needed airflow. The K40m is a high
TDP part.

/Peter
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] IMB-MPI1 hangs after 30 minutes with Open MPI 3.0.0 (was: Openmpi 1.10.4 crashes with 1024 processes)

2017-12-04 Thread Peter Kjellström
On Fri, 1 Dec 2017 21:32:35 +0100
Götz Waschk  wrote:
...
> # Benchmarking Alltoall
> # #processes = 1024
> #
>#bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
> 0 1000 0.04 0.09 0.05
> 1 1000   253.40   335.35   293.06
> 2 1000   266.93   346.65   306.23
> 4 1000   303.52   382.41   342.21
> 8 1000   383.89   493.56   439.34
>16 1000   501.27   627.84   569.80
>32 1000  1039.65  1259.70  1163.12
>64 1000  1710.12  2071.47  1910.62
>   128 1000  3051.68  3653.44  3398.65

As a potentially interesting data point, I dug through my archive of
imb output and found an example that also showed something strange
happening at the 128 to 256 byte transition on alltoall @1024 ranks
(although in my case it didn't completely hang):

# Benchmarking Alltoall 
# #processes = 1024 
#
   #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
1 1000   417.44   417.59   417.54
2 1000   410.50   410.72   410.67
4 1000   365.92   366.21   365.99
8 1000   583.21   583.51   583.37
   16 1000   652.90   653.09   652.98
   32 1000   982.09   982.42   982.28
   64 1000  2090.70  2091.11  2090.90
  128 1000  2590.91  2591.93  2591.44
  256   93 70077.42 70219.70 70174.85
  512   93 88611.39 88711.53 88672.84

My output was run on OpenMPI-1.7.6 on CentOS-6 on Mellanox FDR ib
(using the normal verbs/openib transport).

/Peter K

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] alltoallv

2017-10-11 Thread Peter Kjellström
On Tue, 10 Oct 2017 11:57:51 -0400
Michael Di Domenico  wrote:

> i'm getting stuck trying to run some fairly large IMB-MPI alltoall
> tests under openmpi 2.0.2 on rhel 7.4

What is the IB stack used, just RHEL inbox?

Do you run openmpi on the psm mtl for qlogic and openib btl for
mellanox or something different?

> i have two different clusters, one running mellanox fdr10 and one
> running qlogic qdr
> 
> if i issue
> 
> mpirun -n 1024 ./IMB-MPI1 -npmin 1024 -iter 1 -mem 2.001 alltoallv

Does it work if you run with something that more obviously fits in RAM?
Like "-mem 0.2"

/Peter K
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Hybrid MPI+OpenMP benchmarks (looking for)

2017-10-10 Thread Peter Kjellström
HPGMG-FV is easy to build and to run both serial, mpi, openmp and
mpi+openmp.

/Peter

On Mon, 9 Oct 2017 17:54:02 +
"Sasso, John (GE Digital, consultant)"  wrote:

> I am looking for a decent hybrid MPI+OpenMP benchmark utility which I
> can easily build and run with OpenMPI 1.6.5 (at least) and OpenMP
> under Linux, using GCC build of OpenMPI as well as the Intel Compiler
> suite.  I have looked at CP2K but that is much too complex a build
> for its own good (I managed to build all the prerequisite libraries,
> only to have the build of cp2k itself just fail).  Also looked at
> HOMB 1.0.
> 
> I am wondering what others have used.  The build should be simple and
> not require a large # of prereq libraries to build beforehand.
> Thanks!
> 
> --john

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] mpif90 unable to find ibverbs

2017-09-14 Thread Peter Kjellström
On Thu, 14 Sep 2017 19:01:08 +0900
Gilles Gouaillardet  wrote:

> Peter and all,
> 
> an easier option is to configure Open MPI with
> --mpirun-prefix-by-default this will automagically add rpath to the
> libs.

Yes that sorts out the OpenMPI libs but I was imagining a more general
situation (and the OP later tried adding openblas).

It's also only available if the OpenMPI in question is built with it
or if you can rebuild OpenMPI.

The OP seems at least partially interested in additional libraries and
not rebuilding the system provided OpenMPI.

/Peter

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] mpif90 unable to find ibverbs

2017-09-14 Thread Peter Kjellström
On Thu, 14 Sep 2017 14:28:08 +0430
Mahmood Naderan  wrote:

> >In short, "mpicc -Wl,-rpath=/my/lib/path helloworld.c -o hello", will
> >compile a dynamic binary "hello" with built in search path
> >to "/my/lib/path".  
> 
> Excuse me... Is that a path or file? I get this:

It should be a path ie. directory.
 
> mpif90 -g -pthread -Wl,rpath=/share/apps/computer/OpenBLAS-0.2.18 -o
> iotk_print_kinds.x iotk_print_kinds.o libiotk.a
> /usr/bin/ld: rpath=/share/apps/computer/OpenBLAS-0.2.18: No such
> file: No such file or directory

I think it didn't like passing "-rpath=/a/b/c" in one chunk. Try
this variant:

-Wl,-rpath,/path/to/directory

or even:

-Wl,-rpath -Wl,/path/to/directory

/Peter
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] mpif90 unable to find ibverbs

2017-09-14 Thread Peter Kjellström
On Wed, 13 Sep 2017 20:13:54 +0430
Mahmood Naderan  wrote:
...
> `/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/libc.a(strcmp.o)'
> can not be used when making an executable; recompile with -fPIE and
> relink with -pie collect2: ld returned 1 exit status
> 
> 
> With such an error, I thought it is better to forget static linking!
> (as it is related to libc) and work with the shared libs and
> LD_LIBRARY_PATH

First, I think giving up on static linking is the right choice.

If the main thing you were after was the convenience of a binary that
will run without the need to setup LD_LIBRARY_PATH correctly you should
have a look at passing -rpath to the linker.

In short, "mpicc -Wl,-rpath=/my/lib/path helloworld.c -o hello", will
compile a dynamic binary "hello" with built in search path
to "/my/lib/path".

With OpenMPI this will be added as a "runpath" due to how the wrappers
are designed. Both rpath and runpath works for finding "/my/lib/path"
wihtout LD_LIBRARY_PATH but the difference is in priority. rpath is
higher priority than LD_LIBRARY_PATH etc. and runpath is lower.

You can check your rpath or runpath in a binary using the command
chrpath (package on rhel/centos/... is chrpath):

$ chrpath hello
hello: RUNPATH=/my/lib/path

If what you really wanted is the rpath behavior (winning over any
LD_LIBRARY_PATH in the environment etc.) then you need to modify the
openmpi wrappers (rebuild openmpi) such that it does NOT pass
"--enable-new-dtags" to the linker.

/Peter
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

2016-06-15 Thread Peter Kjellström
On Wed, 15 Jun 2016 15:00:05 +0530
Sreenidhi Bharathkar Ramesh 
wrote:

> hi Mehmet / Llolsten / Peter,
> 
> Just curious to know what is the NIC or fabric you are using in your
> respective clusters.
> 
> If it is Mellanox, is it not better to use the MLNX_OFED ?

We run both Mellanox ConnectX3 based clusters and Intel Trulescale.

Today it may be warranted to look into specific driver if you're using
Omnipath or newer Mellanox HCAs using the mlx5 driver
(ConnectX4/ConnectIB).

/Peter K

> This information may help us build our cluster. Hence, asking.
> 
> Thanks,
> - Sreenidhi.


Re: [OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

2016-06-15 Thread Peter Kjellström
On Tue, 14 Jun 2016 13:18:33 -0400
"Llolsten Kaonga"  wrote:

> Hello Grigory,
> 
> I am not sure what Redhat does exactly but when you install the OS,
> there is always an InfiniBand Support module during the installation
> process. We never check/install that module when we do OS
> installations because it is usually several versions of OFED behind
> (almost obsolete).

It's not as bad as you assume. Also as I said before it's not an OFED
version at all.

We (and many other medium+ HPC centers) run the redhat stack for
reason that it is 1) good enough 2) not an extra complication for the
system environment.

/Peter K (with ~3000 hpc nodes on rhel-ib for many years)


Re: [OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

2016-06-15 Thread Peter Kjellström
On Tue, 14 Jun 2016 16:20:42 +
Grigory Shamov <grigory.sha...@umanitoba.ca> wrote:

> On 2016-06-14, 3:42 AM, "users on behalf of Peter Kjellström"
> <users-boun...@open-mpi.org on behalf of c...@nsc.liu.se> wrote:
> 
> >On Mon, 13 Jun 2016 19:04:59 -0400
> >Mehmet Belgin <mehmet.bel...@oit.gatech.edu> wrote:
> >  
> >> Greetings!
> >> 
> >> We have not upgraded our OFED stack for a very long time, and still
> >> running on an ancient version (1.5.4.1, yeah we know). We are now
> >> considering a big jump from this version to a tested and stable
> >> recent version and would really appreciate any suggestions from the
> >> community.  
> >
> >Some thoughts on the subject.
> >
> >* Not installing an external ibstack is quite attractive imo.
> >  RHEL/CentOS stack (not based on any direct OFED version) works fine
> >  for us. It simplifies cluster maintenance (kernel updates etc.).  
> 
> 
> I am curious on how Redhat stack is ³not based on any direct OFED
> version²? 
> Doesn¹t Redhat just ship an old OFED build, or they do their own
> changes to it like to the kernel?

No, let's define things a bit.

OFED is a packaging of many opensource components with various
upstreams. Simplified it draws upon kernel.org/linux-rdma for kernel
side stuff and many spread out user side projects (mostly under the
openfabrics umbrella).

If you run an upstream kernel and pull+build, for example, the current
master branch of the libraries you need you're not running any form of
OFED. 

OFED does (mainly) three things in my view 1) pick a set of versions
and test it together 2) backport the kernel side to popular enterprisy
kernels 3) put it all in a complete package.

Redhat does not base its ib stack on a specific OFED release.
Functionality is cherry picked and backported from upstream (kernel)
and user space packages are pulled directly for their respective places
(and updated when needed).

/Peter K


Re: [OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

2016-06-14 Thread Peter Kjellström
On Mon, 13 Jun 2016 19:04:59 -0400
Mehmet Belgin  wrote:

> Greetings!
> 
> We have not upgraded our OFED stack for a very long time, and still 
> running on an ancient version (1.5.4.1, yeah we know). We are now 
> considering a big jump from this version to a tested and stable
> recent version and would really appreciate any suggestions from the
> community.

Some thoughts on the subject.

* Not installing an external ibstack is quite attractive imo.
  RHEL/CentOS stack (not based on any direct OFED version) works fine
  for us. It simplifies cluster maintenance (kernel updates etc.).

* If you use an external IB-stack consider the constraints it may put
  on your update plans (for example, you want to update to CentOS-7.3
  but your OFED only supports 7.2...).

* Also consider updates for the stack itself wrt. security. Upstream
  OFED has been quite good at patching security bus but they DO NOT
  maintain older releases (-> you may have to run a nightly build of
  latest). Mellanox has patched when poked at but also only for latest
  version. Intel does not seem to do security afaict and with a dist
  stack it's covered by the normal dist updates.

/Peter K


Re: [OMPI users] #cpus/socket

2011-09-13 Thread Peter Kjellström
On Tuesday, September 13, 2011 09:07:32 AM nn3003 wrote:
> Hello !
>  
> I am running wrf model on 4x AMD 6172 which is 12 core CPU. I use OpenMPI
> 1.4.3 and libgomp 4.3.4. I have binaries compiled for shared-memory and
> distributed-memory (OpenMP and OpenMPI) I use following command
> mpirun -np 4 --cpus-per-proc 6 --report-bindings --bysocket wrf.exe
> It works ok and in top i see there are 4 wrf.exe and each has 6 threads on
> cpu0-5 12-17 24-29 36-41 However, if I want to run 8 or more e.g.
> mpirun -np 4 --cpus-per-proc 12 --report-bindings --bysocket wrf.exe
> I get error
> Your job has requested more cpus per process(rank) than there
> are cpus in a socket:
>   Cpus/rank: 8
>   #cpus/socket: 6
>  
> Why is that ? There are 12 cores per socket in AMD 6172.

In reality a 12 core Magnycours is two 6 core dies on a socket. I'm guessing 
that the topology code sees your 4x 12 core as a 8x 6 core.

/Peter


signature.asc
Description: This is a digitally signed message part.


Re: [OMPI users] MPI_Allgather with derived type crash

2011-05-25 Thread Peter Kjellström
On Wednesday, May 25, 2011 01:16:04 PM Andrew Senin wrote:
> Hello list,
> 
> I have an application which uses MPI_Allgather with derived types. It works
> correctly with mpich2 and mvapich2. However it crashes periodically with
> openmpi2. After investigation I found that the crash takes place when I use
> derived datatypes with MPI_AllGather and number of ranks greater than 8.

Would 8 happen to be the number of cores you have per node so what we're 
seeing is: single node OK, multi node FAIL?

If so what kind of inter node network are you (trying to) use(ing)?

/Peter


signature.asc
Description: This is a digitally signed message part.


Re: [OMPI users] MPI_Allgather with derived type crash

2011-05-25 Thread Peter Kjellström
On Wednesday, May 25, 2011 01:16:04 PM Andrew Senin wrote:
> Hello list,
> 
> I have an application which uses MPI_Allgather with derived types. It works
> correctly with mpich2 and mvapich2. However it crashes periodically with
> openmpi2.

Which version of OpenMPI are you using? There is no such thing as openmpi2...

/Peter


signature.asc
Description: This is a digitally signed message part.


Re: [OMPI users] Error occurred in MPI_Allreduce on communicator MPI_COMM_WORLD

2011-05-04 Thread Peter Kjellström
On Wednesday, May 04, 2011 04:04:37 PM hi wrote:
> Greetings !!!
> 
> I am observing following error messages when executing attached test
> program...
> 
> 
> C:\test>mpirun mar_f.exe
...
> [vbgyor:9920] *** An error occurred in MPI_Allreduce
> [vbgyor:9920] *** on communicator MPI_COMM_WORLD
> [vbgyor:9920] *** MPI_ERR_OP: invalid reduce operation
> [vbgyor:9920] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)

I'm not a fortran programmer but it seems to me that placing the MPI_Allreduce 
call in a subroutine like that broke the meaning of MPI_SUM and MPI_REAL in 
that scope. Adding:

 include 'mpif.h'

after SUBROUTINE PAR_BLAS2(m, n, a, b, c, comm) helps.

/Peter


signature.asc
Description: This is a digitally signed message part.


Re: [OMPI users] bizarre failure with IMB/openib

2011-03-21 Thread Peter Kjellström
On Monday, March 21, 2011 12:25:37 pm Dave Love wrote:
> I'm trying to test some new nodes with ConnectX adaptors, and failing to
> get (so far just) IMB to run on them.
...
> I'm using gcc-compiled OMPI 1.4.3 and the current RedHat 5 OFED with IMB
> 3.2.2, specifying `btl openib,sm,self' (or `mtl psm' on the Qlogic
> nodes).  I'm not sure what else might be relevant.  The output from
> trying to run IMB follows, for what it's worth.
> 
>  
> --
> At least one pair of MPI processes are unable to reach each other for MPI
> communications.  This means that no Open MPI device has indicated that it
> can be used to communicate between these processes.  This is an error;
> Open MPI requires that all MPI processes be able to reach each other. 
> This error can sometimes be the result of forgetting to specify the "self"
> BTL.
> 
> Process 1 ([[25307,1],2]) is on host: lvgig116
> Process 2 ([[25307,1],12]) is on host: lvgig117
> BTLs attempted: self sm

Are you sure you launched it correctly and that you have (re)built OpenMPI 
against your Redhat-5 ib stack?
 
>   Your MPI job is now going to abort; sorry.
...
>   [lvgig116:07931] 19 more processes have sent help message
> help-mca-bml-r2.txt / unreachable proc [lvgig116:07931] Set MCA parameter

Seems to me that OpenMPI gave up because it didn't succeed in initializing any 
inter-node btl/mtl.

I'd suggest you try (roughly in order):

 1) ibstat on all nodes to verify that your ib interfaces are up
 2) try a verbs level test (like ib_write_bw) to verify data can flow
 3) make sure your OpenMPI was built with the redhat libibverbs-devel present
(=> a suitable openib btl is built).

/Peter

> "orte_base_help_aggregate" to 0 to see all help / error messages
> [lvgig116:07931] 19 more processes have sent help message help-mpi-runtime
> / mpi_init:startup:internal-failure


signature.asc
Description: This is a digitally signed message part.


Re: [OMPI users] OpenMPI without IPoIB

2011-03-15 Thread Peter Kjellström
On Monday, March 14, 2011 09:37:54 pm Bernardo F Costa wrote:
> Ok. Native ibverbs/openib is preferable although cannot be used by all
> applications (those who do not have a native ip interface).

Applications (in this context at least) uses the MPI interface. MPI in general 
and OpenMPI in perticular can and should run on top of verbs(btl:openib) or 
psm(mtl:psm) (Mellanox or Qlogic repectively).

/Peter


signature.asc
Description: This is a digitally signed message part.


Re: [OMPI users] QLogic Infiniband : Run switch from ib0 to eth0

2011-03-11 Thread Peter Kjellström
On Thursday, March 10, 2011 08:30:19 pm Thierry LAMOUREUX wrote:
> Hello,
> 
> We add recently enhanced our network with Infiniband modules on a six node
> cluster.
> 
> We have install all OFED drivers related to our hardware
> 
> We have set network IP like following :
> - eth : 192.168.1.0 / 255.255.255.0
> - ib : 192.168.70.0 / 255.255.255.0
> 
> After first tests all seems good. IB interfaces ping each other, ssh and
> other king of exchanges over IB works well.

A very important thing to realise is that TCP/IP on Infiniband, while quite 
possible and sometimes useful, has very little to do with running MPI/OpenMPI 
"using" Infiniband.

MPI data transport can run on either TCP/IP (btl: tcp) or natively on IB (for 
Mellanox btl: openib, for Qlogic mtl: psm).

On top of this job startup uses TCP/IP.
 
> Then we started to run our job thought openmpi (building with --with-openib
> option) and our first results were very bad.

This builds the openib btl but it wont be used runtime if there's no active ib 
interface (I'm _NOT_ talking about interface as listed by ifconfig). Check you 
IB with ibstat or similar.

Also, while it's possible to run MPI traffic on the openib btl (verbs) on 
Qlogic cards you'll have to use the psm mtl (psm) for good performance.

/Peter

> After investigations, our system have the following behaviour :
> - job starts over ib network (few packet are sent)
> - job switch to eth network (all next packet sent to these interfaces)
> 
> We never specified the IP Address of our eth interfaces.
> 
> We tried to launch our jobs with the following options :
> - mpirun -hostfile hostfile.list -mca blt openib,self
> /common_gfs2/script-test.sh
> - mpirun -hostfile hostfile.list -mca blt openib,sm,self
> /common_gfs2/script-test.sh
> - mpirun -hostfile hostfile.list -mca blt openib,self -mca
> btl_tcp_if_exclude lo,eth0,eth1,eth2 /common_gfs2/script-test.sh
> 
> The final behaviour remain the same : job is initiated over ib and runs
> over eth.
> 
> We grab performance tests file (osu_bw and osu_latency) and we got not so
> bad results (see attached files).
> 
> We had tried plenty of different things but we are stuck : we don't have
> any error message...
> 
> Thanks per advance for your help.
> 
> Thierry.


signature.asc
Description: This is a digitally signed message part.


Re: [OMPI users] CQ errors

2011-01-10 Thread Peter Kjellström
On Monday, January 10, 2011 03:06:06 pm Michael Di Domenico wrote:
> I'm not sure if these are being reported from OpenMPI or through
> OpenMPI from OpenFabrics, but i figured this would be a good place to
> start
> 
> On one node we received the below errors, i'm not sure i under the
> error sequence, hopefully someone can shed some light on what
> happened.
> 
> [[5691,1],49][btl_openib_component.c:3294:handle_wc] from node27 to:
...
> network is qlogic qdr end to end, openmpi 1.5 and ofed 1.5.2 (q stack)

Not really addressing your problem, but, with qlogic you should be using psm, 
not verbs (btl_openib).

That said, openib should work (slowly).

/Peter


signature.asc
Description: This is a digitally signed message part.


Re: [OMPI users] difference between single and double precision

2010-12-06 Thread Peter Kjellström
On Monday 06 December 2010 15:03:13 Mathieu Gontier wrote:
> Hi,
> 
> A small update.
> My colleague made a mistake and there is no arithmetic performance
> issue. Sorry for bothering you.
> 
> Nevertheless, one can observed some differences between MPICH and
> OpenMPI from 25% to 100% depending on the options we are using into our
> software. Tests are lead on a single SGI node on 6 or 12 processes, and
> thus, I am focused on the sm option.

A few previous threads on sm performance have been related to what /tmp is. 
OpenMPI relies on (or at least used to rely on) this being backed by page 
cache (tmpfs, a local ext3 or similar). I'm not sure what the behaviour is in 
the latest version but then again you didn't say which version you've tried.

/Peter


signature.asc
Description: This is a digitally signed message part.


Re: [OMPI users] Infiniband problem, kernel mismatch

2010-12-03 Thread Peter Kjellström
On Friday 19 November 2010 01:03:35 HeeJin Kim wrote:
...
> *   mlx4: There is a mismatch between the kernel and the userspace
> libraries: Kernel does not support XRC. Exiting.*
...
> What I'm thinking is that the infiniband card is installed but it doesn't
> work in correct mode.
> My linux kernel version is *2.6.18-164.el5*, and installed ofed
> version is *kernel-ib-pp-1.4.1-ofed20090528r1.4.1sgi605r1.rhel5

Why don't you as a first step try the ib software that is included with EL5.4 
(that is, don't install OFED). We run several clusters this way.

Also, consider updating to 5.5 (the version you're on includes several 
security vulnerabilities).

/Peter


signature.asc
Description: This is a digitally signed message part.


Re: [OMPI users] Does current Intel dual processor support MPI?

2006-09-05 Thread Peter Kjellström
On Tuesday 05 September 2006 09:19, Aidaros Dev wrote:
> Nowdays we hear about intel dual core processor, An Intel dual-core
> processor consists of two complete execution cores in one physical
> processor both running at the same frequency. Both cores share the same
> packaging and the same interface with the chipset/memory.
> Can I use MPI library to communicate these processors? Can we consider as
> they are separated?

You can treat one dual core processor like it was two normal single core 
processors. As such, MPI works fine as it does on any smp.

/Peter


pgpsrLaskJUbO.pgp
Description: PGP signature


Re: [O-MPI users] Performance of all-to-all on Gbit Ethernet

2006-01-04 Thread Peter Kjellström
Hello Carsten,

Have you considered the possibility that this is the effect of a non-optimal 
ethernet switch? I don't know how many nodes you need to reproduce it on or 
if you even have physical access (and opportunity) but popping in another 
decent 16-port switch for a testrun might be interesting.

just my .02 euros,
 Peter 

On Tuesday 03 January 2006 18:45, Carsten Kutzner wrote:
> On Tue, 3 Jan 2006, Graham E Fagg wrote:
> > Do you have any tools such as Vampir (or its Intel equivalent) available
> > to get a time line graph ? (even jumpshot of one of the bad cases such as
> > the 128/32 for 256 floats below would help).
>
> Hi Graham,
>
> I have attached an slog file of an all-to-all run for 1024 floats (ompi
> tuned alltoall). I could not get clog files for >32 processes - is this
> perhaps a limitation of MPE? So I decided to take the case 32 CPUs on
> 32 nodes which is performance-critical as well. From the run output you
> can see that 2 of the 5 tries yield a fast execution while the others
> are slow (see below).
>
> Carsten
>
>
>
> ckutzne@node001:~/mpe> mpirun -hostfile ./bhost1 -np 32 ./phas_mpe.x
> Alltoall Test on 32 CPUs. 5 repetitions.
> --- New category (first test not counted) ---
> MPI: sending1024 floats (4096 bytes) to 32 processes (  1
> times) took ...0.00690 seconds
> -
> MPI: sending1024 floats (4096 bytes) to 32 processes (  1
> times) took ...0.00320 seconds MPI: sending1024 floats (4096
> bytes) to 32 processes (  1 times) took ...0.26392 seconds ! MPI:
> sending1024 floats (4096 bytes) to 32 processes (  1 times)
> took ...0.26868 seconds ! MPI: sending1024 floats (4096 bytes)
> to 32 processes (  1 times) took ...0.26398 seconds ! MPI: sending 
>   1024 floats (4096 bytes) to 32 processes (  1 times) took ...   
> 0.00339 seconds Summary (5-run average, timer resolution 0.01):
>   1024 floats took 0.160632 (0.143644) seconds. Min: 0.003200  max:
> 0.268681 Writing logfile
> Finished writing logfile.

-- 

  Peter Kjellström   |
  National Supercomputer Centre  |
  Sweden | http://www.nsc.liu.se


pgpqBS6LxHJl2.pgp
Description: PGP signature


[O-MPI users] how do you select which network/trasport to use at run-time?

2005-08-23 Thread Peter Kjellström
Hello,

First I'd like to say that I'm really happy and excited that public access to 
svn is now open :-)

Here is what went fine: check-out, autogen, configure, make, ompi_info and 
simple mpi app (both build and run!!!)

Now I'd like to control over which channels/transports/networks the data 
flows... I configured and built ompi against mvapi (mellanox ibgd-1.8.0) and 
as far as I can tell it went well. Judging by the behaviour of the tests I 
have done it defaults to tcp (over ethernet in my case). How do I select 
mvapi?

Here's some detailed information:
ompi-version: 1.0a1r6976
configure  : --prefix=/usr/local/openmpi-svn6976/intel-8.1e-027 \
 --with-btl-mvapi=/opt/ibgd/driver/infinihost
compilers  : icc, ifort 8.1.027 (64-bit for em64t)
os : centos-4.1 64-bit (el4u1 rebuild)
kernel : 2.6.9-11smp
mvapi  : mellanox ibgd-1.8.0
ompi_info | grep -i mvapi:
 MCA mpool : mvapi (MCA v1.0, API v1.0, Component v1.0)
 MCA btl   : mvapi (MCA v1.0, API v1.0, Component v1.0)
hardware   : dual Xeon Nocona 2 GiB mem, mell. pci-exress HCAs

tia,
 Peter

-- 

  Peter Kjellström   |
  National Supercomputer Centre  |
  Sweden | http://www.nsc.liu.se


pgpEultSX5i25.pgp
Description: PGP signature