from:"Brock Palen"

[OMPI users] Some nodes have ucx over IB failures

2021-01-12 Thread Brock Palen via users

We have an odd behavior after an update, the most severe one is if UCX is
allowed to use IB it will fail for anything except small messages,

Using PingPong from IMB
OMPI_MCA_pml_ucx_verbose=100 mpirun -x UCX_LOG_LEVEL=DEBUG -x
UCX_MODULE_LOG_LEVEL=DEBUG IMB-MPI1 PingPong

[lh.arc-ts.umich.edu:09155] pml_ucx.c:130 mca_pml_ucx_open
[lh1112.arc-ts.umich.edu:09551] pml_ucx.c:130 mca_pml_ucx_open
[lh.arc-ts.umich.edu:09155] pml_ucx.c:194 mca_pml_ucx_init
[lh1112.arc-ts.umich.edu:09551] pml_ucx.c:194 mca_pml_ucx_init
[lh.arc-ts.umich.edu:09155] pml_ucx.c:247 created ucp context
0x1d00de0, worker 0x2ad833eba010
[lh1112.arc-ts.umich.edu:09551] pml_ucx.c:247 created ucp context 0xc1fe40,
worker 0x2ada73f80010
[lh.arc-ts.umich.edu:09155] pml_ucx.c:286 connecting to proc. 0
[1610507831.995613] [lh:9155 :0] ucp_worker.c:1543 UCX  INFO
 ep_cfg[1]: tag(self/memory knem/memory);
[lh1112.arc-ts.umich.edu:09551] pml_ucx.c:286 connecting to proc. 1
[1610507832.007926] [lh1112:9551 :0] ucp_worker.c:1543 UCX  INFO
 ep_cfg[1]: tag(self/memory knem/memory);
[lh.arc-ts.umich.edu:09155] pml_ucx.c:286 connecting to proc. 1
[1610507832.008479] [lh:9155 :0] ucp_worker.c:1543 UCX  INFO
 ep_cfg[2]: tag(rc_verbs/mlx4_0:1);
[1610507832.045284] [lh1112:9551 :0] ucp_worker.c:1543 UCX  INFO
 ep_cfg[2]: tag(rc_verbs/mlx5_0:1);

  256 1000 3.3875.74
  512 1000 3.41   150.05
 1024 1000 3.79   270.43
[lh:9155 :0:9155] rc_verbs_iface.c:65   send completion with error:
remote invalid request error qpn 0x2aa wrid 0x28 vendor_err 0x8a
 backtrace (tid:   9155) 


So you can see it starts up runs up till it goes over 1K message and then
fails.
Other nodes don't do this, any idea what is causing this?  The UCX logging
isn't helping much.

We have ucx 1.8.0 provided by MOFED

Thanks  here is a working pair of nodes,  all were rebuild from scratch
using automation


[brockp@lh0057 src]$ OMPI_MCA_pml_ucx_verbose=100 mpirun -x
UCX_LOG_LEVEL=DEBUG -x UCX_MODULE_LOG_LEVEL=DEBUG IMB-MPI1 PingPong
[lh0057.arc-ts.umich.edu:04804] pml_ucx.c:130 mca_pml_ucx_open
[lh0058.arc-ts.umich.edu:05126] pml_ucx.c:130 mca_pml_ucx_open
[lh0057.arc-ts.umich.edu:04804] pml_ucx.c:194 mca_pml_ucx_init
[lh0058.arc-ts.umich.edu:05126] pml_ucx.c:194 mca_pml_ucx_init
[lh0057.arc-ts.umich.edu:04804] pml_ucx.c:247 created ucp context
0x18a1e00, worker 0x2b3ef3e92010
[lh0058.arc-ts.umich.edu:05126] pml_ucx.c:247 created ucp context
0x1606e30, worker 0x2b6703edd010
[lh0057.arc-ts.umich.edu:04804] pml_ucx.c:286 connecting to proc. 0
[1610508044.471312] [lh0057:4804 :0] ucp_worker.c:1543 UCX  INFO
 ep_cfg[1]: tag(self/memory knem/memory);
[lh0058.arc-ts.umich.edu:05126] pml_ucx.c:286 connecting to proc. 1
[1610508044.471757] [lh0058:5126 :0] ucp_worker.c:1543 UCX  INFO
 ep_cfg[1]: tag(self/memory knem/memory);
[lh0057.arc-ts.umich.edu:04804] pml_ucx.c:286 connecting to proc. 1
[1610508044.472420] [lh0057:4804 :0] ucp_worker.c:1543 UCX  INFO
 ep_cfg[2]: tag(rc_mlx5/mlx5_0:1);
[1610508044.520315] [lh0058:5126 :0] ucp_worker.c:1543 UCX  INFO
 ep_cfg[2]: tag(rc_mlx5/mlx5_0:1);



  1048576   4092.21 11371.43
  2097152   20   179.31 11695.83
  4194304   10   352.87 11886.16


# All processes entering MPI_Finalize

[lh0057.arc-ts.umich.edu:04804] pml_ucx.c:423 disconnecting from rank 0
[lh0058.arc-ts.umich.edu:05126] pml_ucx.c:423 disconnecting from rank 0
[lh0058.arc-ts.umich.edu:05126] pml_ucx.c:373 waiting for 1 disconnect
requests
[lh0058.arc-ts.umich.edu:05126] pml_ucx.c:423 disconnecting from rank 1
[lh0057.arc-ts.umich.edu:04804] pml_ucx.c:423 disconnecting from rank 1
[lh0057.arc-ts.umich.edu:04804] pml_ucx.c:373 waiting for 1 disconnect
requests
[lh0057.arc-ts.umich.edu:04804] pml_ucx.c:373 waiting for 0 disconnect
requests
[lh0058.arc-ts.umich.edu:05126] pml_ucx.c:373 waiting for 0 disconnect
requests
[lh0057.arc-ts.umich.edu:04804] pml_ucx.c:253 mca_pml_ucx_cleanup
[lh0058.arc-ts.umich.edu:05126] pml_ucx.c:253 mca_pml_ucx_cleanup
[lh0058.arc-ts.umich.edu:05126] pml_ucx.c:178 mca_pml_ucx_close
[lh0057.arc-ts.umich.edu:04804] pml_ucx.c:178 mca_pml_ucx_close

Brock Palen
IG: brockpalen1984
www.umich.edu/~brockp
Director Advanced Research Computing - TS
bro...@umich.edu
Office: (734)936-1985   (not in use during Covid)
Cell:  (989)277-6075

Re: [hwloc-users] hwloc Python3 Bindings - Correctly Grab number cores available

2020-08-31 Thread Brock Palen

Thanks,

yeah I was looking for an API that would take into consideration most
cases, like I find with hwloc-bind --get   where I can find the number the
process has access to.  Wether is cgroups,  other sorts of affinity setting
etc.

Brock Palen
IG: brockpalen1984
www.umich.edu/~brockp
Director Advanced Research Computing - TS
bro...@umich.edu
(734)936-1985


On Mon, Aug 31, 2020 at 12:37 PM Guy Streeter 
wrote:

> I forgot that the cpuset value is still available in cgroups v2. You
> would want the cpuset.cpus.effective value.
> More information is available here:
> https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html
>
> On Mon, Aug 31, 2020 at 11:19 AM Guy Streeter 
> wrote:
> >
> > As I said, cgroups doesn't limit the group to a number of cores, it
> > limits processing time, either as an absolute amount or as a share of
> > what is available.
> > A docker process can be restricted to a set of cores, but that is done
> > with cpu affinity, not cgroups.
> >
> > You could try to figure out an equivalency. For instance if you are
> > using cpu.shares to limit the cgroups, then figure the ratio of a
> > cgroup's share to the shares of all the cgroups at that level, and
> > apply that ratio to the number of available cores to get an estimated
> > number of threads you should start.
> >
> > On Mon, Aug 31, 2020 at 10:40 AM Brock Palen  wrote:
> > >
> > > Sorry if wasn't clear, I'm trying to find out what is available to my
> process before it starts up threads.  If the user is jailed in a cgroup
> (docker, slurm, other)  and the program tries to start 36 threads, when it
> only has access to 4 cores, it's probably not a huge deal, but not
> desirable.
> > >
> > > I do allow the user to specify number of threads, but would like to
> automate it for least astonishment.
> > >
> > > Brock Palen
> > > IG: brockpalen1984
> > > www.umich.edu/~brockp
> > > Director Advanced Research Computing - TS
> > > bro...@umich.edu
> > > (734)936-1985
> > >
> > >
> > > On Mon, Aug 31, 2020 at 11:34 AM Guy Streeter 
> wrote:
> > >>
> > >> My very basic understanding of cgroups is that it can be used to limit
> > >> cpu processing time for a group, and to ensure fair distribution of
> > >> processing time within the group, but I don't know of a way to use
> > >> cgroups to limit the number of CPUs available to a cgroup.
> > >>
> > >> On Mon, Aug 31, 2020 at 8:56 AM Brock Palen  wrote:
> > >> >
> > >> > Hello,
> > >> >
> > >> > I have a small utility,  it is currently using
> multiprocess.cpu_count()
> > >> > Which currently ignores cgroups etc.
> > >> >
> > >> > I see https://gitlab.com/guystreeter/python-hwloc
> > >> > But appears stale,
> > >> >
> > >> > How would you detect number of threads that are safe to start in a
> cgroup from Python3 ?
> > >> >
> > >> > Thanks!
> > >> >
> > >> > Brock Palen
> > >> > IG: brockpalen1984
> > >> > www.umich.edu/~brockp
> > >> > Director Advanced Research Computing - TS
> > >> > bro...@umich.edu
> > >> > (734)936-1985
> > >> > ___
> > >> > hwloc-users mailing list
> > >> > hwloc-users@lists.open-mpi.org
> > >> > https://lists.open-mpi.org/mailman/listinfo/hwloc-users
> > >> ___
> > >> hwloc-users mailing list
> > >> hwloc-users@lists.open-mpi.org
> > >> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
> > >
> > > ___
> > > hwloc-users mailing list
> > > hwloc-users@lists.open-mpi.org
> > > https://lists.open-mpi.org/mailman/listinfo/hwloc-users
> ___
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
>
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] hwloc Python3 Bindings - Correctly Grab number cores available

2020-08-31 Thread Brock Palen

Sorry if wasn't clear, I'm trying to find out what is available to my
process before it starts up threads.  If the user is jailed in a cgroup
(docker, slurm, other)  and the program tries to start 36 threads, when it
only has access to 4 cores, it's probably not a huge deal, but not
desirable.

I do allow the user to specify number of threads, but would like to
automate it for least astonishment.

Brock Palen
IG: brockpalen1984
www.umich.edu/~brockp
Director Advanced Research Computing - TS
bro...@umich.edu
(734)936-1985


On Mon, Aug 31, 2020 at 11:34 AM Guy Streeter 
wrote:

> My very basic understanding of cgroups is that it can be used to limit
> cpu processing time for a group, and to ensure fair distribution of
> processing time within the group, but I don't know of a way to use
> cgroups to limit the number of CPUs available to a cgroup.
>
> On Mon, Aug 31, 2020 at 8:56 AM Brock Palen  wrote:
> >
> > Hello,
> >
> > I have a small utility,  it is currently using  multiprocess.cpu_count()
> > Which currently ignores cgroups etc.
> >
> > I see https://gitlab.com/guystreeter/python-hwloc
> > But appears stale,
> >
> > How would you detect number of threads that are safe to start in a
> cgroup from Python3 ?
> >
> > Thanks!
> >
> > Brock Palen
> > IG: brockpalen1984
> > www.umich.edu/~brockp
> > Director Advanced Research Computing - TS
> > bro...@umich.edu
> > (734)936-1985
> > ___
> > hwloc-users mailing list
> > hwloc-users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/hwloc-users
> ___
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
>
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

[hwloc-users] hwloc Python3 Bindings - Correctly Grab number cores available

2020-08-31 Thread Brock Palen

Hello,

I have a small utility,  it is currently using  multiprocess.cpu_count()
Which currently ignores cgroups etc.

I see https://gitlab.com/guystreeter/python-hwloc
But appears stale,

How would you detect number of threads that are safe to start in a cgroup
from Python3 ?

Thanks!

Brock Palen
IG: brockpalen1984
www.umich.edu/~brockp
Director Advanced Research Computing - TS
bro...@umich.edu
(734)936-1985
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

[OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-23 Thread Brock Palen

Is it possible to use mpirun / orte as a load balancer for running serial
jobs in parallel similar to GNU Parallel?
https://www.biostars.org/p/63816/

Reason is on any major HPC system you normally want to use a resource
manager launcher (TM, slurm etc)  and not ssh like gnu parallel.

I recall there being a way to give OMPI a stack of work todo from the talk
at SC this year, but I can't figure it out if it does what I think it
should do.

Thanks,

Brock Palen
www.umich.edu/~brockp <http://www.umich.edu/%7Ebrockp>
Director Advanced Research Computing - TS
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Building against XLC XLF

2016-04-03 Thread Brock Palen

1.10.1 failed in make  with xlc/xlf it (1.10.1) works with gcc.

master (cloned today from github)  works with xlc/xlf.

I wanted to be more confident it worked than just make check.  I have built
a few codes appears good so far.  Just reporting that master appears to
work with xlc/xlf.

On Sun, Apr 3, 2016 at 10:34 PM, Gilles Gouaillardet <gil...@rist.or.jp>
wrote:

> Brock,
>
> which version are you using ? v1.10 ? v2.x ? master ?
>
> what is failing ?
> configure ? make ? make install ? make check ?
>
> is this issue specific to xlc/xlf ?
> (e.g. does it work with gcc compilers ?)
>
> Cheers,
>
> Gilles
>
>
> On 4/4/2016 11:28 AM, Brock Palen wrote:
>
> I recently needed to build an OpenMPI build on Power 8 where I had access
> to xlc / xlf
>
> The current release fails (as noted in the readme)
> But a clone as of April 4th from git worked, here is my (simple) configure
> script:
>
> COMPILERS='CC=xlc FC=xlf CXX=xlc++'
> ./configure \
> --prefix=$PREFIX\
> --mandir=$PREFIX/man\
> $COMPILERS
>
> Is there a better way to check other than
> make check
>
> It had not failures.
>
> xlc --version
> IBM XL C/C++ for Linux, V13.1.3 (5725-C73, 5765-J08)
> xlf -qversion
> IBM XL Fortran for Linux, V15.1.3 (5725-C75, 5765-J10)
> Version: 15.01.0003.0001
>
>
> Thanks!
>
> --
>
> Brock Palen
> www.umich.edu/~brockp <http://www.umich.edu/%7Ebrockp>
> Assoc. Director Advanced Research Computing - TS
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
>
>
> ___
> users mailing listus...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/04/28881.php
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/04/28882.php
>



-- 

Brock Palen
www.umich.edu/~brockp
Assoc. Director Advanced Research Computing - TS
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985

[OMPI users] Building against XLC XLF

2016-04-03 Thread Brock Palen

I recently needed to build an OpenMPI build on Power 8 where I had access
to xlc / xlf

The current release fails (as noted in the readme)
But a clone as of April 4th from git worked, here is my (simple) configure
script:

COMPILERS='CC=xlc FC=xlf CXX=xlc++'
./configure \
--prefix=$PREFIX\
--mandir=$PREFIX/man\
$COMPILERS

Is there a better way to check other than
make check

It had not failures.

xlc --version
IBM XL C/C++ for Linux, V13.1.3 (5725-C73, 5765-J08)
xlf -qversion
IBM XL Fortran for Linux, V15.1.3 (5725-C75, 5765-J10)
Version: 15.01.0003.0001


Thanks!

-- 

Brock Palen
www.umich.edu/~brockp
Assoc. Director Advanced Research Computing - TS
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985

Re: [OMPI users] configuring a code with MPI/OPENMPI

2015-02-03 Thread Brock Palen

I'll hit you off list with my Abinit OpenMPI build notes,

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



> On Feb 3, 2015, at 2:26 PM, Nick Papior Andersen <nickpap...@gmail.com> wrote:
> 
> I also concur with Jeff about asking software specific questions at the 
> software-site, abinit already has a pretty active forum: 
> http://forum.abinit.org/
> So any questions can also be directed there.
> 
> 2015-02-03 19:20 GMT+00:00 Nick Papior Andersen <nickpap...@gmail.com>:
> 
> 
> 2015-02-03 19:12 GMT+00:00 Elio Physics <elio-phys...@live.com>:
> Hello,
> 
> thanks for your help. I have tried:
> 
> ./configure  --with-mpi-prefix=/usr FC=ifort CC=icc
> 
> But i still get the same error.  Mind you if I compile it serially, that is 
> ./configure   FC=ifort CC=icc
> 
> It works perfectly fine.
> 
> We do have MPI installed.. I am using Quantum Espresso code with mpirun.
> Sorry I thought you where also compiling your own MPI. 
> 
> I am attaching the config.log file. I appreciate your help.
> I see you are trying to install abinit, I would highly recommend you to 
> utilize their build.ac module method.
> Instead of then passing arguments to the command line you create a build.ac 
> file and configure like this:
> ./configure --with-config-file 
> (I would recommend you to build abinit in a sub-folder)
> 
> However, your problem is that your used MPI version is compiled against gcc 
> (the 4.1) so that will never work, even if you specify FC/CC
> Either:
> A) Use an MPI version installed using the intel compiler (if not provided by 
> your cluster administrator you need to install it)
> B) Get a new gcc compiler
> 
> Regards
> 
> Elio
> 
> 
> From: nickpap...@gmail.com
> Date: Tue, 3 Feb 2015 17:21:51 +
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] configuring a code with MPI/OPENMPI
> 
> 
> First try and correct your compilation by using the intel c-compiler AND the 
> fortran compiler. You should not mix compilers. 
> CC=icc FC=ifort
> Else the config.log is going to be necessary to debug it further.
> 
> PS: You could also try and convince your cluster administrator to provide a 
> more recent compiler
> PPS: Do your cluster not have an MPI installation already present?
> 
> 
> 2015-02-03 17:13 GMT+00:00 Elio Physics <elio-phys...@live.com>:
> Dear all,
> 
> II am trying to configure a code  with mpi (for parallel processing)  to do 
> some calculations so basically I type:
> 
> ./configure 
> 
> and I get:
> 
> configure: error: Fortran compiler does not provide iso_c_binding module. Use 
> a more recent version or a different compiler
> 
> 
> which means that my GCC 4.1 compiler is too old to build the code (something 
> i do not have control over..It is a cluster of the Uni where I work). so I 
> tried another compiler ifort:
> 
> ./configure  --enable-mpi=yes  FC=ifort
>  but now I get another error:
> 
>  
> ==
>  === Multicore architecture support 
> ===
>  
> ==
> 
> checking whether to enable OpenMP support... no
> checking whether to build MPI code... yes
> checking whether the C compiler supports MPI... no
> checking whether the C++ compiler supports MPI... no
> checking whether the Fortran Compiler supports MPI... no
> checking whether MPI is usable... no
> configure: error: MPI support is broken - please fix your config parameters 
> and/or MPI installation
> 
> Agaiin, I tried ti give a path for the mpi compiler:
> 
>  ./configure  --enable-mpi  --with-mpi-prefix=/usr FC=ifort
> 
> WHICH APPARENTLY SOLVED THE PREVIOUS ERROR:
>  
> ==
>  === Multicore architecture startup 
> ===
>  
> ==
> 
> configure: Initializing MPI support
> configure: looking for MPI in /usr
> checking for a MPI C compiler... /usr/bin/mpicc
> checking for a MPI C++ compiler... /usr/bin/mpicxx
> checking for a MPI Fortran compiler... /usr/bin/mpif90
> configure: creating wrapper for MPI Fortran compiler
> configure: GPU support disabled from command-line
>  
> But stranegly enough got me back to the furst error although I am using 
> ifort!!
> 
> checking whether the Fortran compiler provides the iso_c_binding module... 
> configure: error: Fortran compiler does not provide iso_c_

Re: [OMPI users] best function to send data

2014-12-22 Thread Brock Palen

Diego,

That is what you want. 

 This isn't really the list for these sorts of questions, I would recommend 
doing an MPI tutorial or getting one of the many books. 

The self paced class at CI Tutor at NCSA is my favorite. You should learn what 
you need to get started from this tutorial:

http://citutor.org/login.php


Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



> On Dec 19, 2014, at 5:56 PM, Diego Avesani <diego.aves...@gmail.com> wrote:
> 
> dear all users,
> I am new in MPI world.
> I would like to know what is the best choice and meaning between different 
> function.
> 
> In my program I would like that each process send a vector of data to all the 
> other process. What do you suggest?
> Is it correct MPI_Bcast or I am missing something?
> 
> Thanks a lot
> 
> Diego
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/12/26047.php

Re: [hwloc-users] Selecting real cores vs HT cores

2014-12-11 Thread Brock Palen

Ok let me expand then.  I don't have control over the bios.

The testing I am doing resides on a cloud provider and from our testing it 
appears that it has HT enabled.  It is ambiguous though to me what I see vs how 
they allocate on their hypervisor. 

I want to see if this has any effect. given the providers advertised CPU types 
they use vs my bare metal ones of the same types everything feels 'half as 
fast'  Thus the question about HT.

Here is the lstopo from the provider:

lstopo-no-graphics

Machine (7484MB)
  Socket L#0 + L3 L#0 (25MB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
  PU L#0 (P#0)
  PU L#1 (P#2)
L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
  PU L#2 (P#1)
  PU L#3 (P#3)
  HostBridge L#0
PCI 8086:7010
PCI 1013:00b8
PCI 8086:10ed
  Net L#0 "eth0"


Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



> On Dec 11, 2014, at 4:12 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
> wrote:
> 
> I'm not sure you're asking a well-formed question.
> 
> When the BIOS is set to enable hyper threading, then several resources on the 
> core are split when the machine is booted up (e.g., some of the queue depths 
> for various processing units in the core are half the length that they are 
> when hyperthreading is disabled in the BIOS).
> 
> Hence, running a process on a core that only uses a single hyperthread (when 
> HT is enabled) is not quite the same thing as booting up with HT disabled and 
> running that same job on the core.
> 
> Make sense?
> 
> Meaning: if you want to test HT vs. non-HT performance, you really need to 
> change the BIOS settings and reboot, sorry.
> 
> Also, note that if you have HT enabled and you run a single-threaded app 
> bound to a core, it will only use 1 of those HTs -- the other HT will be 
> largely dormant. Meaning: don't expect that running a single-threaded app on 
> a core that has HT enabled will magically take advantage of some performance 
> benefit of aggressive automatic parallelization.  You really need multiple 
> threads in a process to get performance advantages out of HT.
> 
> 
> 
> On Dec 11, 2014, at 12:51 PM, Brock Palen <bro...@umich.edu> wrote:
> 
>> When a system has HT enabled is one core presented the real one and one the 
>> fake partner?  Or is that not the case?
>> 
>> If wanting to test behavior without messing with the bios how do I select 
>> just the 'real cores'  if this is the case?   
>> 
>> I am looking for the equivelent of 
>> 
>> hwloc-bind ALLREALCORES  my.exe
>> 
>> Doing some performance study type things.
>> 
>> Thanks,
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/hwloc-users/2014/12/1126.php
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
> Link to this post: 
> http://www.open-mpi.org/community/lists/hwloc-users/2014/12/1127.php

Re: [OMPI users] How to find MPI ranks located in remote nodes?

2014-11-25 Thread Brock Palen

Are you doing this just for debugging?  Or you really want to do it within the 
MPI program?

orte-ps

Gives you the pid/host for each rank, but I don't think there is any standard 
way to do this via API.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



> On Nov 25, 2014, at 5:38 PM, Ralph Castain <r...@open-mpi.org> wrote:
> 
> Every process has a complete map of where every process in the job is located 
> - not sure if there is an MPI API for accessing it, though.
> 
> 
>> On Nov 25, 2014, at 2:32 PM, Teranishi, Keita <knte...@sandia.gov> wrote:
>> 
>> Hi,
>> 
>> I am trying  to figure out a way for each local MPI rank to identify the 
>> ranks located in physically remote nodes (just different nodes) of cluster 
>> or MPPs such as Cray.  I am using MPI_Get_processor_name to get the node ID, 
>> but it requires some processing to map MPI rank to the node ID.  Is there 
>> any better ways doing this using MPI-2.2 (or earlier) capabilities?   It 
>> will be great if I can easily get a list of MPI ranks in the same physical 
>> node.  
>> 
>> Thanks,
>> -
>> Keita Teranishi
>> Principal Member of Technical Staff
>> Scalable Modeling and Analysis Systems
>> Sandia National Laboratories
>> Livermore, CA 94551
>> +1 (925) 294-3738
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/11/25868.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25869.php

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-08 Thread Brock Palen

Ok I figured, i'm going to have to read some more for my own curiosity. The 
reason I mention the Resource Manager we use, and that the hostnames given but 
PBS/Torque match the 1gig-e interfaces, i'm curious what path it would take to 
get to a peer node when the node list given all match the 1gig interfaces but 
yet data is being sent out the 10gig eoib0/ib0 interfaces.  

I'll go do some measurements and see.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



> On Nov 8, 2014, at 8:30 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
> wrote:
> 
> Ralph is right: OMPI aggressively uses all Ethernet interfaces by default.  
> 
> This short FAQ has links to 2 other FAQs that provide detailed information 
> about reachability:
> 
>http://www.open-mpi.org/faq/?category=tcp#tcp-multi-network
> 
> The usNIC BTL uses UDP for its wire transport and actually does a much more 
> standards-conformant peer reachability determination (i.e., it actually 
> checks routing tables to see if it can reach a given peer which has all kinds 
> of caching benefits, kernel controls if you want them, etc.).  We haven't 
> back-ported this to the TCP BTL because a) most people who use TCP for MPI 
> still use a single L2 address space, and b) no one has asked for it.  :-)
> 
> As for the round robin scheduling, there's no indication from the Linux TCP 
> stack what the bandwidth is on a given IP interface.  So unless you use the 
> btl_tcp_bandwidth_ (e.g., btl_tcp_bandwidth_eth0) MCA 
> params, OMPI will round-robin across them equally.
> 
> If you have multiple IP interfaces sharing a single physical link, there will 
> likely be no benefit from having Open MPI use more than one of them.  You 
> should probably use btl_tcp_if_include / btl_tcp_if_exclude to select just 
> one.
> 
> 
> 
> 
> On Nov 7, 2014, at 2:53 PM, Brock Palen <bro...@umich.edu> wrote:
> 
>> I was doing a test on our IB based cluster, where I was diabling IB
>> 
>> --mca btl ^openib --mca mtl ^mxm
>> 
>> I was sending very large messages >1GB  and I was surppised by the speed.
>> 
>> I noticed then that of all our ethernet interfaces
>> 
>> eth0  (1gig-e)
>> ib0  (ip over ib, for lustre configuration at vendor request)
>> eoib0  (ethernet over IB interface for IB -> Ethernet gateway for some 
>> extrnal storage support at >1Gig speed
>> 
>> I saw all three were getting traffic.
>> 
>> We use torque for our Resource Manager and use TM support, the hostnames 
>> given by torque match the eth0 interfaces.
>> 
>> How does OMPI figure out that it can also talk over the others?  How does it 
>> chose to load balance?
>> 
>> BTW that is fine, but we will use if_exclude on one of the IB ones as ib0 
>> and eoib0  are the same physical device and may screw with load balancing if 
>> anyone ver falls back to TCP.
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/11/25709.php
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25713.php

Re: [OMPI users] OMPI users] How OMPI picks ethernet interfaces

2014-11-08 Thread Brock Palen

Right I understand those are TCP interfaces, I was just showing that I have two 
TCP interfaces over one physical interface, so why I was asking how TCP 
interfaces were selected.  It rarely if ever will mater to us.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



> On Nov 7, 2014, at 8:38 PM, Gilles Gouaillardet 
> <gilles.gouaillar...@gmail.com> wrote:
> 
> Ralph,
> 
> IIRC there is load balancing accros all the btl, for example
> between vader and scif.
> So load balancing between ib0 and eoib0 is just a particular case that might 
> not necessarily be handled by the btl tcp.
> 
> Cheers,
> 
> Gilles
> 
> Ralph Castain <rhc.open...@gmail.com> wrote:
>> OMPI discovers all active interfaces and automatically considers them 
>> available for its use unless instructed otherwise via the params. I’d have 
>> to look at the TCP BTL code to see the loadbalancing algo - I thought we 
>> didn’t have that “on” by default across BTLs, but I don’t know if the TCP 
>> one automatically uses all available Ethernet interfaces by default. Sounds 
>> like it must.
>> 
>> 
>>> On Nov 7, 2014, at 11:53 AM, Brock Palen <bro...@umich.edu> wrote:
>>> 
>>> I was doing a test on our IB based cluster, where I was diabling IB
>>> 
>>> --mca btl ^openib --mca mtl ^mxm
>>> 
>>> I was sending very large messages >1GB  and I was surppised by the speed.
>>> 
>>> I noticed then that of all our ethernet interfaces
>>> 
>>> eth0  (1gig-e)
>>> ib0  (ip over ib, for lustre configuration at vendor request)
>>> eoib0  (ethernet over IB interface for IB -> Ethernet gateway for some 
>>> extrnal storage support at >1Gig speed
>>> 
>>> I saw all three were getting traffic.
>>> 
>>> We use torque for our Resource Manager and use TM support, the hostnames 
>>> given by torque match the eth0 interfaces.
>>> 
>>> How does OMPI figure out that it can also talk over the others?  How does 
>>> it chose to load balance?
>>> 
>>> BTW that is fine, but we will use if_exclude on one of the IB ones as ib0 
>>> and eoib0  are the same physical device and may screw with load balancing 
>>> if anyone ver falls back to TCP.
>>> 
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> CAEN Advanced Computing
>>> XSEDE Campus Champion
>>> bro...@umich.edu
>>> (734)936-1985
>>> 
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/11/25709.php
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/11/25710.php
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25712.php

[OMPI users] How OMPI picks ethernet interfaces

2014-11-07 Thread Brock Palen

I was doing a test on our IB based cluster, where I was diabling IB

--mca btl ^openib --mca mtl ^mxm

I was sending very large messages >1GB  and I was surppised by the speed.

I noticed then that of all our ethernet interfaces

eth0  (1gig-e)
ib0  (ip over ib, for lustre configuration at vendor request)
eoib0  (ethernet over IB interface for IB -> Ethernet gateway for some extrnal 
storage support at >1Gig speed

I saw all three were getting traffic.

We use torque for our Resource Manager and use TM support, the hostnames given 
by torque match the eth0 interfaces.

How does OMPI figure out that it can also talk over the others?  How does it 
chose to load balance?

BTW that is fine, but we will use if_exclude on one of the IB ones as ib0 and 
eoib0  are the same physical device and may screw with load balancing if anyone 
ver falls back to TCP.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985

Re: [OMPI users] orte-ps and orte-top behavior

2014-10-31 Thread Brock Palen

Thanks!

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



> On Oct 31, 2014, at 2:22 PM, Ralph Castain <rhc.open...@gmail.com> wrote:
> 
> 
>> On Oct 30, 2014, at 3:15 PM, Brock Palen <bro...@umich.edu> wrote:
>> 
>> If i'm on the node hosting mpirun for a job, and run:
>> 
>> orte-ps
>> 
>> It finds the job and shows the pids and info for all ranks.
>> 
>> If I use orte-top though it does no such default, I have to find the mpirun 
>> pid and then use it.
>> 
>> Why do the two have different behavior?  The show data from the same source 
>> don't they?
> 
> Yeah, well….no good reason, really. Just historical. I can make them 
> consistent :-)
> 
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/10/25648.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25657.php

[OMPI users] IB Retry Limit Errors when fabric changes

2014-10-31 Thread Brock Palen

Does anyone have issues with jobs dying with errors:

> The InfiniBand retry count between two MPI processes has been
> exceeded.  "Retry count" is defined in the InfiniBand spec 1.2
> (section 12.7.38):

We started seeing this about a year ago.  If we have changes to the IB fabric, 
this can happen.  Multiple times now when just plugging in line cards to 
switches on a live system causes large swaths of jobs to die with this error.

Does anyone else have this problem?  We are a Mellonox based fabric.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985

[OMPI users] orte-ps and orte-top behavior

2014-10-30 Thread Brock Palen

If i'm on the node hosting mpirun for a job, and run:

orte-ps

It finds the job and shows the pids and info for all ranks.

If I use orte-top though it does no such default, I have to find the mpirun pid 
and then use it.

Why do the two have different behavior?  The show data from the same source 
don't they?

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985

Re: [OMPI users] HAMSTER MPI+Yarn

2014-10-27 Thread Brock Palen

Thanks this is good feedback.

I was worried with the dynamic nature of Yarn containers that it would be hard 
to coordinate wire up, and you have confirmed that.

Thanks

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



> On Oct 27, 2014, at 11:25 AM, Ralph Castain <r...@open-mpi.org> wrote:
> 
> 
>> On Oct 26, 2014, at 9:56 PM, Brock Palen <bro...@umich.edu> wrote:
>> 
>> We are starting to look at supporting MPI on our Hadoop/Spark YARN based 
>> cluster.
> 
> You poor soul…
> 
>>  I found a bunch of referneces to Hamster, but what I don't find is if it 
>> was ever merged into regular OpenMPI, and if so is it just another RM 
>> integration?  Or does it need more setup?
> 
> When I left Pivotal, it was based on a copy of the OMPI trunk that sat 
> somewhere in the 1.7 series, I believe. Last contact I had indicated they 
> were trying to update, but I’m not sure they were successful.
> 
>> 
>> I found this:
>> http://pivotalhd.docs.pivotal.io/doc/2100/Hamster.html
> 
> Didn’t know they had actually (finally) released it, so good to know. Just so 
> you are aware, there are major problems running MPI under Yarn as it just 
> isn’t designed for MPI support. What we did back then was add a JNI layer so 
> that ORTE could run underneath it, and then added a PMI-like service to 
> provide the wireup support (since Yarn couldn’t be used to exchange the info 
> itself). You also have the issue that Yarn doesn’t understand the need for 
> all the procs to be launched together, and so you have to modify Yarn so it 
> will ensure that the MPI procs are all running or else you’ll hang in 
> MPI_Init.
> 
>> 
>> Which appears to imply extra setup required.  Is this documented anywhere 
>> for OpenMPI?
> 
> I’m afraid you’ll just have to stick with the Pivotal-provided version as the 
> integration is rather complicated. Don’t expect much in the way of 
> performance! This was purely intended as a way for “casual” MPI users to make 
> use of “free” time on their Hadoop cluster, not for any serious technical 
> programming.
> 
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/10/25593.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25613.php

[OMPI users] Java FAQ Page out of date

2014-10-27 Thread Brock Palen

I think a lot of the information on this page:

http://www.open-mpi.org/faq/?category=java

Is out of date with the 1.8 release. 

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985

[OMPI users] HAMSTER MPI+Yarn

2014-10-27 Thread Brock Palen

We are starting to look at supporting MPI on our Hadoop/Spark YARN based 
cluster.  I found a bunch of referneces to Hamster, but what I don't find is if 
it was ever merged into regular OpenMPI, and if so is it just another RM 
integration?  Or does it need more setup?

I found this:
http://pivotalhd.docs.pivotal.io/doc/2100/Hamster.html

Which appears to imply extra setup required.  Is this documented anywhere for 
OpenMPI?

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985

Re: [OMPI users] low CPU utilization with OpenMPI

2014-10-21 Thread Brock Palen

Doing special files on NFS can be weird,  try the other /tmp/ locations:

/var/tmp/
/dev/shm  (ram disk careful!)

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



> On Oct 21, 2014, at 10:18 PM, Vinson Leung <lwhvinson1...@gmail.com> wrote:
> 
> Because of permission reason (OpenMPI can not write temporary file to the 
> default /tmp directory), I change the TMPDIR to my local directory (export 
> TMPDIR=/home/user/tmp ) and then the MPI program can run. But the CPU 
> utilization is very low under 20% (8 MPI rank running in Intel Xeon 8-core 
> CPU). 
> 
> And I also got some message when I run with OpenMPI:
> [cn3:28072] 9 more processes have sent help message help-opal-shmem-mmap.txt 
> / mmap on nfs
> [cn3:28072] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help 
> / error messages
> 
> Any idea?
> Thanks
> 
> VIncent
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25548.php

Re: [OMPI users] Strange affinity messages with 1.8 and torque 5

2014-09-24 Thread Brock Palen

So very hetero, I did some testing and I couldn't make it happen below 32 
cores.  Not sure if this the real issue or if it requires a specific layout:

[brockp@nyx5512 ~]$ cat $PBS_NODEFILE | sort | uniq -c
  1 nyx5512
  1 nyx5515
  1 nyx5518
  1 nyx5523
  1 nyx5527
  2 nyx5537
  1 nyx5542
  1 nyx5560
  2 nyx5561
  2 nyx5562
  3 nyx5589
  1 nyx5591
  1 nyx5593
  1 nyx5617
  2 nyx5620
  1 nyx5622
  5 nyx5629
  1 nyx5630
  1 nyx5770
  1 nyx5771
  2 nyx5772
  1 nyx5780
  3 nyx5784
  2 nyx5820
 10 nyx5844
  2 nyx5847
  1 nyx5849
  1 nyx5852
  2 nyx5856
  1 nyx5870
  8 nyx5872
  1 nyx5894

This sort of layout gives me that warning, if I leave -np 64 
A request was made to bind to that would result in binding more
processes than cpus on a resource:

   Bind to: CORE
   Node:nyx5589
   #processes:  2
   #cpus:   1

If I omit the -np ## it works and nyx5589 does get 3 processes started.

If I look at the binding of the three ranks on nyx5589 that it complains about 
they appear correct:
[root@nyx5589 ~]# hwloc-bind --get --pid 24826
0x0080  ->  7
[root@nyx5589 ~]# hwloc-bind --get --pid 24827
0x0400 -> 10
[root@nyx5589 ~]# hwloc-bind --get --pid 24828
0x1000 -> 12

I think I found the problem though, and its on torque side,  while the CPU set 
sets up 7, 10, and 12
PBS server thinks it gave out 6,7, and 10.  Thus where the only 2 processes 
come from.

I checked some of the other jobs and the cpusets and the pbs server cpu list 
are the same.

More investigation required.  Still strange why would it give that message at 
all?  Why would OpenMPI care, and why only when -np ## is given.


Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Sep 23, 2014, at 3:27 PM, Maxime Boissonneault 
<maxime.boissonnea...@calculquebec.ca> wrote:

> Do you know the topology of the cores allocated by Torque (i.e. were they all 
> on the same nodes, or 8 per node, or a heterogenous distribution for example 
> ?)
> 
> 
> Le 2014-09-23 15:05, Brock Palen a écrit :
>> Yes the request to torque was procs=64,
>> 
>> We are using cpusets.
>> 
>> the mpirun without -np 64  creates 64 spawned hostnames.
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On Sep 23, 2014, at 3:02 PM, Ralph Castain <r...@open-mpi.org> wrote:
>> 
>>> FWIW: that warning has been removed from the upcoming 1.8.3 release
>>> 
>>> 
>>> On Sep 23, 2014, at 11:45 AM, Reuti <re...@staff.uni-marburg.de> wrote:
>>> 
>>>> -BEGIN PGP SIGNED MESSAGE-
>>>> Hash: SHA1
>>>> 
>>>> Am 23.09.2014 um 19:53 schrieb Brock Palen:
>>>> 
>>>>> I found a fun head scratcher, with openmpi 1.8.2  with torque 5 built 
>>>>> with TM support, on hereto core layouts  I get the fun thing:
>>>>> mpirun -report-bindings hostname< Works
>>>> And you get 64 lines of output?
>>>> 
>>>> 
>>>>> mpirun -report-bindings -np 64 hostname   <- Wat?
>>>>> --
>>>>> A request was made to bind to that would result in binding more
>>>>> processes than cpus on a resource:
>>>>> 
>>>>> Bind to: CORE
>>>>> Node:nyx5518
>>>>> #processes:  2
>>>>> #cpus:   1
>>>>> 
>>>>> You can override this protection by adding the "overload-allowed"
>>>>> option to your binding directive.
>>>>> ------
>>>> How many cores are physically installed on this machine - two as mentioned 
>>>> above?
>>>> 
>>>> - -- Reuti
>>>> 
>>>> 
>>>>> I ran with --oversubscribed and got the expected host list, which matched 
>>>>> $PBS_NODEFILE and was 64 entires long:
>>>>> 
>>>>> mpirun -overload-allowed -report-bindings -np 64 --oversubscribe hostname
>>>>> 
>>>>> What did I do wrong?  I'm stumped why one works one doesn't but the one 
>>>>> that doesn't if your force it appears correct.
>>>>> 
>>>>> 
>>>>> Brock Palen
>>>>> www.umich.edu/~brockp
>>>>> CAEN Adva

Re: [OMPI users] Strange affinity messages with 1.8 and torque 5

2014-09-23 Thread Brock Palen

Yes the request to torque was procs=64,

We are using cpusets.

the mpirun without -np 64  creates 64 spawned hostnames. 

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Sep 23, 2014, at 3:02 PM, Ralph Castain <r...@open-mpi.org> wrote:

> FWIW: that warning has been removed from the upcoming 1.8.3 release
> 
> 
> On Sep 23, 2014, at 11:45 AM, Reuti <re...@staff.uni-marburg.de> wrote:
> 
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>> 
>> Am 23.09.2014 um 19:53 schrieb Brock Palen:
>> 
>>> I found a fun head scratcher, with openmpi 1.8.2  with torque 5 built with 
>>> TM support, on hereto core layouts  I get the fun thing:
>>> mpirun -report-bindings hostname< Works
>> 
>> And you get 64 lines of output?
>> 
>> 
>>> mpirun -report-bindings -np 64 hostname   <- Wat?
>>> --
>>> A request was made to bind to that would result in binding more
>>> processes than cpus on a resource:
>>> 
>>> Bind to: CORE
>>> Node:nyx5518
>>> #processes:  2
>>> #cpus:   1
>>> 
>>> You can override this protection by adding the "overload-allowed"
>>> option to your binding directive.
>>> --
>> 
>> How many cores are physically installed on this machine - two as mentioned 
>> above?
>> 
>> - -- Reuti
>> 
>> 
>>> I ran with --oversubscribed and got the expected host list, which matched 
>>> $PBS_NODEFILE and was 64 entires long:
>>> 
>>> mpirun -overload-allowed -report-bindings -np 64 --oversubscribe hostname
>>> 
>>> What did I do wrong?  I'm stumped why one works one doesn't but the one 
>>> that doesn't if your force it appears correct.
>>> 
>>> 
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> CAEN Advanced Computing
>>> XSEDE Campus Champion
>>> bro...@umich.edu
>>> (734)936-1985
>>> 
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/09/25375.php
>> 
>> -BEGIN PGP SIGNATURE-
>> Version: GnuPG/MacGPG2 v2.0.20 (Darwin)
>> Comment: GPGTools - http://gpgtools.org
>> 
>> iEYEARECAAYFAlQhv7IACgkQo/GbGkBRnRr3HgCgjZoD9l9a+WThl5CDaGF1jawx
>> PWIAmwWnZwQdytNgAJgbir6V7yCyBt5D
>> =NG0H
>> -END PGP SIGNATURE-
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/09/25376.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/09/25378.php



signature.asc
Description: Message signed with OpenPGP using GPGMail

[OMPI users] Strange affinity messages with 1.8 and torque 5

2014-09-23 Thread Brock Palen

I found a fun head scratcher, with openmpi 1.8.2  with torque 5 built with TM 
support, on hereto core layouts  I get the fun thing:
mpirun -report-bindings hostname< Works
mpirun -report-bindings -np 64 hostname   <- Wat?
--
A request was made to bind to that would result in binding more
processes than cpus on a resource:

   Bind to: CORE
   Node:nyx5518
   #processes:  2
   #cpus:   1

You can override this protection by adding the "overload-allowed"
option to your binding directive.
--


I ran with --oversubscribed and got the expected host list, which matched 
$PBS_NODEFILE and was 64 entires long:

mpirun -overload-allowed -report-bindings -np 64 --oversubscribe hostname

What did I do wrong?  I'm stumped why one works one doesn't but the one that 
doesn't if your force it appears correct.


Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985





signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [OMPI users] enable-cuda with disable-dlopen

2014-09-05 Thread Brock Palen

Thanks Rolf. 

Sent from my iPhone

> On Sep 5, 2014, at 6:28 PM, Rolf vandeVaart <rvandeva...@nvidia.com> wrote:
> 
> Yes, I have reproduced.  And I agree with your thoughts on configuring vs 
> runtime error.  I will look into this.
> Thanks,
> Rolf
> 
>> -Original Message-
>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Brock Palen
>> Sent: Friday, September 05, 2014 5:22 PM
>> To: Open MPI Users
>> Subject: [OMPI users] enable-cuda with disable-dlopen
>> 
>> * PGP Signed by an unknown key
>> 
>> We found with 1.8.[1,2]  that is you compile with
>> 
>> --with-mxm
>> --with-cuda
>> --disable-dlopen
>> 
>> OMPI will compile install and run, but if you run disabling mxm (say to debug
>> something)
>> 
>> mpirun --mca mtl ^mxm ./a.out
>> 
>> You will get a notice saying that you cannot have enable cuda with disable
>> dlopen, and the job dies.
>> 
>> Is there any reason this shouldn't be a configure time error?  Rather than a
>> runtime error?
>> 
>> Note we have moved to --disable-mca-dso to get the result we want (less
>> iops to the server hosting the install)
>> 
>> Our full configure was:
>> 
>> PREFIX=/home/software/rhel6/openmpi-1.8.2/gcc-4.8.0
>> MXM=/opt/mellanox/mxm
>> FCA=/opt/mellanox/fca
>> KNEM=/opt/knem-1.1.1.90mlnx
>> CUDA=/home/software/rhel6/cuda/5.5/cuda
>> COMPILERS='CC=gcc CXX=g++ FC=gfortran'
>> ./configure \
>> --prefix=$PREFIX \
>> --mandir=${PREFIX}/man \
>> --disable-dlopen \
>> --enable-shared \
>> --enable-java   \
>> --enable-mpi-java \
>> --with-cuda=$CUDA \
>> --with-hwloc=internal \
>> --with-verbs \
>> --with-psm   \
>> --with-tm=/usr/local/torque \
>> --with-fca=$FCA \
>> --with-mxm=$MXM \
>> --with-knem=$KNEM \
>> --with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre' \
>>  $COMPILERS
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> 
>> * Unknown Key
>> * 0x806FCF94
> ---
> This email message is for the sole use of the intended recipient(s) and may 
> contain
> confidential information.  Any unauthorized review, use, disclosure or 
> distribution
> is prohibited.  If you are not the intended recipient, please contact the 
> sender by
> reply email and destroy all copies of the original message.
> ---
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/09/25283.php

[OMPI users] enable-cuda with disable-dlopen

2014-09-05 Thread Brock Palen

We found with 1.8.[1,2]  that is you compile with 

--with-mxm
--with-cuda
--disable-dlopen

OMPI will compile install and run, but if you run disabling mxm (say to debug 
something)

mpirun --mca mtl ^mxm ./a.out

You will get a notice saying that you cannot have enable cuda with disable 
dlopen, and the job dies.

Is there any reason this shouldn't be a configure time error?  Rather than a 
runtime error?

Note we have moved to --disable-mca-dso to get the result we want (less iops to 
the server hosting the install)

Our full configure was:

PREFIX=/home/software/rhel6/openmpi-1.8.2/gcc-4.8.0
MXM=/opt/mellanox/mxm
FCA=/opt/mellanox/fca
KNEM=/opt/knem-1.1.1.90mlnx
CUDA=/home/software/rhel6/cuda/5.5/cuda
COMPILERS='CC=gcc CXX=g++ FC=gfortran'
./configure \
--prefix=$PREFIX \
--mandir=${PREFIX}/man \
--disable-dlopen \
--enable-shared \
--enable-java   \
--enable-mpi-java \
--with-cuda=$CUDA \
--with-hwloc=internal \
--with-verbs \
--with-psm   \
--with-tm=/usr/local/torque \
--with-fca=$FCA \
--with-mxm=$MXM \
--with-knem=$KNEM \
--with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre' \
   $COMPILERS

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985





signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [OMPI users] mxm 3.0 and knem warnings

2014-08-28 Thread Brock Palen

Interesting, we are using 3.0 that is in MOFED, and that is also what is on the 
MXM download site.  Kinda confusing.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Aug 28, 2014, at 2:12 AM, Mike Dubman <mi...@dev.mellanox.co.il> wrote:

> btw, you may want to use latest mxm v3.1 which is part of hpcx package 
> http://www.mellanox.com/products/hpcx
> 
> 
> 
> On Thu, Aug 28, 2014 at 4:10 AM, Brock Palen <bro...@umich.edu> wrote:
> Brice, et al.
> 
> Thanks a lot for this info. We are setting up new builds of OMPI 1.8.2 with 
> knem and mxm 3.0,
> 
> If we have questions we will let you know.
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> On Aug 27, 2014, at 12:44 PM, Brice Goglin <brice.gog...@inria.fr> wrote:
> 
> > Hello Brock,
> >
> > Some people complained that giving world-wide access to a device file by 
> > default might be bad if we ever find a security leak in the kernel module. 
> > So I needed a better default. The rdma group is often used for OFED 
> > devices, and OFED and KNEM users are often the same, so it was a good 
> > compromise.
> >
> > There's no major issue with opening /dev/knem to everybody. A remote 
> > process memory is only accessible if an attacker finds the corresponding 
> > 64bit cookie. Only the memory buffer that was explicitly made readable 
> > and/or writable can be accessed read and/or write through this cookie. And 
> > recent KNEM releases also enforce by default that the attacker has the same 
> > uid as the target process.
> >
> > Brice
> >
> >
> >
> >
> > Le 27/08/2014 16:25, Brock Palen a écrit :
> >> Is there any major issues letting all users use it by setting /dev/knem to 
> >> 666 ?  It appears knem by default wants to only allow users of the rdma 
> >> group (if defined) to access knem.
> >>
> >> We are a generic provider and want everyone to be able to use it, just 
> >> feels strange to restrict it, so I am trying to understand why that is the 
> >> default.
> >>
> >> Brock Palen
> >>
> >> www.umich.edu/~brockp
> >>
> >> CAEN Advanced Computing
> >> XSEDE Campus Champion
> >>
> >> bro...@umich.edu
> >>
> >> (734)936-1985
> >>
> >>
> >>
> >> On Aug 27, 2014, at 10:15 AM, Alina Sklarevich
> >> <ali...@dev.mellanox.co.il>
> >>  wrote:
> >>
> >>
> >>> Hi,
> >>>
> >>> KNEM can improve the performance significantly for intra-node 
> >>> communication and that's why MXM is using it.
> >>> If you don't want to use it, you can suppress this warning by adding the 
> >>> following to your command line after mpirun:
> >>> -x MXM_LOG_LEVEL=error
> >>>
> >>> Alina.
> >>>
> >>>
> >>> On Wed, Aug 27, 2014 at 4:28 PM, Brock Palen
> >>> <bro...@umich.edu>
> >>>  wrote:
> >>> We updated our ofed and started to rebuild our MPI builds with mxm 3.0  .
> >>>
> >>> Now we get warnings bout knem
> >>>
> >>> [1409145437.578861] [flux-login1:31719:0] shm.c:65   MXM  WARN  
> >>> Could not open the KNEM device file at /dev/knem : No such file or 
> >>> directory. Won't use knem.
> >>>
> >>> I have heard about it a little.  Should we investigate adding it to our 
> >>> systems?
> >>> Is there a way to suppress this warning?
> >>>
> >>>
> >>>
> >>> Brock Palen
> >>>
> >>> www.umich.edu/~brockp
> >>>
> >>> CAEN Advanced Computing
> >>> XSEDE Campus Champion
> >>>
> >>> bro...@umich.edu
> >>>
> >>> (734)936-1985
> >>>
> >>>
> >>>
> >>>
> >>> ___
> >>> users mailing list
> >>>
> >>> us...@open-mpi.org
> >>>
> >>> Subscription:
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>> Link to this post:
> >>> http://www.open-mpi.org/community/lists/users/2014/08/25166.php
> >>>
> >>>
> >>> ___
> >>> users mai

Re: [OMPI users] mxm 3.0 and knem warnings

2014-08-27 Thread Brock Palen

Brice, et al.

Thanks a lot for this info. We are setting up new builds of OMPI 1.8.2 with 
knem and mxm 3.0, 

If we have questions we will let you know.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Aug 27, 2014, at 12:44 PM, Brice Goglin <brice.gog...@inria.fr> wrote:

> Hello Brock,
> 
> Some people complained that giving world-wide access to a device file by 
> default might be bad if we ever find a security leak in the kernel module. So 
> I needed a better default. The rdma group is often used for OFED devices, and 
> OFED and KNEM users are often the same, so it was a good compromise.
> 
> There's no major issue with opening /dev/knem to everybody. A remote process 
> memory is only accessible if an attacker finds the corresponding 64bit 
> cookie. Only the memory buffer that was explicitly made readable and/or 
> writable can be accessed read and/or write through this cookie. And recent 
> KNEM releases also enforce by default that the attacker has the same uid as 
> the target process.
> 
> Brice
> 
> 
> 
> 
> Le 27/08/2014 16:25, Brock Palen a écrit :
>> Is there any major issues letting all users use it by setting /dev/knem to 
>> 666 ?  It appears knem by default wants to only allow users of the rdma 
>> group (if defined) to access knem.  
>> 
>> We are a generic provider and want everyone to be able to use it, just feels 
>> strange to restrict it, so I am trying to understand why that is the default.
>> 
>> Brock Palen
>> 
>> www.umich.edu/~brockp
>> 
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> 
>> bro...@umich.edu
>> 
>> (734)936-1985
>> 
>> 
>> 
>> On Aug 27, 2014, at 10:15 AM, Alina Sklarevich 
>> <ali...@dev.mellanox.co.il>
>>  wrote:
>> 
>> 
>>> Hi,
>>> 
>>> KNEM can improve the performance significantly for intra-node communication 
>>> and that's why MXM is using it.
>>> If you don't want to use it, you can suppress this warning by adding the 
>>> following to your command line after mpirun:
>>> -x MXM_LOG_LEVEL=error
>>> 
>>> Alina. 
>>> 
>>> 
>>> On Wed, Aug 27, 2014 at 4:28 PM, Brock Palen 
>>> <bro...@umich.edu>
>>>  wrote:
>>> We updated our ofed and started to rebuild our MPI builds with mxm 3.0  .
>>> 
>>> Now we get warnings bout knem
>>> 
>>> [1409145437.578861] [flux-login1:31719:0] shm.c:65   MXM  WARN  
>>> Could not open the KNEM device file at /dev/knem : No such file or 
>>> directory. Won't use knem.
>>> 
>>> I have heard about it a little.  Should we investigate adding it to our 
>>> systems?
>>> Is there a way to suppress this warning?
>>> 
>>> 
>>> 
>>> Brock Palen
>>> 
>>> www.umich.edu/~brockp
>>> 
>>> CAEN Advanced Computing
>>> XSEDE Campus Champion
>>> 
>>> bro...@umich.edu
>>> 
>>> (734)936-1985
>>> 
>>> 
>>> 
>>> 
>>> ___
>>> users mailing list
>>> 
>>> us...@open-mpi.org
>>> 
>>> Subscription: 
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/08/25166.php
>>> 
>>> 
>>> ___
>>> users mailing list
>>> 
>>> us...@open-mpi.org
>>> 
>>> Subscription: 
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/08/25169.php
>> 
>> 
>> ___
>> users mailing list
>> 
>> us...@open-mpi.org
>> 
>> Subscription: 
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/08/25170.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/25172.php



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [OMPI users] mxm 3.0 and knem warnings

2014-08-27 Thread Brock Palen

Is there any major issues letting all users use it by setting /dev/knem to 666 
?  It appears knem by default wants to only allow users of the rdma group (if 
defined) to access knem.  

We are a generic provider and want everyone to be able to use it, just feels 
strange to restrict it, so I am trying to understand why that is the default.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Aug 27, 2014, at 10:15 AM, Alina Sklarevich <ali...@dev.mellanox.co.il> 
wrote:

> Hi,
> 
> KNEM can improve the performance significantly for intra-node communication 
> and that's why MXM is using it.
> If you don't want to use it, you can suppress this warning by adding the 
> following to your command line after mpirun:
> -x MXM_LOG_LEVEL=error
> 
> Alina. 
> 
> 
> On Wed, Aug 27, 2014 at 4:28 PM, Brock Palen <bro...@umich.edu> wrote:
> We updated our ofed and started to rebuild our MPI builds with mxm 3.0  .
> 
> Now we get warnings bout knem
> 
> [1409145437.578861] [flux-login1:31719:0] shm.c:65   MXM  WARN  Could 
> not open the KNEM device file at /dev/knem : No such file or directory. Won't 
> use knem.
> 
> I have heard about it a little.  Should we investigate adding it to our 
> systems?
> Is there a way to suppress this warning?
> 
> 
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/25166.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/25169.php



signature.asc
Description: Message signed with OpenPGP using GPGMail

[OMPI users] mxm 3.0 and knem warnings

2014-08-27 Thread Brock Palen

We updated our ofed and started to rebuild our MPI builds with mxm 3.0  .

Now we get warnings bout knem

[1409145437.578861] [flux-login1:31719:0] shm.c:65   MXM  WARN  Could 
not open the KNEM device file at /dev/knem : No such file or directory. Won't 
use knem.

I have heard about it a little.  Should we investigate adding it to our systems?
Is there a way to suppress this warning?



Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985





signature.asc
Description: Message signed with OpenPGP using GPGMail

[OMPI users] HugeTLB messages from mpi code

2014-07-01 Thread Brock Palen

We are getting the following on our RHEL6 cluster using openmpi 1.8.1 with meep
http://ab-initio.mit.edu/wiki/index.php/Meep

WARNING: at fs/hugetlbfs/inode.c:940 hugetlb_file_setup+0x227/0x250() (Tainted: 
P   ---   )
Hardware name: C6100   
Using mlock ulimits for SHM_HUGETLB deprecated
Modules linked in: rdma_ucm(U) openafs(P)(U) autofs4 mgc(U) lustre(U) lov(U) 
mdc(U) lquota(U) osc(U) ksocklnd(U) ko2iblnd(U) rdma_cm(U) iw_cm(U) ib_addr(U) 
ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) nfs lockd fscache auth_rpcgss 
nfs_acl sunrpc acpi_cpufreq freq_table mperf ipt_REJECT nf_conntrack_ipv4 
nf_defrag_ipv4 xt_state nf_conntrack xt_multiport iptable_filter ip_tables 
ip6_tables ib_ipoib(U) ib_cm(U) ipv6 ib_uverbs(U) ib_umad(U) iw_nes(U) 
libcrc32c cxgb3 mdio mlx4_vnic(U) mlx4_vnic_helper(U) ib_sa(U) mlx4_ib(U) 
mlx4_en(U) mlx4_core(U) ib_mthca(U) ib_mad(U) ib_core(U) mic(U) vhost_net 
macvtap macvlan tun kvm ipmi_devintf igb ptp pps_core dcdbas microcode i2c_i801 
i2c_core sg iTCO_wdt iTCO_vendor_support ioatdma dca i7core_edac edac_core 
shpchp ext4 jbd2 mbcache sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log 
dm_mod [last unloaded: scsi_wait_scan]
Pid: 14367, comm: meep-mpi Tainted: P   ---
2.6.32-358.23.2.el6.x86_64 #1
Call Trace:
 [] ? warn_slowpath_common+0x87/0xc0
 [] ? warn_slowpath_fmt+0x46/0x50
 [] ? user_shm_lock+0x9c/0xc0
 [] ? hugetlb_file_setup+0x227/0x250
 [] ? sprintf+0x40/0x50
 [] ? newseg+0x152/0x290
 [] ? ipcget+0x61/0x200
 [] ? remove_vma+0x6e/0x90
 [] ? sys_shmget+0x59/0x60
 [] ? newseg+0x0/0x290
 [] ? shm_security+0x0/0x10
 [] ? shm_more_checks+0x0/0x20
 [] ? system_call_fastpath+0x16/0x1b
---[ end trace 375c130ede6f14a0 ]---


Doing some googling looks like this could be hurting our performance, but i'm 
not sure what todo about it?  There is nothing on the list, but there was one 
reference to another MPI library.  Is there any idea what would cause this?


Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985





signature.asc
Description: Message signed with OpenPGP using GPGMail

[OMPI users] importing to MPI data already in memory from another process

2014-06-27 Thread Brock Palen

Is there a way to import/map memory from a process (data acquisition) such that 
an MPI program could 'take' or see that memory?

We have a need to do data acquisition at the rate of .7TB/s and need todo some 
shuffles/computation on these data,  some of the nodes are directly connected 
to the device, and some will do processing. 

Here is the proposed flow:

* Data collector nodes runs process collecting data from device
* Those nodes somehow pass the data to an MPI job running on these nodes and a 
number of other nodes (cpu need for filterting is greater than what the 16 data 
nodes can provide).

One thought is to have the data collector processes be threads inside the MPI 
job running across all nodes, but was curious is there is a way to pass data 
still in memory (to much to hit disk) to the running MPI filter job.

Thanks! 

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985





signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-25 Thread Brock Palen

Yes 

ompi_info --all 

Works,

ompi_info -param all all

[brockp@flux-login1 34241]$ ompi_info --param all all
Error getting SCIF driver version 
 MCA btl: parameter "btl_tcp_if_include" (current value: "",
  data source: default, level: 1 user/basic, type:
  string)
  Comma-delimited list of devices and/or CIDR
  notation of networks to use for MPI communication
  (e.g., "eth0,192.168.0.0/16").  Mutually exclusive
  with btl_tcp_if_exclude.
 MCA btl: parameter "btl_tcp_if_exclude" (current value:
  "127.0.0.1/8,sppp", data source: default, level: 1
  user/basic, type: string)
  Comma-delimited list of devices and/or CIDR
  notation of networks to NOT use for MPI
  communication -- all devices not matching these
  specifications will be used (e.g.,
  "eth0,192.168.0.0/16").  If set to a non-default
  value, it is mutually exclusive with
  btl_tcp_if_include.
[brockp@flux-login1 34241]$ 


ompi_info --param all all --level 9 
(gives me what I expect).

Thanks,

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 24, 2014, at 10:22 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
wrote:

> Brock --
> 
> Can you run with "ompi_info --all"?
> 
> With "--param all all", ompi_info in v1.8.x is defaulting to only showing 
> level 1 MCA params.  It's showing you all possible components and variables, 
> but only level 1.
> 
> Or you could also use "--level 9" to show all 9 levels.  Here's the relevant 
> section from the README:
> 
> -
> The following options may be helpful:
> 
> --all   Show a *lot* of information about your Open MPI
>installation. 
> --parsable  Display all the information in an easily
>grep/cut/awk/sed-able format.
> --param  
>A  of "all" and a  of "all" will
>show all parameters to all components.  Otherwise, the
>parameters of all the components in a specific framework,
>or just the parameters of a specific component can be
>displayed by using an appropriate  and/or
> name.
> --level 
>By default, ompi_info only shows "Level 1" MCA parameters
>-- parameters that can affect whether MPI processes can
>run successfully or not (e.g., determining which network
>interfaces to use).  The --level option will display all
>MCA parameters from level 1 to  (the max 
>value is 9).  Use "ompi_info --param 
> --level 9" to see *all* MCA parameters for a
>given component.  See "The Modular Component Architecture
>(MCA)" section, below, for a fuller explanation.
> 
> 
> 
> 
> 
> On Jun 24, 2014, at 5:19 AM, Ralph Castain <r...@open-mpi.org> wrote:
> 
>> That's odd - it shouldn't truncate the output. I'll take a look later today 
>> - we're all gathered for a developer's conference this week, so I'll be able 
>> to poke at this with Nathan.
>> 
>> 
>> 
>> On Mon, Jun 23, 2014 at 3:15 PM, Brock Palen <bro...@umich.edu> wrote:
>> Perfection, flexible, extensible, so nice.
>> 
>> BTW this doesn't happen older versions,
>> 
>> [brockp@flux-login2 34241]$ ompi_info --param all all
>> Error getting SCIF driver version
>> MCA btl: parameter "btl_tcp_if_include" (current value: "",
>>  data source: default, level: 1 user/basic, type:
>>  string)
>>  Comma-delimited list of devices and/or CIDR
>>  notation of networks to use for MPI communication
>>  (e.g., "eth0,192.168.0.0/16").  Mutually exclusive
>>  with btl_tcp_if_exclude.
>> MCA btl: parameter "btl_tcp_if_exclude" (current value:
>>  "127.0.0.1/8,sppp", data source: default, level: 1
>>  user/basic, type: string)
>>  Comma-delimited list of devices and/or CIDR
>>  notation of networks to NOT use for MPI
>>

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-23 Thread Brock Palen

Perfection, flexible, extensible, so nice.

BTW this doesn't happen older versions,

[brockp@flux-login2 34241]$ ompi_info --param all all
Error getting SCIF driver version 
 MCA btl: parameter "btl_tcp_if_include" (current value: "",
  data source: default, level: 1 user/basic, type:
  string)
  Comma-delimited list of devices and/or CIDR
  notation of networks to use for MPI communication
  (e.g., "eth0,192.168.0.0/16").  Mutually exclusive
  with btl_tcp_if_exclude.
 MCA btl: parameter "btl_tcp_if_exclude" (current value:
  "127.0.0.1/8,sppp", data source: default, level: 1
  user/basic, type: string)
  Comma-delimited list of devices and/or CIDR
  notation of networks to NOT use for MPI
  communication -- all devices not matching these
  specifications will be used (e.g.,
  "eth0,192.168.0.0/16").  If set to a non-default
  value, it is mutually exclusive with
  btl_tcp_if_include.

This is normally much longer.  And yes we don't have the PHI stuff installed on 
all nodes, strange that 'all all' is now very short,  ompi_info -a  still works 
though.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985

On Jun 20, 2014, at 1:48 PM, Ralph Castain <r...@open-mpi.org> wrote:

> Put "orte_hetero_nodes=1" in your default MCA param file - uses can override 
> by setting that param to 0
> 
> 
> On Jun 20, 2014, at 10:30 AM, Brock Palen <bro...@umich.edu> wrote:
> 
>> Perfection!  That appears to do it for our standard case.
>> 
>> Now I know how to set MCA options by env var or config file.  How can I make 
>> this the default, that then a user can override?
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On Jun 20, 2014, at 1:21 PM, Ralph Castain <r...@open-mpi.org> wrote:
>> 
>>> I think I begin to grok at least part of the problem. If you are assigning 
>>> different cpus on each node, then you'll need to tell us that by setting 
>>> --hetero-nodes otherwise we won't have any way to report that back to 
>>> mpirun for its binding calculation.
>>> 
>>> Otherwise, we expect that the cpuset of the first node we launch a daemon 
>>> onto (or where mpirun is executing, if we are only launching local to 
>>> mpirun) accurately represents the cpuset on every node in the allocation.
>>> 
>>> We still might well have a bug in our binding computation - but the above 
>>> will definitely impact what you said the user did.
>>> 
>>> On Jun 20, 2014, at 10:06 AM, Brock Palen <bro...@umich.edu> wrote:
>>> 
>>>> Extra data point if I do:
>>>> 
>>>> [brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core hostname
>>>> --
>>>> A request was made to bind to that would result in binding more
>>>> processes than cpus on a resource:
>>>> 
>>>> Bind to: CORE
>>>> Node:nyx5513
>>>> #processes:  2
>>>> #cpus:  1
>>>> 
>>>> You can override this protection by adding the "overload-allowed"
>>>> option to your binding directive.
>>>> --
>>>> 
>>>> [brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime
>>>> 13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
>>>> 13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
>>>> [brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind --get
>>>> 0x0010
>>>> 0x1000
>>>> [brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513
>>>> nyx5513
>>>> nyx5513
>>>> 
>>>> Interesting, if I force bind to core, MPI barfs saying there is only 1 cpu 
>>>> available, PBS says it gave it two, and if I force (this is all inside an 
>>>> interactive job) just on that node hwloc-bind --get I get what I expect,
>>>> 
>>>> Is there a wa

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen

Perfection!  That appears to do it for our standard case.

Now I know how to set MCA options by env var or config file.  How can I make 
this the default, that then a user can override?

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 20, 2014, at 1:21 PM, Ralph Castain <r...@open-mpi.org> wrote:

> I think I begin to grok at least part of the problem. If you are assigning 
> different cpus on each node, then you'll need to tell us that by setting 
> --hetero-nodes otherwise we won't have any way to report that back to mpirun 
> for its binding calculation.
> 
> Otherwise, we expect that the cpuset of the first node we launch a daemon 
> onto (or where mpirun is executing, if we are only launching local to mpirun) 
> accurately represents the cpuset on every node in the allocation.
> 
> We still might well have a bug in our binding computation - but the above 
> will definitely impact what you said the user did.
> 
> On Jun 20, 2014, at 10:06 AM, Brock Palen <bro...@umich.edu> wrote:
> 
>> Extra data point if I do:
>> 
>> [brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core hostname
>> --
>> A request was made to bind to that would result in binding more
>> processes than cpus on a resource:
>> 
>>   Bind to: CORE
>>   Node:nyx5513
>>   #processes:  2
>>   #cpus:  1
>> 
>> You can override this protection by adding the "overload-allowed"
>> option to your binding directive.
>> --
>> 
>> [brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime
>> 13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
>> 13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
>> [brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind --get
>> 0x0010
>> 0x1000
>> [brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513
>> nyx5513
>> nyx5513
>> 
>> Interesting, if I force bind to core, MPI barfs saying there is only 1 cpu 
>> available, PBS says it gave it two, and if I force (this is all inside an 
>> interactive job) just on that node hwloc-bind --get I get what I expect,
>> 
>> Is there a way to get a map of what MPI thinks it has on each host?
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On Jun 20, 2014, at 12:38 PM, Brock Palen <bro...@umich.edu> wrote:
>> 
>>> I was able to produce it in my test.
>>> 
>>> orted affinity set by cpuset:
>>> [root@nyx5874 ~]# hwloc-bind --get --pid 103645
>>> 0xc002
>>> 
>>> This mask (1, 14,15) which is across sockets, matches the cpu set setup by 
>>> the batch system. 
>>> [root@nyx5874 ~]# cat /dev/cpuset/torque/12719806.nyx.engin.umich.edu/cpus 
>>> 1,14-15
>>> 
>>> The ranks though were then all set to the same core:
>>> 
>>> [root@nyx5874 ~]# hwloc-bind --get --pid 103871
>>> 0x8000
>>> [root@nyx5874 ~]# hwloc-bind --get --pid 103872
>>> 0x8000
>>> [root@nyx5874 ~]# hwloc-bind --get --pid 103873
>>> 0x8000
>>> 
>>> Which is core 15:
>>> 
>>> report-bindings gave me:
>>> You can see how a few nodes were bound to all the same core, the last one 
>>> in each case.  I only gave you the results for the hose nyx5874.
>>> 
>>> [nyx5526.engin.umich.edu:23726] MCW rank 0 is not bound (or bound to all 
>>> available processors)
>>> [nyx5878.engin.umich.edu:103925] MCW rank 8 is not bound (or bound to all 
>>> available processors)
>>> [nyx5533.engin.umich.edu:123988] MCW rank 1 is not bound (or bound to all 
>>> available processors)
>>> [nyx5879.engin.umich.edu:102808] MCW rank 9 is not bound (or bound to all 
>>> available processors)
>>> [nyx5874.engin.umich.edu:103645] MCW rank 41 bound to socket 1[core 15[hwt 
>>> 0]]: [./././././././.][./././././././B]
>>> [nyx5874.engin.umich.edu:103645] MCW rank 42 bound to socket 1[core 15[hwt 
>>> 0]]: [./././././././.][./././././././B]
>>> [nyx5874.engin.umich.edu:103645] MCW rank 43 bound to socket 1[core 15[hwt 
>>> 0]]: [./././././././.][./././././././B]
>>> [nyx5888.engin.umich.edu:117400] MCW rank 11 is not bound (or bound to all 
>>> a

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen

Extra data point if I do:

[brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core hostname
--
A request was made to bind to that would result in binding more
processes than cpus on a resource:

   Bind to: CORE
   Node:nyx5513
   #processes:  2
   #cpus:  1

You can override this protection by adding the "overload-allowed"
option to your binding directive.
--

[brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime
 13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
 13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
[brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind --get
0x0010
0x1000
[brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513
nyx5513
nyx5513

Interesting, if I force bind to core, MPI barfs saying there is only 1 cpu 
available, PBS says it gave it two, and if I force (this is all inside an 
interactive job) just on that node hwloc-bind --get I get what I expect,

Is there a way to get a map of what MPI thinks it has on each host?

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 20, 2014, at 12:38 PM, Brock Palen <bro...@umich.edu> wrote:

> I was able to produce it in my test.
> 
> orted affinity set by cpuset:
> [root@nyx5874 ~]# hwloc-bind --get --pid 103645
> 0xc002
> 
> This mask (1, 14,15) which is across sockets, matches the cpu set setup by 
> the batch system. 
> [root@nyx5874 ~]# cat /dev/cpuset/torque/12719806.nyx.engin.umich.edu/cpus 
> 1,14-15
> 
> The ranks though were then all set to the same core:
> 
> [root@nyx5874 ~]# hwloc-bind --get --pid 103871
> 0x8000
> [root@nyx5874 ~]# hwloc-bind --get --pid 103872
> 0x8000
> [root@nyx5874 ~]# hwloc-bind --get --pid 103873
> 0x8000
> 
> Which is core 15:
> 
> report-bindings gave me:
> You can see how a few nodes were bound to all the same core, the last one in 
> each case.  I only gave you the results for the hose nyx5874.
> 
> [nyx5526.engin.umich.edu:23726] MCW rank 0 is not bound (or bound to all 
> available processors)
> [nyx5878.engin.umich.edu:103925] MCW rank 8 is not bound (or bound to all 
> available processors)
> [nyx5533.engin.umich.edu:123988] MCW rank 1 is not bound (or bound to all 
> available processors)
> [nyx5879.engin.umich.edu:102808] MCW rank 9 is not bound (or bound to all 
> available processors)
> [nyx5874.engin.umich.edu:103645] MCW rank 41 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5874.engin.umich.edu:103645] MCW rank 42 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5874.engin.umich.edu:103645] MCW rank 43 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5888.engin.umich.edu:117400] MCW rank 11 is not bound (or bound to all 
> available processors)
> [nyx5786.engin.umich.edu:30004] MCW rank 19 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5786.engin.umich.edu:30004] MCW rank 18 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5594.engin.umich.edu:33884] MCW rank 24 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5594.engin.umich.edu:33884] MCW rank 25 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5594.engin.umich.edu:33884] MCW rank 26 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5798.engin.umich.edu:53026] MCW rank 59 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5798.engin.umich.edu:53026] MCW rank 60 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5798.engin.umich.edu:53026] MCW rank 56 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5798.engin.umich.edu:53026] MCW rank 57 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5798.engin.umich.edu:53026] MCW rank 58 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5545.engin.umich.edu:88170] MCW rank 2 is not bound (or bound to all 
> available processors)
> [nyx5613.engin.umich.edu:25229] MCW rank 31 is not bound (or bound to all 
> available processors)
> [nyx5880.engin.umich.edu:01406] MCW rank 10 is not bound (or bound to all 
> available processors)
> [nyx5770.engin.umich.edu:86538] MCW rank 6 is not bound (or bound to all 
> available processors)
> [nyx5613.engin.umich.edu:25228] MCW rank 30 is not bound (or bound to all 
> available processors)
> [nyx5577.engin.umich.edu:65949]

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen

 (or bound to all 
available processors)
[nyx5625.engin.umich.edu:95931] MCW rank 53 is not bound (or bound to all 
available processors)
[nyx5625.engin.umich.edu:95930] MCW rank 52 is not bound (or bound to all 
available processors)
[nyx5557.engin.umich.edu:46655] MCW rank 51 is not bound (or bound to all 
available processors)
[nyx5625.engin.umich.edu:95932] MCW rank 54 is not bound (or bound to all 
available processors)
[nyx5625.engin.umich.edu:95933] MCW rank 55 is not bound (or bound to all 
available processors)
[nyx5866.engin.umich.edu:16306] MCW rank 40 is not bound (or bound to all 
available processors)
[nyx5861.engin.umich.edu:22761] MCW rank 61 is not bound (or bound to all 
available processors)
[nyx5861.engin.umich.edu:22762] MCW rank 62 is not bound (or bound to all 
available processors)
[nyx5861.engin.umich.edu:22763] MCW rank 63 is not bound (or bound to all 
available processors)
[nyx5557.engin.umich.edu:46652] MCW rank 48 is not bound (or bound to all 
available processors)
[nyx5557.engin.umich.edu:46653] MCW rank 49 is not bound (or bound to all 
available processors)
[nyx5866.engin.umich.edu:16304] MCW rank 38 is not bound (or bound to all 
available processors)
[nyx5788.engin.umich.edu:02465] MCW rank 20 is not bound (or bound to all 
available processors)
[nyx5597.engin.umich.edu:68071] MCW rank 27 is not bound (or bound to all 
available processors)
[nyx5775.engin.umich.edu:27952] MCW rank 17 is not bound (or bound to all 
available processors)
[nyx5866.engin.umich.edu:16305] MCW rank 39 is not bound (or bound to all 
available processors)
[nyx5788.engin.umich.edu:02466] MCW rank 21 is not bound (or bound to all 
available processors)
[nyx5775.engin.umich.edu:27951] MCW rank 16 is not bound (or bound to all 
available processors)
[nyx5597.engin.umich.edu:68073] MCW rank 29 is not bound (or bound to all 
available processors)
[nyx5597.engin.umich.edu:68072] MCW rank 28 is not bound (or bound to all 
available processors)
[nyx5552.engin.umich.edu:30481] MCW rank 12 is not bound (or bound to all 
available processors)
[nyx5552.engin.umich.edu:30482] MCW rank 13 is not bound (or bound to all 
available processors)


Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 20, 2014, at 12:20 PM, Brock Palen <bro...@umich.edu> wrote:

> Got it,
> 
> I have the input from the user and am testing it out.
> 
> It probably has less todo with torque and more cpuset's, 
> 
> I'm working on producing it myself also.
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> On Jun 20, 2014, at 12:18 PM, Ralph Castain <r...@open-mpi.org> wrote:
> 
>> Thanks - I'm just trying to reproduce one problem case so I can look at it. 
>> Given that I don't have access to a Torque machine, I need to "fake" it.
>> 
>> 
>> On Jun 20, 2014, at 9:15 AM, Brock Palen <bro...@umich.edu> wrote:
>> 
>>> In this case they are a single socket, but as you can see they could be 
>>> ether/or depending on the job.
>>> 
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> CAEN Advanced Computing
>>> XSEDE Campus Champion
>>> bro...@umich.edu
>>> (734)936-1985
>>> 
>>> 
>>> 
>>> On Jun 19, 2014, at 2:44 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>> 
>>>> Sorry, I should have been clearer - I was asking if cores 8-11 are all on 
>>>> one socket, or span multiple sockets
>>>> 
>>>> 
>>>> On Jun 19, 2014, at 11:36 AM, Brock Palen <bro...@umich.edu> wrote:
>>>> 
>>>>> Ralph,
>>>>> 
>>>>> It was a large job spread across.  Our system allows users to ask for 
>>>>> 'procs' which are laid out in any format. 
>>>>> 
>>>>> The list:
>>>>> 
>>>>>> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
>>>>>> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
>>>>>> [nyx5409:11][nyx5411:11][nyx5412:3]
>>>>> 
>>>>> Shows that nyx5406 had 2 cores,  nyx5427 also 2,  nyx5411 had 11.
>>>>> 
>>>>> They could be spread across any number of sockets configuration.  We 
>>>>> start very lax "user requests X procs" and then the user can request more 
>>>>> strict requirements from there.  We support mostly serial users, and 
>>>>> users can colocate on nodes.
>>>>> 
>>>>> That is good to know, I think we would want to turn our default to 'bind 
>>>>> to core' except for our few users who use

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen

Got it,

I have the input from the user and am testing it out.

It probably has less todo with torque and more cpuset's, 

I'm working on producing it myself also.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 20, 2014, at 12:18 PM, Ralph Castain <r...@open-mpi.org> wrote:

> Thanks - I'm just trying to reproduce one problem case so I can look at it. 
> Given that I don't have access to a Torque machine, I need to "fake" it.
> 
> 
> On Jun 20, 2014, at 9:15 AM, Brock Palen <bro...@umich.edu> wrote:
> 
>> In this case they are a single socket, but as you can see they could be 
>> ether/or depending on the job.
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On Jun 19, 2014, at 2:44 PM, Ralph Castain <r...@open-mpi.org> wrote:
>> 
>>> Sorry, I should have been clearer - I was asking if cores 8-11 are all on 
>>> one socket, or span multiple sockets
>>> 
>>> 
>>> On Jun 19, 2014, at 11:36 AM, Brock Palen <bro...@umich.edu> wrote:
>>> 
>>>> Ralph,
>>>> 
>>>> It was a large job spread across.  Our system allows users to ask for 
>>>> 'procs' which are laid out in any format. 
>>>> 
>>>> The list:
>>>> 
>>>>> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
>>>>> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
>>>>> [nyx5409:11][nyx5411:11][nyx5412:3]
>>>> 
>>>> Shows that nyx5406 had 2 cores,  nyx5427 also 2,  nyx5411 had 11.
>>>> 
>>>> They could be spread across any number of sockets configuration.  We start 
>>>> very lax "user requests X procs" and then the user can request more strict 
>>>> requirements from there.  We support mostly serial users, and users can 
>>>> colocate on nodes.
>>>> 
>>>> That is good to know, I think we would want to turn our default to 'bind 
>>>> to core' except for our few users who use hybrid mode.
>>>> 
>>>> Our CPU set tells you what cores the job is assigned.  So in the problem 
>>>> case provided, the cpuset/cgroup shows only cores 8-11 are available to 
>>>> this job on this node.
>>>> 
>>>> Brock Palen
>>>> www.umich.edu/~brockp
>>>> CAEN Advanced Computing
>>>> XSEDE Campus Champion
>>>> bro...@umich.edu
>>>> (734)936-1985
>>>> 
>>>> 
>>>> 
>>>> On Jun 18, 2014, at 11:10 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>> 
>>>>> The default binding option depends on the number of procs - it is bind-to 
>>>>> core for np=2, and bind-to socket for np > 2. You never said, but should 
>>>>> I assume you ran 4 ranks? If so, then we should be trying to bind-to 
>>>>> socket.
>>>>> 
>>>>> I'm not sure what your cpuset is telling us - are you binding us to a 
>>>>> socket? Are some cpus in one socket, and some in another?
>>>>> 
>>>>> It could be that the cpuset + bind-to socket is resulting in some odd 
>>>>> behavior, but I'd need a little more info to narrow it down.
>>>>> 
>>>>> 
>>>>> On Jun 18, 2014, at 7:48 PM, Brock Palen <bro...@umich.edu> wrote:
>>>>> 
>>>>>> I have started using 1.8.1 for some codes (meep in this case) and it 
>>>>>> sometimes works fine, but in a few cases I am seeing ranks being given 
>>>>>> overlapping CPU assignments, not always though.
>>>>>> 
>>>>>> Example job, default binding options (so by-core right?):
>>>>>> 
>>>>>> Assigned nodes, the one in question is nyx5398, we use torque CPU sets, 
>>>>>> and use TM to spawn.
>>>>>> 
>>>>>> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
>>>>>> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
>>>>>> [nyx5409:11][nyx5411:11][nyx5412:3]
>>>>>> 
>>>>>> [root@nyx5398 ~]# hwloc-bind --get --pid 16065
>>>>>> 0x0200
>>>>>> [root@nyx5398 ~]# hwloc-bind --get --pid 16066
>>>>>> 0x0800
>>>>>> [root@nyx5398 ~]# hwloc-bind --get --pid 16067
>>>>>> 0x02

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen

In this case they are a single socket, but as you can see they could be 
ether/or depending on the job.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 19, 2014, at 2:44 PM, Ralph Castain <r...@open-mpi.org> wrote:

> Sorry, I should have been clearer - I was asking if cores 8-11 are all on one 
> socket, or span multiple sockets
> 
> 
> On Jun 19, 2014, at 11:36 AM, Brock Palen <bro...@umich.edu> wrote:
> 
>> Ralph,
>> 
>> It was a large job spread across.  Our system allows users to ask for 
>> 'procs' which are laid out in any format. 
>> 
>> The list:
>> 
>>> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
>>> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
>>> [nyx5409:11][nyx5411:11][nyx5412:3]
>> 
>> Shows that nyx5406 had 2 cores,  nyx5427 also 2,  nyx5411 had 11.
>> 
>> They could be spread across any number of sockets configuration.  We start 
>> very lax "user requests X procs" and then the user can request more strict 
>> requirements from there.  We support mostly serial users, and users can 
>> colocate on nodes.
>> 
>> That is good to know, I think we would want to turn our default to 'bind to 
>> core' except for our few users who use hybrid mode.
>> 
>> Our CPU set tells you what cores the job is assigned.  So in the problem 
>> case provided, the cpuset/cgroup shows only cores 8-11 are available to this 
>> job on this node.
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On Jun 18, 2014, at 11:10 PM, Ralph Castain <r...@open-mpi.org> wrote:
>> 
>>> The default binding option depends on the number of procs - it is bind-to 
>>> core for np=2, and bind-to socket for np > 2. You never said, but should I 
>>> assume you ran 4 ranks? If so, then we should be trying to bind-to socket.
>>> 
>>> I'm not sure what your cpuset is telling us - are you binding us to a 
>>> socket? Are some cpus in one socket, and some in another?
>>> 
>>> It could be that the cpuset + bind-to socket is resulting in some odd 
>>> behavior, but I'd need a little more info to narrow it down.
>>> 
>>> 
>>> On Jun 18, 2014, at 7:48 PM, Brock Palen <bro...@umich.edu> wrote:
>>> 
>>>> I have started using 1.8.1 for some codes (meep in this case) and it 
>>>> sometimes works fine, but in a few cases I am seeing ranks being given 
>>>> overlapping CPU assignments, not always though.
>>>> 
>>>> Example job, default binding options (so by-core right?):
>>>> 
>>>> Assigned nodes, the one in question is nyx5398, we use torque CPU sets, 
>>>> and use TM to spawn.
>>>> 
>>>> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
>>>> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
>>>> [nyx5409:11][nyx5411:11][nyx5412:3]
>>>> 
>>>> [root@nyx5398 ~]# hwloc-bind --get --pid 16065
>>>> 0x0200
>>>> [root@nyx5398 ~]# hwloc-bind --get --pid 16066
>>>> 0x0800
>>>> [root@nyx5398 ~]# hwloc-bind --get --pid 16067
>>>> 0x0200
>>>> [root@nyx5398 ~]# hwloc-bind --get --pid 16068
>>>> 0x0800
>>>> 
>>>> [root@nyx5398 ~]# cat /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus 
>>>> 8-11
>>>> 
>>>> So torque claims the CPU set setup for the job has 4 cores, but as you can 
>>>> see the ranks were giving identical binding. 
>>>> 
>>>> I checked the pids they were part of the correct CPU set, I also checked, 
>>>> orted:
>>>> 
>>>> [root@nyx5398 ~]# hwloc-bind --get --pid 16064
>>>> 0x0f00
>>>> [root@nyx5398 ~]# hwloc-calc --intersect PU 16064
>>>> ignored unrecognized argument 16064
>>>> 
>>>> [root@nyx5398 ~]# hwloc-calc --intersect PU 0x0f00
>>>> 8,9,10,11
>>>> 
>>>> Which is exactly what I would expect.
>>>> 
>>>> So ummm, i'm lost why this might happen?  What else should I check?  Like 
>>>> I said not all jobs show this behavior.
>>>> 
>>>> Brock Palen
>>>> www.umich.edu/~brockp
>>>> CAEN Advanced Computing
>>>> XSEDE Campus Champion
>>>> bro...@umich.edu
>>>> (734)936-1985
>>

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-19 Thread Brock Palen

Ralph,

It was a large job spread across.  Our system allows users to ask for 'procs' 
which are laid out in any format. 

The list:

> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
> [nyx5409:11][nyx5411:11][nyx5412:3]

Shows that nyx5406 had 2 cores,  nyx5427 also 2,  nyx5411 had 11.

They could be spread across any number of sockets configuration.  We start very 
lax "user requests X procs" and then the user can request more strict 
requirements from there.  We support mostly serial users, and users can 
colocate on nodes.

That is good to know, I think we would want to turn our default to 'bind to 
core' except for our few users who use hybrid mode.

Our CPU set tells you what cores the job is assigned.  So in the problem case 
provided, the cpuset/cgroup shows only cores 8-11 are available to this job on 
this node.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 18, 2014, at 11:10 PM, Ralph Castain <r...@open-mpi.org> wrote:

> The default binding option depends on the number of procs - it is bind-to 
> core for np=2, and bind-to socket for np > 2. You never said, but should I 
> assume you ran 4 ranks? If so, then we should be trying to bind-to socket.
> 
> I'm not sure what your cpuset is telling us - are you binding us to a socket? 
> Are some cpus in one socket, and some in another?
> 
> It could be that the cpuset + bind-to socket is resulting in some odd 
> behavior, but I'd need a little more info to narrow it down.
> 
> 
> On Jun 18, 2014, at 7:48 PM, Brock Palen <bro...@umich.edu> wrote:
> 
>> I have started using 1.8.1 for some codes (meep in this case) and it 
>> sometimes works fine, but in a few cases I am seeing ranks being given 
>> overlapping CPU assignments, not always though.
>> 
>> Example job, default binding options (so by-core right?):
>> 
>> Assigned nodes, the one in question is nyx5398, we use torque CPU sets, and 
>> use TM to spawn.
>> 
>> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
>> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
>> [nyx5409:11][nyx5411:11][nyx5412:3]
>> 
>> [root@nyx5398 ~]# hwloc-bind --get --pid 16065
>> 0x0200
>> [root@nyx5398 ~]# hwloc-bind --get --pid 16066
>> 0x0800
>> [root@nyx5398 ~]# hwloc-bind --get --pid 16067
>> 0x0200
>> [root@nyx5398 ~]# hwloc-bind --get --pid 16068
>> 0x0800
>> 
>> [root@nyx5398 ~]# cat /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus 
>> 8-11
>> 
>> So torque claims the CPU set setup for the job has 4 cores, but as you can 
>> see the ranks were giving identical binding. 
>> 
>> I checked the pids they were part of the correct CPU set, I also checked, 
>> orted:
>> 
>> [root@nyx5398 ~]# hwloc-bind --get --pid 16064
>> 0x0f00
>> [root@nyx5398 ~]# hwloc-calc --intersect PU 16064
>> ignored unrecognized argument 16064
>> 
>> [root@nyx5398 ~]# hwloc-calc --intersect PU 0x0f00
>> 8,9,10,11
>> 
>> Which is exactly what I would expect.
>> 
>> So ummm, i'm lost why this might happen?  What else should I check?  Like I 
>> said not all jobs show this behavior.
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/06/24672.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/06/24673.php



signature.asc
Description: Message signed with OpenPGP using GPGMail

[OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-18 Thread Brock Palen

I have started using 1.8.1 for some codes (meep in this case) and it sometimes 
works fine, but in a few cases I am seeing ranks being given overlapping CPU 
assignments, not always though.

Example job, default binding options (so by-core right?):

Assigned nodes, the one in question is nyx5398, we use torque CPU sets, and use 
TM to spawn.

[nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
[nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
[nyx5409:11][nyx5411:11][nyx5412:3]

[root@nyx5398 ~]# hwloc-bind --get --pid 16065
0x0200
[root@nyx5398 ~]# hwloc-bind --get --pid 16066
0x0800
[root@nyx5398 ~]# hwloc-bind --get --pid 16067
0x0200
[root@nyx5398 ~]# hwloc-bind --get --pid 16068
0x0800
  
[root@nyx5398 ~]# cat /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus 
8-11

So torque claims the CPU set setup for the job has 4 cores, but as you can see 
the ranks were giving identical binding. 

I checked the pids they were part of the correct CPU set, I also checked, orted:

[root@nyx5398 ~]# hwloc-bind --get --pid 16064
0x0f00
[root@nyx5398 ~]# hwloc-calc --intersect PU 16064
ignored unrecognized argument 16064

[root@nyx5398 ~]# hwloc-calc --intersect PU 0x0f00
8,9,10,11

Which is exactly what I would expect.

So ummm, i'm lost why this might happen?  What else should I check?  Like I 
said not all jobs show this behavior.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985





signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [OMPI users] Enable PMI build

2014-05-29 Thread Brock Palen

Ok I have dug into this more.  Is this PMI the Slurm process manager?

To use OpenMPI on Phi just build OPenMPI for it?  Does that mean I need to add 
CFLAGS FCFLAGS   -mmic ?

How does one go about doing multi-phi MPI code?

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On May 16, 2014, at 5:40 PM, Hjelm, Nathan T <hje...@lanl.gov> wrote:

> PMI != phi. If you want to build for phi you will have to make two builds. 
> One for the host and one for the phi.
> 
> Take a look in contrib/platform/lanl/darwin to get an idea of how to build 
> for phi. The optimized-mic has most of what is needed to build a Phi version 
> of Open MPI.
> 
> I usually run:
> 
> mkdir build-host ; cd build-host ; ../configure --prefix=path_to_host_build 
> --with-platform=../contrib/platform/lanl/darwin/optimized ; make install
> cd ../
> mkdir build-pbi ; cd build-phi ; ../configure --prefix=path_to_phi_build 
> --with-platform=..//contrib/platform/lanl/darwin/optimized-mic ; make install
> 
> 
> I then modify the share/openmpi/mpicc-wrapper-data.txt to add a section for 
> -mmic and have it point to the phi build. This is a bit complicated but it 
> works well since mpicc -mmic with then use the phi libraries. I can give you 
> a sample modified wrapper if you like.
> 
> -Nathan Hjelm
> HPC-5, LANL
> ________
> From: users [users-boun...@open-mpi.org] on behalf of Brock Palen 
> [bro...@umich.edu]
> Sent: Friday, May 16, 2014 3:31 PM
> To: Open MPI Users
> Subject: [OMPI users] Enable PMI build
> 
> We are looking at enabling the use of OpenMPI on our Xeon Phis,
> 
> One comment, i'm not sure that most users will know that pmi means phi,
>  --with-pmi(=DIR)Build PMI support, optionally adding DIR to the
>  search path (default: no)
> 
> how about:
>  --with-pmi(=DIR)Build PMI support for the Xeon Phi/MIC, optionally 
> adding DIR to the
>  search path (default: no)
> 
> 
> Second, digging in my mpss install I am not finding pmi.h or anything like 
> that that searching the mailing list shows. We recently found that Intel made 
> a lot of changes to the MPSS stack and this Phi stuff is very infantile at 
> the moment so minimal (decent) documentation,  does anyone know what current 
> package provides PMI  for the Xeon Phi?
> 
> Thanks!
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [OMPI users] mpiifort mpiicc not found

2014-05-27 Thread Brock Palen

mpiifort and mpiicc are intel MPI library commands, in openmpi and others the 
analogous would be mpifort and mpicc

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On May 27, 2014, at 2:11 PM, Lorenzo Donà <lorechimic...@hotmail.it> wrote:

> Dear all
> I installed openmpi with intel compiler in this way:
> ./configure FC=ifort CC=icc CXX=icpc F77=ifort 
> --prefix=/Users/lorenzodona/Documents/openmpi-1.8.1/
> but in bin dir i did not find : mpiifort  mpiicc
> please can you help me to install openmpi with intel compiler.??
> Thanks to help me and thanks for your patience with me.
> Dearly lorenzo.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [OMPI users] pinning processes by default

2014-05-23 Thread Brock Palen

Albert,

Actually doing affinity correctly for hybrid got easier in OpenMPI 1.7 and 
newer,  In the past you had to make a lot of assumptions, stride by node etc,

Now you can define a layout:

http://blogs.cisco.com/performance/eurompi13-cisco-slides-open-mpi-process-affinity-user-interface/

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On May 23, 2014, at 9:19 AM, Albert Solernou <albert.soler...@oerc.ox.ac.uk> 
wrote:

> Hi,
> after compiling and installing OpenMPI 1.8.1, I find that OpenMPI is pinning 
> processes onto cores. Although this may be
> desirable on some cases, it is a complete disaster when runnning hybrid 
> OpenMP-MPI applications. Therefore, I want to disable this behaviour, but 
> don't know how.
> 
> I configured OpenMPI with:
> ./configure \
> --prefix=$OPENMPI_HOME \
> --with-psm \
> --with-tm=/system/software/arcus/lib/PBS/11.3.0.121723 \
> --enable-mpirun-prefix-by-default \
> --enable-mpi-thread-multiple
> 
> and:
> ompi_info | grep paffinity
> does not report anything. However,
> mpirun -np 2 --report-bindings ./wrk
> reports bindings:
> [login3:04574] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]: 
> [../BB/../../../../../..][../../../../../../../..]
> [login3:04574] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: 
> [BB/../../../../../../..][../../../../../../../..]
> but they cannot be disabled as:
> mpirun -np 2 --bind-to-none ./wrk
> returns:
> mpirun: Error: unknown option "--bind-to-none"
> 
> Any idea on what went wrong?
> 
> Best,
> Albert
> 
> -- 
> -
>  Dr. Albert Solernou
>  Research Associate
>  Oxford Supercomputing Centre,
>  University of Oxford
>  Tel: +44 (0)1865 610631
> -
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



signature.asc
Description: Message signed with OpenPGP using GPGMail

[OMPI users] Enable PMI build

2014-05-16 Thread Brock Palen

We are looking at enabling the use of OpenMPI on our Xeon Phis, 

One comment, i'm not sure that most users will know that pmi means phi, 
  --with-pmi(=DIR)Build PMI support, optionally adding DIR to the
  search path (default: no)

how about:
  --with-pmi(=DIR)Build PMI support for the Xeon Phi/MIC, optionally 
adding DIR to the
  search path (default: no)


Second, digging in my mpss install I am not finding pmi.h or anything like that 
that searching the mailing list shows. We recently found that Intel made a lot 
of changes to the MPSS stack and this Phi stuff is very infantile at the moment 
so minimal (decent) documentation,  does anyone know what current package 
provides PMI  for the Xeon Phi?

Thanks!

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985





signature.asc
Description: Message signed with OpenPGP using GPGMail

[OMPI users] OpenIB Cannot Allocate Memory error

2014-02-27 Thread Brock Palen

I have some suers that are reporting errors with OpenIB on mellonox gear,  it 
tends to apply to larger jobs (64 - 256 cores)  is not reliable, but happens 
with regularity.  Example error below:

The nodes have 64GB of memory and the IB driver is set with:
options mlx4_core pfctx=0 pfcrx=0 log_num_mtt=24 log_mtts_per_seg=1

Which is I read right should let one register 128GB,

We did make one change, we showed codes having huge performance impacts and 
khugepaged consuming 100% cpu.  We found that we could get expected performance 
if we disabled memory defrag for huge pages, but left transparent huge paged 
enabled:

cat /sys/kernel/mm/redhat_transparent_hugepage/defrag
[never] 

Is this possibly related?  We didn't have reports before then, has anyone seen 
anything similar?


--
The OpenFabrics (openib) BTL failed to register memory in the driver.
Please check /var/log/messages or dmesg for driver specific failure
reason.
The failure occured here:

  Local host:mlx4_0
  Device:openib_reg_mr
  Function:  Cannot allocate memory()
  Errno says:¢
Øy9ÉA?<8a>Ù
<92>òD^C?#eÁþ/þE??^Y·Ý^A?uyºçË<8c>P?<87>í&<8c><99>Ú^E?7<99><8d>   
ÍQ#?´×(<91>°k^[¿^]Ñ78©ãI?Bå^U<9d>íF^A?óü^V<84>í¢D?D9C$te^S?&'B<83>[<92> 
?Aº2^W?*^B?<95>#^]ç|¸^G?rºmHPTñ¾<8a>íÖ^Wì<84>B?Lwçí"þ>5S<99>5<92>û^T?<9b>ë#M^_Üâ¾<9a>w^O@<98>^G-?/÷íôY0^L¿Mm^DÎÂC@?YÞ<83>t@^?^R¿<98>.ê/£^L?^V<83>:{<80>B^M?

You may need to consult with your system administrator to get this
problem fixed.
--
[nyx5641.engin.umich.edu:30080] 99407 more processes have sent help message 
help-mpi-btl-openib.txt / mem-reg-fail
[nyx5641.engin.umich.edu:30080] Set MCA parameter "orte_base_help_aggregate" to 
0 to see all help / error messages
[nyx5641.engin.umich.edu:30080] 54493 more processes have sent help message 
help-mpi-btl-openib.txt / mem-reg-fail
[nyx5641.engin.umich.edu:30080] 1 more process has sent help message 
help-mpi-btl-openib.txt / mem-reg-fail
[nyx5641.engin.umich.edu:30080] 76831 more processes have sent help message 
help-mpi-btl-openib.txt / mem-reg-fail
[nyx5641.engin.umich.edu:30080] 76800 more processes have sent help message 
help-mpi-btl-openib.txt / mem-reg-fail
[nyx5641.engin.umich.edu:30080] 1 more process has sent help message 
help-mpi-btl-openib.txt / mem-reg-fail
[nyx5641.engin.umich.edu:30080] 76834 more processes have sent help message 
help-mpi-btl-openib.txt / mem-reg-fail
[nyx5641.engin.umich.edu:30080] 104597 more processes have sent help message 
help-mpi-btl-openib.txt / mem-reg-fail
[nyx5641.engin.umich.edu:30080] 94309 more processes have sent help message 
help-mpi-btl-openib.txt / mem-reg-fail
[nyx5641.engin.umich.edu:30080] 96283 more processes have sent help message 
help-mpi-btl-openib.txt / mem-reg-fail
[nyx5641.engin.umich.edu:30080] 88849 more processes have sent help message 
help-mpi-btl-openib.txt / mem-reg-fail
[nyx5641.engin.umich.edu:30080] 87245 more processes have sent help message 
help-mpi-btl-openib.txt / mem-reg-fail
[nyx5694.engin.umich.edu][[55235,1],50][../../../../../openmpi-1.6/ompi/mca/btl/openib/btl_openib_component.c:3707:mca_btl_openib_post_srr]
 error posting receive descriptors to shared receive queue 2 (6 from 107)
[nyx5694.engin.umich.edu][[55235,1],50][../../../../../openmpi-1.6/ompi/mca/btl/openib/btl_openib_component.c:3707:mca_btl_openib_post_srr]
 error posting receive descriptors to shared receive queue 2 (0 from 106)
[nyx5694.engin.umich.edu][[55235,1],50][../../../../../openmpi-1.6/ompi/mca/btl/openib/btl_openib_component.c:3707:mca_btl_openib_post_srr]
 error posting receive descriptors to shared receive queue 2 (0 from 105)
[nyx5641.engin.umich.edu:30080] 4868 more processes have sent help message 
help-mpi-btl-openib.txt / mem-reg-fail
[nyx5641.engin.umich.edu:30080] 557 more processes have sent help message 
help-mpi-btl-openib.txt / mem-reg-fail



Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985





signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [hwloc-users] Using hwloc to map GPU layout on system

2014-02-14 Thread Brock Palen


On Feb 7, 2014, at 9:45 AM, Brice Goglin <brice.gog...@inria.fr> wrote:

> Le 06/02/2014 21:31, Brock Palen a écrit :
>> Actually that did turn out to help. The nvml# devices appear to be numbered 
>> in the way that CUDA_VISABLE_DEVICES sees them, while the cuda# devices are 
>> in the order that PBS and nvidia-smi see them.
> 
> By the way, did you have CUDA_VISIBLE_DEVICES set during the lstopo below? 
> Was it set to 2,3,0,1 ? That would explain the reordering.

It was not set, and I have double checked it just now to be sure.

> 
> I am not sure in which order you want to do things in the end. One way that 
> could help is:
> * Get the locality of each GPU by doing CUDA_VISIBLE_DEVICES=x (for x in 
> 0..number of gpus-1). Each iteration gives a single GPU in hwloc, and you can 
> retrieve the corresponding locality from the cuda0 object.
> * Once you know which GPUs you want based on the locality info, take the 
> corresponding #x and put them in CUDA_VISIBLE_DEVICES=x,y before you run your 
> program. hwloc will create cuda0 for x and cuda1 for y.

The cuda ID's match the order if you run nvidia-smi  (which gives you PCI 
addresses)

The nvml id's  match the order in which they start.  That is 
CUDA_VISIBLE_DEVICES=0, cudaSetDevice(0) matches nvml0  which matches id 2 for 
CoProc cuda2 and for nvidia-smi id 2.

This appears to be very consistent between reboots.
te
> 
> If you don't set CUDA_VISIBLE_DEVICES, cuda* objects are basically 
> out-of-order. nvml objects are (a bit less likely) ordered by PCI bus is 
> (lstopo -v would confirm that).

Yes the nvml and what is ordering is by ascending PCI ID,  nvidia-smi shows 
this:

[root@nyx7500 ~]# nvidia-smi | grep Tesla
|   0  Tesla K20Xm Off  | :09:00.0 Off |0 |
|   1  Tesla K20Xm Off  | :0A:00.0 Off |0 |
|   2  Tesla K20Xm Off  | :0D:00.0 Off |0 |
|   3  Tesla K20Xm Off  | :0E:00.0 Off |0 |
|   4  Tesla K20Xm Off  | :28:00.0 Off |0 |
|   5  Tesla K20Xm Off  | :2B:00.0 Off |0 |
|   6  Tesla K20Xm Off  | :30:00.0 Off |0 |
|   7  Tesla K20Xm Off  | :33:00.0 Off |0 |

[root@nyx7500 ~]# lstopo -v
Machine (P#0 total=67073288KB DMIProductName="ProLiant SL270s Gen8   " 
DMIProductVersion= DMIProductSerial="USE3267A92  " 
DMIProductUUID=36353439-3437-5553-4533-323637413932 DMIBoardVendor=HP 
DMIBoardName= DMIBoardVersion= DMIBoardSerial="USE3267A92  " 
DMIBoardAssetTag="" DMIChassisVendor=HP DMIChassisType=25 
DMIChassisVersion= DMIChassisSerial="USE3267A90  " DMIChassisAssetTag=" 
   " DMIBIOSVendor=HP DMIBIOSVersion=P75 DMIBIOSDate=09/18/2013 DMISysVendor=HP 
Backend=Linux LinuxCgroup=/ OSName=Linux OSRelease=2.6.32-358.23.2.el6.x86_64 
OSVersion="#1 SMP Sat Sep 14 05:32:37 EDT 2013" 
HostName=nyx7500.engin.umich.edu Architecture=x86_64)
  NUMANode L#0 (P#0 local=33518860KB total=33518860KB)
Socket L#0 (P#0 CPUModel="Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz" 
CPUVendor=GenuineIntel CPUModelNumber=45 CPUFamilyNumber=6)
  L3Cache L#0 (size=20480KB linesize=64 ways=20)
L2Cache L#0 (size=256KB linesize=64 ways=8)
  L1dCache L#0 (size=32KB linesize=64 ways=8)
L1iCache L#0 (size=32KB linesize=64 ways=8)
  Core L#0 (P#0)
PU L#0 (P#0)
L2Cache L#1 (size=256KB linesize=64 ways=8)
  L1dCache L#1 (size=32KB linesize=64 ways=8)
L1iCache L#1 (size=32KB linesize=64 ways=8)
  Core L#1 (P#1)
PU L#1 (P#1)
L2Cache L#2 (size=256KB linesize=64 ways=8)
  L1dCache L#2 (size=32KB linesize=64 ways=8)
L1iCache L#2 (size=32KB linesize=64 ways=8)
  Core L#2 (P#2)
PU L#2 (P#2)
L2Cache L#3 (size=256KB linesize=64 ways=8)
  L1dCache L#3 (size=32KB linesize=64 ways=8)
L1iCache L#3 (size=32KB linesize=64 ways=8)
  Core L#3 (P#3)
PU L#3 (P#3)
L2Cache L#4 (size=256KB linesize=64 ways=8)
  L1dCache L#4 (size=32KB linesize=64 ways=8)
L1iCache L#4 (size=32KB linesize=64 ways=8)
  Core L#4 (P#4)
PU L#4 (P#4)
L2Cache L#5 (size=256KB linesize=64 ways=8)
  L1dCache L#5 (size=32KB linesize=64 ways=8)
L1iCache L#5 (size=32KB linesize=64 ways=8)
  Core L#5 (P#5)
PU L#5 (P#5)
L2Cache L#6 (size=256KB linesize=64 ways=8)
  L1dCache L#6 (size=32KB linesize=64 ways=8)
L1iCache L#6 (size=32KB linesize=64 ways=8)
  Core L#6 (P#6)
PU L#6 (P#6)

Re: [OMPI users] Can't build openmpi-1.6.5 with latest FCA 2.5 release.

2014-02-14 Thread Brock Palen

Mike,

We checked it over, here is what the guy who knows OFED much better than I do 
sent me:

We are running version 1.3.8.MLNX_20120424-0.1 of libibmad and version 
1.3.7.MLNX_20130110_ff06102-0.1 of libibumad. The 1.5.3-4.0.42 release notes 
indicate that these are the latest versions of the packages: 


http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_Release_Notes_1_5_3-4_0_42.txt

Thanks

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jan 31, 2014, at 9:06 AM, Mike Dubman <mi...@dev.mellanox.co.il> wrote:

> Hi,
> Can it be that libibmad/libibumad installed on your system belongs to 
> previous mofed installation?
> 
> Thanks
> M.
> 
> On Jan 31, 2014 2:02 AM, "Brock Palen" <bro...@umich.edu> wrote:
> I grabbed the latest FCA release from Mellnox's website.  We have been 
> building against FCA 2.5 for a while, but it never worked right.  Today I 
> tried to build against the latest (version number was still 2.5, but I think 
> we have updated our OFED sense the last install).  We are running MOFED 
> 1.5.3-4.0.42
> 
> 1.6.5 configures fine against the old 2.5 fca library I have around (don't 
> recall what OFED it expected)  but the new one, which claims is for RHEL6.4 
> OFED 1.5.3-4.0.42,  fails in configure with:
> 
> /home/software/rhel6/fca/2.5-v2/lib/libfca.so: undefined reference to 
> `smp_mkey_set@IBMAD_1.3'
> 
> libibmad is installed, but the symbol smp_mkey_set  is not defined in it. 
> IBMAD_1.3  is though.
> 
> Any thought what may cause this?  As far as I know our MOFED is from Mellnox 
> and should match up fine to their release of FCA. So this has me scratching 
> my head.
> 
> Thanks
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [hwloc-users] Using hwloc to map GPU layout on system

2014-02-06 Thread Brock Palen

Actually that did turn out to help. The nvml# devices appear to be numbered in 
the way that CUDA_VISABLE_DEVICES sees them, while the cuda# devices are in the 
order that PBS and nvidia-smi see them.

  PCIBridge
PCIBridge
  PCIBridge
PCI 10de:1021
  CoProc L#2 "cuda0"
  GPU L#3 "nvml2"
  PCIBridge
PCI 10de:1021
  CoProc L#4 "cuda1"
  GPU L#5 "nvml3"
  PCIBridge
PCIBridge
  PCIBridge
PCI 10de:1021
  CoProc L#6 "cuda2"
  GPU L#7 "nvml0"
  PCIBridge
PCI 10de:1021
  CoProc L#8 "cuda3"
  GPU L#9 "nvml1"


Right now I am trying to create a python script that will take the XML output 
of lstopo and give me just the cuda and nvml devices in order. 

I dont' know if some value are deterministic though.  Could I  ignore the 
CoProc line and just use the:

  GPU L#3 "nvml2"
  GPU L#5 "nvml3"
  GPU L#7 "nvml0"
  GPU L#9 "nvml1"

Is the L# always going to be in the oder I would expect?  Because then I 
already have my map then. 

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Feb 5, 2014, at 1:19 AM, Brice Goglin <brice.gog...@inria.fr> wrote:

> Hello Brock,
> 
> Some people reported the same issue in the past and that's why we added the 
> "nvml" objects. CUDA reorders devices by "performance". Batch-schedulers are 
> somehow supposed to use "nvml" for managing GPUs without actually using them 
> with CUDA directly. And the "nvml" order is the "normal" order.
> 
> You need "tdk" (https://developer.nvidia.com/tesla-deployment-kit) to get 
> nvml library and development headers installed. Then hwloc can build its 
> "nvml" backend. Once ready, you'll see a hwloc "cudaX" and a hwloc "nvmlY" 
> object in each NVIDIA PCI devices, and you can get their locality as usual.
> 
> Does this help?
> 
> Brice
> 
> 
> 
> Le 05/02/2014 05:25, Brock Palen a écrit :
>> We are trying to build a system to mask users to the GPU's they were 
>> assigned by our batch system (torque).
>> 
>> The batch system sets the GPU's into thread exclusive mode when assigned to 
>> a job, so we want the GPU that the batch system assigns to be the one set in 
>> CUDA_VISIBLE_DEVICES,
>> 
>> Problem is on our nodes what the batch system sees as gpu 0  is not the same 
>> GPU that CUDA_VISIBLE_DEVICES sees as 0.   Actually 0  is 2.
>> 
>> You can see this behavior is you run 
>> 
>> nvidia-smi  and look at the PCI ID's of the devices.  You can then look at 
>> the PCI ID's outputed by deviceQuery from the SDK examples and see they are 
>> in a different order.
>> 
>> The ID's you would set in CUDA_VISIBLE_DEVICES matches the order that 
>> deviceQuery sees, not the order that nvida-smi sees.
>> 
>> Example (All values turned to decimal to match deviceQuery):
>> 
>> nvidia-smi order: 9, 10, 13, 14, 40, 43, 48, 51
>> dviceQuery order: 13, 14, 9, 10, 40, 43, 48, 51
>> 
>> 
>> Can hwloc help me with this?  Right now I am hacking a script based on the 
>> output of the two commands, and making a map, between the two and then set 
>> CUDA_VISIBLE_DEVICES
>> 
>> Any ideas would be great. Later as we currently also use CPU sets, we want 
>> to pass GPU locality information to the scheduler to make decisions to match 
>> GPU-> CPU Socket information, as performance of threads across QPI domains 
>> is very poor. 
>> 
>> Thanks
>> 
>> Machine (64GB)
>>   NUMANode L#0 (P#0 32GB)
>> Socket L#0 + L3 L#0 (20MB)
>>   L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 
>> (P#0)
>>   L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 
>> (P#1)
>>   L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 
>> (P#2)
>>   L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 
>> (P#3)
>>   L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 
>> (P#4)
>>   L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 
>> (P#5)
>>   L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 
>> (P#6)
>>   L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 
>> (P#7)
>> HostBridge L#0
>>   PCI

[OMPI users] Can't build openmpi-1.6.5 with latest FCA 2.5 release.

2014-01-30 Thread Brock Palen

I grabbed the latest FCA release from Mellnox's website.  We have been building 
against FCA 2.5 for a while, but it never worked right.  Today I tried to build 
against the latest (version number was still 2.5, but I think we have updated 
our OFED sense the last install).  We are running MOFED 1.5.3-4.0.42

1.6.5 configures fine against the old 2.5 fca library I have around (don't 
recall what OFED it expected)  but the new one, which claims is for RHEL6.4 
OFED 1.5.3-4.0.42,  fails in configure with:

/home/software/rhel6/fca/2.5-v2/lib/libfca.so: undefined reference to 
`smp_mkey_set@IBMAD_1.3'

libibmad is installed, but the symbol smp_mkey_set  is not defined in it. 
IBMAD_1.3  is though.

Any thought what may cause this?  As far as I know our MOFED is from Mellnox 
and should match up fine to their release of FCA. So this has me scratching my 
head.

Thanks

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985





signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [OMPI users] openmpi-1.6.5 intel 14.0 MPI-IO Errors

2014-01-17 Thread Brock Palen

BAH,

The error persisted when doing the test to /tmp/ (local disk)

I rebuilt the library with the same compiler and all is well now.

Sorry for the false alarm. Thanks for the help and ideas Jeff.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jan 17, 2014, at 4:34 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:

> Brock and I chatted off list.
> 
> I'm unable to replicate the error, but I have icc 14.0.1, not 14.0.  I also 
> don't have Lustre, which is his base case.
> 
> So there's at least 2 variables here that need to be resolved.
> 
> 
> On Jan 9, 2014, at 11:46 AM, Brock Palen <bro...@umich.edu> wrote:
> 
>> Attached you will find a small sample code that demonstrates the problem but 
>> ether MPI_File_seek() or MPI_File_get_position() is screwing up on me.  This 
>> only happens with this version of the intel compiler:
>> 
>> Version 14.0.0.080 Build 20130728
>> 
>> You can compile and run the example with:
>> 
>> mpicc -g -DDEBUG mkrandfile.c -o mkrand
>> mpirun -np 2 mkrand -f data -l 1
>> 
>> 1.6.5  works with gcc 4.7.0, 
>> openmpi/1.6.5/gcc/4.7.0
>>  0: my current offset is 0
>>  1: my current offset is 8388608
>> 
>> openmpi/1.6.5/intel/14.0
>>  0: my current offset is 4294967297
>>  1: my current offset is 4294967297
>> 
>> I passed the code through ddt, and the calculations for the offset for each 
>> rank gets the correct values passed to MPI_File_seek() but what I get back 
>> from MPI_File_get_position() is the above gibberish. 
>> 
>> I also cannot produce the problem with  openmpi/1.6.4/intel/13.0.1  or with 
>> openmpi/1.6.5/pgi/13.5
>> 
>> Our builds all like this:
>> 
>> PREFIX=/home/software/rhel6/openmpi-1.6.5/pgi-13.5
>> MXM=/home/software/rhel6/mxm/2.0
>> FCA=/home/software/rhel6/fca/2.5
>> COMPILERS='CC=pgcc CXX=pgCC FC=pgf90 F77=pgf77'
>> ./configure \
>>   --prefix=${PREFIX} \
>>   --mandir=${PREFIX}/man \
>>   --with-tm=/usr/local/torque \
>>   --with-openib --with-psm \
>>   --with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre' \
>>   --with-mxm=$MXM \
>>   --with-fca=$FCA \
>>   --disable-dlopen --enable-shared \
>>  $COMPILERS
>> 
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> 
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [OMPI users] openmpi-1.6.5 intel 14.0 MPI-IO Errors

2014-01-17 Thread Brock Palen

I never saw any replies on this.  Has anyone else been able to produce this 
sort of error? It is 100% reproducible for me.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jan 9, 2014, at 11:46 AM, Brock Palen <bro...@umich.edu> wrote:

> Attached you will find a small sample code that demonstrates the problem but 
> ether MPI_File_seek() or MPI_File_get_position() is screwing up on me.  This 
> only happens with this version of the intel compiler:
> 
> Version 14.0.0.080 Build 20130728
> 
> You can compile and run the example with:
> 
> mpicc -g -DDEBUG mkrandfile.c -o mkrand
> mpirun -np 2 mkrand -f data -l 1
> 
> 1.6.5  works with gcc 4.7.0, 
> openmpi/1.6.5/gcc/4.7.0
>  0: my current offset is 0
>  1: my current offset is 8388608
> 
> openmpi/1.6.5/intel/14.0
>  0: my current offset is 4294967297
>  1: my current offset is 4294967297
> 
> I passed the code through ddt, and the calculations for the offset for each 
> rank gets the correct values passed to MPI_File_seek() but what I get back 
> from MPI_File_get_position() is the above gibberish. 
> 
> I also cannot produce the problem with  openmpi/1.6.4/intel/13.0.1  or with 
> openmpi/1.6.5/pgi/13.5
> 
> Our builds all like this:
> 
> PREFIX=/home/software/rhel6/openmpi-1.6.5/pgi-13.5
> MXM=/home/software/rhel6/mxm/2.0
> FCA=/home/software/rhel6/fca/2.5
> COMPILERS='CC=pgcc CXX=pgCC FC=pgf90 F77=pgf77'
> ./configure \
>   --prefix=${PREFIX} \
>   --mandir=${PREFIX}/man \
>   --with-tm=/usr/local/torque \
>   --with-openib --with-psm \
>   --with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre' \
>   --with-mxm=$MXM \
>   --with-fca=$FCA \
>   --disable-dlopen --enable-shared \
>  $COMPILERS
> 
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail

[OMPI users] openmpi-1.6.5 intel 14.0 MPI-IO Errors

2014-01-09 Thread Brock Palen

Attached you will find a small sample code that demonstrates the problem but 
ether MPI_File_seek() or MPI_File_get_position() is screwing up on me.  This 
only happens with this version of the intel compiler:

Version 14.0.0.080 Build 20130728

You can compile and run the example with:

mpicc -g -DDEBUG mkrandfile.c -o mkrand
mpirun -np 2 mkrand -f data -l 1

1.6.5  works with gcc 4.7.0, 
openmpi/1.6.5/gcc/4.7.0
  0: my current offset is 0
  1: my current offset is 8388608

openmpi/1.6.5/intel/14.0
  0: my current offset is 4294967297
  1: my current offset is 4294967297

I passed the code through ddt, and the calculations for the offset for each 
rank gets the correct values passed to MPI_File_seek() but what I get back from 
MPI_File_get_position() is the above gibberish. 

I also cannot produce the problem with  openmpi/1.6.4/intel/13.0.1  or with 
openmpi/1.6.5/pgi/13.5

Our builds all like this:

PREFIX=/home/software/rhel6/openmpi-1.6.5/pgi-13.5
MXM=/home/software/rhel6/mxm/2.0
FCA=/home/software/rhel6/fca/2.5
COMPILERS='CC=pgcc CXX=pgCC FC=pgf90 F77=pgf77'
./configure \
   --prefix=${PREFIX} \
   --mandir=${PREFIX}/man \
   --with-tm=/usr/local/torque \
   --with-openib --with-psm \
   --with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre' \
   --with-mxm=$MXM \
   --with-fca=$FCA \
   --disable-dlopen --enable-shared \
  $COMPILERS


Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985




mkrandfiles.c
Description: Binary data



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [OMPI users] FCA collectives disabled by default

2013-04-03 Thread Brock Palen

That would do it. 

Thanks!

Now to make even the normal ones work

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Apr 3, 2013, at 10:31 AM, Ralph Castain <r...@open-mpi.org> wrote:

> Looking at the source code, it is because those other collectives aren't 
> implemented yet :-)
> 
> 
> On Apr 2, 2013, at 12:07 PM, Brock Palen <bro...@umich.edu> wrote:
> 
>> We are starting to play with FCA on our Mellonox based IB fabric.
>> 
>> I noticed from ompi_info that FCA support for a lot of collectives are 
>> disabled by default:
>> 
>> Any idea why only barrier/bcast/reduce  are on by default and all the more 
>> complex values are disabled?
>> 
>>   MCA coll: parameter "coll_fca_enable_barrier" (current value: 
>> <1>, data source: default value)
>>   MCA coll: parameter "coll_fca_enable_bcast" (current value: 
>> <1>, data source: default value)
>>   MCA coll: parameter "coll_fca_enable_reduce" (current value: 
>> <1>, data source: default value)
>>   MCA coll: parameter "coll_fca_enable_reduce_scatter" (current 
>> value: <0>, data source: default value)
>>   MCA coll: parameter "coll_fca_enable_allreduce" (current 
>> value: <1>, data source: default value)
>>   MCA coll: parameter "coll_fca_enable_allgather" (current 
>> value: <1>, data source: default value)
>>   MCA coll: parameter "coll_fca_enable_allgatherv" (current 
>> value: <1>, data source: default value)
>>   MCA coll: parameter "coll_fca_enable_gather" (current value: 
>> <0>, data source: default value)
>>   MCA coll: parameter "coll_fca_enable_gatherv" (current value: 
>> <0>, data source: default value)
>>   MCA coll: parameter "coll_fca_enable_alltoall" (current value: 
>> <0>, data source: default value)
>>   MCA coll: parameter "coll_fca_enable_alltoallv" (current 
>> value: <0>, data source: default value)
>>   MCA coll: parameter "coll_fca_enable_alltoallw" (current 
>> value: <0>, data source: default value)
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] FCA collectives disabled by default

2013-04-02 Thread Brock Palen

We are starting to play with FCA on our Mellonox based IB fabric.

I noticed from ompi_info that FCA support for a lot of collectives are disabled 
by default:

Any idea why only barrier/bcast/reduce  are on by default and all the more 
complex values are disabled?

MCA coll: parameter "coll_fca_enable_barrier" (current value: 
<1>, data source: default value)
MCA coll: parameter "coll_fca_enable_bcast" (current value: 
<1>, data source: default value)
MCA coll: parameter "coll_fca_enable_reduce" (current value: 
<1>, data source: default value)
MCA coll: parameter "coll_fca_enable_reduce_scatter" (current 
value: <0>, data source: default value)
MCA coll: parameter "coll_fca_enable_allreduce" (current value: 
<1>, data source: default value)
MCA coll: parameter "coll_fca_enable_allgather" (current value: 
<1>, data source: default value)
MCA coll: parameter "coll_fca_enable_allgatherv" (current 
value: <1>, data source: default value)
MCA coll: parameter "coll_fca_enable_gather" (current value: 
<0>, data source: default value)
MCA coll: parameter "coll_fca_enable_gatherv" (current value: 
<0>, data source: default value)
MCA coll: parameter "coll_fca_enable_alltoall" (current value: 
<0>, data source: default value)
MCA coll: parameter "coll_fca_enable_alltoallv" (current value: 
<0>, data source: default value)
MCA coll: parameter "coll_fca_enable_alltoallw" (current value: 
<0>, data source: default value)

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985

Re: [OMPI users] openmpi 1.6.3, job submitted through torque/PBS + Moab (scheduler) only land on one node even though multiple nodes/processors are specified

2013-01-24 Thread Brock Palen

On Jan 24, 2013, at 10:10 AM, Sabuj Pattanayek wrote:

> or do i just need to compile two versions, one with IB and one without?

You should not need to, we have OMPI compiled for openib/psm and run that same 
install on psm/tcp and verbs(openib) based gear.

All the nodes assigned to your job have qlogic IB adaptors? They also have 
libpsm_ininipath installed on all of them?  This will be required.

Also did you build your openmpi with tm?  --with-tm=/usr/local/torque/  (or 
where ever the path to lib/libtorque.so  is.)

With TM support, mpirun from OMPI will know how to find the CPUs assigned to 
your job by torque.  This is the better way, you can also in a pinch use 
mpirun -machinefile $PBS_NODEFILE -np 8 

But really tm is better.

Here is our build line for OMPI:

./configure --prefix=/home/software/rhel6/openmpi-1.6.3-mxm/intel-12.1 
--mandir=/home/software/rhel6/openmpi-1.6.3-mxm/intel-12.1/man 
--with-tm=/usr/local/torque --with-openib --with-psm 
--with-mxm=/home/software/rhel6/mxm/1.5 
--with-io-romio-flags=--with-file-system=testfs+ufs+lustre --disable-dlopen 
--enable-shared CC=icc CXX=icpc FC=ifort F77=ifort

We run torque with OMPI.

> 
> On Thu, Jan 24, 2013 at 9:09 AM, Sabuj Pattanayek  wrote:
>> ahha, with --display-allocation I'm getting :
>> 
>> mca: base: component_find: unable to open
>> /sb/apps/openmpi/1.6.3/x86_64/lib/openmpi/mca_mtl_psm:
>> libpsm_infinipath.so.1: cannot open shared object file: No such file
>> or directory (ignored)
>> 
>> I think the system I compiled it on has different ib libs than the
>> nodes. I'll need to recompile and then see if it runs, but is there
>> anyway to get it to ignore IB and just use gigE? Not all of our nodes
>> have IB and I just want to use any node.
>> 
>> On Thu, Jan 24, 2013 at 8:52 AM, Ralph Castain  wrote:
>>> How did you configure OMPI? If you add --display-allocation to your cmd 
>>> line, does it show all the nodes?
>>> 
>>> On Jan 24, 2013, at 6:34 AM, Sabuj Pattanayek  wrote:
>>> 
 Hi,

 I'm submitting a job through torque/PBS, the head node also runs the
 Moab scheduler, the .pbs file has this in the resources line :

 #PBS -l nodes=2:ppn=4

 I've also tried something like :

 #PBS -l procs=56

 and at the end of script I'm running :

 mpirun -np 8 cat /dev/urandom > /dev/null

 or

 mpirun -np 56 cat /dev/urandom > /dev/null

 ...depending on how many processors I requested. The job starts,
 $PBS_NODEFILE has the nodes that the job was assigned listed, but all
 the cat's are piled onto the first node. Any idea how I can get this
 to submit jobs across multiple nodes? Note, I have OSU mpiexec working
 without problems with mvapich and mpich2 on our cluster to launch jobs
 across multiple nodes.

 Thanks,
 Sabuj
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] IBV_EVENT_QP_ACCESS_ERR

2013-01-23 Thread Brock Palen

have a user whos code at scale dies reliably with the errors (new hosts each 
time):

We have been using for this code:
-mca btl_openib_receive_queues X,4096,128:X,12288,128:X,65536,12

Without that option it dies with an out of memory message reliably. 

Note this code runs fine at the same scale on Pilaties (NASA SGI box) using 
MPT, 

Are we running out of QP?  Is that possible?

--
The OpenFabrics stack has reported a network error event.  Open MPI
will try to continue, but your job may end up failing.

  Local host:nyx5608.engin.umich.edu
  MPI process PID:   42036
  Error number:  3 (IBV_EVENT_QP_ACCESS_ERR)

This error may indicate connectivity problems within the fabric;
please contact your system administrator.
--
[[9462,1],3][../../../../../openmpi-1.6/ompi/mca/btl/openib/btl_openib_component.c:3394:handle_wc]
 from nyx5608.engin.umich.edu to: nyx5022 error polling LP CQ with status 
INVALID REQUEST ERROR status number 9 for wr_id 14d6d00 opcode 0  vendor error 
138 qp_idx 0
--
The OpenFabrics stack has reported a network error event.  Open MPI
will try to continue, but your job may end up failing.

  Local host:(null)
  MPI process PID:   42038
  Error number:  3 (IBV_EVENT_QP_ACCESS_ERR)

This error may indicate connectivity problems within the fabric;
please contact your system administrator.


Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985

Re: [OMPI users] MXM vs OpenIB

2013-01-22 Thread Brock Palen


>> You sound like our vendors, "what is your app"  
> 
> ;-) I used to be one.
> 
> Ideally OMPI should do the switch between MXM/RC/XRC internally in the 
> transport layer. Unfortunately,
> we don't have such smart selection logic. Hopefully IB vendors will fix some 
> day. 

I actually looked in the openib-hca.ini (working from memory) to try and find 
what the default queues were, and I actually couldn't figure it out. The 
ConnectX entry doesn't have a default, and the 'default default'  also doesn't 
have an entry. 

I need to dig into ompi_info, got distracted by an intel compiler bug, ADD for 
admin/user support folks.

> 
>> 
>> Note most of our users run just fine with the standard Peer-Peer queues, 
>> default out the box OpenMPI.
> 
> The P2P queue is fine, but most like using XRC your users will observe better 
> performance. This is not just scalability.

Cool thanks for all the input, I wonder why peer-to-peer is the default, I know 
XRC requires hardware support, 

> 
> - Pasha
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] MXM vs OpenIB

2013-01-22 Thread Brock Palen

No there would be no overlap.

We run a large legacy condo, with several islands of Infiniband of different 
ages and types. Users run within their condo/ib island.  So PSM users only run 
on PSM nodes they own, and there is no overlap.

Our jobs range from 4 cores to 1000 cores, looking at the FAQ page it states 
that MXM was used in the past only for >128 ranks, but is in 1.6 used for rank 
counts of any size.

I think we will do some testing, we never even heard of MXM before, 

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985

On Jan 22, 2013, at 2:58 PM, Shamis, Pavel wrote:

>> We just learned about MXM, and given most our cards are Mellonox ConnectX 
>> cards (though not all, have have islands of previous to ConnectX and Qlogic 
>> supported in the same OpenMPI environment),
>> 
>> Will MXM correctly fail though to PSM if on qlogic gear and fail though to 
>> OpenIB if on previous to connectX cards?
> 
> Do you want to run MXM and PSM in the same MPI session ? You can't do it. MXM 
> and PSM use different network protocols.
> If you want to use MXM in your MPI job, all nodes should be configured to use 
> MXM.
> 
> On the other hand, OpenIB btl should support mixed environments out of the 
> box.
> 
> - Pasha
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] 1.6.2 affinity failures

2012-12-20 Thread Brock Palen

w00t :-)

Thanks

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Dec 20, 2012, at 10:46 AM, Ralph Castain wrote:

> HmmmI'll see what I can do about the error message. I don't think there 
> is much in 1.6 I can do, but in 1.7 I could generate an appropriate error 
> message as we have a way to check the topologies.
> 
> On Dec 20, 2012, at 7:11 AM, Brock Palen <bro...@umich.edu> wrote:
> 
>> Ralph,
>> 
>> Thanks for the info, 
>> That said I found the problem, one of the new nodes, had Hyperthreading on, 
>> and the rest didn't so all the nodes didn't match.  A quick 
>> 
>> pdsh lstopo | dshbak -c 
>> 
>> Uncovered the one different node.  The error just didn't give me a clue to 
>> that being the cause, which was very odd:
>> 
>> Correct node:
>> [brockp@nyx0930 ~]$ lstopo 
>> Machine (64GB)
>> NUMANode L#0 (P#0 32GB) + Socket L#0 + L3 L#0 (20MB)
>>   L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
>>   L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1 + PU L#1 (P#1)
>>   L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2 + PU L#2 (P#2)
>>   L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3 + PU L#3 (P#3)
>>   L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4 + PU L#4 (P#4)
>>   L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5 + PU L#5 (P#5)
>>   L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6 + PU L#6 (P#6)
>>   L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7)
>> NUMANode L#1 (P#1 32GB) + Socket L#1 + L3 L#1 (20MB)
>>   L2 L#8 (256KB) + L1 L#8 (32KB) + Core L#8 + PU L#8 (P#8)
>>   L2 L#9 (256KB) + L1 L#9 (32KB) + Core L#9 + PU L#9 (P#9)
>>   L2 L#10 (256KB) + L1 L#10 (32KB) + Core L#10 + PU L#10 (P#10)
>>   L2 L#11 (256KB) + L1 L#11 (32KB) + Core L#11 + PU L#11 (P#11)
>>   L2 L#12 (256KB) + L1 L#12 (32KB) + Core L#12 + PU L#12 (P#12)
>>   L2 L#13 (256KB) + L1 L#13 (32KB) + Core L#13 + PU L#13 (P#13)
>>   L2 L#14 (256KB) + L1 L#14 (32KB) + Core L#14 + PU L#14 (P#14)
>>   L2 L#15 (256KB) + L1 L#15 (32KB) + Core L#15 + PU L#15 (P#15)
>> 
>> 
>> Bad node:
>> [brockp@nyx0936 ~]$ lstopo
>> Machine (64GB)
>> NUMANode L#0 (P#0 32GB) + Socket L#0 + L3 L#0 (20MB)
>>   L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
>> PU L#0 (P#0)
>> PU L#1 (P#16)
>>   L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1
>> PU L#2 (P#1)
>> PU L#3 (P#17)
>>   L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2
>> PU L#4 (P#2)
>> PU L#5 (P#18)
>>   L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3
>> PU L#6 (P#3)
>> PU L#7 (P#19)
>>   L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4
>> PU L#8 (P#4)
>> PU L#9 (P#20)
>>   L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5
>> PU L#10 (P#5)
>> PU L#11 (P#21)
>>   L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6
>> PU L#12 (P#6)
>> PU L#13 (P#22)
>>   L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7
>> PU L#14 (P#7)
>> PU L#15 (P#23)
>> NUMANode L#1 (P#1 32GB) + Socket L#1 + L3 L#1 (20MB)
>>   L2 L#8 (256KB) + L1 L#8 (32KB) + Core L#8
>> PU L#16 (P#8)
>> PU L#17 (P#24)
>>   L2 L#9 (256KB) + L1 L#9 (32KB) + Core L#9
>> PU L#18 (P#9)
>> PU L#19 (P#25)
>>   L2 L#10 (256KB) + L1 L#10 (32KB) + Core L#10
>> PU L#20 (P#10)
>> PU L#21 (P#26)
>>   L2 L#11 (256KB) + L1 L#11 (32KB) + Core L#11
>> PU L#22 (P#11)
>> PU L#23 (P#27)
>>   L2 L#12 (256KB) + L1 L#12 (32KB) + Core L#12
>> PU L#24 (P#12)
>> PU L#25 (P#28)
>>   L2 L#13 (256KB) + L1 L#13 (32KB) + Core L#13
>> PU L#26 (P#13)
>> PU L#27 (P#29)
>>   L2 L#14 (256KB) + L1 L#14 (32KB) + Core L#14
>> PU L#28 (P#14)
>> PU L#29 (P#30)
>>   L2 L#15 (256KB) + L1 L#15 (32KB) + Core L#15
>> PU L#30 (P#15)
>> PU L#31 (P#31)
>> 
>> Once I removed that node from the pool the error went away, and using 
>> bind-to-core and cpus-per-rank worked. 
>> 
>> I don't see how an error message of the sort given would ever lead me to 
>> find a node with 'more' cores, even if fake, I was looking for a node that 
>> had a bad socket or wrong part.
>> 
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On Dec 19, 2012, at 9:08 PM, Ralph Castain wrote:
>> 
>>> I'm afraid these are both known problems in the 1.6.2 release. I believe we 
>>> fixed npersocket in 1.6.3, though you

Re: [OMPI users] 1.6.2 affinity failures

2012-12-20 Thread Brock Palen

Ralph,

Thanks for the info, 
That said I found the problem, one of the new nodes, had Hyperthreading on, and 
the rest didn't so all the nodes didn't match.  A quick 

pdsh lstopo | dshbak -c 

Uncovered the one different node.  The error just didn't give me a clue to that 
being the cause, which was very odd:

Correct node:
[brockp@nyx0930 ~]$ lstopo 
Machine (64GB)
  NUMANode L#0 (P#0 32GB) + Socket L#0 + L3 L#0 (20MB)
L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1 + PU L#1 (P#1)
L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2 + PU L#2 (P#2)
L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3 + PU L#3 (P#3)
L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4 + PU L#4 (P#4)
L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5 + PU L#5 (P#5)
L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6 + PU L#6 (P#6)
L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7)
  NUMANode L#1 (P#1 32GB) + Socket L#1 + L3 L#1 (20MB)
L2 L#8 (256KB) + L1 L#8 (32KB) + Core L#8 + PU L#8 (P#8)
L2 L#9 (256KB) + L1 L#9 (32KB) + Core L#9 + PU L#9 (P#9)
L2 L#10 (256KB) + L1 L#10 (32KB) + Core L#10 + PU L#10 (P#10)
L2 L#11 (256KB) + L1 L#11 (32KB) + Core L#11 + PU L#11 (P#11)
L2 L#12 (256KB) + L1 L#12 (32KB) + Core L#12 + PU L#12 (P#12)
L2 L#13 (256KB) + L1 L#13 (32KB) + Core L#13 + PU L#13 (P#13)
L2 L#14 (256KB) + L1 L#14 (32KB) + Core L#14 + PU L#14 (P#14)
L2 L#15 (256KB) + L1 L#15 (32KB) + Core L#15 + PU L#15 (P#15)


Bad node:
[brockp@nyx0936 ~]$ lstopo
Machine (64GB)
  NUMANode L#0 (P#0 32GB) + Socket L#0 + L3 L#0 (20MB)
L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
  PU L#0 (P#0)
  PU L#1 (P#16)
L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1
  PU L#2 (P#1)
  PU L#3 (P#17)
L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2
  PU L#4 (P#2)
  PU L#5 (P#18)
L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3
  PU L#6 (P#3)
  PU L#7 (P#19)
L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4
  PU L#8 (P#4)
  PU L#9 (P#20)
L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5
  PU L#10 (P#5)
  PU L#11 (P#21)
L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6
  PU L#12 (P#6)
  PU L#13 (P#22)
L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7
  PU L#14 (P#7)
  PU L#15 (P#23)
  NUMANode L#1 (P#1 32GB) + Socket L#1 + L3 L#1 (20MB)
L2 L#8 (256KB) + L1 L#8 (32KB) + Core L#8
  PU L#16 (P#8)
  PU L#17 (P#24)
L2 L#9 (256KB) + L1 L#9 (32KB) + Core L#9
  PU L#18 (P#9)
  PU L#19 (P#25)
L2 L#10 (256KB) + L1 L#10 (32KB) + Core L#10
  PU L#20 (P#10)
  PU L#21 (P#26)
L2 L#11 (256KB) + L1 L#11 (32KB) + Core L#11
  PU L#22 (P#11)
  PU L#23 (P#27)
L2 L#12 (256KB) + L1 L#12 (32KB) + Core L#12
  PU L#24 (P#12)
  PU L#25 (P#28)
L2 L#13 (256KB) + L1 L#13 (32KB) + Core L#13
  PU L#26 (P#13)
  PU L#27 (P#29)
L2 L#14 (256KB) + L1 L#14 (32KB) + Core L#14
  PU L#28 (P#14)
  PU L#29 (P#30)
L2 L#15 (256KB) + L1 L#15 (32KB) + Core L#15
  PU L#30 (P#15)
  PU L#31 (P#31)

Once I removed that node from the pool the error went away, and using 
bind-to-core and cpus-per-rank worked. 

I don't see how an error message of the sort given would ever lead me to find a 
node with 'more' cores, even if fake, I was looking for a node that had a bad 
socket or wrong part.


Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Dec 19, 2012, at 9:08 PM, Ralph Castain wrote:

> I'm afraid these are both known problems in the 1.6.2 release. I believe we 
> fixed npersocket in 1.6.3, though you might check to be sure. On the 
> large-scale issue, cpus-per-rank well might fail under those conditions. The 
> algorithm in the 1.6 series hasn't seen much use, especially at scale.
> 
> In fact, cpus-per-rank has somewhat fallen by the wayside recently due to 
> apparent lack of interest. I'm restoring it for the 1.7 series over the 
> holiday (currently doesn't work in 1.7 or trunk).
> 
> 
> On Dec 19, 2012, at 4:34 PM, Brock Palen <bro...@umich.edu> wrote:
> 
>> Using openmpi 1.6.2 with intel 13.0  though the problem not specific to the 
>> compiler.
>> 
>> Using two 12 core 2 socket nodes, 
>> 
>> mpirun -np 4 -npersocket 2 uptime
>> --
>> Your job has requested a conflicting number of processes for the
>> application:
>> 
>> App: uptime
>> number of procs:  4
>> 
>> This is more processes than we can launch under the following
>> additional directives and conditions:
>> 
>> number of sockets:   0
>> npersocket:   2
>> 
>> 
>> Any idea why this wouldn't work?  
>> 
>> Another problem the following does what I expect,  two 2 socket 8 core 
>> sockets. 16 total cores/node.
>> 
>> mpi

[OMPI users] 1.6.2 affinity failures

2012-12-19 Thread Brock Palen

Using openmpi 1.6.2 with intel 13.0  though the problem not specific to the 
compiler.

Using two 12 core 2 socket nodes, 

mpirun -np 4 -npersocket 2 uptime
--
Your job has requested a conflicting number of processes for the
application:

App: uptime
number of procs:  4

This is more processes than we can launch under the following
additional directives and conditions:

number of sockets:   0
npersocket:   2


Any idea why this wouldn't work?  

Another problem the following does what I expect,  two 2 socket 8 core sockets. 
16 total cores/node.

mpirun -np 8 -npernode 4 -bind-to-core -cpus-per-rank 4 hwloc-bind --get
0x000f
0x000f
0x00f0
0x00f0
0x0f00
0x0f00
0xf000
0xf000

But fails at large scale:

mpirun -np 276 -npernode 4 -bind-to-core -cpus-per-rank 4 hwloc-bind --get

--
An invalid physical processor ID was returned when attempting to bind
an MPI process to a unique processor.

This usually means that you requested binding to more processors than
exist (e.g., trying to bind N MPI processes to M processors, where N >
M).  Double check that you have enough unique processors for all the
MPI processes that you are launching on this host.
You job will now abort.
--



Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985

Re: [OMPI users] Romio and OpenMPI builds

2012-12-07 Thread Brock Palen

Thanks!

So it looks like most OpenMPI builds out there are running with ROMIO's that 
are obvious to any  optimizations to what they are running on.  

I have added this to our build notes so we get it in next time.  Thanks!

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Dec 7, 2012, at 11:44 PM, Eric Chamberland wrote:

> Hi Brock,
> 
> Le 12/06/2012 05:10 PM, Brock Palen a écrit :
>> Eric,
>> 
>> You are correct, our builds do not show lustre support:
>> 
>>  MCA io: information "io_romio_user_configure_params" (value:, data 
>> source: default value)
> surprise! ;-)
>> 
>> So to enable this, when I build OpenMPI I should pass:
>> 
>> --with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre'
> exactly.
>> 
>> We have Lustre, local filesystems (ufs), and NFSv3 and NFSv4 clients. So 
>> that list should be good for our site.
>> 
>> Would this be a good recommendation for us to include in all our MPI builds?
> I think yes, it is in the right direction, but I am not an "expert"...  some 
> expert advice should be welcome.
> 
> Eric
> 
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On Dec 3, 2012, at 7:12 PM, Eric Chamberland wrote:
>> 
>>> Le 12/03/2012 05:37 PM, Brock Palen a écrit :
>>>> I was trying to use hints with ROMIO and lustre prompted by another post 
>>>> on this list.
>>>> 
>>>> I have a simple MPI-IO code and I cannot using the notes I find set the 
>>>> lustre striping using the config file and setting ROMIO_HINTS.
>>>> 
>>>> Question:
>>>> 
>>>> How can I check which ADIO drivers ROMIO in OPenMPI was built with when I 
>>>> built it?
>>>> Can I make ROMIO go into 'verbose' mode and have it print what it is 
>>>> setting all its values to?
>>> Try "ompi_info -a" and check for lustre in the output:
>>> 
>>> ompi_info -a | grep -i romio
>>> ...
>>>  MCA io: information "io_romio_user_configure_params" 
>>> (value:<--with-file-system=testfs+ufs+nfs+lustre>, data source: default 
>>> value)
>>>  User-specified command line parameters passed to 
>>> ROMIO's configure script
>>>  MCA io: information "io_romio_complete_configure_params" 
>>> (value:<--with-file-system=testfs+ufs+nfs+lustre  CFLAGS='-DNDEBUG -O3 
>>> -xHOST -Wall -finline-functions -fno-strict-aliasing -restrict -pthread' 
>>> CPPFLAGS='  
>>> -I/clumeq/src/Open-MPI/1.6.3/intel/openmpi-1.6.3/opal/mca/hwloc/hwloc132/hwloc/include
>>>  
>>> -I/clumeq/src/Open-MPI/1.6.3/intel/Build/opal/mca/hwloc/hwloc132/hwloc/include
>>>-I/usr/include/infiniband -I/usr/include/infiniband' FFLAGS='' LDFLAGS=' 
>>> ' --enable-shared --enable-static --with-file-system=testfs+ufs+nfs+lustre  
>>> --prefix=/software/MPI/openmpi/1.6.3_intel --with-mpi=open_mpi 
>>> --disable-aio>, data source: default value)
>>>  Complete set of command line parameters passed to 
>>> ROMIO's configure script
>>> 
>>> Eric
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Romio and OpenMPI builds

2012-12-06 Thread Brock Palen

Eric,

You are correct, our builds do not show lustre support:

 MCA io: information "io_romio_user_configure_params" (value: , data 
source: default value)

So to enable this, when I build OpenMPI I should pass:

--with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre'

We have Lustre, local filesystems (ufs), and NFSv3 and NFSv4 clients. So that 
list should be good for our site.

Would this be a good recommendation for us to include in all our MPI builds?

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Dec 3, 2012, at 7:12 PM, Eric Chamberland wrote:

> Le 12/03/2012 05:37 PM, Brock Palen a écrit :
>> I was trying to use hints with ROMIO and lustre prompted by another post on 
>> this list.
>> 
>> I have a simple MPI-IO code and I cannot using the notes I find set the 
>> lustre striping using the config file and setting ROMIO_HINTS.
>> 
>> Question:
>> 
>> How can I check which ADIO drivers ROMIO in OPenMPI was built with when I 
>> built it?
>> Can I make ROMIO go into 'verbose' mode and have it print what it is setting 
>> all its values to?
> Try "ompi_info -a" and check for lustre in the output:
> 
> ompi_info -a | grep -i romio
> ...
>  MCA io: information "io_romio_user_configure_params" (value: 
> <--with-file-system=testfs+ufs+nfs+lustre>, data source: default value)
>  User-specified command line parameters passed to 
> ROMIO's configure script
>  MCA io: information "io_romio_complete_configure_params" 
> (value: <--with-file-system=testfs+ufs+nfs+lustre  CFLAGS='-DNDEBUG -O3 
> -xHOST -Wall -finline-functions -fno-strict-aliasing -restrict -pthread' 
> CPPFLAGS='  
> -I/clumeq/src/Open-MPI/1.6.3/intel/openmpi-1.6.3/opal/mca/hwloc/hwloc132/hwloc/include
>  
> -I/clumeq/src/Open-MPI/1.6.3/intel/Build/opal/mca/hwloc/hwloc132/hwloc/include
>-I/usr/include/infiniband -I/usr/include/infiniband' FFLAGS='' LDFLAGS=' ' 
> --enable-shared --enable-static --with-file-system=testfs+ufs+nfs+lustre  
> --prefix=/software/MPI/openmpi/1.6.3_intel --with-mpi=open_mpi 
> --disable-aio>, data source: default value)
>  Complete set of command line parameters passed to 
> ROMIO's configure script
> 
> Eric
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] Romio and OpenMPI builds

2012-12-03 Thread Brock Palen

I was trying to use hints with ROMIO and lustre prompted by another post on 
this list. 

I have a simple MPI-IO code and I cannot using the notes I find set the lustre 
striping using the config file and setting ROMIO_HINTS.

Question:

How can I check which ADIO drivers ROMIO in OPenMPI was built with when I built 
it?
Can I make ROMIO go into 'verbose' mode and have it print what it is setting 
all its values to?

Thanks!

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985

[OMPI users] Java MPI Bindings in 1.6.x

2012-11-28 Thread Brock Palen

I know java MPI bindings are in the development tree, and the FAQ states they 
are derived from the HLRS bindings (which I can't appear to find online).

Is it possible to take the bindings from the dev tree and build them against 
the 1.6 stable?  If not what mpiJava bindings do you recommend? 

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985

[hwloc-users] Strange binding issue on 40 core nodes and cgroups

2012-11-02 Thread Brock Palen

This isn't a hwloc problem exactly, but maybe you can shed some insight.

We have some 4 socket 10 core = 40 core nodes, HT off:

depth 0:1 Machine (type #1)
 depth 1:   4 NUMANodes (type #2)
  depth 2:  4 Sockets (type #3)
   depth 3: 4 Caches (type #4)
depth 4:40 Caches (type #4)
 depth 5:   40 Caches (type #4)
  depth 6:  40 Cores (type #5)
   depth 7: 40 PUs (type #6)


We run rhel 6.3  we use torque to create cgroups for jobs.  I get the following 
cgroup for this job  all 12 cores for the job are on one node:
cat /dev/cpuset/torque/8845236.nyx.engin.umich.edu/cpus 
0-1,4-5,8,12,16,20,24,28,32,36

Not all nicely spaced, but 12 cores

I then start a code, even a simple serial code with openmpi 1.6.0 on all 12 
cores:
mpirun ./stream

45521 brockp20   0 1885m 1.8g  456 R 100.0  0.2   4:02.72 stream
 
45522 brockp20   0 1885m 1.8g  456 R 100.0  0.2   1:46.08 stream
 
45525 brockp20   0 1885m 1.8g  456 R 100.0  0.2   4:02.72 stream
 
45526 brockp20   0 1885m 1.8g  456 R 100.0  0.2   1:46.07 stream
 
45527 brockp20   0 1885m 1.8g  456 R 100.0  0.2   4:02.71 stream
 
45528 brockp20   0 1885m 1.8g  456 R 100.0  0.2   4:02.71 stream
 
45532 brockp20   0 1885m 1.8g  456 R 100.0  0.2   1:46.05 stream
 
45529 brockp20   0 1885m 1.8g  456 R 99.2  0.2   4:02.70 stream 
 
45530 brockp20   0 1885m 1.8g  456 R 99.2  0.2   4:02.70 stream 
 
45531 brockp20   0 1885m 1.8g  456 R 33.6  0.2   1:20.89 stream 
 
45523 brockp20   0 1885m 1.8g  456 R 32.8  0.2   1:20.90 stream 
 
45524 brockp20   0 1885m 1.8g  456 R 32.8  0.2   1:20.89 stream   

Note the processes that are not running at 100% cpu, 

hwloc-bind  --get --pid 45523
0x0011,0x1133


hwloc-calc 0x0011,0x1133 --intersect PU
0,1,2,3,4,5,6,7,8,9,10,11

So all ranks in the job should see all 12 cores.  The same cgroup is reported 
by /proc//cgroup

Not only that I can make things work by forcing binding in the mpi launcher:
mpirun -bind-to-core ./stream

46886 brockp20   0 1885m 1.8g  456 R 99.8  0.2   0:15.49 stream 
 
46887 brockp20   0 1885m 1.8g  456 R 99.8  0.2   0:15.49 stream 
 
46888 brockp20   0 1885m 1.8g  456 R 99.8  0.2   0:15.48 stream 
 
46889 brockp20   0 1885m 1.8g  456 R 99.8  0.2   0:15.49 stream 
 
46890 brockp20   0 1885m 1.8g  456 R 99.8  0.2   0:15.48 stream 
 
46891 brockp20   0 1885m 1.8g  456 R 99.8  0.2   0:15.48 stream 
 
46892 brockp20   0 1885m 1.8g  456 R 99.8  0.2   0:15.47 stream 
 
46893 brockp20   0 1885m 1.8g  456 R 99.8  0.2   0:15.47 stream 
 
46894 brockp20   0 1885m 1.8g  456 R 99.8  0.2   0:15.47 stream 
 
46895 brockp20   0 1885m 1.8g  456 R 99.8  0.2   0:15.47 stream 
 
46896 brockp20   0 1885m 1.8g  456 R 99.8  0.2   0:15.46 stream 
 
46897 brockp20   0 1885m 1.8g  456 R 99.8  0.2   0:15.46 stream 

Things are now working as expected, and I should stress this is inside the same 
torque job and cgroup that I started with.

A multi threaded version of the code does use close to 12 cores as expected.

If I cervumvent out batch system and the cgroups a normal mpirun ./stream  does 
start 12 processes that consume a full 100% core. 

Thoughts?  This is really odd linux scheduler behavior.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985

Re: [OMPI users] openmpi 1.6.1 Questions

2012-08-26 Thread Brock Palen

Thanks and super cool.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Aug 25, 2012, at 7:06 AM, Jeff Squyres wrote:

> On Aug 24, 2012, at 10:45 AM, Brock Palen wrote:
> 
>>> Right now we should be just warning if we can't register 3/4 of your 
>>> physical memory (we can't really test for anything more than that).  But it 
>>> doesn't abort.
>> Ok
>> 
>>> We could add a tunable that makes it abort in this case, if you think that 
>>> would be useful.
>> I think so, in my case that would mean a node is miss-configured, and rather 
>> than running slowly I want it brought to my attention, 
> 
> 
> Ok, this is easy enough to add.  Due to a PGI compilation issue, it looks 
> like we're going to probably roll a 1.6.2 in the immediate future; we can 
> include this in there.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] openmpi 1.6.1 Questions

2012-08-24 Thread Brock Palen

On Aug 24, 2012, at 10:38 AM, Jeff Squyres wrote:

> On Aug 24, 2012, at 10:28 AM, Brock Palen wrote:
> 
>> I grabbed the new OMPI 1.6.1 and ran my test that would cause a hang with 
>> 1.6.0 with low registered memory.  From reading the release notes rather 
>> than hang I would expect:
>> 
>> * lower performance/fall back to send/receive.
>> * a notice of failed to allocate registered memory
>> 
>> In my case I still get a hang, is this expected?  
> 
> It can still happen, yes.  The short version is that there are cases that 
> can't easily be fixed in the 1.6 series that involve lazy creation of QPs.  
> Do you see errors about OMPI failing to create CQ's or QP'?

No IMB (my simple test case) just hangs on an Alltoall indefinitely, 

> 
>> This is running with default registered memory limits and I do appreciate  
>> the message that I only have 4GB of registered memory of my 48.  We will 
>> also be fixing our load to raise this value, which should make this issue 
>> moot.
> 
> Did you get a warning about being able to register too little memory?

Correct I do and I like the warning at startup.

> 
>> Honestly I think what I would want is for MPI to blow up saying "can't 
>> allocate registered memory, fatal, contact your admin", rather than fall 
>> back to send/receive and just be slower.
> 
> Right now we should be just warning if we can't register 3/4 of your physical 
> memory (we can't really test for anything more than that).  But it doesn't 
> abort.
Ok

> 
> We could add a tunable that makes it abort in this case, if you think that 
> would be useful.
I think so, in my case that would mean a node is miss-configured, and rather 
than running slowly I want it brought to my attention, 

> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] openmpi 1.6.1 Questions

2012-08-24 Thread Brock Palen

I grabbed the new OMPI 1.6.1 and ran my test that would cause a hang with 1.6.0 
with low registered memory.  From reading the release notes rather than hang I 
would expect:

* lower performance/fall back to send/receive.
* a notice of failed to allocate registered memory

In my case I still get a hang, is this expected?  This is running with default 
registered memory limits and I do appreciate  the message that I only have 4GB 
of registered memory of my 48.  We will also be fixing our load to raise this 
value, which should make this issue moot.

Honestly I think what I would want is for MPI to blow up saying "can't allocate 
registered memory, fatal, contact your admin", rather than fall back to 
send/receive and just be slower.

Am I reading the release notes correctly?  Is there a tunable setting to blow 
up rather than fallback?

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985

Re: [hwloc-users] HWLoc Documentation pages 404's

2012-08-10 Thread Brock Palen

Yep very odd,

Looks like torque wrote a wrapper then for some hwloc functions.

BTW working with cgroups/cpusets in our resource manager  hwloc-info --pid  is 
_wonderful_  

I think I am good to go.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Aug 10, 2012, at 5:14 PM, Jeff Squyres wrote:

> I don't know why Google is pointing you there...
> 
> I went back as far as as hwloc 1.3 and I cannot find a function named 
> hwloc_bitmap_displaylist() -- that's probably why you can't find any 
> reference to it in the docs.  :-)
> 
> 
> 
> On Aug 10, 2012, at 4:55 PM, Brock Palen wrote:
> 
>> Google is giving me this url:
>> www.open-mpi.org/projects/hwloc//doc/v1.5/a2.php
>> 
>> When i searched for hwloc_bitmap_displaylist()   (for which I can find 
>> nothing nor a manpage :-) )
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On Aug 10, 2012, at 4:26 PM, Jeff Squyres wrote:
>> 
>>> Try looking here:
>>> 
>>> http://www.open-mpi.org/projects/hwloc/doc/
>>> 
>>> You have an extra "projects" in your URL.  How did you get to that URL?  Do 
>>> we have a bug in our web pages somewhere?
>>> 
>>> 
>>> On Aug 10, 2012, at 3:56 PM, Brock Palen wrote:
>>> 
>>>> http://www.open-mpi.org/projects/projects/hwloc/doc/
>>>> 
>>>> Oh noooss!!!
>>>> 
>>>> Brock Palen
>>>> www.umich.edu/~brockp
>>>> CAEN Advanced Computing
>>>> bro...@umich.edu
>>>> (734)936-1985
>>>> 
>>>> 
>>>> 
>>>> ___
>>>> hwloc-users mailing list
>>>> hwloc-us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> 
>>> ___
>>> hwloc-users mailing list
>>> hwloc-us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>> 
>> 
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users

Re: [hwloc-users] HWLoc Documentation pages 404's

2012-08-10 Thread Brock Palen

Google is giving me this url:
www.open-mpi.org/projects/hwloc//doc/v1.5/a2.php

When i searched for hwloc_bitmap_displaylist()   (for which I can find nothing 
nor a manpage :-) )

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Aug 10, 2012, at 4:26 PM, Jeff Squyres wrote:

> Try looking here:
> 
>  http://www.open-mpi.org/projects/hwloc/doc/
> 
> You have an extra "projects" in your URL.  How did you get to that URL?  Do 
> we have a bug in our web pages somewhere?
> 
> 
> On Aug 10, 2012, at 3:56 PM, Brock Palen wrote:
> 
>> http://www.open-mpi.org/projects/projects/hwloc/doc/
>> 
>> Oh noooss!!!
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users

[hwloc-users] HWLoc Documentation pages 404's

2012-08-10 Thread Brock Palen

http://www.open-mpi.org/projects/projects/hwloc/doc/

Oh noooss!!!

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985

Re: [OMPI users] OpenMPI and Rmpi/snow

2012-07-26 Thread Brock Palen

I think so, sorry if I gave you the impression that Rmpi changed, 

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Jul 26, 2012, at 7:30 PM, Ralph Castain wrote:

> Guess I'm confused - your original note indicated that something had changed 
> in Rmpi that broke things. Are you now saying it was something in OMPI?
> 
> On Jul 26, 2012, at 4:22 PM, Brock Palen wrote:
> 
>> Ok will see, Rmpi we had working with 1.4 and has not been updated after 
>> 2010,  this this kinda stinks.
>> 
>> I will keep digging into it thanks for the help.
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On Jul 26, 2012, at 7:16 PM, Ralph Castain wrote:
>> 
>>> Crud - afraid you'll have to ask them, then :-(
>>> 
>>> 
>>> On Jul 26, 2012, at 3:50 PM, Brock Palen wrote:
>>> 
>>>> Ralph,
>>>> 
>>>> Rmpi wraps everything up, so I tried setting them with
>>>> 
>>>> export OMPI_plm_base_verbose=5
>>>> export OMPI_dpm_base_verbose=5
>>>> 
>>>> and I get no extra messages even on helloworld example simple MPI-1.0 
>>>> code. 
>>>> 
>>>> 
>>>> Brock Palen
>>>> www.umich.edu/~brockp
>>>> CAEN Advanced Computing
>>>> bro...@umich.edu
>>>> (734)936-1985
>>>> 
>>>> 
>>>> 
>>>> On Jul 26, 2012, at 6:42 PM, Ralph Castain wrote:
>>>> 
>>>>> Well, it looks like comm_spawn is working on 1.6. Afraid I don't know 
>>>>> enough about Rmpi/snow to advise on what changed, but you could add some 
>>>>> debug params to get an idea of where the problem is occurring:
>>>>> 
>>>>> -mca plm_base_verbose 5 -mca dpm_base_verbose 5
>>>>> 
>>>>> should tell you from an OMPI perspective. I can try to help debug that 
>>>>> end, at least.
>>>>> 
>>>>> 
>>>>> On Jul 26, 2012, at 3:02 PM, Ralph Castain wrote:
>>>>> 
>>>>>> Weird - looks like it has done a comm_spawn and having trouble 
>>>>>> connecting between the jobs. I can check the basic code and make sure it 
>>>>>> is working - I seem to recall someone else recently talking about Rmpi 
>>>>>> changes causing problems (different ones than this, IIRC), so you might 
>>>>>> want to search our user archives for rmpi to see what they ran into. Not 
>>>>>> sure what rmpi changed, or why.
>>>>>> 
>>>>>> On Jul 26, 2012, at 2:41 PM, Brock Palen wrote:
>>>>>> 
>>>>>>> I have ran into a problem using Rmpi with OpenMPI (trying to get snow 
>>>>>>> running).
>>>>>>> 
>>>>>>> I built OpenMPI following another post where I built static:
>>>>>>> 
>>>>>>> ./configure --prefix=$INSTALL/gcc-4.4.6-static 
>>>>>>> --mandir=$INSTALL/gcc-4.4.6-static/man --with-tm=/usr/local/torque/ 
>>>>>>> --with-openib --with-psm --enable-static CC=gcc CXX=g++ FC=gfortran 
>>>>>>> F77=gfortran
>>>>>>> 
>>>>>>> Rmpi/snow work fine when I run on a single node.  When I span more than 
>>>>>>> one node I get nasty errors (pasted below).
>>>>>>> 
>>>>>>> I tested this mpi install with a simple hello world and that works.  
>>>>>>> Any thoughts what is different about Rmpi/snow that could cause this?
>>>>>>> 
>>>>>>> [nyx0400.engin.umich.edu:11927] [[48116,0],4] ORTE_ERROR_LOG: Not found 
>>>>>>> in file routed_binomial.c at line 386
>>>>>>> [nyx0400.engin.umich.edu:11927] [[48116,0],4]:route_callback tried 
>>>>>>> routing message from [[48116,2],16] to [[48116,1],0]:16, can't find 
>>>>>>> route
>>>>>>> [nyx0405.engin.umich.edu:07707] [[48116,0],8] ORTE_ERROR_LOG: Not found 
>>>>>>> in file routed_binomial.c at line 386
>>>>>>> [nyx0405.engin.umich.edu:07707] [[48116,0],8]:route_callback tried 
>>>>>>> routing message from [[48116,2],32] to [[48116,1],0]:16, can't find 
>>>>>>> route
>>>>>>> [0] 
>>>>>>> func:/home/soft

Re: [OMPI users] OpenMPI and Rmpi/snow

2012-07-26 Thread Brock Palen

Ok will see, Rmpi we had working with 1.4 and has not been updated after 2010,  
this this kinda stinks.

I will keep digging into it thanks for the help.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Jul 26, 2012, at 7:16 PM, Ralph Castain wrote:

> Crud - afraid you'll have to ask them, then :-(
> 
> 
> On Jul 26, 2012, at 3:50 PM, Brock Palen wrote:
> 
>> Ralph,
>> 
>> Rmpi wraps everything up, so I tried setting them with
>> 
>> export OMPI_plm_base_verbose=5
>> export OMPI_dpm_base_verbose=5
>> 
>> and I get no extra messages even on helloworld example simple MPI-1.0 code. 
>> 
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On Jul 26, 2012, at 6:42 PM, Ralph Castain wrote:
>> 
>>> Well, it looks like comm_spawn is working on 1.6. Afraid I don't know 
>>> enough about Rmpi/snow to advise on what changed, but you could add some 
>>> debug params to get an idea of where the problem is occurring:
>>> 
>>> -mca plm_base_verbose 5 -mca dpm_base_verbose 5
>>> 
>>> should tell you from an OMPI perspective. I can try to help debug that end, 
>>> at least.
>>> 
>>> 
>>> On Jul 26, 2012, at 3:02 PM, Ralph Castain wrote:
>>> 
>>>> Weird - looks like it has done a comm_spawn and having trouble connecting 
>>>> between the jobs. I can check the basic code and make sure it is working - 
>>>> I seem to recall someone else recently talking about Rmpi changes causing 
>>>> problems (different ones than this, IIRC), so you might want to search our 
>>>> user archives for rmpi to see what they ran into. Not sure what rmpi 
>>>> changed, or why.
>>>> 
>>>> On Jul 26, 2012, at 2:41 PM, Brock Palen wrote:
>>>> 
>>>>> I have ran into a problem using Rmpi with OpenMPI (trying to get snow 
>>>>> running).
>>>>> 
>>>>> I built OpenMPI following another post where I built static:
>>>>> 
>>>>> ./configure --prefix=$INSTALL/gcc-4.4.6-static 
>>>>> --mandir=$INSTALL/gcc-4.4.6-static/man --with-tm=/usr/local/torque/ 
>>>>> --with-openib --with-psm --enable-static CC=gcc CXX=g++ FC=gfortran 
>>>>> F77=gfortran
>>>>> 
>>>>> Rmpi/snow work fine when I run on a single node.  When I span more than 
>>>>> one node I get nasty errors (pasted below).
>>>>> 
>>>>> I tested this mpi install with a simple hello world and that works.  Any 
>>>>> thoughts what is different about Rmpi/snow that could cause this?
>>>>> 
>>>>> [nyx0400.engin.umich.edu:11927] [[48116,0],4] ORTE_ERROR_LOG: Not found 
>>>>> in file routed_binomial.c at line 386
>>>>> [nyx0400.engin.umich.edu:11927] [[48116,0],4]:route_callback tried 
>>>>> routing message from [[48116,2],16] to [[48116,1],0]:16, can't find route
>>>>> [nyx0405.engin.umich.edu:07707] [[48116,0],8] ORTE_ERROR_LOG: Not found 
>>>>> in file routed_binomial.c at line 386
>>>>> [nyx0405.engin.umich.edu:07707] [[48116,0],8]:route_callback tried 
>>>>> routing message from [[48116,2],32] to [[48116,1],0]:16, can't find route
>>>>> [0] 
>>>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_backtrace_print+0x1f)
>>>>>  [0x2b7e9209e0df]
>>>>> [1] 
>>>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(+0x9f77a)
>>>>>  [0x2b7e9206577a]
>>>>> [2] 
>>>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(mca_oob_tcp_msg_recv_complete+0x27f)
>>>>>  [0x2b7e920404af]
>>>>> [3] 
>>>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(+0x7bed2)
>>>>>  [0x2b7e92041ed2]
>>>>> [4] 
>>>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_event_base_loop+0x238)
>>>>>  [0x2b7e92087e38]
>>>>> [5] 
>>>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(orte_daemon+0x8d8)
>>>>>  [0x2b7e92016768]
>>>>> [6] func:orted(main+0x66) [0x400966]
>>>>> [7] func:/lib64/libc.so.6(__libc_start_main+0xfd) [0x3d39c1ecd

Re: [OMPI users] OpenMPI and Rmpi/snow

2012-07-26 Thread Brock Palen

Ralph,

Rmpi wraps everything up, so I tried setting them with

export OMPI_plm_base_verbose=5
export OMPI_dpm_base_verbose=5

and I get no extra messages even on helloworld example simple MPI-1.0 code. 


Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Jul 26, 2012, at 6:42 PM, Ralph Castain wrote:

> Well, it looks like comm_spawn is working on 1.6. Afraid I don't know enough 
> about Rmpi/snow to advise on what changed, but you could add some debug 
> params to get an idea of where the problem is occurring:
> 
> -mca plm_base_verbose 5 -mca dpm_base_verbose 5
> 
> should tell you from an OMPI perspective. I can try to help debug that end, 
> at least.
> 
> 
> On Jul 26, 2012, at 3:02 PM, Ralph Castain wrote:
> 
>> Weird - looks like it has done a comm_spawn and having trouble connecting 
>> between the jobs. I can check the basic code and make sure it is working - I 
>> seem to recall someone else recently talking about Rmpi changes causing 
>> problems (different ones than this, IIRC), so you might want to search our 
>> user archives for rmpi to see what they ran into. Not sure what rmpi 
>> changed, or why.
>> 
>> On Jul 26, 2012, at 2:41 PM, Brock Palen wrote:
>> 
>>> I have ran into a problem using Rmpi with OpenMPI (trying to get snow 
>>> running).
>>> 
>>> I built OpenMPI following another post where I built static:
>>> 
>>> ./configure --prefix=$INSTALL/gcc-4.4.6-static 
>>> --mandir=$INSTALL/gcc-4.4.6-static/man --with-tm=/usr/local/torque/ 
>>> --with-openib --with-psm --enable-static CC=gcc CXX=g++ FC=gfortran 
>>> F77=gfortran
>>> 
>>> Rmpi/snow work fine when I run on a single node.  When I span more than one 
>>> node I get nasty errors (pasted below).
>>> 
>>> I tested this mpi install with a simple hello world and that works.  Any 
>>> thoughts what is different about Rmpi/snow that could cause this?
>>> 
>>> [nyx0400.engin.umich.edu:11927] [[48116,0],4] ORTE_ERROR_LOG: Not found in 
>>> file routed_binomial.c at line 386
>>> [nyx0400.engin.umich.edu:11927] [[48116,0],4]:route_callback tried routing 
>>> message from [[48116,2],16] to [[48116,1],0]:16, can't find route
>>> [nyx0405.engin.umich.edu:07707] [[48116,0],8] ORTE_ERROR_LOG: Not found in 
>>> file routed_binomial.c at line 386
>>> [nyx0405.engin.umich.edu:07707] [[48116,0],8]:route_callback tried routing 
>>> message from [[48116,2],32] to [[48116,1],0]:16, can't find route
>>> [0] 
>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_backtrace_print+0x1f)
>>>  [0x2b7e9209e0df]
>>> [1] 
>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(+0x9f77a)
>>>  [0x2b7e9206577a]
>>> [2] 
>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(mca_oob_tcp_msg_recv_complete+0x27f)
>>>  [0x2b7e920404af]
>>> [3] 
>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(+0x7bed2)
>>>  [0x2b7e92041ed2]
>>> [4] 
>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_event_base_loop+0x238)
>>>  [0x2b7e92087e38]
>>> [5] 
>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(orte_daemon+0x8d8)
>>>  [0x2b7e92016768]
>>> [6] func:orted(main+0x66) [0x400966]
>>> [7] func:/lib64/libc.so.6(__libc_start_main+0xfd) [0x3d39c1ecdd]
>>> [8] func:orted() [0x400839]
>>> [nyx0397.engin.umich.edu:06959] [[48116,0],1] ORTE_ERROR_LOG: Not found in 
>>> file routed_binomial.c at line 386
>>> [nyx0397.engin.umich.edu:06959] [[48116,0],1]:route_callback tried routing 
>>> message from [[48116,2],7] to [[48116,1],0]:16, can't find route
>>> [nyx0401.engin.umich.edu:07782] [[48116,0],5] ORTE_ERROR_LOG: Not found in 
>>> file routed_binomial.c at line 386
>>> [nyx0401.engin.umich.edu:07782] [[48116,0],5]:route_callback tried routing 
>>> message from [[48116,2],23] to [[48116,1],0]:16, can't find route
>>> [nyx0406.engin.umich.edu:07743] [[48116,0],9] ORTE_ERROR_LOG: Not found in 
>>> file routed_binomial.c at line 386
>>> [nyx0406.engin.umich.edu:07743] [[48116,0],9]:route_callback tried routing 
>>> message from [[48116,2],39] to [[48116,1],0]:16, can't find route
>>> [0] 
>>> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_backtrace_print+0x1f)
>>>  [0x2ae2ad17d0df]
>>> 
>>> 
>>> 
>>> 
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> CAEN Advanced Computing
>>> bro...@umich.edu
>>> (734)936-1985
>>> 
>>> 
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] OpenMPI and Rmpi/snow

2012-07-26 Thread Brock Palen

I have ran into a problem using Rmpi with OpenMPI (trying to get snow running).

I built OpenMPI following another post where I built static:

./configure --prefix=$INSTALL/gcc-4.4.6-static 
--mandir=$INSTALL/gcc-4.4.6-static/man --with-tm=/usr/local/torque/ 
--with-openib --with-psm --enable-static CC=gcc CXX=g++ FC=gfortran F77=gfortran

Rmpi/snow work fine when I run on a single node.  When I span more than one 
node I get nasty errors (pasted below).

I tested this mpi install with a simple hello world and that works.  Any 
thoughts what is different about Rmpi/snow that could cause this?

[nyx0400.engin.umich.edu:11927] [[48116,0],4] ORTE_ERROR_LOG: Not found in file 
routed_binomial.c at line 386
[nyx0400.engin.umich.edu:11927] [[48116,0],4]:route_callback tried routing 
message from [[48116,2],16] to [[48116,1],0]:16, can't find route
[nyx0405.engin.umich.edu:07707] [[48116,0],8] ORTE_ERROR_LOG: Not found in file 
routed_binomial.c at line 386
[nyx0405.engin.umich.edu:07707] [[48116,0],8]:route_callback tried routing 
message from [[48116,2],32] to [[48116,1],0]:16, can't find route
[0] 
func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_backtrace_print+0x1f)
 [0x2b7e9209e0df]
[1] 
func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(+0x9f77a)
 [0x2b7e9206577a]
[2] 
func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(mca_oob_tcp_msg_recv_complete+0x27f)
 [0x2b7e920404af]
[3] 
func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(+0x7bed2)
 [0x2b7e92041ed2]
[4] 
func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_event_base_loop+0x238)
 [0x2b7e92087e38]
[5] 
func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(orte_daemon+0x8d8)
 [0x2b7e92016768]
[6] func:orted(main+0x66) [0x400966]
[7] func:/lib64/libc.so.6(__libc_start_main+0xfd) [0x3d39c1ecdd]
[8] func:orted() [0x400839]
[nyx0397.engin.umich.edu:06959] [[48116,0],1] ORTE_ERROR_LOG: Not found in file 
routed_binomial.c at line 386
[nyx0397.engin.umich.edu:06959] [[48116,0],1]:route_callback tried routing 
message from [[48116,2],7] to [[48116,1],0]:16, can't find route
[nyx0401.engin.umich.edu:07782] [[48116,0],5] ORTE_ERROR_LOG: Not found in file 
routed_binomial.c at line 386
[nyx0401.engin.umich.edu:07782] [[48116,0],5]:route_callback tried routing 
message from [[48116,2],23] to [[48116,1],0]:16, can't find route
[nyx0406.engin.umich.edu:07743] [[48116,0],9] ORTE_ERROR_LOG: Not found in file 
routed_binomial.c at line 386
[nyx0406.engin.umich.edu:07743] [[48116,0],9]:route_callback tried routing 
message from [[48116,2],39] to [[48116,1],0]:16, can't find route
[0] 
func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_backtrace_print+0x1f)
 [0x2ae2ad17d0df]




Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985

Re: [OMPI users] MPI_Allreduce hangs

2012-04-24 Thread Brock Palen

To throw in my $0.02, though it is worth less.

Were you running this on verb based infiniband? 

We see a problem that we have a work around for even with the newest 1.4.5 only 
on IB, we can reproduce it with IMB.  You can find an old thread from me about 
it.  Your problem might not be the same.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Apr 24, 2012, at 3:09 PM, Jeffrey Squyres wrote:

> Could you repeat your tests with 1.4.5 and/or 1.5.5?
> 
> 
> On Apr 23, 2012, at 1:32 PM, Martin Siegert wrote:
> 
>> Hi,
>> 
>> I am debugging a program that hangs in MPI_Allreduce (openmpi-1.4.3).
>> An strace of one of the processes shows:
>> 
>> Process 10925 attached with 3 threads - interrupt to quit
>> [pid 10927] poll([{fd=17, events=POLLIN}, {fd=16, events=POLLIN}], 2, -1 
>> > shed ...>
>> [pid 10926] select(15, [8 14], [], NULL, NULL 
>> [pid 10925] poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, 
>> events=PO
>> LLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}], 5, 0) = 0 (Timeout)
>> [pid 10925] poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, 
>> events=PO
>> LLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}], 5, 0) = 0 (Timeout)
>> [pid 10925] poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, 
>> events=PO
>> LLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}], 5, 0) = 0 (Timeout)
>> [pid 10925] poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, 
>> events=PO
>> LLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}], 5, 0) = 0 (Timeout)
>> ...
>> 
>> The program is a Fortran program using 64bit integers (compiled with -i8)
>> and I correspondingly compiled openmpi (version 1.4.3) with -i8 for
>> the Fortran compiler as well.
>> 
>> The program is somewhat difficult to debug since it takes 3 days to reach
>> the point where it hangs. This is what I found so far:
>> 
>> MPI_Allreduce is called as
>> 
>> call MPI_Allreduce(MPI_IN_PLACE, recvbuf, count, MPI_DOUBLE_PRECISION, &
>>  MPI_SUM, MPI_COMM_WORLD, mpierr)
>> 
>> with count = 455295488. Since the Fortran interface just calls the
>> C routines in OpenMPI and count variables are 32bit integers in C I started
>> to wonder what is the largest integer "count" for which a MPI_Allreduce
>> succeeds. E.g., in MPICH (it has been a while that I looked into this, i.e.,
>> this may or may not be correct anymore) all send/recv were converted
>> into send/recv of MPI_BYTE, thus the largest count for doubles was
>> (2^31-1)/8 = 268435455. Thus, I started to wrap the MPI_Allreduce call
>> with a myMPI_Allreduce routine that repeatedly calls MPI_Allreduce when
>> the count is larger than some value maxallreduce (the myMPI_Allreduce.f90
>> is attached). I have tested the routine with a trivial program that
>> just fills an array with numbers and calls myMPI_Allreduce and this
>> test succeeds.
>> However, with the real program the situations is very strange:
>> When I set maxallreduce = 268435456, the program hangs at the first call
>> (iallreduce = 1) to MPI_Allreduce in the do loop
>> 
>>do iallreduce = 1, nallreduce - 1
>>   idx = (iallreduce - 1)*length + 1
>>   call MPI_Allreduce(MPI_IN_PLACE, recvbuf(idx), length, &
>>  datatype, op, comm, mpierr)
>>   if (mpierr /= MPI_SUCCESS) return
>>end do
>> 
>> With maxallreduce = 134217728 the first call succeeds, the second hangs. 
>> For maxallreduce = 67108864, the first two calls to MPI_Allreduce complete, 
>> but the third (iallreduce = 3) hangs. For maxallreduce = 8388608 the
>> 17th call hangs, for 1048576 the 138th call hangs; here is a table 
>> (values from gdb attached to process 0 when the program hangs):
>> 
>> maxallreduce iallreduce idxlength
>> 268435456 1   1 227647744
>> 134217728 2   113823873 113823872
>> 67108864 3   130084427  65042213
>> 838860817   137447697   8590481
>> 1048576   138   143392010   1046657
>> 
>> As if there is (are) some element(s) in the middle of the array with 
>> idx >= 143392010 that cannot be sent or recv'd.
>> 
>> Has anybody seen this kind of behaviour?
>> Has anybody an idea what could be causing this?
>> Ideas how to get around this?
>> Anything that could help would be appreciated ... I already spent a
>> huge amount of time on this and I am running out of ideas.
>> 
>>

Re: [OMPI users] MPI_Waitall hangs and querying

2012-03-21 Thread Brock Palen

Will do,

Right now I have asked the user to try rebuilding with the newest openmpi just 
to be safe.

Interesting behavior rank0 the ib counters (using collctl) never gets a packet 
in, only packets out.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Mar 21, 2012, at 11:37 AM, Jeffrey Squyres wrote:

> On Mar 21, 2012, at 11:34 AM, Brock Palen wrote:
> 
>> tcp with this code?
> 
> Does it matter enough for debugging runs?
> 
>> Can we disable the psm mtl and use the verbs emulation on qlogic?  While the 
>> qlogic verbs isn't that great it is still much faster in my tests than tcp.
>> 
>> Is there a particular reason to pick tcp?
> 
> Not really.  My only thought was that verbs over qlogic devices isn't the 
> most stable stack around (they spend all their effort on PSM, not verbs).
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] MPI_Waitall hangs and querying

2012-03-21 Thread Brock Palen

tcp with this code?

Can we disable the psm mtl and use the verbs emulation on qlogic?  While the 
qlogic verbs isn't that great it is still much faster in my tests than tcp.

Is there a particular reason to pick tcp?

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Mar 21, 2012, at 11:22 AM, Jeffrey Squyres wrote:

> We unfortunately don't have much visibility into the PSM device (meaning: 
> Open MPI is a thin shim on top of the underlying libpsm, which handles all 
> the MPI point-to-point semantics itself).  So we can't even ask you to run 
> padb to look at the message queues, because we don't have access to them.  :-\
> 
> Can you try running with TCP and see if that also deadlocks?  If it does, you 
> can at least run padb to have a look at the message queues.
> 
> 
> On Mar 21, 2012, at 11:15 AM, Brock Palen wrote:
> 
>> Forgotten stack as promised, it keeps changing at the lower level 
>> opal_progress, but never moves above that.
>> 
>> [yccho@nyx0817 ~]$ padb -Ormgr=orte --all --stack-trace --tree --all 
>> Stack trace(s) for thread: 1
>> -
>> [0-63] (64 processes)
>> -
>> main() at ?:?
>> Loci::makeQuery(Loci::rule_db const&, Loci::fact_db&, 
>> std::basic_string<char, std::char_traits, std::allocator > 
>> const&)() at ?:?
>>   Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
>> Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
>>   Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
>> Loci::execute_loop::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
>>   Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at 
>> ?:?
>> Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() 
>> at ?:?
>>   Loci::execute_loop::execute(Loci::fact_db&, Loci::sched_db&)() 
>> at ?:?
>> Loci::execute_list::execute(Loci::fact_db&, 
>> Loci::sched_db&)() at ?:?
>>   Loci::execute_rule::execute(Loci::fact_db&, 
>> Loci::sched_db&)() at ?:?
>> streamUns::HypreSolveUnit::compute(Loci::sequence 
>> const&)() at ?:?
>>   hypre_BoomerAMGSetup() at ?:?
>> hypre_BoomerAMGBuildInterp() at ?:?
>>   -
>>   [0,2-3,5-16,18-19,21-24,27-34,36-63] (57 processes)
>>   -
>>   hypre_ParCSRMatrixExtractBExt() at ?:?
>> hypre_ParCSRMatrixExtractBExt_Arrays() at ?:?
>>   hypre_ParCSRCommHandleDestroy() at ?:?
>> PMPI_Waitall() at ?:?
>>   -
>>   [0,2-3,5,7-16,18-19,21-24,27-34,36-63] (56 
>> processes)
>>   -
>>   ompi_request_default_wait_all() at ?:?
>> opal_progress() at ?:?
>>   -
>>   [6] (1 processes)
>>   -
>>   ompi_mtl_psm_progress() at ?:?
>>   -
>>   [1,4,17,20,25-26,35] (7 processes)
>>   -
>>       hypre_ParCSRCommHandleDestroy() at ?:?
>> PMPI_Waitall() at ?:?
>>   ompi_request_default_wait_all() at ?:?
>> opal_progress() at ?:?
>> Stack trace(s) for thread: 2
>> -
>> [0-63] (64 processes)
>> -
>> start_thread() at ?:?
>> ips_ptl_pollintr() at ptl_rcvthread.c:324
>>   poll() at ?:?
>> 
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On Mar 21, 2012, at 11:14 AM, Brock Palen wrote:
>> 
>>> I have a users code that appears to be hanging some times on MPI_Waitall(), 
>>>  stack trace from padb below.  It is on qlogic IB using the psm mtl.
>>> Without knowing what requests go to which rank, how can I check that this 
>>> code didn't just get its self into a dead

Re: [OMPI users] MPI_Waitall hangs and querying

2012-03-21 Thread Brock Palen

Forgotten stack as promised, it keeps changing at the lower level 
opal_progress, but never moves above that.

[yccho@nyx0817 ~]$ padb -Ormgr=orte --all --stack-trace --tree --all 
Stack trace(s) for thread: 1
-
[0-63] (64 processes)
-
main() at ?:?
  Loci::makeQuery(Loci::rule_db const&, Loci::fact_db&, std::basic_string<char, 
std::char_traits, std::allocator > const&)() at ?:?
Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
  Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
  Loci::execute_loop::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at 
?:?
  Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at 
?:?
Loci::execute_loop::execute(Loci::fact_db&, Loci::sched_db&)() 
at ?:?
  Loci::execute_list::execute(Loci::fact_db&, 
Loci::sched_db&)() at ?:?
Loci::execute_rule::execute(Loci::fact_db&, 
Loci::sched_db&)() at ?:?
  streamUns::HypreSolveUnit::compute(Loci::sequence 
const&)() at ?:?
hypre_BoomerAMGSetup() at ?:?
  hypre_BoomerAMGBuildInterp() at ?:?
-
[0,2-3,5-16,18-19,21-24,27-34,36-63] (57 processes)
-
hypre_ParCSRMatrixExtractBExt() at ?:?
  hypre_ParCSRMatrixExtractBExt_Arrays() at ?:?
hypre_ParCSRCommHandleDestroy() at ?:?
  PMPI_Waitall() at ?:?
-
[0,2-3,5,7-16,18-19,21-24,27-34,36-63] (56 
processes)
-
ompi_request_default_wait_all() at ?:?
  opal_progress() at ?:?
-
[6] (1 processes)
-
ompi_mtl_psm_progress() at ?:?
-
[1,4,17,20,25-26,35] (7 processes)
-
hypre_ParCSRCommHandleDestroy() at ?:?
  PMPI_Waitall() at ?:?
ompi_request_default_wait_all() at ?:?
  opal_progress() at ?:?
Stack trace(s) for thread: 2
-
[0-63] (64 processes)
-----
start_thread() at ?:?
  ips_ptl_pollintr() at ptl_rcvthread.c:324
poll() at ?:?


Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Mar 21, 2012, at 11:14 AM, Brock Palen wrote:

> I have a users code that appears to be hanging some times on MPI_Waitall(),  
> stack trace from padb below.  It is on qlogic IB using the psm mtl.
> Without knowing what requests go to which rank, how can I check that this 
> code didn't just get its self into a deadlock?  Is there a way to get a 
> reable list of every ranks posted sends?  And then query an wiating 
> MPI_Waitall() of a running job to get what rends/recvs it is waiting on?
> 
> Thanks!
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> bro...@umich.edu
> (734)936-1985
> 
> 
>

[OMPI users] MPI_Waitall hangs and querying

2012-03-21 Thread Brock Palen

I have a users code that appears to be hanging some times on MPI_Waitall(),  
stack trace from padb below.  It is on qlogic IB using the psm mtl.
Without knowing what requests go to which rank, how can I check that this code 
didn't just get its self into a deadlock?  Is there a way to get a reable list 
of every ranks posted sends?  And then query an wiating MPI_Waitall() of a 
running job to get what rends/recvs it is waiting on?

Thanks!

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985

Re: [OMPI users] ROMIO Podcast

2012-02-20 Thread Brock Palen

This should be fixed, there was a bad upload, the server had a different copy 
than my machine.  The fixed version is in place.  Feel free to grab it again.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Feb 20, 2012, at 4:43 PM, Jeffrey Squyres wrote:

> Yes, something is borked here.  I just listened to what I got in iTunes and 
> it's both longer than 33 mins (i.e., it keeps playing after the timer reaches 
> 0:00), and then it cuts off in the middle of one of Rajeev's answers.  Doh.  
> :-(
> 
> Brock is checking into it…
> 
> 
> On Feb 20, 2012, at 4:37 PM, Rayson Ho wrote:
> 
>> Hi Jeff,
>> 
>> I use wget to download the file - and I use VideoLAN to play the mp3.
>> VideoLAN shows that the file only has 26:34.
>> 
>> I just quickly tried to use Chrome to play the file, and it showed
>> that the file was over 33 mins. *However*, the podcast still ended at
>> 26:34, after the ROMIO guys say "there are many MPI implementations,
>> SGI, and HP, and what..." - so am I the only one who gets a corrupted
>> file??
>> 
>> Rayson
>> 
>> =
>> Open Grid Scheduler / Grid Engine
>> http://gridscheduler.sourceforge.net/
>> 
>> Scalable Grid Engine Support Program
>> http://www.scalablelogic.com/
>> 
>> 
>> 
>> On Mon, Feb 20, 2012 at 3:45 PM, Jeffrey Squyres <jsquy...@cisco.com> wrote:
>>> Little known secret: we edit before these things go to air.  :-)
>>> 
>>> The recordings almost always take about an hour, but we snip some things 
>>> out.  IIRC, we had some tech problems which wasted some time in this 
>>> recording, and some off-recording kibitzing.  :-)
>>> 
>>> Also, it looks like Brock had a problem with the XML so that iTunes/RSS 
>>> readers said the episode was 26:34.  But when you download it, the MP3 is 
>>> actually over 33 mins.  I think Brock just updated the RSS, so we'll see 
>>> when iTunes updates.
>>> 
>>> 
>>> 
>>> On Feb 20, 2012, at 3:25 PM, Rayson Ho wrote:
>>> 
>>>> Brock,
>>>> 
>>>> I listened to the podcast on Saturday, and I just downloaded it again
>>>> 10 mins ago.
>>>> 
>>>> Did the interview really end at 26:34?? And if I recall correctly, you
>>>> & Jeff did not get a chance to ask them the "which source control
>>>> system do you guys use" question :-D
>>>> 
>>>> Rayson
>>>> 
>>>> =====
>>>> Open Grid Scheduler / Grid Engine
>>>> http://gridscheduler.sourceforge.net/
>>>> 
>>>> Scalable Grid Engine Support Program
>>>> http://www.scalablelogic.com/
>>>> 
>>>> 
>>>> On Mon, Feb 20, 2012 at 3:05 PM, Brock Palen <bro...@umich.edu> wrote:
>>>>> For those interested in MPI-IO, and ROMIO Jeff and I did an interview 
>>>>> Rajeev and Rob:
>>>>> 
>>>>> http://www.rce-cast.com/Podcast/rce-66-romio-mpi-io.html
>>>>> 
>>>>> Brock Palen
>>>>> www.umich.edu/~brockp
>>>>> CAEN Advanced Computing
>>>>> bro...@umich.edu
>>>>> (734)936-1985
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> ___
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> ___
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> 
>> -- 
>> ==
>> Open Grid Scheduler - The Official Open Source Grid Engine
>> http://gridscheduler.sourceforge.net/
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] ROMIO Podcast

2012-02-20 Thread Brock Palen

For those interested in MPI-IO, and ROMIO Jeff and I did an interview Rajeev 
and Rob:

http://www.rce-cast.com/Podcast/rce-66-romio-mpi-io.html

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985

[OMPI users] numactl with torque cpusets

2011-11-09 Thread Brock Palen

Question,
If we are using torque with TM with cpusets enabled for pinning should we not 
enable numactl?  Would they conflict with each other?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-07-27 Thread Brock Palen

Sorry to bring this back up.
We recently had an outage updated the firmware on our GD4700 and installed a 
new mellonox provided OFED stack and the problem has returned.
Specifically I am able to produce the problem with IMB 4 12 core nodes when it 
tries to go to 16 cores.  I have verified that enabling different openib_flags 
of 313 fix the issue abit lower bw for some message sizes. 

Has there been any progress on this issue?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On May 18, 2011, at 10:25 AM, Brock Palen wrote:

> Well I have a new wrench into this situation.
> We have a power failure at our datacenter took down our entire system 
> nodes,switch,sm.  
> Now I am unable to produce the error with oob default ibflags etc.
> 
> Does this shed any light on the issue?  It also makes it hard to now debug 
> the issue without being able to reproduce it.
> 
> Any thoughts?  Am I overlooking something? 
> 
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> On May 17, 2011, at 2:18 PM, Brock Palen wrote:
> 
>> Sorry typo 314 not 313, 
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> Center for Advanced Computing
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On May 17, 2011, at 2:02 PM, Brock Palen wrote:
>> 
>>> Thanks, I though of looking at ompi_info after I sent that note sigh.
>>> 
>>> SEND_INPLACE appears to help performance of larger messages in my synthetic 
>>> benchmarks over regular SEND.  Also it appears that SEND_INPLACE still 
>>> allows our code to run.
>>> 
>>> We working on getting devs access to our system and code. 
>>> 
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> Center for Advanced Computing
>>> bro...@umich.edu
>>> (734)936-1985
>>> 
>>> 
>>> 
>>> On May 16, 2011, at 11:49 AM, George Bosilca wrote:
>>> 
>>>> Here is the output of the "ompi_info --param btl openib":
>>>> 
>>>>  MCA btl: parameter "btl_openib_flags" (current value: <306>, 
>>>> data
>>>>   source: default value)
>>>>   BTL bit flags (general flags: SEND=1, PUT=2, GET=4,
>>>>   SEND_INPLACE=8, RDMA_MATCHED=64, 
>>>> HETEROGENEOUS_RDMA=256; flags
>>>>   only used by the "dr" PML (ignored by others): 
>>>> ACK=16,
>>>>   CHECKSUM=32, RDMA_COMPLETION=128; flags only used by 
>>>> the "bfo"
>>>>   PML (ignored by others): FAILOVER_SUPPORT=512)
>>>> 
>>>> So the 305 flags means: HETEROGENEOUS_RDMA | CHECKSUM | ACK | SEND. Most 
>>>> of these flags are totally useless in the current version of Open MPI (DR 
>>>> is not supported), so the only value that really matter is SEND | 
>>>> HETEROGENEOUS_RDMA.
>>>> 
>>>> If you want to enable the send protocol try first with SEND | SEND_INPLACE 
>>>> (9), if not downgrade to SEND (1)
>>>> 
>>>> george.
>>>> 
>>>> On May 16, 2011, at 11:33 , Samuel K. Gutierrez wrote:
>>>> 
>>>>> 
>>>>> On May 16, 2011, at 8:53 AM, Brock Palen wrote:
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On May 16, 2011, at 10:23 AM, Samuel K. Gutierrez wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Just out of curiosity - what happens when you add the following MCA 
>>>>>>> option to your openib runs?
>>>>>>> 
>>>>>>> -mca btl_openib_flags 305
>>>>>> 
>>>>>> You Sir found the magic combination.
>>>>> 
>>>>> :-)  - cool.
>>>>> 
>>>>> Developers - does this smell like a registered memory availability hang?
>>>>> 
>>>>>> I verified this lets IMB and CRASH progress pass their lockup points,
>>>>>> I will have a user test this, 
>>>>> 
>>>>> Please let us know what you find.
>>>>> 
>>>>>> Is this an ok option to put in our environment?  What does 305 mean?
>>>>> 
>>>>> There may be a performance hit associated with this configuration,

[OMPI users] openmpi-1.4.3 and pgi-11.6 segfault

2011-06-21 Thread Brock Palen

Has anyone else had issues building 1.4.3 with pgi 11.6?  When I do I get a 
segfault,

./configure --prefix=/home/software/rhel5/openmpi-1.4.3/pgi-11.6 
--mandir=/home/software/rhel5/openmpi-1.4.3/pgi-11.6/man 
--with-tm=/usr/local/torque/ --with-openib --with-psm CC=pgcc CXX=pgCC FC=pgf90 
F77=pgf90


make[7]: Entering directory 
`/tmp/openmpi-1.4.3/ompi/contrib/vt/vt/tools/opari/tool'
source='handler.cc' object='handler.o' libtool=no \
DEPDIR=.deps depmode=none /bin/sh ../../../depcomp \
pgCC -DHAVE_CONFIG_H -I. -I../../..   -D_REENTRANT  -O -DNDEBUG   -c -o 
handler.o handler.cc
source='ompragma.cc' object='ompragma.o' libtool=no \
DEPDIR=.deps depmode=none /bin/sh ../../../depcomp \
pgCC -DHAVE_CONFIG_H -I. -I../../..   -D_REENTRANT  -O -DNDEBUG   -c -o 
ompragma.o ompragma.cc
source='ompragma_c.cc' object='ompragma_c.o' libtool=no \
DEPDIR=.deps depmode=none /bin/sh ../../../depcomp \
pgCC -DHAVE_CONFIG_H -I. -I../../..   -D_REENTRANT  -O -DNDEBUG   -c -o 
ompragma_c.o ompragma_c.cc
source='ompragma_f.cc' object='ompragma_f.o' libtool=no \
DEPDIR=.deps depmode=none /bin/sh ../../../depcomp \
pgCC -DHAVE_CONFIG_H -I. -I../../..   -D_REENTRANT  -O -DNDEBUG   -c -o 
ompragma_f.o ompragma_f.cc
source='ompregion.cc' object='ompregion.o' libtool=no \
DEPDIR=.deps depmode=none /bin/sh ../../../depcomp \
pgCC -DHAVE_CONFIG_H -I. -I../../..   -D_REENTRANT  -O -DNDEBUG   -c -o 
ompregion.o ompregion.cc
source='opari.cc' object='opari.o' libtool=no \
DEPDIR=.deps depmode=none /bin/sh ../../../depcomp \
pgCC -DHAVE_CONFIG_H -I. -I../../..   -D_REENTRANT  -O -DNDEBUG   -c -o 
opari.o opari.cc
source='process_c.cc' object='process_c.o' libtool=no \
DEPDIR=.deps depmode=none /bin/sh ../../../depcomp \
pgCC -DHAVE_CONFIG_H -I. -I../../..   -D_REENTRANT  -O -DNDEBUG   -c -o 
process_c.o process_c.cc
source='process_f.cc' object='process_f.o' libtool=no \
DEPDIR=.deps depmode=none /bin/sh ../../../depcomp \
pgCC -DHAVE_CONFIG_H -I. -I../../..   -D_REENTRANT  -O -DNDEBUG   -c -o 
process_f.o process_f.cc
pgCC-Fatal-/afs/engin.umich.edu/caen/rhel_5/pgi-11.6/linux86-64/11.6/bin/pgcpp1 
TERMINATED by signal 11
Arguments to 
/afs/engin.umich.edu/caen/rhel_5/pgi-11.6/linux86-64/11.6/bin/pgcpp1
/afs/engin.umich.edu/caen/rhel_5/pgi-11.6/linux86-64/11.6/bin/pgcpp1 --llalign 
-Dunix -D__unix -D__unix__ -Dlinux -D__linux -D__linux__ -D__NO_MATH_INLINES 
-D__x86_64__ -D__LONG_MAX__=9223372036854775807L '-D__SIZE_TYPE__=unsigned long 
int' '-D__PTRDIFF_TYPE__=long int' -D__THROW= -D__extension__= -D__amd64__ 
-D__SSE__ -D__MMX__ -D__SSE2__ -D__SSE3__ -D__SSSE3__ -D__PGI -I. -I../../.. 
-DHAVE_CONFIG_H -D_REENTRANT -DNDEBUG 
-I/afs/engin.umich.edu/caen/rhel_5/pgi-11.6/linux86-64/11.6/include/CC 
-I/afs/engin.umich.edu/caen/rhel_5/pgi-11.6/linux86-64/11.6/include 
-I/usr/local/include -I/usr/lib/gcc/x86_64-redhat-linux/4.1.2/include 
-I/usr/lib/gcc/x86_64-redhat-linux/4.1.2/include -I/usr/include --zc_eh 
--gnu_version=40102 -D__pgnu_vsn=40102 -q -o /tmp/pgCCD_kbx1IQBbml.il 
process_f.cc
make[7]: *** [process_f.o] Error 127
make[7]: Leaving directory 
`/tmp/openmpi-1.4.3/ompi/contrib/vt/vt/tools/opari/tool'
make[6]: *** [all-recursive] Error 1
make[6]: Leaving directory `/tmp/openmpi-1.4.3/ompi/contrib/vt/vt/tools/opari'
make[5]: *** [all-recursive] Error 1
make[5]: Leaving directory `/tmp/openmpi-1.4.3/ompi/contrib/vt/vt/tools'
make[4]: *** [all-recursive] Error 1
make[4]: Leaving directory `/tmp/openmpi-1.4.3/ompi/contrib/vt/vt'
make[3]: *** [all] Error 2
make[3]: Leaving directory `/tmp/openmpi-1.4.3/ompi/contrib/vt/vt'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/tmp/openmpi-1.4.3/ompi/contrib/vt'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/tmp/openmpi-1.4.3/ompi'
make: *** [all-recursive] Error 1


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-18 Thread Brock Palen

Well I have a new wrench into this situation.
We have a power failure at our datacenter took down our entire system 
nodes,switch,sm.  
Now I am unable to produce the error with oob default ibflags etc.

Does this shed any light on the issue?  It also makes it hard to now debug the 
issue without being able to reproduce it.

Any thoughts?  Am I overlooking something? 

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On May 17, 2011, at 2:18 PM, Brock Palen wrote:

> Sorry typo 314 not 313, 
> 
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> On May 17, 2011, at 2:02 PM, Brock Palen wrote:
> 
>> Thanks, I though of looking at ompi_info after I sent that note sigh.
>> 
>> SEND_INPLACE appears to help performance of larger messages in my synthetic 
>> benchmarks over regular SEND.  Also it appears that SEND_INPLACE still 
>> allows our code to run.
>> 
>> We working on getting devs access to our system and code. 
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> Center for Advanced Computing
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On May 16, 2011, at 11:49 AM, George Bosilca wrote:
>> 
>>> Here is the output of the "ompi_info --param btl openib":
>>> 
>>>   MCA btl: parameter "btl_openib_flags" (current value: <306>, 
>>> data
>>>source: default value)
>>>BTL bit flags (general flags: SEND=1, PUT=2, GET=4,
>>>SEND_INPLACE=8, RDMA_MATCHED=64, 
>>> HETEROGENEOUS_RDMA=256; flags
>>>only used by the "dr" PML (ignored by others): 
>>> ACK=16,
>>>CHECKSUM=32, RDMA_COMPLETION=128; flags only used by 
>>> the "bfo"
>>>PML (ignored by others): FAILOVER_SUPPORT=512)
>>> 
>>> So the 305 flags means: HETEROGENEOUS_RDMA | CHECKSUM | ACK | SEND. Most of 
>>> these flags are totally useless in the current version of Open MPI (DR is 
>>> not supported), so the only value that really matter is SEND | 
>>> HETEROGENEOUS_RDMA.
>>> 
>>> If you want to enable the send protocol try first with SEND | SEND_INPLACE 
>>> (9), if not downgrade to SEND (1)
>>> 
>>> george.
>>> 
>>> On May 16, 2011, at 11:33 , Samuel K. Gutierrez wrote:
>>> 
>>>> 
>>>> On May 16, 2011, at 8:53 AM, Brock Palen wrote:
>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On May 16, 2011, at 10:23 AM, Samuel K. Gutierrez wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Just out of curiosity - what happens when you add the following MCA 
>>>>>> option to your openib runs?
>>>>>> 
>>>>>> -mca btl_openib_flags 305
>>>>> 
>>>>> You Sir found the magic combination.
>>>> 
>>>> :-)  - cool.
>>>> 
>>>> Developers - does this smell like a registered memory availability hang?
>>>> 
>>>>> I verified this lets IMB and CRASH progress pass their lockup points,
>>>>> I will have a user test this, 
>>>> 
>>>> Please let us know what you find.
>>>> 
>>>>> Is this an ok option to put in our environment?  What does 305 mean?
>>>> 
>>>> There may be a performance hit associated with this configuration, but if 
>>>> it lets your users run, then I don't see a problem with adding it to your 
>>>> environment.
>>>> 
>>>> If I'm reading things correctly, 305 turns off RDMA PUT/GET and turns on 
>>>> SEND.
>>>> 
>>>> OpenFabrics gurus - please correct me if I'm wrong :-).
>>>> 
>>>> Samuel Gutierrez
>>>> Los Alamos National Laboratory
>>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> Brock Palen
>>>>> www.umich.edu/~brockp
>>>>> Center for Advanced Computing
>>>>> bro...@umich.edu
>>>>> (734)936-1985
>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Samuel Gutierrez
>>>>>> Los Alamos National Laboratory
>>>>>> 
>>>>>> On May 13, 2011, at 2:3

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-17 Thread Brock Palen

Sorry typo 314 not 313, 

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On May 17, 2011, at 2:02 PM, Brock Palen wrote:

> Thanks, I though of looking at ompi_info after I sent that note sigh.
> 
> SEND_INPLACE appears to help performance of larger messages in my synthetic 
> benchmarks over regular SEND.  Also it appears that SEND_INPLACE still allows 
> our code to run.
> 
> We working on getting devs access to our system and code. 
> 
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> On May 16, 2011, at 11:49 AM, George Bosilca wrote:
> 
>> Here is the output of the "ompi_info --param btl openib":
>> 
>>MCA btl: parameter "btl_openib_flags" (current value: <306>, 
>> data
>> source: default value)
>> BTL bit flags (general flags: SEND=1, PUT=2, GET=4,
>> SEND_INPLACE=8, RDMA_MATCHED=64, 
>> HETEROGENEOUS_RDMA=256; flags
>> only used by the "dr" PML (ignored by others): 
>> ACK=16,
>> CHECKSUM=32, RDMA_COMPLETION=128; flags only used by 
>> the "bfo"
>> PML (ignored by others): FAILOVER_SUPPORT=512)
>> 
>> So the 305 flags means: HETEROGENEOUS_RDMA | CHECKSUM | ACK | SEND. Most of 
>> these flags are totally useless in the current version of Open MPI (DR is 
>> not supported), so the only value that really matter is SEND | 
>> HETEROGENEOUS_RDMA.
>> 
>> If you want to enable the send protocol try first with SEND | SEND_INPLACE 
>> (9), if not downgrade to SEND (1)
>> 
>> george.
>> 
>> On May 16, 2011, at 11:33 , Samuel K. Gutierrez wrote:
>> 
>>> 
>>> On May 16, 2011, at 8:53 AM, Brock Palen wrote:
>>> 
>>>> 
>>>> 
>>>> 
>>>> On May 16, 2011, at 10:23 AM, Samuel K. Gutierrez wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Just out of curiosity - what happens when you add the following MCA 
>>>>> option to your openib runs?
>>>>> 
>>>>> -mca btl_openib_flags 305
>>>> 
>>>> You Sir found the magic combination.
>>> 
>>> :-)  - cool.
>>> 
>>> Developers - does this smell like a registered memory availability hang?
>>> 
>>>> I verified this lets IMB and CRASH progress pass their lockup points,
>>>> I will have a user test this, 
>>> 
>>> Please let us know what you find.
>>> 
>>>> Is this an ok option to put in our environment?  What does 305 mean?
>>> 
>>> There may be a performance hit associated with this configuration, but if 
>>> it lets your users run, then I don't see a problem with adding it to your 
>>> environment.
>>> 
>>> If I'm reading things correctly, 305 turns off RDMA PUT/GET and turns on 
>>> SEND.
>>> 
>>> OpenFabrics gurus - please correct me if I'm wrong :-).
>>> 
>>> Samuel Gutierrez
>>> Los Alamos National Laboratory
>>> 
>>> 
>>>> 
>>>> 
>>>> Brock Palen
>>>> www.umich.edu/~brockp
>>>> Center for Advanced Computing
>>>> bro...@umich.edu
>>>> (734)936-1985
>>>> 
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Samuel Gutierrez
>>>>> Los Alamos National Laboratory
>>>>> 
>>>>> On May 13, 2011, at 2:38 PM, Brock Palen wrote:
>>>>> 
>>>>>> On May 13, 2011, at 4:09 PM, Dave Love wrote:
>>>>>> 
>>>>>>> Jeff Squyres <jsquy...@cisco.com> writes:
>>>>>>> 
>>>>>>>> On May 11, 2011, at 3:21 PM, Dave Love wrote:
>>>>>>>> 
>>>>>>>>> We can reproduce it with IMB.  We could provide access, but we'd have 
>>>>>>>>> to
>>>>>>>>> negotiate with the owners of the relevant nodes to give you 
>>>>>>>>> interactive
>>>>>>>>> access to them.  Maybe Brock's would be more accessible?  (If you
>>>>>>>>> contact me, I may not be able to respond for a few days.)
>>>>>>>&

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-17 Thread Brock Palen

Thanks, I though of looking at ompi_info after I sent that note sigh.

SEND_INPLACE appears to help performance of larger messages in my synthetic 
benchmarks over regular SEND.  Also it appears that SEND_INPLACE still allows 
our code to run.

We working on getting devs access to our system and code. 

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On May 16, 2011, at 11:49 AM, George Bosilca wrote:

> Here is the output of the "ompi_info --param btl openib":
> 
> MCA btl: parameter "btl_openib_flags" (current value: <306>, 
> data
>  source: default value)
>  BTL bit flags (general flags: SEND=1, PUT=2, GET=4,
>  SEND_INPLACE=8, RDMA_MATCHED=64, 
> HETEROGENEOUS_RDMA=256; flags
>  only used by the "dr" PML (ignored by others): 
> ACK=16,
>  CHECKSUM=32, RDMA_COMPLETION=128; flags only used by 
> the "bfo"
>  PML (ignored by others): FAILOVER_SUPPORT=512)
> 
> So the 305 flags means: HETEROGENEOUS_RDMA | CHECKSUM | ACK | SEND. Most of 
> these flags are totally useless in the current version of Open MPI (DR is not 
> supported), so the only value that really matter is SEND | HETEROGENEOUS_RDMA.
> 
> If you want to enable the send protocol try first with SEND | SEND_INPLACE 
> (9), if not downgrade to SEND (1)
> 
>  george.
> 
> On May 16, 2011, at 11:33 , Samuel K. Gutierrez wrote:
> 
>> 
>> On May 16, 2011, at 8:53 AM, Brock Palen wrote:
>> 
>>> 
>>> 
>>> 
>>> On May 16, 2011, at 10:23 AM, Samuel K. Gutierrez wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Just out of curiosity - what happens when you add the following MCA option 
>>>> to your openib runs?
>>>> 
>>>> -mca btl_openib_flags 305
>>> 
>>> You Sir found the magic combination.
>> 
>> :-)  - cool.
>> 
>> Developers - does this smell like a registered memory availability hang?
>> 
>>> I verified this lets IMB and CRASH progress pass their lockup points,
>>> I will have a user test this, 
>> 
>> Please let us know what you find.
>> 
>>> Is this an ok option to put in our environment?  What does 305 mean?
>> 
>> There may be a performance hit associated with this configuration, but if it 
>> lets your users run, then I don't see a problem with adding it to your 
>> environment.
>> 
>> If I'm reading things correctly, 305 turns off RDMA PUT/GET and turns on 
>> SEND.
>> 
>> OpenFabrics gurus - please correct me if I'm wrong :-).
>> 
>> Samuel Gutierrez
>> Los Alamos National Laboratory
>> 
>> 
>>> 
>>> 
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> Center for Advanced Computing
>>> bro...@umich.edu
>>> (734)936-1985
>>> 
>>>> 
>>>> Thanks,
>>>> 
>>>> Samuel Gutierrez
>>>> Los Alamos National Laboratory
>>>> 
>>>> On May 13, 2011, at 2:38 PM, Brock Palen wrote:
>>>> 
>>>>> On May 13, 2011, at 4:09 PM, Dave Love wrote:
>>>>> 
>>>>>> Jeff Squyres <jsquy...@cisco.com> writes:
>>>>>> 
>>>>>>> On May 11, 2011, at 3:21 PM, Dave Love wrote:
>>>>>>> 
>>>>>>>> We can reproduce it with IMB.  We could provide access, but we'd have 
>>>>>>>> to
>>>>>>>> negotiate with the owners of the relevant nodes to give you interactive
>>>>>>>> access to them.  Maybe Brock's would be more accessible?  (If you
>>>>>>>> contact me, I may not be able to respond for a few days.)
>>>>>>> 
>>>>>>> Brock has replied off-list that he, too, is able to reliably reproduce 
>>>>>>> the issue with IMB, and is working to get access for us.  Many thanks 
>>>>>>> for your offer; let's see where Brock's access takes us.
>>>>>> 
>>>>>> Good.  Let me know if we could be useful
>>>>>> 
>>>>>>>>> -- we have not closed this issue,
>>>>>>>> 
>>>>>>>> Which issue?   I couldn't find a relevant-looking one.
>>>>>>> 
>>>>>>> https://svn.open-mpi.org/trac/omp

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-16 Thread Brock Palen




On May 16, 2011, at 10:23 AM, Samuel K. Gutierrez wrote:

> Hi,
> 
> Just out of curiosity - what happens when you add the following MCA option to 
> your openib runs?
> 
> -mca btl_openib_flags 305

You Sir found the magic combination.
I verified this lets IMB and CRASH progress pass their lockup points,
I will have a user test this, 
Is this an ok option to put in our environment?  What does 305 mean?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985

> 
> Thanks,
> 
> Samuel Gutierrez
> Los Alamos National Laboratory
> 
> On May 13, 2011, at 2:38 PM, Brock Palen wrote:
> 
>> On May 13, 2011, at 4:09 PM, Dave Love wrote:
>> 
>>> Jeff Squyres <jsquy...@cisco.com> writes:
>>> 
>>>> On May 11, 2011, at 3:21 PM, Dave Love wrote:
>>>> 
>>>>> We can reproduce it with IMB.  We could provide access, but we'd have to
>>>>> negotiate with the owners of the relevant nodes to give you interactive
>>>>> access to them.  Maybe Brock's would be more accessible?  (If you
>>>>> contact me, I may not be able to respond for a few days.)
>>>> 
>>>> Brock has replied off-list that he, too, is able to reliably reproduce the 
>>>> issue with IMB, and is working to get access for us.  Many thanks for your 
>>>> offer; let's see where Brock's access takes us.
>>> 
>>> Good.  Let me know if we could be useful
>>> 
>>>>>> -- we have not closed this issue,
>>>>> 
>>>>> Which issue?   I couldn't find a relevant-looking one.
>>>> 
>>>> https://svn.open-mpi.org/trac/ompi/ticket/2714
>>> 
>>> Thanks.  In csse it's useful info, it hangs for me with 1.5.3 & np=32 on
>>> connectx with more than one collective I can't recall.
>> 
>> Extra data point, that ticket said it ran with mpi_preconnect_mpi 1,  well 
>> that doesn't help here, both my production code (crash) and IMB still hang.
>> 
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> Center for Advanced Computing
>> bro...@umich.edu
>> (734)936-1985
>> 
>>> 
>>> -- 
>>> Excuse the typping -- I have a broken wrist
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
>

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-13 Thread Brock Palen

On May 13, 2011, at 4:09 PM, Dave Love wrote:

> Jeff Squyres <jsquy...@cisco.com> writes:
> 
>> On May 11, 2011, at 3:21 PM, Dave Love wrote:
>> 
>>> We can reproduce it with IMB.  We could provide access, but we'd have to
>>> negotiate with the owners of the relevant nodes to give you interactive
>>> access to them.  Maybe Brock's would be more accessible?  (If you
>>> contact me, I may not be able to respond for a few days.)
>> 
>> Brock has replied off-list that he, too, is able to reliably reproduce the 
>> issue with IMB, and is working to get access for us.  Many thanks for your 
>> offer; let's see where Brock's access takes us.
> 
> Good.  Let me know if we could be useful
> 
>>>> -- we have not closed this issue,
>>> 
>>> Which issue?   I couldn't find a relevant-looking one.
>> 
>> https://svn.open-mpi.org/trac/ompi/ticket/2714
> 
> Thanks.  In csse it's useful info, it hangs for me with 1.5.3 & np=32 on
> connectx with more than one collective I can't recall.

Extra data point, that ticket said it ran with mpi_preconnect_mpi 1,  well that 
doesn't help here, both my production code (crash) and IMB still hang.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985

> 
> -- 
> Excuse the typping -- I have a broken wrist
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
>

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-12 Thread Brock Palen

I am pretty sure MTL's and BTL's are very different, but just as a note,
This users code (Crash) hangs at MPI_Allreduce() in 

Openib

But runs on:
tcp
psm (an mtl, different hardware)

Putting it out there if it does have any bearing.  Otherwise ignore. 

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On May 12, 2011, at 10:20 AM, Brock Palen wrote:

> On May 12, 2011, at 10:13 AM, Jeff Squyres wrote:
> 
>> On May 11, 2011, at 3:21 PM, Dave Love wrote:
>> 
>>> We can reproduce it with IMB.  We could provide access, but we'd have to
>>> negotiate with the owners of the relevant nodes to give you interactive
>>> access to them.  Maybe Brock's would be more accessible?  (If you
>>> contact me, I may not be able to respond for a few days.)
>> 
>> Brock has replied off-list that he, too, is able to reliably reproduce the 
>> issue with IMB, and is working to get access for us.  Many thanks for your 
>> offer; let's see where Brock's access takes us.
> 
> I should also note that as far as I know I have three codes (CRASH, Namd 
> (some cases), and another user code.  That lockup on a collective on OpenIB 
> but run with the same library on Gig-e.
> 
> So I am not sure it is limited to IMB, or I could be crossing errors, 
> normally I would assume unmatched eager recvs for this sort of problem. 
> 
>> 
>>>> -- we have not closed this issue,
>>> 
>>> Which issue?   I couldn't find a relevant-looking one.
>> 
>> https://svn.open-mpi.org/trac/ompi/ticket/2714
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
>

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-12 Thread Brock Palen

On May 12, 2011, at 10:13 AM, Jeff Squyres wrote:

> On May 11, 2011, at 3:21 PM, Dave Love wrote:
> 
>> We can reproduce it with IMB.  We could provide access, but we'd have to
>> negotiate with the owners of the relevant nodes to give you interactive
>> access to them.  Maybe Brock's would be more accessible?  (If you
>> contact me, I may not be able to respond for a few days.)
> 
> Brock has replied off-list that he, too, is able to reliably reproduce the 
> issue with IMB, and is working to get access for us.  Many thanks for your 
> offer; let's see where Brock's access takes us.

I should also note that as far as I know I have three codes (CRASH, Namd (some 
cases), and another user code.  That lockup on a collective on OpenIB but run 
with the same library on Gig-e.

So I am not sure it is limited to IMB, or I could be crossing errors, normally 
I would assume unmatched eager recvs for this sort of problem. 

> 
>>> -- we have not closed this issue,
>> 
>> Which issue?   I couldn't find a relevant-looking one.
> 
> https://svn.open-mpi.org/trac/ompi/ticket/2714
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
>

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-11 Thread Brock Palen

On May 9, 2011, at 9:31 AM, Jeff Squyres wrote:

> On May 3, 2011, at 6:42 AM, Dave Love wrote:
> 
>>> We managed to have another user hit the bug that causes collectives (this 
>>> time MPI_Bcast() ) to hang on IB that was fixed by setting:
>>> 
>>> btl_openib_cpc_include rdmacm
>> 
>> Could someone explain this?  We also have problems with collective hangs
>> with openib/mlx4 (specifically in IMB), but not with psm, and I couldn't
>> see any relevant issues filed.  However, rdmacm isn't an available value
>> for that parameter with our 1.4.3 or 1.5.3 installations, only oob (not
>> that I understand what these things are...).
> 
> Sorry for the delay -- perhaps an IB vendor can reply here with more detail...
> 
> We had a user-reported issue of some hangs that the IB vendors have been 
> unable to replicate in their respective labs.  We *suspect* that it may be an 
> issue with the oob openib CPC, but that code is pretty old and pretty mature, 
> so all of us would be at least somewhat surprised if that were the case.  If 
> anyone can reliably reproduce this error, please let us know and/or give us 
> access to your machines -- we have not closed this issue, but are unable to 
> move forward because the customers who reported this issue switched to rdmacm 
> and moved on (i.e., we don't have access to their machines to test any more).

An update, we set all our ib0 interfaces to have IP's on a 172. network. This 
allowed the use of rdmacm to work and get latencies that we would expect.  That 
said we are still getting hangs.  I can very reliably reproduce it using IMB 
with a specific core count on a specific test case. 

Just an update.  Has anyone else had luck fixing the lockup issues on openib 
BTL for collectives in some cases? Thanks!


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985

> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
>

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-05 Thread Brock Palen

Yeah we have ran into more issues, with rdmacm not being avialable on all of 
our hosts.  So it would be nice to know what we can do to test that a host 
would support rdmacm,

Example:

--
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:   nyx5067.engin.umich.edu
  Local device: mlx4_0
  Local port:   1
  CPCs attempted:   rdmacm
--

This is one of our QDR hosts that rdmacm generally works on. Which this code 
(CRASH) requires to avoid a collective hang in MPI_Allreduce() 

I look on this hosts and I find:
[root@nyx5067 ~]# rpm -qa | grep rdma
librdmacm-1.0.11-1
librdmacm-1.0.11-1
librdmacm-devel-1.0.11-1
librdmacm-devel-1.0.11-1
librdmacm-utils-1.0.11-1

So all the libraries are installed (I think) is there a way to verify this?  Or 
to have OpenMPI be more verbose what caused rdmacm to fail as an oob option?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On May 3, 2011, at 9:42 AM, Dave Love wrote:

> Brock Palen <bro...@umich.edu> writes:
> 
>> We managed to have another user hit the bug that causes collectives (this 
>> time MPI_Bcast() ) to hang on IB that was fixed by setting:
>> 
>> btl_openib_cpc_include rdmacm
> 
> Could someone explain this?  We also have problems with collective hangs
> with openib/mlx4 (specifically in IMB), but not with psm, and I couldn't
> see any relevant issues filed.  However, rdmacm isn't an available value
> for that parameter with our 1.4.3 or 1.5.3 installations, only oob (not
> that I understand what these things are...).
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
>

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-28 Thread Brock Palen

: init of component sm returned success
[nyx0665.engin.umich.edu:06399] select: initializing btl component tcp
[nyx0665.engin.umich.edu:06399] select: init of component tcp returned success
[nyx0666.engin.umich.edu:07210] openib BTL: rdmacm IP address not found on port
[nyx0666.engin.umich.edu:07210] openib BTL: rdmacm CPC unavailable for use on 
mthca0:1; skipped
[nyx0666.engin.umich.edu:07210] select: init of component openib returned 
failure
[nyx0666.engin.umich.edu:07210] select: module openib unloaded
[nyx0666.engin.umich.edu:07210] select: initializing btl component self
[nyx0666.engin.umich.edu:07210] select: init of component self returned success
[nyx0666.engin.umich.edu:07210] select: initializing btl component sm
[nyx0666.engin.umich.edu:07210] select: init of component sm returned success
[nyx0666.engin.umich.edu:07210] select: initializing btl component tcp
[nyx0666.engin.umich.edu:07210] select: init of component tcp returned success
0: nyx0665
1: nyx0666
[nyx0666.engin.umich.edu:07210] btl: tcp: attempting to connect() to address 
10.164.2.153 on port 516
[nyx0665.engin.umich.edu:06399] btl: tcp: attempting to connect() to address 
10.164.2.154 on port 4
Now starting the main loop
  0:   1 bytes   1948 times -->  0.14 Mbps in  53.29 usec
  1:   2 bytes   1876 times -->  0.29 Mbps in  52.74 usec
  2:   3 bytes   1896 times -->  0.43 Mbps in  53.04 usec
  3:   4 bytes   1256 times -->  0.57 Mbps in  53.55 usec
  4:   6 bytes   1400 times -->  0.85 Mbps in  54.03 usec
  5:   8 bytes925 times --> mpirun: killing job...

--
mpirun noticed that process rank 0 with PID 6399 on node 
nyx0665.engin.umich.edu exited on signal 0 (Unknown signal 0).
--
mpirun: clean termination accomplished

[nyx0665.engin.umich.edu:06398] 1 more process has sent help message 
help-mpi-btl-openib-cpc-base.txt / no cpcs for port
[nyx0665.engin.umich.edu:06398] Set MCA parameter "orte_base_help_aggregate" to 
0 to see all help / error messages
2 total processes killed (some possibly by mpirun during cleanup)

We were being bit by a number of codes hanging in collectives, and was resolved 
by using rdmacm.  We tried setting this as default till the two bugs in 
bugzilla are resolved as a work around. Then we hit this problem on our DDR/SDR 
gear.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985

On Apr 28, 2011, at 8:07 AM, Jeff Squyres wrote:

> On Apr 27, 2011, at 10:02 AM, Brock Palen wrote:
> 
>> Argh, our messed up environment with three generations on infiniband bit us,
>> Setting openib_cpc_include to rdmacm causes ib to not be used on our old DDR 
>> ib on some of our hosts.  Note that jobs will never run across our old DDR 
>> ib and our new QDR stuff where rdmacm does work.
> 
> Hmm -- odd.  I use RDMACM on some old DDR (and SDR!) IB hardware and it seems 
> to work fine.
> 
> Do you have any indication as to why OMPI is refusing to use rdmacm on your 
> older hardware, other than "No OF connection schemes reported..."?  Try 
> running with --mca btl_base_verbose 100 (beware: it will be a truckload of 
> output).  Make sure that you have rdmacm support available on those machines, 
> both in OMPI and in OFED/the OS.
> 
>> I am doing some testing with:
>> export OMPI_MCA_btl_openib_cpc_include=rdmacm,oob,xoob
>> 
>> What I want to know is there a way to tell mpirun to 'dump all resolved mca 
>> settings'  Or something similar. 
> 
> I'm not quite sure what you're asking here -- do you want to override MCA 
> params on specific hosts?
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
>

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-27 Thread Brock Palen

Argh, our messed up environment with three generations on infiniband bit us,
Setting openib_cpc_include to rdmacm causes ib to not be used on our old DDR ib 
on some of our hosts.  Note that jobs will never run across our old DDR ib and 
our new QDR stuff where rdmacm does work.

I am doing some testing with:
export OMPI_MCA_btl_openib_cpc_include=rdmacm,oob,xoob

What I want to know is there a way to tell mpirun to 'dump all resolved mca 
settings'  Or something similar. 

The error we get which I think is expected is we set only rdmacm is:
--
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:   nyx0665.engin.umich.edu
  Local device: mthca0
  Local port:   1
  CPCs attempted:   rdmacm
--

Again I think this is expected on this older hardware. 

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Apr 22, 2011, at 10:23 AM, Brock Palen wrote:

> On Apr 21, 2011, at 6:49 PM, Ralph Castain wrote:
> 
>> 
>> On Apr 21, 2011, at 4:41 PM, Brock Palen wrote:
>> 
>>> Given that part of our cluster is TCP only, openib wouldn't even startup on 
>>> those hosts
>> 
>> That is correct - it would have no impact on those hosts
>> 
>>> and this would be ignored on hosts with IB adaptors?  
>> 
>> Ummm...not sure I understand this one. The param -will- be used on hosts 
>> with IB adaptors because that is what it is controlling.
>> 
>> However, it -won't- have any impact on hosts without IB adaptors, which is 
>> what I suspect you meant to ask?
> 
> Correct typo, Thanks, I am going to add the environment variable to our 
> OpenMPI modules so rdmacm is our default for now,  Thanks!
> 
>> 
>> 
>>> 
>>> Just checking thanks!
>>> 
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> Center for Advanced Computing
>>> bro...@umich.edu
>>> (734)936-1985
>>> 
>>> 
>>> 
>>> On Apr 21, 2011, at 6:21 PM, Jeff Squyres wrote:
>>> 
>>>> Over IB, I'm not sure there is much of a drawback.  It might be slightly 
>>>> slower to establish QP's, but I don't think that matters much.
>>>> 
>>>> Over iWARP, rdmacm can cause connection storms as you scale to thousands 
>>>> of MPI processes.
>>>> 
>>>> 
>>>> On Apr 20, 2011, at 5:03 PM, Brock Palen wrote:
>>>> 
>>>>> We managed to have another user hit the bug that causes collectives (this 
>>>>> time MPI_Bcast() ) to hang on IB that was fixed by setting:
>>>>> 
>>>>> btl_openib_cpc_include rdmacm
>>>>> 
>>>>> My question is if we set this to the default on our system with an 
>>>>> environment variable does it introduce any performance or other issues we 
>>>>> should be aware of?
>>>>> 
>>>>> Is there a reason we should not use rdmacm?
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> Brock Palen
>>>>> www.umich.edu/~brockp
>>>>> Center for Advanced Computing
>>>>> bro...@umich.edu
>>>>> (734)936-1985
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> ___
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> 
>>>> -- 
>>>> Jeff Squyres
>>>> jsquy...@cisco.com
>>>> For corporate legal information go to:
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>> 
>>>> 
>>>> ___
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> 
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
>

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-22 Thread Brock Palen

On Apr 21, 2011, at 6:49 PM, Ralph Castain wrote:

> 
> On Apr 21, 2011, at 4:41 PM, Brock Palen wrote:
> 
>> Given that part of our cluster is TCP only, openib wouldn't even startup on 
>> those hosts
> 
> That is correct - it would have no impact on those hosts
> 
>> and this would be ignored on hosts with IB adaptors?  
> 
> Ummm...not sure I understand this one. The param -will- be used on hosts with 
> IB adaptors because that is what it is controlling.
> 
> However, it -won't- have any impact on hosts without IB adaptors, which is 
> what I suspect you meant to ask?

Correct typo, Thanks, I am going to add the environment variable to our OpenMPI 
modules so rdmacm is our default for now,  Thanks!

> 
> 
>> 
>> Just checking thanks!
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> Center for Advanced Computing
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On Apr 21, 2011, at 6:21 PM, Jeff Squyres wrote:
>> 
>>> Over IB, I'm not sure there is much of a drawback.  It might be slightly 
>>> slower to establish QP's, but I don't think that matters much.
>>> 
>>> Over iWARP, rdmacm can cause connection storms as you scale to thousands of 
>>> MPI processes.
>>> 
>>> 
>>> On Apr 20, 2011, at 5:03 PM, Brock Palen wrote:
>>> 
>>>> We managed to have another user hit the bug that causes collectives (this 
>>>> time MPI_Bcast() ) to hang on IB that was fixed by setting:
>>>> 
>>>> btl_openib_cpc_include rdmacm
>>>> 
>>>> My question is if we set this to the default on our system with an 
>>>> environment variable does it introduce any performance or other issues we 
>>>> should be aware of?
>>>> 
>>>> Is there a reason we should not use rdmacm?
>>>> 
>>>> Thanks!
>>>> 
>>>> Brock Palen
>>>> www.umich.edu/~brockp
>>>> Center for Advanced Computing
>>>> bro...@umich.edu
>>>> (734)936-1985
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ___
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
>

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-21 Thread Brock Palen

Given that part of our cluster is TCP only, openib wouldn't even startup on 
those hosts and this would be ignored on hosts with IB adaptors?  

Just checking thanks!

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Apr 21, 2011, at 6:21 PM, Jeff Squyres wrote:

> Over IB, I'm not sure there is much of a drawback.  It might be slightly 
> slower to establish QP's, but I don't think that matters much.
> 
> Over iWARP, rdmacm can cause connection storms as you scale to thousands of 
> MPI processes.
> 
> 
> On Apr 20, 2011, at 5:03 PM, Brock Palen wrote:
> 
>> We managed to have another user hit the bug that causes collectives (this 
>> time MPI_Bcast() ) to hang on IB that was fixed by setting:
>> 
>> btl_openib_cpc_include rdmacm
>> 
>> My question is if we set this to the default on our system with an 
>> environment variable does it introduce any performance or other issues we 
>> should be aware of?
>> 
>> Is there a reason we should not use rdmacm?
>> 
>> Thanks!
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> Center for Advanced Computing
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
>

1 2 3 >

1 - 100 of 256 matches

Mail list logo