Re: [OMPI users] openib segfaults with Torque

2014-06-05 Thread Ralph Castain
Hmmm...I'm not sure how that is going to run with only one proc (I don't know if the program is protected against that scenario). If you run with -np 2 -mca btl openib,sm,self, is it happy? On Jun 5, 2014, at 2:16 PM, Fischer, Greg A. wrote: > Here’s the command

Re: [OMPI users] Bind multiple cores to rank - OpenMPI 1.8.1

2014-06-05 Thread Ralph Castain
On Jun 5, 2014, at 2:13 PM, Dan Dietz wrote: > Hello all, > > I'd like to bind 8 cores to a single MPI rank for hybrid MPI/OpenMP > codes. In OMPI 1.6.3, I can do: > > $ mpirun -np 2 -cpus-per-rank 8 -machinefile ./nodes ./hello > > I get one rank bound to procs 0-7 and

Re: [OMPI users] OPENIB unknown transport errors

2014-06-05 Thread Tim Miller
Hi Josh, Thanks for attempting to sort this out. In answer to your questions: 1. Node allocation is done by TORQUE, however we don't use the TM API to launch jobs (long story). Instead, we just pass a hostfile to mpirun, and mpirun uses the ssh launcher to actually communicate and launch the

Re: [OMPI users] openib segfaults with Torque

2014-06-05 Thread Fischer, Greg A.
Here's the command I'm invoking and the terminal output. (Some of this information doesn't appear to be captured in the backtrace.) [binf316:fischega] $ mpirun -np 1 -mca btl openib,self ring_c ring_c: ../../../../../openmpi-1.8.1/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c:734:

[OMPI users] Bind multiple cores to rank - OpenMPI 1.8.1

2014-06-05 Thread Dan Dietz
Hello all, I'd like to bind 8 cores to a single MPI rank for hybrid MPI/OpenMP codes. In OMPI 1.6.3, I can do: $ mpirun -np 2 -cpus-per-rank 8 -machinefile ./nodes ./hello I get one rank bound to procs 0-7 and the other bound to 8-15. Great! But I'm having some difficulties doing this with

[OMPI users] openib segfaults with Torque

2014-06-05 Thread Fischer, Greg A.
OpenMPI Users, After encountering difficulty with the Intel compilers (see the "intermittent segfaults with openib on ring_c.c" thread), I installed GCC-4.8.3 and recompiled OpenMPI. I ran the simple examples (ring, etc.) with the openib BTL in a typical BASH environment. Everything appeared

Re: [OMPI users] OPENIB unknown transport errors

2014-06-05 Thread Joshua Ladd
Strange indeed. This info (remote adapter info) is passed around in the modex and the struct is locally populated during add procs. 1. How do you launch jobs? Mpirun, srun, or something else? 2. How many active ports do you have on each HCA? Are they all configured to use IB? 3. Do you explicitly

Re: [OMPI users] OpenMPI Compilation Error

2014-06-05 Thread George Bosilca
A fix has been pushed in the trunk (r31955). Once reviewed it will make it to the next release 1.8.2. George. On Thu, Jun 5, 2014 at 11:29 AM, Jeff Squyres (jsquyres) wrote: > George and I are together at the MPI Forum this week -- we just looked at > this in more

Re: [OMPI users] spml_ikrit_np random values

2014-06-05 Thread Mike Dubman
seems oshmem_info uses uninitialized value. we will check it, thanks for report. On Thu, Jun 5, 2014 at 6:56 PM, Timur Ismagilov wrote: > Hello! > > I am using Open MPI v1.8.1. > > $oshmem_info -a --parsable | grep spml_ikrit_np > >

Re: [OMPI users] OpenMPI Compilation Error

2014-06-05 Thread Jeff Squyres (jsquyres)
George and I are together at the MPI Forum this week -- we just looked at this in more detail; it looks like this is a more pervasive problem. Let us look at this a bit more... On Jun 5, 2014, at 10:37 AM, George Bosilca wrote: > Alan, > > I think we forgot to cleanup

Re: [OMPI users] Compiling OpenMPI 1.8.1 for Cray XC30

2014-06-05 Thread Ralph Castain
I know Nathan has it running on the XC30, but I don't see a platform file specifically for it in the repo. Did you try the cray_xe6 platform files - I think he may have just augmented those to handle the XC30 case Look in contrib/platform/lanl/cray_xe6 On Jun 5, 2014, at 9:00 AM, Hammond,

[OMPI users] Compiling OpenMPI 1.8.1 for Cray XC30

2014-06-05 Thread Hammond, Simon David (-EXP)
Hi OpenMPI developers/users, Does anyone have a working configure line for OpenMPI 1.8.1 on a Cray XC30? When we compile the code ALPS is located but when we run compiled binaries using aprun we get n * 1 ranks rather than 1 job of n ranks. Thank you. S. -- Simon Hammond Scalable Computer

[OMPI users] spml_ikrit_np random values

2014-06-05 Thread Timur Ismagilov
Hello! I am using Open MPI v1.8.1. $oshmem_info -a --parsable | grep spml_ikrit_np mca:spml:ikrit:param:spml_ikrit_np:value:1620524368  (alwase new value) mca:spml:ikrit:param:spml_ikrit_np:source:default mca:spml:ikrit:param:spml_ikrit_np:status:writeable

Re: [OMPI users] OpenMPI Compilation Error

2014-06-05 Thread George Bosilca
Alan, I think we forgot to cleanup after a merge and as a result we have c_destweights and c_sourceweights defined twice. Please try the following patch and let us know if this fixes your issue. Index: ompi/mpi/fortran/mpif-h/dist_graph_create_adjacent_f.c

Re: [OMPI users] [warn] Epoll ADD(1) on fd 0 failed

2014-06-05 Thread Ralph Castain
FWIW: support for the --resv-ports option was deprecated and removed on the OMPI side a long time ago. I'm not familiar enough with "oshrun" to know if it is doing anything unusual - I believe it is just a renaming of our usual "mpirun". I suspect this is some interaction with sbatch, but I'll

[OMPI users] Bug in OpenMPI-1.8.1: missing routines mpi_win_allocate_shared, mpi_win_shared_query called from Ftn95-code

2014-06-05 Thread Michael.Rachner
Dear developers of OpenMPI, I found that when building an executable from a Fortran95-code on a LINUX cluster with OpenMPI-1.8.1 (and INTEL-14.0.2 Ftn-compiler) the following two MPI-3 routines do not exist: /dat/KERNEL/mpi3_sharedmem.f90:176: undefined reference to `mpi_win_allocate_shared_'

[OMPI users] OpenMPI Compilation Error

2014-06-05 Thread Alan Sang Loon
Hello, I have downloaded OpenMPI-1.8.1 and compiled it using Intel Compilers (Intel Composer XE Suites 2013) and the command used is as follow: [Code] ./configure --prefix=/opt/openmpi-1.8.1 CC=icc CXX=icpc F77=ifort FC=ifort make all install [/code] Everything works just fine except I realized