Re: [OMPI devel] 1.7rc8 is posted

2013-03-01 Thread Peter Kjellström
On Wednesday 27 February 2013 17:52:42 Jeff Squyres wrote:
> The goal is to release 1.7 (final) by the end of this week.  New rc posted

Built on CentOS-6.3 + MLNXOFED-1.5.3-310 using intel-13.1.0: OK
Built with cma: OK
Built with mxm (1.5.0eb2be5): OK
Built with slurm: ok

Launch correctness: OK
IMB on 32 and 128 ranks all combos of mxm, verbs, cma, no cma: OK(*)

(*) of course there are lots of spots-o-performance-sucking

Good luck with 1.7 final,
 Peter

> with fairly small changes:
> 
> http://www.open-mpi.org/software/ompi/v1.7/
> 
> - Fix wrong header file / compilation error in bcol
> - Support MXM STREAM for isend and irecv
> - Make sure "mpirun " fails with $status!=0
> - Bunches of cygwin minor fixes
> - Make sure the fortran compiler supports BIND(C) with LOGICAL for the F08
> bindings - Fix --disable-mpi-io with the F08 bindings


signature.asc
Description: This is a digitally signed message part.


Re: [OMPI devel] 1.7rc8 is posted

2013-02-28 Thread Pavel Mezentsev
Regarding the Open64. It's that strange thing that I need two different
versions of the compilers:
one for Opteron family 15h and one for general x86-64 architecture. This
makes things quite complicated since my head node doesn't have Opteron
family 15h processor. You can have a look at this topic:
http://devgurus.amd.com/thread/160180

I've tried building the same version on a node with 6380 processors.
Configuration was successful. But make failed in the following way:
libtool: compile:  opencc -DHAVE_CONFIG_H -I. -DLTDLOPEN=libltdlc
"-DLT_CONFIG_H=" -DLTDL -I. -I. -Ilibltdl -I./libltdl -I./libltdl
-I/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/mca/hwloc/hwloc151/hwloc/include
-I/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/mca/event/libevent2019/libevent
-I/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/mca/event/libevent2019/libevent/include
-I/usr/include/infiniband -I/usr/include/infiniband
-I/usr/include/infiniband -I/usr/include/infiniband
-I/usr/include/infiniband -I/usr/include/infiniband -O3 -DNDEBUG
-fvisibility=hidden -MT libltdlc_la-ltdl.lo -MD -MP -MF
.deps/libltdlc_la-ltdl.Tpo -c ltdl.c  -fPIC -DPIC -o
.libs/libltdlc_la-ltdl.o
/tmp/ccspin#.cVv00f.s: Assembler messages:
/tmp/ccspin#.cVv00f.s:860: Error: no such instruction: `bextr
$257,%esi,%esi'
/tmp/ccspin#.cVv00f.s:868: Error: no such instruction: `bextr
$258,%edi,%edi'
/tmp/ccspin#.cVv00f.s:876: Error: no such instruction: `bextr
$259,%eax,%eax'
make[3]: *** [libltdlc_la-ltdl.lo] Error 1
make[3]: Leaving directory
`/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/libltdl'
make[2]: *** [all] Error 2
make[2]: Leaving directory
`/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal/libltdl'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/tmp/mpi_install_tmp26482/openmpi-1.7rc8/opal'
make: *** [all-recursive] Error 1

I guess that this issue has more to do with the compiler than with OpenMPI.


Let me know if you need me to run any additional tests.

Regards, Pavel Mezentsev.



2013/2/28 Jeff Squyres (jsquyres) 

> On Feb 28, 2013, at 12:04 PM, Pavel Mezentsev 
> wrote:
>
> > Do you mean the logs from failed attempts? They are enclosed. If you
> need the successful logs I'll need to make them again since the files from
> successful builds are deleted.
>
> You guessed right; I need the logs from the failed builds.
>
> It looks like your openf95 compiler is generating borked executables:
>
> -
> configure:31019: checking KIND value of Fortran C_SIGNED_CHAR
> configure:31046: openf95 -o conftestconftest.f90  >&5
> configure:31046: $? = 0
> configure:31046: ./conftest
> ./configure: line 4343:  1234 Illegal instruction (core dumped)
> ./conftest$ac_exeext
> configure:31046: $? = 132
> configure: program exited with status 132
> configure: failed program was:
> |   program main
> |
> | use, intrinsic :: ISO_C_BINDING
> | open(unit = 7, file = "conftest.out")
> | write(7, *) C_SIGNED_CHAR
> | close(7)
> |
> |   end
> -
>
> There's no reason the above Fortran program should fail with "illegal
> instruction".
>
> > I am not using  MXM. The results with the option you suggested are the
> same as before:
>
> We're investigating the latency issue.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] 1.7rc8 is posted

2013-02-28 Thread Jeff Squyres (jsquyres)
On Feb 28, 2013, at 12:04 PM, Pavel Mezentsev  wrote:

> Do you mean the logs from failed attempts? They are enclosed. If you need the 
> successful logs I'll need to make them again since the files from successful 
> builds are deleted.

You guessed right; I need the logs from the failed builds.

It looks like your openf95 compiler is generating borked executables:

-
configure:31019: checking KIND value of Fortran C_SIGNED_CHAR
configure:31046: openf95 -o conftestconftest.f90  >&5
configure:31046: $? = 0
configure:31046: ./conftest
./configure: line 4343:  1234 Illegal instruction (core dumped) 
./conftest$ac_exeext
configure:31046: $? = 132
configure: program exited with status 132
configure: failed program was:
|   program main
| 
| use, intrinsic :: ISO_C_BINDING
| open(unit = 7, file = "conftest.out")
| write(7, *) C_SIGNED_CHAR
| close(7)
| 
|   end
-

There's no reason the above Fortran program should fail with "illegal 
instruction".

> I am not using  MXM. The results with the option you suggested are the same 
> as before:

We're investigating the latency issue.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] 1.7rc8 is posted

2013-02-28 Thread Pavel Mezentsev
Do you mean the logs from failed attempts? They are enclosed. If you need
the successful logs I'll need to make them again since the files from
successful builds are deleted.

I am not using  MXM. The results with the option you suggested are the same
as before:
#---
# Benchmarking PingPong
# #processes = 2
#---
   #bytes #repetitions  t[usec]   Mbytes/sec
0 1000 1.49 0.00
1 1000 1.58 0.61
2 1000 1.12 1.71
4 1000 1.10 3.48
8 1000 1.11 6.90
   16 1000 1.1113.69
   32 1000 1.1227.21
   64 1000 1.1652.52
  128 1000 1.7270.83
  256 1000 1.84   132.72
  512 1000 1.99   245.74
 1024 1000 2.25   433.92
 2048 1000 2.87   680.54
 4096 1000 3.52  1109.13
 8192 1000 4.68  1670.60
16384 1000 9.66  1617.91
32768 100014.30  2185.24
65536  64023.45  2665.33
   131072  32035.99  3473.15
   262144  16058.05  4306.65
   524288   80   101.94  4904.69
  1048576   40   188.65  5300.86
  2097152   20   526.05  3801.94
  4194304   10  1096.09  3649.32

#---
# Benchmarking PingPing
# #processes = 2
#---
   #bytes #repetitions  t[usec]   Mbytes/sec
0 1000 1.10 0.00
1 1000 1.24 0.77
2 1000 1.23 1.55
4 1000 1.23 3.10
8 1000 1.25 6.09
   16 1000 1.1413.41
   32 1000 1.1127.40
   64 1000 1.1652.75
  128 1000 1.7171.34
  256 1000 1.84   132.33
  512 1000 1.98   246.63
 1024 1000 2.27   429.26
 2048 1000 2.91   672.30
 4096 1000 3.52  1109.43
 8192 1000 4.80  1627.25
16384 1000 9.98  1565.64
32768 100014.70  2125.14
65536  64024.18  2584.97
   131072  32037.33  3348.95
   262144  16060.59  4125.82
   524288   80   105.83  4724.78
  1048576   40   197.82  5055.05
  2097152   20   791.35  2527.34
  4194304   10  1820.30  2197.44

Regards, Pavel Mezentsev.


2013/2/28 Jeff Squyres (jsquyres) 

> On Feb 27, 2013, at 7:36 PM, Pavel Mezentsev 
> wrote:
>
> > I've tried the new rc. Here is what I got:
>
> Thanks for testing.
>
> > 1) I've successfully built it with intel-13.1 and gcc-4.7.2. But I've
> failed while using open64-4.5.2 and ekopath-5.0.0 (pathscale). The problems
> are in the fortran part. In each case I've used the following configuration
> line:
> > CC=$CC CXX=$CXX F77=$F77 FC=$FC ./configure --prefix=$prefix
> --with-knem=$knem_path
> > Open64 failed during configuration with the following:
> > *** Fortran compiler
> > checking whether we are using the GNU Fortran compiler... yes
> > checking whether openf95 accepts -g... yes
> > configure: WARNING: Open MPI now ignores the F77 and FFLAGS environment
> variables; only the FC and FCFLAGS environment variables are used.
> > checking whether ln -s works... yes
> > checking if Fortran compiler works... yes
> > checking for extra arguments to build a shared library... none needed
> > checking for Fortran flag to compile .f files... none
> > checking for Fortran flag to compile .f90 files... none
> > checking to see if Fortran compilers need additional linker flags... none
> > checking  external symbol convention... double underscore
> > checking if C and Fortran are link compatible... yes
> > checking to see if Fortran compiler likes the C++ exception flags...
> skipped (no C++ exceptions flags)
> > checking to see if mpifort compiler needs additional linker flags... none
> > checking if Fortran compiler supports CHARACTER... yes
> > checking size of Fortran CHARACTER... 1
> > checking for C type corresponding to CHARACTER... char
> > checking alignment of Fortran CHARACTER... 1
> > 

Re: [OMPI devel] 1.7rc8 is posted

2013-02-27 Thread Jeff Squyres (jsquyres)
On Feb 27, 2013, at 7:36 PM, Pavel Mezentsev  wrote:

> I've tried the new rc. Here is what I got:

Thanks for testing.

> 1) I've successfully built it with intel-13.1 and gcc-4.7.2. But I've failed 
> while using open64-4.5.2 and ekopath-5.0.0 (pathscale). The problems are in 
> the fortran part. In each case I've used the following configuration line:
> CC=$CC CXX=$CXX F77=$F77 FC=$FC ./configure --prefix=$prefix 
> --with-knem=$knem_path
> Open64 failed during configuration with the following:
> *** Fortran compiler
> checking whether we are using the GNU Fortran compiler... yes
> checking whether openf95 accepts -g... yes
> configure: WARNING: Open MPI now ignores the F77 and FFLAGS environment 
> variables; only the FC and FCFLAGS environment variables are used.
> checking whether ln -s works... yes
> checking if Fortran compiler works... yes
> checking for extra arguments to build a shared library... none needed
> checking for Fortran flag to compile .f files... none
> checking for Fortran flag to compile .f90 files... none
> checking to see if Fortran compilers need additional linker flags... none
> checking  external symbol convention... double underscore
> checking if C and Fortran are link compatible... yes
> checking to see if Fortran compiler likes the C++ exception flags... skipped 
> (no C++ exceptions flags)
> checking to see if mpifort compiler needs additional linker flags... none
> checking if Fortran compiler supports CHARACTER... yes
> checking size of Fortran CHARACTER... 1
> checking for C type corresponding to CHARACTER... char
> checking alignment of Fortran CHARACTER... 1
> checking for corresponding KIND value of CHARACTER... C_SIGNED_CHAR
> checking KIND value of Fortran C_SIGNED_CHAR... no ISO_C_BINDING -- fallback
> checking Fortran value of selected_int_kind(4)... no
> configure: WARNING: Could not determine KIND value of C_SIGNED_CHAR
> configure: WARNING: See config.log for more details
> configure: error: Cannot continue

Please send the full configure output as well as the config.log file (please 
compress).

> Ekopath failed during make with the following error:
>  PPFC mpi-f08-sizeof.lo
>   PPFC mpi-f08.lo
> In file included from mpi-f08.F90:37:
> mpi-f-interfaces-bind.h:1908: warning: extra tokens at end of #endif directive
> mpi-f-interfaces-bind.h:2957: warning: extra tokens at end of #endif directive
> In file included from mpi-f08.F90:38:
> pmpi-f-interfaces-bind.h:1911: warning: extra tokens at end of #endif 
> directive
> pmpi-f-interfaces-bind.h:2963: warning: extra tokens at end of #endif 
> directive
> pathf95-1044 pathf95: INTERNAL OMPI_OP_CREATE_F, File = 
> mpi-f-interfaces-bind.h, Line = 955, Column = 29 
>   Internal : Unexpected ATP_PGM_UNIT in check_interoperable_pgm_unit()

I've pinged Pathscale about this.

> 2) I've ran a couple of tests (IMB) with the new version. I ran this on a 
> system consisting of 10 nodes with Intel SB processor and fdr ConnectX3 
> infiniband adapters.
> First I've tried the following parameters:
> mpirun -np $NP -hostfile hosts --mca btl openib,sm,self --bind-to-core 
> -npernode 16 --mca mpi_leave_pinned 1 ./IMB-MPI1 -npmin $NP -mem 4G $COLL
> This combination complained about mca_leave_pinned. The same line works for 
> 1.6.3. Is something different in the new release and I've missed it?
> --
> A process attempted to use the "leave pinned" MPI feature, but no
> memory registration hooks were found on the system at run time.  This
> may be the result of running on a system that does not support memory
> hooks or having some other software subvert Open MPI's use of the
> memory hooks.  You can disable Open MPI's use of memory hooks by
> setting both the mpi_leave_pinned and mpi_leave_pinned_pipeline MCA
> parameters to 0.

This should not be, and might explain your lower performance on the IMB results.

Nathan -- you reported that you saw something like this before, but were then 
unable to reproduce.  Any ideas what's going on here?  Mellanox?

(although the short message latency is troubling...)

Can you ensure that you aren't using MXM in 1.7?  I understand that its short 
message latency is worse than RC verbs.  You'll need to add "--mca pml ob1" in 
your command line.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] 1.7rc8 is posted

2013-02-27 Thread Jeff Squyres (jsquyres)
Sweet -- thanks for the quick test!

On Feb 27, 2013, at 7:54 PM, marco atzeri  wrote:

> On 2/27/2013 6:52 PM, Jeff Squyres (jsquyres) wrote:
>> The goal is to release 1.7 (final) by the end of this week.  New rc posted 
>> with fairly small changes:
>> 
>> http://www.open-mpi.org/software/ompi/v1.7/
>> 
>> - Fix wrong header file / compilation error in bcol
>> - Support MXM STREAM for isend and irecv
>> - Make sure "mpirun " fails with $status!=0
>> - Bunches of cygwin minor fixes
>> - Make sure the fortran compiler supports BIND(C) with LOGICAL for the F08 
>> bindings
>> - Fix --disable-mpi-io with the F08 bindings
>> 
> 
> builds and passes tests on cygwin
> 
> Regards
> Marco
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] 1.7rc8 is posted

2013-02-27 Thread marco atzeri

On 2/27/2013 6:52 PM, Jeff Squyres (jsquyres) wrote:

The goal is to release 1.7 (final) by the end of this week.  New rc posted with 
fairly small changes:

 http://www.open-mpi.org/software/ompi/v1.7/

- Fix wrong header file / compilation error in bcol
- Support MXM STREAM for isend and irecv
- Make sure "mpirun " fails with $status!=0
- Bunches of cygwin minor fixes
- Make sure the fortran compiler supports BIND(C) with LOGICAL for the F08 
bindings
- Fix --disable-mpi-io with the F08 bindings



builds and passes tests on cygwin

Regards
Marco


Re: [OMPI devel] 1.7rc8 is posted

2013-02-27 Thread Pavel Mezentsev
I've tried the new rc. Here is what I got:

1) I've successfully built it with intel-13.1 and gcc-4.7.2. But I've
failed while using open64-4.5.2 and ekopath-5.0.0 (pathscale). The problems
are in the fortran part. In each case I've used the following configuration
line:
CC=$CC CXX=$CXX F77=$F77 FC=$FC ./configure --prefix=$prefix
--with-knem=$knem_path
Open64 failed during configuration with the following:
*** Fortran compiler
checking whether we are using the GNU Fortran compiler... yes
checking whether openf95 accepts -g... yes
configure: WARNING: Open MPI now ignores the F77 and FFLAGS environment
variables; only the FC and FCFLAGS environment variables are used.
checking whether ln -s works... yes
checking if Fortran compiler works... yes
checking for extra arguments to build a shared library... none needed
checking for Fortran flag to compile .f files... none
checking for Fortran flag to compile .f90 files... none
checking to see if Fortran compilers need additional linker flags... none
checking  external symbol convention... double underscore
checking if C and Fortran are link compatible... yes
checking to see if Fortran compiler likes the C++ exception flags...
skipped (no C++ exceptions flags)
checking to see if mpifort compiler needs additional linker flags... none
checking if Fortran compiler supports CHARACTER... yes
checking size of Fortran CHARACTER... 1
checking for C type corresponding to CHARACTER... char
checking alignment of Fortran CHARACTER... 1
checking for corresponding KIND value of CHARACTER... C_SIGNED_CHAR
checking KIND value of Fortran C_SIGNED_CHAR... no ISO_C_BINDING -- fallback
checking Fortran value of selected_int_kind(4)... no
configure: WARNING: Could not determine KIND value of C_SIGNED_CHAR
configure: WARNING: See config.log for more details
configure: error: Cannot continue

Ekopath failed during make with the following error:
 PPFC mpi-f08-sizeof.lo
  PPFC mpi-f08.lo
In file included from mpi-f08.F90:37:
mpi-f-interfaces-bind.h:1908: warning: extra tokens at end of #endif
directive
mpi-f-interfaces-bind.h:2957: warning: extra tokens at end of #endif
directive
In file included from mpi-f08.F90:38:
pmpi-f-interfaces-bind.h:1911: warning: extra tokens at end of #endif
directive
pmpi-f-interfaces-bind.h:2963: warning: extra tokens at end of #endif
directive
pathf95-1044 pathf95: INTERNAL OMPI_OP_CREATE_F, File =
mpi-f-interfaces-bind.h, Line = 955, Column = 29
  Internal : Unexpected ATP_PGM_UNIT in check_interoperable_pgm_unit()
make[2]: *** [mpi-f08.lo] Error 1
make[2]: Leaving directory
`/tmp/mpi_install_tmp1400/openmpi-1.7rc8/ompi/mpi/fortran/use-mpi-f08'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/tmp/mpi_install_tmp1400/openmpi-1.7rc8/ompi'
make: *** [all-recursive] Error 1

It seems to be different from the error I got last time with rc7. And again
I'm not a fortran guy to understand this error. I've used the following
version of the compiler:
http://c591116.r16.cf2.rackcdn.com/ekopath/nightly/Linux/ekopath-2013-02-26-installer.run

2) I've ran a couple of tests (IMB) with the new version. I ran this on a
system consisting of 10 nodes with Intel SB processor and fdr ConnectX3
infiniband adapters.
First I've tried the following parameters:
mpirun -np $NP -hostfile hosts --mca btl
openib,sm,self --bind-to-core -npernode 16 --mca mpi_leave_pinned
1 ./IMB-MPI1 -npmin $NP -mem 4G $COLL
This combination complained about mca_leave_pinned. The same line works for
1.6.3. Is something different in the new release and I've missed it?
--
A process attempted to use the "leave pinned" MPI feature, but no
memory registration hooks were found on the system at run time.  This
may be the result of running on a system that does not support memory
hooks or having some other software subvert Open MPI's use of the
memory hooks.  You can disable Open MPI's use of memory hooks by
setting both the mpi_leave_pinned and mpi_leave_pinned_pipeline MCA
parameters to 0.

Open MPI will disable any transports that are attempting to use the
leave pinned functionality; your job may still run, but may fall back
to a slower network transport (such as TCP).

  Mpool name: grdma
  Process:[[13305,1],1]
  Local host: b23
--
--
WARNING: There is at least one OpenFabrics device found but there are
no active ports detected (or Open MPI was unable to use them).  This
is most certainly not what you wanted.  Check your cables, subnet
manager configuration, etc.  The openib BTL will be ignored for this
job.

  Local host: b23
--
--
At least one pair of MPI processes are unable to reach each other for
MPI communications.  

[OMPI devel] 1.7rc8 is posted

2013-02-27 Thread Jeff Squyres (jsquyres)
The goal is to release 1.7 (final) by the end of this week.  New rc posted with 
fairly small changes:

http://www.open-mpi.org/software/ompi/v1.7/

- Fix wrong header file / compilation error in bcol
- Support MXM STREAM for isend and irecv
- Make sure "mpirun " fails with $status!=0
- Bunches of cygwin minor fixes
- Make sure the fortran compiler supports BIND(C) with LOGICAL for the F08 
bindings
- Fix --disable-mpi-io with the F08 bindings

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/