Re: [OMPI users] cannot build 32-bit openmpi-1.7 on Linux

2013-04-05 Thread Paul Kapinos
I believe with 99%prob this is not an Open MPI issue, but an issue of the used 
fortran compiler (PPFC) itself.


You can verify this by going to the build subdir ('Entering directory...') and 
trying to find out _what command was called_. If your compiler crashes again, 
build a reproducer and send it to the compiler developer team :o)


Best
Paul Kapinos

On 04/05/13 17:56, Siegmar Gross wrote:

   PPFC mpi-f08.lo
"../../../../../openmpi-1.7/ompi/mpi/fortran/use-mpi-f08/mpi-f08.F90", Line = 1,
Column = 1: INTERNAL: Interrupt: Segmentation fault



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] OMPI v1.7.1 fails to build on RHEL 5 and RHEL 6

2013-04-18 Thread Paul Kapinos

On 04/17/13 23:37, Ralph Castain wrote:

Try adding --disable-openib-connectx-xrc to your configure line



That mean, the XRC issue is still not fixed, though this in the 1.7.1 announce?

> - Fixed XRC compile issue in Open Fabrics support.







On Apr 17, 2013, at 2:27 PM, Timothy Dwight Dunn <timothy.d...@colorado.edu> 
wrote:


I have been trying to get the new v1.7.1 to build on a few different systems 
and I get the same error on every build attempted. While the builds are on 3 
different clusters the are all using RHEL 5 or RHEL6 (6.3 not 6.4 as OFED is 
still not working for it yet)


Get this, too:

gmake[2]: Entering directory 
`/tmp/pk224850/linuxc2_10777/openmpi-1.7.1_linux64_intel/ompi/mca/common/ofacm'

  CC   common_ofacm_xoob.lo
common_ofacm_xoob.c(158): error: identifier "ompi_jobid_t" is undefined
  static int xoob_ib_address_init(ofacm_ib_address_t *ib_addr, uint16_t lid, 
uint64_t s_id, ompi_jobid_t ep_jobid)


 ^

common_ofacm_xoob.c(873): warning #188: enumerated type mixed with another type
  enum ibv_mtu mtu = (context->attr[0].path_mtu < 
context->remote_info.rem_mtu) ?

 ^

common_ofacm_xoob.c(953): warning #188: enumerated type mixed with another type
  enum ibv_mtu mtu = (context->attr[0].path_mtu < remote_info->rem_mtu) ?
 ^

compilation aborted for common_ofacm_xoob.c (code 2)
gmake[2]: *** [common_ofacm_xoob.lo] Error 1










While I have complex configs, even when I go down to a simple config using 
either gnu or Intel compilers such as;
export CC=icc
export CXX=icpc
export F77=ifort
export FC=ifort

./configure --prefix=~/openmpi-1.7.1 --with-tm=~/torque-2.5.11/ --with-verbs

(Note the ~ is just covering up my actual paths otherwise all is well)

So this config's without problems but when I go to build with make all -j 8 I 
get the following error


make[2]: Entering directory `~openmpi-1.7.1/ompi/mpi/fortran/mpiext'
  PPFC mpi-ext-module.lo
  PPFC mpi-f08-ext-module.lo
  FCLD libforce_usempi_module_to_be_built.la
  FCLD libforce_usempif08_module_to_be_built.la
make[2]: Leaving directory `~openmpi-1.7.1/ompi/mpi/fortran/mpiext'
Making all in mca/common/ofacm
make[2]: Entering directory `~openmpi-1.7.1/ompi/mca/common/ofacm'
  CC   libmca_common_ofacm_la-common_ofacm_oob.lo
  CC   libmca_common_ofacm_la-common_ofacm_base.lo
if test -z "libmca_common_ofacm.la"; then \
  rm -f "libmca_common_ofacm.la"; \
  ln -s "libmca_common_ofacm_noinst.la" "libmca_common_ofacm.la"; \
fi
  CC   libmca_common_ofacm_la-common_ofacm_empty.lo
  CC   libmca_common_ofacm_la-common_ofacm_xoob.lo
common_ofacm_xoob.c(158): error: identifier "ompi_jobid_t" is undefined
  static int xoob_ib_address_init(ofacm_ib_address_t *ib_addr, uint16_t lid, 
uint64_t s_id, ompi_jobid_t ep_jobid)

^

compilation aborted for common_ofacm_xoob.c (code 2)
make[2]: *** [libmca_common_ofacm_la-common_ofacm_xoob.lo] Error 1



Note I get this even if I try and build without IB verbs. Googeling for help on 
this has turned up nothing, literally nothing.

Any suggestions?

Thanks
Tim Dunn


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] Building Open MPI with LSF

2013-05-07 Thread Paul Kapinos

On 05/07/13 17:55, Ralph Castain wrote:


*1.* *OpenMPI support for 1.6 seems to be broken, and was fixed maybe in 1.7?*
http://www.open-mpi.org/community/lists/users/2013/03/21640.php


It is indeed fixed in 1.7 - we will look at backporting a fix to 1.6


well, we're using 1.6.4 with tight integration to LSF 8.0 now =)

For future, if you need a testbed, I can grant an user access to you...

best

Paul



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] basic questions about compiling OpenMPI

2013-05-22 Thread Paul Kapinos

On 05/22/13 17:08, Blosch, Edwin L wrote:

Apologies for not exploring the FAQ first.


No comments =)




If I want to use Intel or PGI compilers but link against the OpenMPI that ships 
with RedHat Enterprise Linux 6 (compiled with g++ I presume), are there any 
issues to watch out for, during linking?


At least, the Fortran-90 bindings ("use mpi") won't work at all (they're 
compiler-dependent.


So, our way is to compile a version of Open MPI with each compiler. I think this 
is recommended.


Note also that the version of Open MPI shipped with Linux is usuallu a bit 
dusty.




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] 1.7.1 Hang with MPI_THREAD_MULTIPLE set

2013-06-03 Thread Paul Kapinos

Hello,

It is more or less well-known that MPI_THREAD_MULTIPLE disable the OpenFabric / 
InfiniBand networking in Open MPI:


http://www.open-mpi.org/faq/?category=supported-systems#thread-support
http://www.open-mpi.org/community/lists/users/2010/03/12345.php

On our system not only the 'openib' BTL is off, but also the IPoIB denies to 
work, leading to error.


But I was able to run your programm error-free when completely disabling using 
the InfiniBand: either both processes on same node (using shared memory), or use 
"-mca btl ^openib -mca btl_tcp_if_exclude ib0,lo" parameter to the 'mpiexec' in 
order to disable InfiniBand and IPoIB.
Well; this is disappinting due to some 20x loss of performance using Gigagbit 
Ethernet, comparing the actual InfiniBand...


Note: Intel MPI support MPI_THREAD_MULTIPLE when linked with -mt_mpi (Intel and 
GCC compilers) or -lmpi_mt instead of -lmpi (other compilers). However, Intel 
MPI is not free.


Best,

Paul Kapinos



Also, I recommend to _always_ check what kinda of threading lievel you ordered 
and what did you get:

  print *, 'hello, world!', MPI_THREAD_MULTIPLE, provided







On 05/31/13 06:12, W Spector wrote:

Dear OpenMPI group,

The following trivial program hangs on the mpi_barrier call with 1.7.1.  I am
using gfortran/gcc 4.6.3 on Ubuntu linux.  OpenMPI was built with
--enable-mpi-thread-multiple support and no other options (other than --prefix).

Are there additional options we should be telling configure about?  Or have we
done something very silly?  Mpich2 works just fine...

Walter Spector


program hang
   use mpi
   implicit none

   integer :: me, npes
   integer :: mpierr, provided
   logical :: iampe0

   call mpi_init_thread (  &
   MPI_THREAD_MULTIPLE,  &
   provided,  &
   mpierr)
   print *, 'hello, world!'

! Hangs here with MPI_THREAD_MULTIPLE set...
   call mpi_barrier (MPI_COMM_WORLD, mpierr)

   call mpi_comm_rank (MPI_COMM_WORLD, me, mpierr)
   iampe0 = me == 0
   call mpi_comm_size (MPI_COMM_WORLD, npes, mpierr)
   print *, 'pe:', me, ', total comm size:', npes
   print *, 'I am ', trim (merge ('PE 0', 'not PE 0', iampe0))

   call mpi_finalize (mpierr)

end program
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] knem/openmpi performance?

2013-07-15 Thread Paul Kapinos

On 07/12/13 12:55, Jeff Squyres (jsquyres) wrote:

FWIW: a long time ago (read: many Open MPI / knem versions ago),
I did a few benchmarks with knem vs. no knem Open MPI installations.
IIRC, I used the typical suspects like NetPIPE, the NPBs, etc.
There was a modest performance improvement (I don't remember the numbers 
offhand);
it was a smaller improvement than I had hoped for
-- particularly in point-to-point message passing latency (e.g., via NetPIPE).


Jeff, I would turn the question the other way around:

- are there any penalties when using KNEM?

We have a couple of Really Big Nodes (128 cores) with non-huge memory bandwidth 
(because coupled of 4x standalone nodes with 4 sockets each). So cutting the 
bandwidth in halves on these nodes sound like Very Good Thing.


But otherwise we've 1500+ nodes with 2 sockets and 24GB memory only and we do 
not wanna to disturb the production on these nodes (and different MPI 
versions for different nodes are doofy).


Best

Paul


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] Big job, InfiniBand, MPI_Alltoallv and ibv_create_qp failed

2013-07-30 Thread Paul Kapinos

Dear Open MPI experts,

An user at our cluster has a problem running a kinda of big job:
(- the job using 3024 processes (12 per node, 252 nodes) runs fine)
- the job using 4032 processes (12 per node, 336 nodes) produce the error 
attached below.


Well, the http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages is 
well-known one; both recommended tweakables (user limits and registered memory 
size) are at MAX now, nevertheless someone queue pair could not be created.


Our blind guess is the number of completion queues is exhausted.

What happen' when raising the value from standard to max?
What max size of Open MPI jobs have been seen at all?
What max size of Open MPI jobs *using MPI_Alltoallv* have been seen at all?
Is there a way to manage the size/the number of queue pairs? (XRC not availabe)
Is there a way to tell MPI_Alltoallv to use less queue pairs, even when this 
could lead to slow-down?


There is a suspicious parameter in the mlx4_core module:
$ modinfo mlx4_core | grep log_num_cq
parm:   log_num_cq:log maximum number of CQs per HCA  (int)

Is this the tweakable parameter?
What is the default, and max value?

Any help would be welcome...

Best,

Paul Kapinos

P.S. There should be no connection problen somewhere between the nodes; a test 
job with 1x process on each node has been ran sucessfully just before starting 
the actual job, which also ran OK for a while - until calling MPI_Alltoallv.







--
A process failed to create a queue pair. This usually means either
the device has run out of queue pairs (too many connections) or
there are insufficient resources available to allocate a queue pair
(out of memory). The latter can happen if either 1) insufficient
memory is available, or 2) no more physical memory can be registered
with the device.

For more information on memory registration see the Open MPI FAQs at:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages

Local host: linuxbmc1156.rz.RWTH-Aachen.DE
Local device:   mlx4_0
Queue pair type:Reliable connected (RC)
--
[linuxbmc1156.rz.RWTH-Aachen.DE][[3703,1],4021][connect/btl_openib_connect_oob.c:867:rml_recv_cb] 
error in endpoint reply start connect

[linuxbmc1156.rz.RWTH-Aachen.DE:9632] *** An error occurred in MPI_Alltoallv
[linuxbmc1156.rz.RWTH-Aachen.DE:9632] *** on communicator MPI_COMM_WORLD
[linuxbmc1156.rz.RWTH-Aachen.DE:9632] *** MPI_ERR_OTHER: known error not in list
[linuxbmc1156.rz.RWTH-Aachen.DE:9632] *** MPI_ERRORS_ARE_FATAL: your MPI job 
will now abort
[linuxbmc1156.rz.RWTH-Aachen.DE][[3703,1],4024][connect/btl_openib_connect_oob.c:867:rml_recv_cb] 
error in endpoint reply start connect
[linuxbmc1156.rz.RWTH-Aachen.DE][[3703,1],4027][connect/btl_openib_connect_oob.c:867:rml_recv_cb] 
error in endpoint reply start connect
[linuxbmc0840.rz.RWTH-Aachen.DE][[3703,1],10][connect/btl_openib_connect_oob.c:867:rml_recv_cb] 
error in endpoint reply start connect
[linuxbmc0840.rz.RWTH-Aachen.DE][[3703,1],1][connect/btl_openib_connect_oob.c:867:rml_recv_cb] 
error in endpoint reply start connect
[linuxbmc0840.rz.RWTH-Aachen.DE:17696] [[3703,0],0]-[[3703,1],10] 
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
[linuxbmc0840.rz.RWTH-Aachen.DE:17696] [[3703,0],0]-[[3703,1],8] 
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
[linuxbmc0840.rz.RWTH-Aachen.DE:17696] [[3703,0],0]-[[3703,1],9] 
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
[linuxbmc0840.rz.RWTH-Aachen.DE:17696] [[3703,0],0]-[[3703,1],1] 
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
[linuxbmc0840.rz.RWTH-Aachen.DE:17696] 9 more processes have sent help message 
help-mpi-btl-openib-cpc-base.txt / ibv_create_qp failed
[linuxbmc0840.rz.RWTH-Aachen.DE:17696] Set MCA parameter 
"orte_base_help_aggregate" to 0 to see all help / error messages
[linuxbmc0840.rz.RWTH-Aachen.DE:17696] 3 more processes have sent help message 
help-mpi-errors.txt / mpi_errors_are_fatal


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] Big job, InfiniBand, MPI_Alltoallv and ibv_create_qp failed

2013-08-01 Thread Paul Kapinos
Vanilla Linux ofed from RPM's for Scientific Linux release 6.4 (Carbon) (= RHEL 
6.4).

No ofed_info available :-(

On 07/31/13 16:59, Mike Dubman wrote:

Hi,
What OFED vendor and version do you use?
Regards
M


On Tue, Jul 30, 2013 at 8:42 PM, Paul Kapinos <kapi...@rz.rwth-aachen.de
<mailto:kapi...@rz.rwth-aachen.de>> wrote:

Dear Open MPI experts,

An user at our cluster has a problem running a kinda of big job:
(- the job using 3024 processes (12 per node, 252 nodes) runs fine)
- the job using 4032 processes (12 per node, 336 nodes) produce the error
attached below.

Well, the
http://www.open-mpi.org/faq/?__category=openfabrics#ib-__locked-pages
<http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages> is
well-known one; both recommended tweakables (user limits and registered
memory size) are at MAX now, nevertheless someone queue pair could not be
created.

Our blind guess is the number of completion queues is exhausted.

What happen' when raising the value from standard to max?
What max size of Open MPI jobs have been seen at all?
What max size of Open MPI jobs *using MPI_Alltoallv* have been seen at all?
Is there a way to manage the size/the number of queue pairs? (XRC not 
availabe)
Is there a way to tell MPI_Alltoallv to use less queue pairs, even when this
could lead to slow-down?

There is a suspicious parameter in the mlx4_core module:
$ modinfo mlx4_core | grep log_num_cq
parm:   log_num_cq:log maximum number of CQs per HCA  (int)

Is this the tweakable parameter?
What is the default, and max value?

Any help would be welcome...

    Best,

Paul Kapinos

P.S. There should be no connection problen somewhere between the nodes; a
test job with 1x process on each node has been ran sucessfully just before
starting the actual job, which also ran OK for a while - until calling
MPI_Alltoallv.







--__--__--
A process failed to create a queue pair. This usually means either
the device has run out of queue pairs (too many connections) or
there are insufficient resources available to allocate a queue pair
(out of memory). The latter can happen if either 1) insufficient
memory is available, or 2) no more physical memory can be registered
with the device.

For more information on memory registration see the Open MPI FAQs at:
http://www.open-mpi.org/faq/?__category=openfabrics#ib-__locked-pages
<http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages>

Local host: linuxbmc1156.rz.RWTH-Aachen.DE
<http://linuxbmc1156.rz.RWTH-Aachen.DE>
Local device:   mlx4_0
Queue pair type:Reliable connected (RC)

--__--__--
[linuxbmc1156.rz.RWTH-Aachen.__DE

<http://linuxbmc1156.rz.RWTH-Aachen.DE>][[3703,1],4021][connect/__btl_openib_connect_oob.c:867:__rml_recv_cb]
error in endpoint reply start connect
[linuxbmc1156.rz.RWTH-Aachen.__DE:9632
<http://linuxbmc1156.rz.RWTH-Aachen.DE:9632>] *** An error occurred in
MPI_Alltoallv
[linuxbmc1156.rz.RWTH-Aachen.__DE:9632
<http://linuxbmc1156.rz.RWTH-Aachen.DE:9632>] *** on communicator 
MPI_COMM_WORLD
[linuxbmc1156.rz.RWTH-Aachen.__DE:9632
<http://linuxbmc1156.rz.RWTH-Aachen.DE:9632>] *** MPI_ERR_OTHER: known error
not in list
[linuxbmc1156.rz.RWTH-Aachen.__DE:9632
<http://linuxbmc1156.rz.RWTH-Aachen.DE:9632>] *** MPI_ERRORS_ARE_FATAL: your
MPI job will now abort
[linuxbmc1156.rz.RWTH-Aachen.__DE

<http://linuxbmc1156.rz.RWTH-Aachen.DE>][[3703,1],4024][connect/__btl_openib_connect_oob.c:867:__rml_recv_cb]
error in endpoint reply start connect
[linuxbmc1156.rz.RWTH-Aachen.__DE

<http://linuxbmc1156.rz.RWTH-Aachen.DE>][[3703,1],4027][connect/__btl_openib_connect_oob.c:867:__rml_recv_cb]
error in endpoint reply start connect
[linuxbmc0840.rz.RWTH-Aachen.__DE

<http://linuxbmc0840.rz.RWTH-Aachen.DE>][[3703,1],10][connect/btl___openib_connect_oob.c:867:rml___recv_cb]
error in endpoint reply start connect
[linuxbmc0840.rz.RWTH-Aachen.__DE

<http://linuxbmc0840.rz.RWTH-Aachen.DE>][[3703,1],1][connect/btl___openib_connect_oob.c:867:rml___recv_cb]
error in endpoint reply start connect
[linuxbmc0840.rz.RWTH-Aachen.__DE:17696
<http://linuxbmc0840.rz.RWTH-Aachen.DE:17696>] [[3703,0],0]-[[3703,1],10]
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
[linuxbmc0840.rz.RWTH-Aachen.__DE:17696
<http://linuxbmc0840.rz.RWTH-Aachen.DE:17696>] [[3703,0],0]-[[3703,1],8]
mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
[linuxbmc0840.rz.RWTH-Aachen.__DE:17696
<http://linuxbmc0840.rz.RWTH-A

Re: [OMPI users] MPI_Init_thread hangs in OpenMPI 1.7.1 when using --enable-mpi-thread-multiple

2013-10-23 Thread Paul Kapinos
et 
--enable-mpi-thread-multiple. So maybe it hangs in 1.7.1 on any computer as 
long as you use MPI_THREAD_MULTIPLE. At least I have not seen it work anywhere.

Do you agree that this is a bug, or am I doing something wrong?

Best regards,
Elias


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users







___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--

Dr. Hans Ekkehard Plesser, Associate Professor
Head, Basic Science Section

Dept. of Mathematical Sciences and Technology
Norwegian University of Life Sciences
PO Box 5003, 1432 Aas, Norway

Phone +47 6496 5467
Fax   +47 6496 5401
emailhans.ekkehard.ples...@umb.no
Homehttp://arken.umb.no/~plesser


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] SIGSEGV in opal_hwlock152_hwlock_bitmap_or.A // Bug in 'hwlock" ?

2013-10-31 Thread Paul Kapinos

Hello all,

using 1.7.x (1.7.2 and 1.7.3 tested), we get SIGSEGV from somewhere in-deepth of 
'hwlock' library - see the attached screenshot.


Because the error is strongly aligned to just one single node, which in turn is 
kinda special one (see output of 'lstopo -'), it smells like an error in the 
'hwlock' library.


Is there a way to disable hwlock or to debug it in somehow way?
(besides to build a debug version of hwlock and OpenMPI)

Best

Paul







--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
System(252GB)
  Misc1
Misc0
  Node#0(31GB) + Socket#0 + L3(18MB)
L2(256KB) + L1(32KB) + Core#0
  P#0
  P#64
L2(256KB) + L1(32KB) + Core#1
  P#1
  P#65
L2(256KB) + L1(32KB) + Core#2
  P#2
  P#66
L2(256KB) + L1(32KB) + Core#3
  P#3
  P#67
L2(256KB) + L1(32KB) + Core#8
  P#4
  P#68
L2(256KB) + L1(32KB) + Core#9
  P#5
  P#69
L2(256KB) + L1(32KB) + Core#10
  P#6
  P#70
L2(256KB) + L1(32KB) + Core#11
  P#7
  P#71
  Node#1(32GB) + Socket#1 + L3(18MB)
L2(256KB) + L1(32KB) + Core#0
  P#8
  P#72
L2(256KB) + L1(32KB) + Core#1
  P#9
  P#73
L2(256KB) + L1(32KB) + Core#2
  P#10
  P#74
L2(256KB) + L1(32KB) + Core#3
  P#11
  P#75
L2(256KB) + L1(32KB) + Core#8
  P#12
  P#76
L2(256KB) + L1(32KB) + Core#9
  P#13
  P#77
L2(256KB) + L1(32KB) + Core#10
  P#14
  P#78
L2(256KB) + L1(32KB) + Core#11
  P#15
  P#79
Misc0
  Node#2(32GB) + Socket#2 + L3(18MB)
L2(256KB) + L1(32KB) + Core#0
  P#16
  P#80
L2(256KB) + L1(32KB) + Core#1
  P#17
  P#81
L2(256KB) + L1(32KB) + Core#2
  P#18
  P#82
L2(256KB) + L1(32KB) + Core#3
  P#19
  P#83
L2(256KB) + L1(32KB) + Core#8
  P#20
  P#84
L2(256KB) + L1(32KB) + Core#9
  P#21
  P#85
L2(256KB) + L1(32KB) + Core#10
  P#22
  P#86
L2(256KB) + L1(32KB) + Core#11
  P#23
  P#87
  Node#3(32GB) + Socket#3 + L3(18MB)
L2(256KB) + L1(32KB) + Core#0
  P#24
  P#88
L2(256KB) + L1(32KB) + Core#1
  P#25
  P#89
L2(256KB) + L1(32KB) + Core#2
  P#26
  P#90
L2(256KB) + L1(32KB) + Core#3
  P#27
  P#91
L2(256KB) + L1(32KB) + Core#8
  P#28
  P#92
L2(256KB) + L1(32KB) + Core#9
  P#29
  P#93
L2(256KB) + L1(32KB) + Core#10
  P#30
  P#94
L2(256KB) + L1(32KB) + Core#11
  P#31
  P#95
  Misc1
Misc0
  Node#4(32GB) + Socket#4 + L3(18MB)
L2(256KB) + L1(32KB) + Core#0
  P#32
  P#96
L2(256KB) + L1(32KB) + Core#1
  P#33
  P#97
L2(256KB) + L1(32KB) + Core#2
  P#34
  P#98
L2(256KB) + L1(32KB) + Core#3
  P#35
  P#99
L2(256KB) + L1(32KB) + Core#8
  P#36
  P#100
L2(256KB) + L1(32KB) + Core#9
  P#37
  P#101
L2(256KB) + L1(32KB) + Core#10
  P#38
  P#102
L2(256KB) + L1(32KB) + Core#11
  P#39
  P#103
  Node#5(32GB) + Socket#5 + L3(18MB)
L2(256KB) + L1(32KB) + Core#0
  P#40
  P#104
L2(256KB) + L1(32KB) + Core#1
  P#41
  P#105
L2(256KB) + L1(32KB) + Core#2
  P#42
  P#106
L2(256KB) + L1(32KB) + Core#3
  P#43
  P#107
L2(256KB) + L1(32KB) + Core#8
  P#44
  P#108
L2(256KB) + L1(32KB) + Core#9
  P#45
  P#109
L2(256KB) + L1(32KB) + Core#10
  P#46
  P#110
L2(256KB) + L1(32KB) + Core#11
  P#47
  P#111
Misc0
  Node#6(32GB) + Socket#6 + L3(18MB)
L2(256KB) + L1(32KB) + Core#0
  P#48
  P#112
L2(256KB) + L1(32KB) + Core#1
  P#49
  P#113
L2(256KB) + L1(32KB) + Core#2
  P#50
  P#114
L2(256KB) + L1(32KB) + Core#3
  P#51
  P#115
L2(256KB) + L1(32KB) + Core#8
  P#52
  P#116
L2(256KB) + L1(32KB) + Core#9
  P#53
  P#117
L2(256KB) + L1(32KB) + Core#10
  P#54
  P#118
L2(256KB) + L1(32KB) + Core#11
  P#55
  P#119
  Node#7(32GB) + Socket#7 + L3(18MB)
L2(256KB) + L1(32KB) + Core#0
  P#56
  P#120
L2(256KB) + L1(32KB) + Core#1

[OMPI users] is there a way to bring to light _all_ configure options in a ready installation?

2010-08-24 Thread Paul Kapinos

Hello OpenMPI developers,

I am searching for a way to discover _all_ configure options of an 
OpenMPI installation.


Background: in a existing installation, the ompi_info program helps to 
find out a lot of informations about the installation. So, "ompi_info 
-c" shows *some* configuration options like CFLAGS, FFLAGS et cetera. 
Compilation directories often does not survive for long time (or are not 
shipped at all, e.g. with SunMPI)


But what about --enable-mpi-threads or --enable-contrib-no-build=vt for 
example (and all other possible) flags of "configure", how can I see 
would these flags set or would not?


In other words: is it possible to get _all_ flags of configure from an 
"ready" installation in without having the compilation dirs (with 
configure logs) any more?


Many thanks

Paul


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] is there a way to bring to light _all_ configure options in a ready installation?

2010-08-24 Thread Paul Kapinos





You should be able to run "./configure --help" and see a lengthy help message 
that includes all the command line options to configure.
Is that what you're looking for?

No, he wants to know what configure options were used with some binaries.



Yes Terry - I want to know what configure options were for a given 
installation! "./configure --help" helps but to guess which all of the 
options are used in a release, is a hard job..





--td

On Aug 24, 2010, at 7:40 AM, Paul Kapinos wrote:

  

Hello OpenMPI developers,

I am searching for a way to discover _all_ configure options of an OpenMPI 
installation.

Background: in a existing installation, the ompi_info program helps to find out a lot of 
informations about the installation. So, "ompi_info -c" shows *some* 
configuration options like CFLAGS, FFLAGS et cetera. Compilation directories often does 
not survive for long time (or are not shipped at all, e.g. with SunMPI)

But what about --enable-mpi-threads or --enable-contrib-no-build=vt for example (and all 
other possible) flags of "configure", how can I see would these flags set or 
would not?

In other words: is it possible to get _all_ flags of configure from an "ready" 
installation in without having the compilation dirs (with configure logs) any more?

Many thanks

Paul


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




  



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] a question about [MPI]IO on systems without network filesystem

2010-09-29 Thread Paul Kapinos

Dear OpenMPI developer,

We have a question about the possibility to use MPI IO (and possible 
regular I/O) on clusters which does *not* have a common filesystem 
(network filesystem) on all nodes.


A common filesystem is mainly NOT a hard precondition to use OpenMPI:
http://www.open-mpi.org/faq/?category=running#do-i-need-a-common-filesystem


Say, we have a (diskless? equipped with very small disks?) cluster, on 
which only one node have access to a filesystem.


Is it possible to configure/run OpenMPI in a such way, that only _one_ 
process (e.g. master) performs real disk I/O, and other processes sends 
the data to the master which works as an agent?


Of course this would impacts the performance, because all data must be 
send over network, and the master may became a bottleneck. But is such 
scenario - IO of all processes bundled to one  process - practicable at all?



Best wishes
Paul



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] v1.5.1 build failed with PGI compiler

2011-01-04 Thread Paul Kapinos
soname -Wl,libopen-pal.so.1 -o .libs/libopen-pal.so.1.0.0




Best wishes,

Paul






--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] v1.5.1: configuration failed if compiling on CentOS 5.5 with defauld GCC

2011-01-04 Thread Paul Kapinos

Dear OpenMPI folks,

I tried to compile the OpenMPI version 1.5.1 on a CentOS  5.5 computer 
with the default GCC shipped with the distribution, which is


gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)

The configuration failed:


configure:156412: checking location of libltdl
configure:156425: result: internal copy
configure:156709: WARNING: Failed to build GNU libltdl.  This usually 
means that something
configure:156711: WARNING: is incorrectly setup with your environment. 
There may be useful information in
configure:156713: WARNING: opal/libltdl/config.log.  You can also 
disable GNU libltdl, which will disable
configure:156715: WARNING: dynamic shared object loading, by configuring 
with --disable-dlopen.

configure:156717: error: Cannot continue


The configuration line was was follows:

$ ./configure --with-openib --with-devel-headers 
--enable-contrib-no-build=vt --enable-mpi-threads CFLAGS=-O3 -ffast-math 
-mtune=opteron -m32  CXXFLAGS=-O3 -ffast-math -mtune=opteron -m32 
FFLAGS=-O3 -ffast-math -mtune=opteron -m32  FCFLAGS=-O3 -ffast-math 
-mtune=opteron -m32  F77=gfortran LDFLAGS=-O3 -ffast-math -mtune=opteron 
-m32  --prefix=/../MPI/openmpi-1.5.1mt/linux32/gcc



With a newer version of GCC, version 4.2.4   (and also gcc version 
4.5.1) the configuration completed fine.


Does there an error in my way of configuring, or is there a problem in 
the configure itself? I think the non-ability to configure and build 
OpenMPI with the default compiler on CentOS 5.5 is still a problem, also 
other versions of GCC seem not to have the same issue.


Best wishes,

Paul



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] Configure fail: OpenMPI/1.5.3 with Support for LSF using Sun Studio compilers

2011-04-07 Thread Paul Kapinos

Dear OpenMPI developers,

We tried to build OpenMPI 1.5.3 including Support for Platform LSF using 
the Sun Studio (=Oracle Solaris Studio now) /12.2 and the configure 
stage failed.


1. Used flags:

./configure --with-lsf --with-openib --with-devel-headers 
--enable-contrib-no-build=vt --enable-mpi-threads CFLAGS="-fast 
-xtarget=nehalem -m64"   CXXFLAGS="-fast -xtarget=nehalem -m64" 
FFLAGS="-fast -xtarget=nehalem" -m64   FCFLAGS="-fast -xtarget=nehalem 
-m64"   F77=f95 LDFLAGS="-fast -xtarget=nehalem -m64" 
--prefix=//openmpi-1.5.3mt/linux64/studio


(note the Support for LSF enabled by --with-lsf). The compiler envvars 
are set as following:

$ echo $CC $FC $CXX
cc f95 CC

The compiler info: (cc -V, CC -V)
cc: Sun C 5.11 Linux_i386 2010/08/13
CC: Sun C++ 5.11 Linux_i386 2010/08/13


2. The configure error was:
##
checking for lsb_launch in -lbat... no
configure: WARNING: LSF support requested (via --with-lsf) but not found.
configure: error: Aborting.
##


3. In the config.log (see the config.log.error) there is more info about 
the problem. crucial info is:

##
/opt/lsf/8.0/linux2.6-glibc2.3-x86_64/lib/libbat.so: undefined reference 
to `ceil'

##

4. Googling vor `ceil' results e.g. in 
http://www.cplusplus.com/reference/clibrary/cmath/ceil/


so, the attached ceil.c example file *can* be compiled by "CC" (the 
Studio C++ compiler), but *cannot* be compiled using "cc" (the Studio C 
compiler).

$ CC ceil.c
$ cc ceil.c


5. Looking into configure.log and searching on `ceil' results: there was 
a check for the availability of `ceil' for the C compiler (see 
config.log.ceil). This check says `ceil' is *available* for the "cc" 
Compiler, which is *wrong*, cf. (4).


So, is there an error in the configure stage? Or either the checks in 
config.log.ceil does not rely on the avilability of the `ceil' funcion 
in the C compiler?


Best wishes,
Paul Kapinos






P.S. Note in in the past we build many older versions of OpenMPI with no 
support for LSF and no such problems






--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
configure:84213: cc -o conftest -DNDEBUG  -fast -xtarget=nehalem -m64   -mt 
-I/home/pk224850/OpenMPI/openmpi-1.5.3_linux64_studio/opal/mca/paffinity/hwloc/hwloc/include
   -I/opt/lsf/8.0/include  -fast -xtarget=nehalem -m64
-L/opt/lsf/8.0/linux2.6-glibc2.3-x86_64/lib conftest.c -lbat -llsf -lnsl  
-lutil  >&5
cc: Warning: -xchip=native detection failed, falling back to -xchip=generic
"conftest.c", line 568: warning: statement not reached
/opt/lsf/8.0/linux2.6-glibc2.3-x86_64/lib/libbat.so: undefined reference to 
`ceil'
configure:84213: $? = 2
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "Open MPI"
| #define PACKAGE_TARNAME "openmpi"
| #define PACKAGE_VERSION "1.5.3"
| #define PACKAGE_STRING "Open MPI 1.5.3"
| #define PACKAGE_BUGREPORT "http://www.open-mpi.org/community/help/;
| #define PACKAGE_URL ""
| #define OPAL_ARCH "x86_64-unknown-linux-gnu"
| #define STDC_HEADERS 1
| #define HAVE_SYS_TYPES_H 1
| #define HAVE_SYS_STAT_H 1
| #define HAVE_STDLIB_H 1
| #define HAVE_STRING_H 1
| #define HAVE_MEMORY_H 1
| #define HAVE_STRINGS_H 1
| #define HAVE_INTTYPES_H 1
| #define HAVE_STDINT_H 1
| #define HAVE_UNISTD_H 1
| #define __EXTENSIONS__ 1
| #define _ALL_SOURCE 1
| #define _GNU_SOURCE 1
| #define _POSIX_PTHREAD_SEMANTICS 1
| #define _TANDEM_SOURCE 1
| #define OMPI_MAJOR_VERSION 1
| #define OMPI_MINOR_VERSION 5
| #define OMPI_RELEASE_VERSION 3
| #define OMPI_GREEK_VERSION ""
| #define OMPI_VERSION "3"
| #define OMPI_RELEASE_DATE "Mar 16, 2011"
| #define ORTE_MAJOR_VERSION 1
| #define ORTE_MINOR_VERSION 5
| #define ORTE_RELEASE_VERSION 3
| #define ORTE_GREEK_VERSION ""
| #define ORTE_VERSION "3"
| #define ORTE_RELEASE_DATE "Mar 16, 2011"
| #define OPAL_MAJOR_VERSION 1
| #define OPAL_MINOR_VERSION 5
| #define OPAL_RELEASE_VERSION 3
| #define OPAL_GREEK_VERSION ""
| #define OPAL_VERSION "3"
| #define OPAL_RELEASE_DATE "Mar 16, 2011"
| #define OPAL_ENABLE_MEM_DEBUG 0
| #define OPAL_ENABLE_MEM_PROFILE 0
| #define OPAL_ENABLE_DEBUG 0
| #define OPAL_WANT_PRETTY_PRINT_STACKTRACE 1
| #define OPAL_ENABLE_PTY_SUPPORT 1
| #define OPAL_ENABLE_HETEROGENEOUS_SUPPORT 0
| #define OPAL_ENABLE_TRACE 0
| #define OPAL_ENABLE_FT 0
| #define OPAL_ENABLE_FT_CR 0
| #define OPAL_WANT_HOME_CONFIG_FILES 1
| #define OPAL_ENABLE_IPV6 0
| #define OPAL_PACKAGE_STRING "Open MPI pk224...@cluster.rz.rwth-aachen.d

Re: [OMPI users] Configure fail: OpenMPI/1.5.3 with Support for LSF using Sun Studio compilers

2011-04-07 Thread Paul Kapinos

Hi Terry,


so, the attached ceil.c example file *can* be compiled by "CC" (the 
Studio C++ compiler), but *cannot* be compiled using "cc" (the Studio 
C compiler).

$ CC ceil.c
$ cc ceil.c

Did you try to link in the math library -lm?  When I did this your test 
program worked for me and that actually is the first test that the 
configure does.


5. Looking into configure.log and searching on `ceil' results: there 
was a check for the availability of `ceil' for the C compiler (see 
config.log.ceil). This check says `ceil' is *available* for the "cc" 
Compiler, which is *wrong*, cf. (4).

See above, it actually is right when you link in the math lib.


Thankt for the tipp! Yes, if using -lm so the Studio C compiler "cc" 
works also fine for ceil.c:


$ cc ceil.c -lm



So, is there an error in the configure stage? Or either the checks in 
config.log.ceil does not rely on the avilability of the `ceil' funcion 
in the C compiler?
It looks to me like the lbat configure test is not linking in the math 
lib. 


Yes, the is no -lm in configure:84213 line.

Note the cheks for ceil again, config.log.ceil. As far as I unterstood 
these logs, the checks for ceil and for the need of -lm deliver wrong 
results:



configure:55000: checking if we need -lm for ceil

configure:55104: result: no

configure:55115: checking for ceil

configure:55115: result: yes


So, configure assumes "ceil" is available for  the "cc" compiler without 
the need for -lm flag - and this is *wrong*, "cc" need -lm.


It seem for me to be an configure issue.

Greetings

Paul


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] --enable-progress-threads broken in 1.5.3?

2011-04-28 Thread Paul Kapinos

Hi OpenMPI folks,

I've tried to install /1.5.3 version with aktivated progress threads 
(just to try it out) in addition to --enable-mpi-threads. The 
installation was fine, I also could build binaries, but each mpiexec 
call hangs forever silently. With the very same configuration options 
but without --enable-progress-threads, everything runs fine.


So I wonder about the --enable-progress-threads is broken, or maybe I 
did something wrong?



The configuration line was:

./configure --with-openib --with-lsf --with-devel-headers 
--enable-contrib-no-build=vt --enable-mpi-threads 
--enable-progress-threads --enable-heterogeneous --enable-cxx-exceptions 
--enable-orterun-prefix-by-default <>


where <> contain prefix and some compiler-specific stuff.

All versions compilerd (GCC, Intel, PGI, Sun Studio compilers, 23bit and 
64bit) behaves the very same way.



Best wishes,

Paul


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] How to use a wrapper for ssh?

2011-07-12 Thread Paul Kapinos

Hi OpenMPI folks,

Using the version 1.4.3 of OpenMPI, I wanna to wrap the 'ssh' calls, 
which are made from the OpenMPIs 'mpiexec'. For this purpose, at least 
two ways seem to be possible for me:


1. let the wrapper have the name 'ssh' and paste the path where it is 
into the PATH envvar *before* the path to real ssh


Q1: Would this work?

2. use MCA parameters described in
http://www.open-mpi.org/faq/?category=rsh#rsh-not-ssh
to bend the call to my wrapper, e.g.
export OMPI_MCA_plm_rsh_agent=WrapPer
export OMPI_MCA_orte_rsh_agent=WrapPer

the oddly thing is, that the OMPI_MCA_orte_rsh_agent envvar seem not to 
have any effect, whereas OMPI_MCA_plm_rsh_agent works.

Why I believe so?

Because "strace -f mpiexec ..." says still trying for opening 'ssh' if 
OMPI_MCA_orte_rsh_agent is set, and correctly trying to open the 
'WrapPer' iff OMPI_MCA_plm_rsh_agent is set.


Q2: Is the supposed non-functionality of OMPI_MCA_orte_rsh_agent a bug, 
or do I have just misunderstood something?


Best wishes,
Paul

P.S. reproducing: just set the envvars and do 'strace -f mpiexec ...'

example:

export OMPI_MCA_plm_rsh_agent=WrapPer
---> look'n for 'WrapPer';
stat64("/opt/lsf/8.0/linux2.6-glibc2.3-x86_64/bin/WrapPer", 0x8324) 
= -1 ENOENT (No such file or directory)


export OMPI_MCA_orte_rsh_agent=WrapPer
(do not forget to unset OMPI_MCA_plm_rsh_agent :o)
---> still looking for 'ssh'
stat64("/opt/lsf/8.0/linux2.6-glibc2.3-x86_64/bin/ssh", 0x8324) = -1 
ENOENT (No such file or directory)


===> OMPI_MCA_orte_rsh_agent does not work?!

--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] How to use a wrapper for ssh?

2011-07-13 Thread Paul Kapinos

Hi Ralph,

2. use MCA parameters described in
http://www.open-mpi.org/faq/?category=rsh#rsh-not-ssh
to bend the call to my wrapper, e.g.
export OMPI_MCA_plm_rsh_agent=WrapPer
export OMPI_MCA_orte_rsh_agent=WrapPer

the oddly thing is, that the OMPI_MCA_orte_rsh_agent envvar seem not to have 
any effect, whereas OMPI_MCA_plm_rsh_agent works.
Why I believe so?


orte_rsh_agent doesn't exist in the 1.4 series :-)
Only plm_rsh_agent is available in 1.4. "ompi_info --param orte all" and "ompi_info 
--param plm rsh" will confirm that fact.


If so, then the Wiki is not correct. Maybe someone can correct it? This 
would save some time for people like me...


Best wishes
Paul Kapinos




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] Does Oracle Cluster Tools aka Sun's MPI work with LDAP?

2011-07-15 Thread Paul Kapinos

Hi OpenMPI volks (and Oracle/Sun experts),

we have a problem with Sun's MPI (Cluster Tools 8.2.x) on a part of our 
cluster. In the part of the cluster where LDAP is activated, the mpiexec 
 does not try to spawn tasks on remote nodes at all, but exits with an 
error message alike below. If 'strace -f' the mpiexec, no exec of "ssh" 
can be found at all. Wondering, mpiexec tries to look into /etc/passwd 
(where user is not in, because using LDAP!).


On the old part of the cluster, where NIS is used as the autentification 
method, Sun MPI runs very fine.


So, is Suns MPI compatible with LDAP autotentification method at all?

Best wishes,

Paul


P.S. in both parts if the cluster, me (login marked as x here) can 
login to any node by ssh without need to type the password.




--
The user (x) is unknown to the system (i.e. there is no corresponding
entry in the password file). Please contact your system administrator
for a fix.
--
[cluster-beta.rz.RWTH-Aachen.DE:31535] [[57885,0],0] ORTE_ERROR_LOG: 
Fatal in file plm_rsh_module.c at line 1058

--


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] Does Oracle Cluster Tools aka Sun's MPI work with LDAP?

2011-07-20 Thread Paul Kapinos

Hi Terry, Reuti,

good news: we've solved/workarounded the problem with CT/8.2.1c :o)

the "fix" was easy: we used the 64bit version of the 'mpiexec' instead 
of [previously-used as default] 32bit version. The 64bit version version 
works now with both NIS and LDAP autentification modi. The32bit version 
works with the NIS-autentificated part of our cluster, only.


Thanks for your help!

Best wishes
Paul Kapinos



Reuti wrote:

Hi,

Am 15.07.2011 um 21:14 schrieb Terry Dontje:


On 7/15/2011 1:46 PM, Paul Kapinos wrote:
Hi OpenMPI volks (and Oracle/Sun experts), 

we have a problem with Sun's MPI (Cluster Tools 8.2.x) on a part of our cluster. In the part of the cluster where LDAP is activated, the mpiexec  does not try to spawn tasks on remote nodes at all, but exits with an error message alike below. If 'strace -f' the mpiexec, no exec of "ssh" can be found at all. Wondering, mpiexec tries to look into /etc/passwd (where user is not in, because using LDAP!). 

Note this is an area that should be no different than from stock Open MPI. 


"should not" but it is :o)
However, I compare CT/8.2.1c with self-compiled OpenMPI/1.4.3 which are 
far different releases. And they behave definitely in different way: in 
selv-compiled OpenMPI both 32bit and 64bit mpiexecs work with NIS and 
with LDAP, and the CT/8.2.1c mpiexec in 32bit does work with NIS only.





I would suspect that the message might be coming from ssh.  I wouldn't suspect 
mpiexec would be looking into /etc/passwd at all, why would it need to.


the output you listed is titled "[unknown-user]". Maybe referring to the 
password file is a wrong simplification. The test is also on the master node of the 
parallel job by an usual `getpwuid`. The /etc/nsswitch.conf is fine an the `mpiexec` 
machine?

On this node the user is known too? Can they login because they have no 
passphrase or because they have an agent running, or did you setup hostbased 
authentication?


my user is known on each node and is allowed to log in (without 
password) from any to any node. In /etc/passwd there is no password for 
my user; all auth thins are done by NIS or LDAP. (sorry I cannot tell 
more because this is admin stuff, but as said: "ssh" works from any to 
any node without password).
/etc/nsswitch.conf seem to be fine (it works now with the 64bit version 
of mpiexec :o)







 It should just be using ssh.  Can you manually ssh to the same node?
On the old part of the cluster, where NIS is used as the autentification method, Sun MPI runs very fine. 

So, is Suns MPI compatible with LDAP autotentification method at all? 


In as far as whatever launcher you use is compatible with LDAP.
Best wishes, 

Paul 



P.S. in both parts if the cluster, me (login marked as x here) can login to any node by ssh without need to type the password. 



From the headnode of the cluster to a node or also between nodes?


-- Reuti





-- 
The user (x) is unknown to the system (i.e. there is no corresponding 
entry in the password file). Please contact your system administrator 
for a fix. 
-- 
[cluster-beta.rz.RWTH-Aachen.DE:31535] [[57885,0],0] ORTE_ERROR_LOG: Fatal in file plm_rsh_module.c at line 1058 
------ 


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] Cofigure(?) problem building /1.5.3 on ScientificLinux6.0

2011-07-22 Thread Paul Kapinos

Dear OpenMPI volks,
currently I have a problem by building the version 1.5.3 of OpenMPI on
Scientific Linux 6.0 systems, which seem vor me to be a configuration
problem.

After the configure run (which seem to terminate without error code),
the "gmake all" stage produces errors and exits.

Typical is the output below.

Fancy: the 1.4.3 version on same computer can be build with no special
trouble. Both the 1.4.3 and 1.5.3 versions can be build on other
computer running CentOS 5.6.

In each case I build 16 versions at all (4 compiler * 32bit/64bit *
support for multithreading ON/OFF). The same error arise in all 16 versions.

Can someone give a hint about how to avoid this issue? Thanks!

Best wishes,

Paul


Some logs and configure are downloadable here:
https://gigamove.rz.rwth-aachen.de/d/id/2jM6MEa2nveJJD

  The configure line is in RUNME.sh, the
logs of configure and build stage in log_* files; I also attached the
config.log file and the configure itself (which is the standard from the
1.5.3 release).


##


CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh
/tmp/pk224850/linuxc2_11254/openmpi-1.5.3mt_linux64_gcc/config/missing
--run aclocal-1.11 -I config
sh: config/ompi_get_version.sh: No such file or directory
/usr/bin/m4: esyscmd subprocess failed



configure.ac:953: warning: OMPI_CONFIGURE_SETUP is m4_require'd but not
m4_defun'd
config/ompi_mca.m4:37: OMPI_MCA is expanded from...
configure.ac:953: the top level
configure.ac:953: warning: AC_COMPILE_IFELSE was called before
AC_USE_SYSTEM_EXTENSIONS
../../lib/autoconf/specific.m4:386: AC_USE_SYSTEM_EXTENSIONS is expanded
from...
opal/mca/paffinity/hwloc/hwloc/config/hwloc.m4:152:
HWLOC_SETUP_CORE_AFTER_C99 is expanded from...
../../lib/m4sugar/m4sh.m4:505: AS_IF is expanded from...
opal/mca/paffinity/hwloc/hwloc/config/hwloc.m4:22: HWLOC_SETUP_CORE is
expanded from...
opal/mca/paffinity/hwloc/configure.m4:40: MCA_paffinity_hwloc_CONFIG is
expanded from...
config/ompi_mca.m4:540: MCA_CONFIGURE_M4_CONFIG_COMPONENT is expanded
from...
config/ompi_mca.m4:326: MCA_CONFIGURE_FRAMEWORK is expanded from...
config/ompi_mca.m4:247: MCA_CONFIGURE_PROJECT is expanded from...
configure.ac:953: warning: AC_RUN_IFELSE was called before
AC_USE_SYSTEM_EXTENSIONS




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915




smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] Usage of PGI compilers (Libtool or OpenMPI issue?)

2011-07-22 Thread Paul Kapinos

Hi,

just found out: the --instantiation_dir, --one_instantiation_per_object, 
and --template_info_file flags are deprecated in the newer versions of 
the PGI compilers, cf. http://www.pgroup.com/support/release_tprs_2010.htm


But, compiling OpenMPI /1.4.3 with 11.7 PGI compilers, I see the warnings:

pgCC-Warning-prelink_objects switch is deprecated
pgCC-Warning-instantiation_dir switch is deprecated

coming from the below-noted call.

I do not know about this is a Libtool or a libtool usage (=OpenMPI 
issue, but I do not want to keep secret this...


Best wishes
Paul Kapinos






libtool: link:  pgCC --prelink_objects --instantiation_dir Template.dir 
  .libs/mpicxx.o .libs/intercepts.o .libs/comm.o .libs/datatype.o 
.libs/win.o .libs/file.o   -Wl,--rpath 
-Wl,/tmp/pk224850/linuxc2_11254/openmpi-1.4.3_linux32_pgi/ompi/.libs 
-Wl,--rpath 
-Wl,/tmp/pk224850/linuxc2_11254/openmpi-1.4.3_linux32_pgi/orte/.libs 
-Wl,--rpath 
-Wl,/tmp/pk224850/linuxc2_11254/openmpi-1.4.3_linux32_pgi/opal/.libs 
-Wl,--rpath -Wl,/opt/MPI/openmpi-1.4.3/linux/pgi/lib/lib32 
-L/tmp/pk224850/linuxc2_11254/openmpi-1.4.3_linux32_pgi/orte/.libs 
-L/tmp/pk224850/linuxc2_11254/openmpi-1.4.3_linux32_pgi/opal/.libs 
-L/opt/lsf/8.0/linux2.6-glibc2.3-x86/lib ../../../ompi/.libs/libmpi.so 
/tmp/pk224850/linuxc2_11254/openmpi-1.4.3_linux32_pgi/orte/.libs/libopen-rte.so 
/tmp/pk224850/linuxc2_11254/openmpi-1.4.3_linux32_pgi/opal/.libs/libopen-pal.so 
-ldl -lnsl -lutil



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] and the next one (3th today!) PGI+OpenMPI issue

2011-07-22 Thread Paul Kapinos
... just do almost the same thing: Try to install OpenMPI 1.4.3 using 
11.7 PGI Compiler on Scientific Linux 6.0. The same place, but other 
error message:

--
/usr/lib64/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
gmake[2]: *** [libmpi_cxx.la] Error 2
gmake[2]: Leaving directory 
`/tmp/pk224850/linuxc2_11254/openmpi-1.4.3_linux64_pgi/ompi/mpi/cxx'

--

and then the compilation aborted. Configure string below. With the 
Intel, gcc and Studio compiles, the very same installations were happily 
through.


Maybe someone can give me a hint about this is an issue by openmpi, pgi 
or somehow else...


Best wishes,

Paul

P.S.

again, more logs downloadable:
https://gigamove.rz.rwth-aachen.de/d/id/WNk69nPr4w7svT


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] Cofigure(?) problem building /1.5.3 on ScientificLinux6.0

2011-07-22 Thread Paul Kapinos

Hi Ralph,

Higher rev levels of the autotools are required for the 1.5 series - are you at 
the right ones? See
http://www.open-mpi.org/svn/building.php


Many thanks for the link.
Short test, and it's out: autoconf version on our release is too old. We 
have 2.63 and needed ist 2.65.


I will trigger our admins...

Best wishes,

Paul




m4 (GNU M4) 1.4.13 (OK)
autoconf (GNU Autoconf) 2.63 (Need: 2.65, NOK)
automake (GNU automake) 1.11.1 (OK)
ltmain.sh (GNU libtool) 2.2.6b (OK)





On Jul 22, 2011, at 9:12 AM, Paul Kapinos wrote:


Dear OpenMPI volks,
currently I have a problem by building the version 1.5.3 of OpenMPI on
Scientific Linux 6.0 systems, which seem vor me to be a configuration
problem.

After the configure run (which seem to terminate without error code),
the "gmake all" stage produces errors and exits.

Typical is the output below.

Fancy: the 1.4.3 version on same computer can be build with no special
trouble. Both the 1.4.3 and 1.5.3 versions can be build on other
computer running CentOS 5.6.

In each case I build 16 versions at all (4 compiler * 32bit/64bit *
support for multithreading ON/OFF). The same error arise in all 16 versions.

Can someone give a hint about how to avoid this issue? Thanks!

Best wishes,

Paul


Some logs and configure are downloadable here:
https://gigamove.rz.rwth-aachen.de/d/id/2jM6MEa2nveJJD

 The configure line is in RUNME.sh, the
logs of configure and build stage in log_* files; I also attached the
config.log file and the configure itself (which is the standard from the
1.5.3 release).


##


CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh
/tmp/pk224850/linuxc2_11254/openmpi-1.5.3mt_linux64_gcc/config/missing
--run aclocal-1.11 -I config
sh: config/ompi_get_version.sh: No such file or directory
/usr/bin/m4: esyscmd subprocess failed



configure.ac:953: warning: OMPI_CONFIGURE_SETUP is m4_require'd but not
m4_defun'd
config/ompi_mca.m4:37: OMPI_MCA is expanded from...
configure.ac:953: the top level
configure.ac:953: warning: AC_COMPILE_IFELSE was called before
AC_USE_SYSTEM_EXTENSIONS
../../lib/autoconf/specific.m4:386: AC_USE_SYSTEM_EXTENSIONS is expanded
from...
opal/mca/paffinity/hwloc/hwloc/config/hwloc.m4:152:
HWLOC_SETUP_CORE_AFTER_C99 is expanded from...
../../lib/m4sugar/m4sh.m4:505: AS_IF is expanded from...
opal/mca/paffinity/hwloc/hwloc/config/hwloc.m4:22: HWLOC_SETUP_CORE is
expanded from...
opal/mca/paffinity/hwloc/configure.m4:40: MCA_paffinity_hwloc_CONFIG is
expanded from...
config/ompi_mca.m4:540: MCA_CONFIGURE_M4_CONFIG_COMPONENT is expanded
from...
config/ompi_mca.m4:326: MCA_CONFIGURE_FRAMEWORK is expanded from...
config/ompi_mca.m4:247: MCA_CONFIGURE_PROJECT is expanded from...
configure.ac:953: warning: AC_RUN_IFELSE was called before
AC_USE_SYSTEM_EXTENSIONS




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] problems with Intel 12.x compilers and OpenMPI (1.4.3)

2011-09-23 Thread Paul Kapinos

Hi Open MPI volks,

we see some quite strange effects with our installations of Open MPI 
1.4.3 with Intel 12.x compilers, which makes us puzzling: Different 
programs reproducibly deadlock or die with errors alike the below-listed 
ones.


Some of the errors looks alike programming issue at first look (well, a 
deadlock *is* usually a programming error) but we do not believe it is 
so: the errors arise in many well-tested codes including HPL (*) and 
only with a special compiler + Open MPI version (Intel 12.x compiler + 
open MPI 1.4.3) and only on special number of processes (usually high 
ones). For example, HPL reproducible deadlocks with 72 procs and dies 
with error message #2 with 384 processes.


All this errors seem to be somehow related to MPI communicators; and 
1.4.4rc3 and in 1.5.3 and 1.5.4 seem not to have this problem. Also 
1.4.3 if using together with Intel 11.x compielr series seem to be 
unproblematic. So probably this:


(1.4.4 release notes:)
- Fixed a segv in MPI_Comm_create when called with GROUP_EMPTY.
  Thanks to Dominik Goeddeke for finding this.

is also fix for our issues? Or maybe not, because 1.5.3 is _older_ than 
this fix?


As far as we workarounded the problem by switching our production to 
1.5.3 this issue is not a "burning" one; but I decieded still to post 
this because any issue on such fundamental things may be interesting for 
developers.


Best wishes,
Paul Kapinos


(*) http://www.netlib.org/benchmark/hpl/


Fatal error in MPI_Comm_size: Invalid communicator, error stack:
MPI_Comm_size(111): MPI_Comm_size(comm=0x0, size=0x6f4a90) failed
MPI_Comm_size(69).: Invalid communicator


[linuxbdc05.rz.RWTH-Aachen.DE:23219] *** An error occurred in MPI_Comm_split
[linuxbdc05.rz.RWTH-Aachen.DE:23219] *** on communicator MPI 
COMMUNICATOR 3 SPLIT FROM 0
[linuxbdc05.rz.RWTH-Aachen.DE:23219] *** MPI_ERR_IN_STATUS: error code 
in status
[linuxbdc05.rz.RWTH-Aachen.DE:23219] *** MPI_ERRORS_ARE_FATAL (your MPI 
job will now abort)



forrtl: severe (71): integer divide by zero
Image PC Routine Line Source
libmpi.so.0 2D9EDF52 Unknown Unknown Unknown
libmpi.so.0 2D9EE45D Unknown Unknown Unknown
libmpi.so.0 2D9C3375 Unknown Unknown Unknown
libmpi_f77.so.0 2D75C37A Unknown Unknown Unknown
vasp_mpi_gamma 0057E010 Unknown Unknown Unknown
vasp_mpi_gamma 0059F636 Unknown Unknown Unknown
vasp_mpi_gamma 00416C5A Unknown Unknown Unknown
vasp_mpi_gamma 00A62BEE Unknown Unknown Unknown
libc.so.6 003EEB61EC5D Unknown Unknown Unknown
vasp_mpi_gamma 00416A29 Unknown Unknown Unknown


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] How are the Open MPI processes spawned?

2011-11-21 Thread Paul Kapinos

Hello Open MPI volks,

We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand cluster, 
and we have some strange hangups if starting OpenMPI processes.


The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna due of 
 offline nodes). Each node is accessible from each other over SSH 
(without password), also MPI programs between any two nodes are checked 
to run.



So long, I tried to start some bigger number of processes, one process 
per node:

$ mpiexec -np NN  --host linuxbsc001,linuxbsc002,... MPI_FastTest.exe

Now the problem: there are some constellations of names in the host list 
on which mpiexec reproducible hangs forever; and more surprising: other 
*permutation* of the *same* node names may run without any errors!


Example: the command in laueft.txt runs OK, the command in haengt.txt 
hangs. Note: the only difference is that the node linuxbsc025 is put on 
the end of the host list. Amazed, too?


Looking on the particular nodes during the above mpiexec hangs, we found 
the orted daemons started on *each* node and the binary on all but one 
node (orted.txt, MPI_FastTest.txt).
Again amazing that the node with no user process started (leading to 
hangup in MPI_Init of all processes and thus to hangup, I believe) was 
always the same, linuxbsc005, which is NOT the permuted item linuxbsc025...


This behaviour is reproducible. The hang-on only occure if the started 
application is a MPI application ("hostname" does not hang).



Any Idea what is gonna on?


Best,

Paul Kapinos


P.S: no alias names used, all names are real ones







--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
linuxbsc001: STDOUT: 24323 ?SLl0:00 MPI_FastTest.exe
linuxbsc002: STDOUT:  2142 ?SLl0:00 MPI_FastTest.exe
linuxbsc003: STDOUT: 69266 ?SLl0:00 MPI_FastTest.exe
linuxbsc004: STDOUT: 58899 ?SLl0:00 MPI_FastTest.exe
linuxbsc006: STDOUT: 68255 ?SLl0:00 MPI_FastTest.exe
linuxbsc007: STDOUT: 62026 ?SLl0:00 MPI_FastTest.exe
linuxbsc008: STDOUT: 54221 ?SLl0:00 MPI_FastTest.exe
linuxbsc009: STDOUT: 55482 ?SLl0:00 MPI_FastTest.exe
linuxbsc010: STDOUT: 59380 ?SLl0:00 MPI_FastTest.exe
linuxbsc011: STDOUT: 58312 ?SLl0:00 MPI_FastTest.exe
linuxbsc014: STDOUT: 56013 ?SLl0:00 MPI_FastTest.exe
linuxbsc016: STDOUT: 58563 ?SLl0:00 MPI_FastTest.exe
linuxbsc017: STDOUT: 54693 ?SLl0:00 MPI_FastTest.exe
linuxbsc018: STDOUT: 54187 ?SLl0:00 MPI_FastTest.exe
linuxbsc020: STDOUT: 55811 ?SLl0:00 MPI_FastTest.exe
linuxbsc021: STDOUT: 54982 ?SLl0:00 MPI_FastTest.exe
linuxbsc022: STDOUT: 50032 ?SLl0:00 MPI_FastTest.exe
linuxbsc023: STDOUT: 54044 ?SLl0:00 MPI_FastTest.exe
linuxbsc024: STDOUT: 51247 ?SLl0:00 MPI_FastTest.exe
linuxbsc025: STDOUT: 18575 ?SLl0:00 MPI_FastTest.exe
linuxbsc027: STDOUT: 48969 ?SLl0:00 MPI_FastTest.exe
linuxbsc028: STDOUT: 52397 ?SLl0:00 MPI_FastTest.exe
linuxbsc029: STDOUT: 52780 ?SLl0:00 MPI_FastTest.exe
linuxbsc030: STDOUT: 47537 ?SLl0:00 MPI_FastTest.exe
linuxbsc031: STDOUT: 54609 ?SLl0:00 MPI_FastTest.exe
linuxbsc032: STDOUT: 52833 ?SLl0:00 MPI_FastTest.exe
$ timex /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec -np 27  --host 
linuxbsc001,linuxbsc002,linuxbsc003,linuxbsc004,linuxbsc005,linuxbsc006,linuxbsc007,linuxbsc008,linuxbsc009,linuxbsc010,linuxbsc011,linuxbsc014,linuxbsc016,linuxbsc017,linuxbsc018,linuxbsc020,linuxbsc021,linuxbsc022,linuxbsc023,linuxbsc024,linuxbsc025,linuxbsc027,linuxbsc028,linuxbsc029,linuxbsc030,linuxbsc031,linuxbsc032
 
MPI_FastTest.exe
$ timex /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec -np 27  --host 
linuxbsc001,linuxbsc002,linuxbsc003,linuxbsc004,linuxbsc005,linuxbsc006,linuxbsc007,linuxbsc008,linuxbsc009,linuxbsc010,linuxbsc011,linuxbsc014,linuxbsc016,linuxbsc017,linuxbsc018,linuxbsc020,linuxbsc021,linuxbsc022,linuxbsc023,linuxbsc024,linuxbsc027,linuxbsc028,linuxbsc029,linuxbsc030,linuxbsc031,linuxbsc032,linuxbsc025
 
MPI_FastTest.exe
linuxbsc001: STDOUT: 24322 ?Ss 0:00 
/opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess env -mca 
orte_ess_jobid 751435776 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 28 
--hnp-uri 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh
linuxbsc002: STDOUT:  2141 ?Ss 0:00 
/opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess env -mca 
orte_ess_jobid 751435776 -mca orte_ess_vpid 2 -mca orte_ess_num_procs 28 
--hnp-uri 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh
linuxbsc003: STDOUT: 69265 ?Ss 0:00 
/opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess env -mca 
orte_ess_jobid 751435776 -mca orte_ess_v

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-22 Thread Paul Kapinos

Hello Ralph, hello all.


No real ideas, I'm afraid. We regularly launch much larger jobs than that using 
ssh without problem,
I was also able to run a 288-node-job yesterday - the size alone is not 
the problem...




so it is likely something about the local setup of that node that is causing the problem. 
Offhand, it sounds like either the mapper isn't getting things right, or for some reason 
the daemon on 005 isn't properly getting or processing the launch command.


What you could try is adding --display-map to see if the map is being correctly 
generated.
> If that works, then (using a debug build) try adding 
--leave-session-attached and see if

> any daemons are outputting an error.


You could add -mca odls_base_verbose 5 --leave-session-attached to your cmd line. 

> You'll see debug output from each daemon as it receives and processes

the launch command.  See if the daemon on 005 is behaving differently than the 
others.


I've tried the options.
The map seem to be correctly build, also the output if the daemons seem 
to be the same (see helloworld.txt)



You should also try putting that long list of nodes in a hostfile - see if that 
makes a difference.
> It will process the nodes thru a different code path, so if there is 
some problem in --host,

this will tell us.


No, with the host file instead of host list on command line the 
behaviour is the same.


But, I just found out that the 1.4.3 does *not* hang on this 
constellation. The next thing I will try will be the installation of 
1.5.4 :o)


Best,

Paul

P.S. started:

$ /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec --hostfile 
hostfile-mini -mca odls_base_verbose 5 --leave-session-attached 
--display-map  helloworld 2>&1 | tee helloworld.txt







On Nov 21, 2011, at 9:33 AM, Paul Kapinos wrote:


Hello Open MPI volks,

We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand cluster, and we 
have some strange hangups if starting OpenMPI processes.

The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna due of  
offline nodes). Each node is accessible from each other over SSH (without 
password), also MPI programs between any two nodes are checked to run.


So long, I tried to start some bigger number of processes, one process per node:
$ mpiexec -np NN  --host linuxbsc001,linuxbsc002,... MPI_FastTest.exe

Now the problem: there are some constellations of names in the host list on 
which mpiexec reproducible hangs forever; and more surprising: other 
*permutation* of the *same* node names may run without any errors!

Example: the command in laueft.txt runs OK, the command in haengt.txt hangs. 
Note: the only difference is that the node linuxbsc025 is put on the end of the 
host list. Amazed, too?

Looking on the particular nodes during the above mpiexec hangs, we found the 
orted daemons started on *each* node and the binary on all but one node 
(orted.txt, MPI_FastTest.txt).
Again amazing that the node with no user process started (leading to hangup in 
MPI_Init of all processes and thus to hangup, I believe) was always the same, 
linuxbsc005, which is NOT the permuted item linuxbsc025...

This behaviour is reproducible. The hang-on only occure if the started application is a 
MPI application ("hostname" does not hang).


Any Idea what is gonna on?


Best,

Paul Kapinos


P.S: no alias names used, all names are real ones







--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
linuxbsc001: STDOUT: 24323 ?SLl0:00 MPI_FastTest.exe
linuxbsc002: STDOUT:  2142 ?SLl0:00 MPI_FastTest.exe
linuxbsc003: STDOUT: 69266 ?SLl0:00 MPI_FastTest.exe
linuxbsc004: STDOUT: 58899 ?SLl0:00 MPI_FastTest.exe
linuxbsc006: STDOUT: 68255 ?SLl0:00 MPI_FastTest.exe
linuxbsc007: STDOUT: 62026 ?SLl0:00 MPI_FastTest.exe
linuxbsc008: STDOUT: 54221 ?SLl0:00 MPI_FastTest.exe
linuxbsc009: STDOUT: 55482 ?SLl0:00 MPI_FastTest.exe
linuxbsc010: STDOUT: 59380 ?SLl0:00 MPI_FastTest.exe
linuxbsc011: STDOUT: 58312 ?SLl0:00 MPI_FastTest.exe
linuxbsc014: STDOUT: 56013 ?SLl0:00 MPI_FastTest.exe
linuxbsc016: STDOUT: 58563 ?SLl0:00 MPI_FastTest.exe
linuxbsc017: STDOUT: 54693 ?SLl0:00 MPI_FastTest.exe
linuxbsc018: STDOUT: 54187 ?SLl0:00 MPI_FastTest.exe
linuxbsc020: STDOUT: 55811 ?SLl0:00 MPI_FastTest.exe
linuxbsc021: STDOUT: 54982 ?SLl0:00 MPI_FastTest.exe
linuxbsc022: STDOUT: 50032 ?SLl0:00 MPI_FastTest.exe
linuxbsc023: STDOUT: 54044 ?SLl0:00 MPI_FastTest.exe
linuxbsc024: STDOUT: 51247 ?SLl0:00 MPI_FastTest.exe
linuxbsc025: STDOUT: 18575 ?SLl0:00 MPI_FastTest.exe
linuxbsc027: STDOUT: 48969 ?SLl0:00 MPI_FastTest.exe
linuxbsc028: STDOU

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-23 Thread Paul Kapinos

Hello Ralph, hello all,

Two news, as usual a good and a bad one.

The good: we believe to find out *why* it hangs

The bad: it seem for me, this is a bug or at least undocumented feature 
of Open MPI /1.5.x.


In detail:
As said, we see mystery hang-ups if starting on some nodes using some 
permutation of hostnames. Usually removing "some bad" nodes helps, 
sometimes a permutation of node names in the hostfile is enough(!). The 
behaviour is reproducible.


The machines have at least 2 networks:

*eth0* is used for installation, monitoring, ... - this ethernet is very 
slim


*ib0* - is the "IP over IB" interface and is used for everything: the 
file systems, ssh and so on. The hostnames are bound to the ib0 network; 
our idea was not to use eth0 for MPI at all.


all machines are available from any over ib0 (are in one network).

But on eth0 there are at least two different networks; especially the 
computer linuxbsc025 is in different network than the others and is not 
reachable from other nodes over eth0! (but reachable over ib0. The name 
used in the hostfile is resolved to the IP of ib0 ).


So I believe that Open MPI /1.5.x tries to communicate over eth0 and 
cannot do it, and hangs. The /1.4.3 does not hang, so this issue is 
1.5.x-specific (seen in 1.5.3 and 1.5.4). A bug?


I also tried to disable the eth0 completely:

$ mpiexec -mca btl_tcp_if_exclude eth0,lo  -mca btl_tcp_if_include ib0 ...

...but this does not help. All right, the above command should disable 
the usage of eth0 for MPI communication itself, but it hangs just before 
the MPI is started, isn't it? (because one process lacks, the MPI_INIT 
cannot be passed)


Now a question: is there a way to forbid the mpiexec to use some 
interfaces at all?


Best wishes,

Paul Kapinos

P.S. Of course we know about the good idea to bring all nodes into the 
same net on eth0, but at this point it is impossible due of technical 
reason[s]...


P.S.2 I'm not sure that the issue is really rooted in the above 
mentioned misconfiguration of eth0, but I have no better idea at this 
point...




The map seem to be correctly build, also the output if the daemons seem to be 
the same (see helloworld.txt)


Unfortunately, it appears that OMPI was not built with --enable-debug as there 
is no debug info in the output. Without a debug installation of OMPI, the 
ability to determine the problem is pretty limited.


well, this will be the next option we will activate. We also have 
another issue here, on (not) using uDAPL..







You should also try putting that long list of nodes in a hostfile - see if that 
makes a difference.
It will process the nodes thru a different code path, so if there is some 
problem in --host,
this will tell us.

No, with the host file instead of host list on command line the behaviour is 
the same.

But, I just found out that the 1.4.3 does *not* hang on this constellation. The 
next thing I will try will be the installation of 1.5.4 :o)

Best,

Paul

P.S. started:

$ /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec --hostfile hostfile-mini -mca 
odls_base_verbose 5 --leave-session-attached --display-map  helloworld 2>&1 | 
tee helloworld.txt




On Nov 21, 2011, at 9:33 AM, Paul Kapinos wrote:

Hello Open MPI volks,

We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand cluster, and we 
have some strange hangups if starting OpenMPI processes.

The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna due of  
offline nodes). Each node is accessible from each other over SSH (without 
password), also MPI programs between any two nodes are checked to run.


So long, I tried to start some bigger number of processes, one process per node:
$ mpiexec -np NN  --host linuxbsc001,linuxbsc002,... MPI_FastTest.exe

Now the problem: there are some constellations of names in the host list on 
which mpiexec reproducible hangs forever; and more surprising: other 
*permutation* of the *same* node names may run without any errors!

Example: the command in laueft.txt runs OK, the command in haengt.txt hangs. 
Note: the only difference is that the node linuxbsc025 is put on the end of the 
host list. Amazed, too?

Looking on the particular nodes during the above mpiexec hangs, we found the 
orted daemons started on *each* node and the binary on all but one node 
(orted.txt, MPI_FastTest.txt).
Again amazing that the node with no user process started (leading to hangup in 
MPI_Init of all processes and thus to hangup, I believe) was always the same, 
linuxbsc005, which is NOT the permuted item linuxbsc025...

This behaviour is reproducible. The hang-on only occure if the started application is a 
MPI application ("hostname" does not hang).


Any Idea what is gonna on?


Best,

Paul Kapinos


P.S: no alias names used, all names are real ones







--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-24 Thread Paul Kapinos

Hello Ralph, Terry, all!

again, two news: the good one and the second one.

Ralph Castain wrote:
Yes, that would indeed break things. The 1.5 series isn't correctly 
checking connections across multiple interfaces until it finds one that 
works - it just uses the first one it sees. :-(


Yahhh!!
This behaviour - catch a random interface and hang forever if something 
is wrong with it - is somewhat less than perfect.


From my perspective - the users one - OpenMPI should try to use eitcher 
*all* available networks (as 1.4 it does...), starting with the high 
performance ones, or *only* those interfaces on which the hostnames from 
the hostfile are bound to.


Also, there should be timeouts (if you cannot connect to a node within a 
minute you probably will never ever be connected...)


If some connection runs into a timeout a warning would be great (and a 
hint to take off the interface by oob_tcp_if_exclude, btl_tcp_if_exclude).


Should it not?
Maybe you can file it as a "call for enhancement"...



The solution is to specify -mca oob_tcp_if_include ib0. This will direct 
the run-time wireup across the IP over IB interface.


You will also need the -mca btl_tcp_if_include ib0 as well so the MPI 
comm goes exclusively over that network. 


YES! This works. Adding
-mca oob_tcp_if_include ib0 -mca btl_tcp_if_include ib0
to the command line of mpiexec helps me to run the 1.5.x programs, so I 
believe this is the workaround.


Many thanks for this hint, Ralph! My fail to not to find it in the FAQ 
(I was so close :o) http://www.open-mpi.org/faq/?category=tcp#tcp-selection


But then I ran into yet another one issue. In 
http://www.open-mpi.org/faq/?category=tuning#setting-mca-params

the way to define MCA parameters over environment variables is described.

I tried it:
$ export OMPI_MCA_oob_tcp_if_include=ib0
$ export OMPI_MCA_btl_tcp_if_include=ib0


I checked it:
$ ompi_info --param all all | grep oob_tcp_if_include
 MCA oob: parameter "oob_tcp_if_include" (current 
value: , data source: environment or cmdline)

$ ompi_info --param all all | grep btl_tcp_if_include
 MCA btl: parameter "btl_tcp_if_include" (current 
value: , data source: environment or cmdline)



But then I get again the hang-up issue!

==> seem, mpiexec does not understand these environment variables! and 
only get the command line options. This should not be so?


(I also tried to advise to provide the envvars by -x 
OMPI_MCA_oob_tcp_if_include -x OMPI_MCA_btl_tcp_if_include - nothing 
changed. Well, they are OMPI_ variables and should be provided in any case).



Best wishes and many thanks for all,

Paul Kapinos




Specifying both include and 
exclude should generate an error as those are mutually exclusive options 
- I think this was also missed in early 1.5 releases and was recently 
patched.


HTH
Ralph


On Nov 23, 2011, at 12:14 PM, TERRY DONTJE wrote:


On 11/23/2011 2:02 PM, Paul Kapinos wrote:

Hello Ralph, hello all,

Two news, as usual a good and a bad one.

The good: we believe to find out *why* it hangs

The bad: it seem for me, this is a bug or at least undocumented 
feature of Open MPI /1.5.x.


In detail:
As said, we see mystery hang-ups if starting on some nodes using some 
permutation of hostnames. Usually removing "some bad" nodes helps, 
sometimes a permutation of node names in the hostfile is enough(!). 
The behaviour is reproducible.


The machines have at least 2 networks:

*eth0* is used for installation, monitoring, ... - this ethernet is 
very slim


*ib0* - is the "IP over IB" interface and is used for everything: the 
file systems, ssh and so on. The hostnames are bound to the ib0 
network; our idea was not to use eth0 for MPI at all.


all machines are available from any over ib0 (are in one network).

But on eth0 there are at least two different networks; especially the 
computer linuxbsc025 is in different network than the others and is 
not reachable from other nodes over eth0! (but reachable over ib0. 
The name used in the hostfile is resolved to the IP of ib0 ).


So I believe that Open MPI /1.5.x tries to communicate over eth0 and 
cannot do it, and hangs. The /1.4.3 does not hang, so this issue is 
1.5.x-specific (seen in 1.5.3 and 1.5.4). A bug?


I also tried to disable the eth0 completely:

$ mpiexec -mca btl_tcp_if_exclude eth0,lo  -mca btl_tcp_if_include 
ib0 ...


I believe if you give "-mca btl_tcp_if_include ib0" you do not need to 
specify the exclude parameter.
...but this does not help. All right, the above command should 
disable the usage of eth0 for MPI communication itself, but it hangs 
just before the MPI is started, isn't it? (because one process lacks, 
the MPI_INIT cannot be passed)


By "just before the MPI is started" do you mean while orte is 
launching the processes.
I wonder if you need to specify "-mca oob_tcp_if_include ib0" also but 
I think that may depe

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-25 Thread Paul Kapinos

Hello again,


Ralph Castain wrote:

Yes, that would indeed break things. The 1.5 series isn't correctly checking 
connections across multiple interfaces until it finds one that works - it just 
uses the first one it sees. :-(

Yahhh!!
This behaviour - catch a random interface and hang forever if something is 
wrong with it - is somewhat less than perfect.

From my perspective - the users one - OpenMPI should try to use eitcher *all* 
available networks (as 1.4 it does...), starting with the high performance 
ones, or *only* those interfaces on which the hostnames from the hostfile are 
bound to.


It is indeed supposed to do the former - as I implied, this is a bug in the 1.5 
series.


Thanks for clarification. I was not sure about this is a bug or a 
feature :-)





Also, there should be timeouts (if you cannot connect to a node within a minute 
you probably will never ever be connected...)


We have debated about this for some time - there is a timeout mca param one can 
set, but we'll consider again making it default.


If some connection runs into a timeout a warning would be great (and a hint to 
take off the interface by oob_tcp_if_exclude, btl_tcp_if_exclude).

Should it not?
Maybe you can file it as a "call for enhancement"...


Probably the right approach at this time.


Ahhh.. sorry, did not understand what you mean.
Did you filed it, or someone else, or should I do it in some way? Or 
should not?








But then I ran into yet another one issue. In 
http://www.open-mpi.org/faq/?category=tuning#setting-mca-params
the way to define MCA parameters over environment variables is described.

I tried it:
$ export OMPI_MCA_oob_tcp_if_include=ib0
$ export OMPI_MCA_btl_tcp_if_include=ib0


I checked it:
$ ompi_info --param all all | grep oob_tcp_if_include
MCA oob: parameter "oob_tcp_if_include" (current value: , 
data source: environment or cmdline)
$ ompi_info --param all all | grep btl_tcp_if_include
MCA btl: parameter "btl_tcp_if_include" (current value: , 
data source: environment or cmdline)


But then I get again the hang-up issue!

==> seem, mpiexec does not understand these environment variables! and only get 
the command line options. This should not be so?


No, that isn't what is happening. The problem lies in the behavior of rsh/ssh. 
This environment does not forward environmental variables. Because of limits on 
cmd line length, we don't automatically forward MCA params from the 
environment, but only from the cmd line. It is an annoying limitation, but one 
outside our control.


We know about "ssh does not forward environmental variables." But in 
this case, are these parameters not the parameters of mpiexec itself, too?


The crucial thing is, that setting of the parameters works over the 
command line but *does not work* over the envvar way (as in 
http://www.open-mpi.org/faq/?category=tuning#setting-mca-params 
described). This looks like a bug for me!








Put those envars in the default mca param file and the problem will be resolved.


You mean e.g. $prefix/etc/openmpi-mca-params.conf as described in 4. of 
http://www.open-mpi.org/faq/?category=tuning#setting-mca-params


Well, this is possible, but not flexible enough for us (because there 
are some machines which only can run if the parameters are *not* set - 
on those the ssh goes just over these eth0 devices).


By now we use the command line parameters and hope the envvar way will 
work sometimes.




(I also tried to advise to provide the envvars by -x 
OMPI_MCA_oob_tcp_if_include -x OMPI_MCA_btl_tcp_if_include - nothing changed.


I'm surprised by that - they should be picked up and forwarded. Could be a bug


Well, I also mean this is a bug, but as said not on providing the values 
of envvars but on detecting of these parameters at all. Or maybe on both.






Well, they are OMPI_ variables and should be provided in any case).


No, they aren't - they are not treated differently than any other envar.


[after performing some RTFM...]
at least the man page of mpiexec says, the OMPI_ environment variables 
are always provided and thus treated *differently* than other envvars:


$ man mpiexec

 Exported Environment Variables
   All environment variables that are named in the form OMPI_* will 
 automatically  be  exported to new processes on the local and remote 
nodes.


So, tells the man page lies, or this is an removed feature, or something 
else?



Best wishes,

Paul Kapinos






Specifying both include and exclude should generate an error as those are 
mutually exclusive options - I think this was also missed in early 1.5 releases 
and was recently patched.
HTH
Ralph
On Nov 23, 2011, at 12:14 PM, TERRY DONTJE wrote:

On 11/23/2011 2:02 PM, Paul Kapinos wrote:

Hello Ralph, hello all,

Two news, as usual a good and a bad one.

The good: we believe to find out *why* it hangs

The bad: it seem for me, this is a bug or at lea

[OMPI users] Open MPI and DAPL 2.0.34 are incompatible?

2011-12-02 Thread Paul Kapinos

Dear Open MPI developer,

OFED 1.5.4 will contain DAPL 2.0.34.

I tried to compile the newest release of Open MPI (1.5.4) with this DAPL 
release and I was not successful.


Configuring with --with-udapl=/path/to/2.0.34/dapl
got the error "/path/to/2.0.34/dapl/include/dat/udat.h not found"
Looking into include dir: there is no 'dat' subdir but 'dat2'.

Just for fun I also tried to move 'dat2' to 'dat' back (dirty hack I 
know :-) - the configure stage was then successful but the compilation 
failed. The header seem to be really changed, not just moved.


The question: are the Open MPI developer aware of this changes, and when 
a version of Open MPI will be available with support for DAPL 2.0.34?


(Background: we have some trouble with Intel MPI and current DAPL which 
we do not have with DAPL 2.0.34, so our dream is to update as soon as 
possible)


Best wishes and an nice weekend,

Paul






http://www.openfabrics.org/downloads/OFED/release_notes/OFED_1.5.4_release_notes




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] How are the Open MPI processes spawned?

2011-12-06 Thread Paul Kapinos

Hello Jeff, Ralph, all!


Meaning that per my output from above, what Paul was trying should have worked, no?  
I.e., setenv'ing OMPI_, and those env vars should magically show up 
in the launched process.

In the -launched process- yes. However, his problem was that they do not show 
up for the -orteds-, and thus the orteds don't wireup correctly.


Sorry for latency, too many issues on too many area needing improvement :-/
Well, just to clarify the long story about what I have seen:

1. got a strange start-up problem (based on bogus configuration of eth0 
+  known (for you, experts :o) bug in /1.5.x


2. got a workaround for (1.) by setting  '-mca oob_tcp_if_include ib0 
-mca btl_tcp_if_include ib0' on the command line of mpiexec => WORKS! 
Many thanks guys!


3. remembered that any MCA Parameters can also be defined over 
OMP_MCA_... envvars, tried out to set => works NOT, the hang-ups were 
again here. Checking how the MCA parameters are set by ompi_info - all 
clear, but doesn't work. My blind guess was, mpiexec does not understood 
there envvars in this case.

See also http://www.open-mpi.org/community/lists/users/2011/11/17823.php

Thus this issue is not about forwarding some or any OMPI_* envvars to 
the _processes_, but on someone step _before_ (the processes were not 
started correctly at all in my problem case), as Ralph wrote.


The difference in behaviour if setting parameters on command line or 
over OMPI_*envvars matters!



Ralph Castain wrote:
>> Did you filed it, or someone else, or should I do it in some way?
> I'll take care of it, and copy you on the ticket so you can see
> what happens. I'll also do the same for the connection bug
> - sorry for the problem :-(

Ralph, many thanks for this!

Best wishes and a nice evening/day/whatever time you have!

Paul Kapinos




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] wiki and "man mpirun" odds, and a question

2011-12-06 Thread Paul Kapinos

Hello,


I don't see what you're referring to.  I see:

-
• -x : The name of an environment variable to export to the 
parallel application. The -x option can be specified multiple times to export 
multiple environment variables to the parallel application.
-

(ok, I might have just changed it :-) )


nice joke! :o)

> Queuing systems can forward the submitters environment if desired.
> For example, in SGE, the -V switch forwards all the environment
> variables to the job's environment, so if there's one you can use
> to launch your job, you might want to check it's documentation.

This is known and not an option for us. There are too much variables in 
the interactive environment which should not be forwarded...


What I asked for is something which could replace

mpiexec -x FOO -x BAR -x FOBA -x BAFO -x RR -x ZZ ..

(which is quite tedious to type and error-prone for the users) by 
setting some dreamlike value, e.g.


export OMPI_PROVIDE_THIS_VARIABLES="FOO BAR FOBA BAFO RR ZZ"

At least some envvar, whose content would be simply added to the comand 
line, could help:

export OMPI_ADD_2_COMMLINE="-x FOO -x BAR -x FOBA -x BAFO -x RR -x ZZ"

Well, this are my user's dreams; but maybe this give an inspiration for 
Open MPI programmers. As said, the situation when a [long] list of 
envvars is to be provided is quite common, and typing everything on the 
command line is tedious and error-prone.


Best wishes [and sorry for the noise],

Paul Kapinos

--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] Open MPI and DAPL 2.0.34 are incompatible?

2011-12-06 Thread Paul Kapinos

Good morning,

We've never recommended the use of dapl on Linux.  
I think it might have worked at one time, but I don't think anyone bothered to maintain it.  


On Linux, you should probably use native verbs support, instead.


Well, we use 'Open MPI + openib' since some years now (started with 
Sun's ClusterTools and Open MPI 1.2.x, now we have self-build 1.4.x and 
1.5.x Open MPI).


The problem is, that on our new, big, sexy cluster (some 1700 nodes 
connected to common QDR InfiniBand fabric), running MPI over DAPL seem 
to be quite faster than running over native IB.  Yes, it is puzzling.


But  reproducible:
Intel MPI (over DAPL) => 100%
OpenMPI (over openib) => 90% on some 4/5 machines (Westmere dual-Socket)
OpenMPI (over openib) => 45% on some 1/5 machines (Nehalem quad-Socket)
Intel MPI (over ofa) ==> the same values than OpenMPI!

(Bandwidth in a PingPong test, e.g. Intel MPI benchmark, and two other 
PingPongs)


The question about WHY native IB is slower than DAPL is a very good one 
(did you have any ideas?). As said it is reproducible: switching from 
dapl to ofa in Intel MPI also switches the performance of PingPong.


(You may say "your test is wrong" but we tried out three different 
PingPong tests, producing very similar values).


The second question is How to Learn OpenMPI to Use DAPL.

Meanwhile, I compiled lotz of versions (1.4.3, 1.4.4, 1.5.3, 1.5.4) 
using at least two DAPL versions and option --with-udapl. The versions 
are build well, but always on start, the initialisation of DAPL fails 
(message see below) and the communication runs as usual over openib.


Also the error message says "may be an invalid Registry in the dat.conf 
file", this seem to be very unlikely: with the same dat.conf the Intel 
MPI can use DAPL. (and yes, OpenMPI really use the same dat.conf than 
Intel MPI, set over DAT_OVERRIDE - checked and double-checked).


--
WARNING: Failed to open "ofa-v2-mlx4_0-1u" 
[DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].

This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--

Because of the anticipated performance gain we would be very keen on 
using DAPL with Open MPI. Does somebody have any idea what could be 
wrong and what to check?







On Dec 2, 2011, at 1:21 PM, Paul Kapinos wrote:


Dear Open MPI developer,

OFED 1.5.4 will contain DAPL 2.0.34.

I tried to compile the newest release of Open MPI (1.5.4) with this DAPL 
release and I was not successful.

Configuring with --with-udapl=/path/to/2.0.34/dapl
got the error "/path/to/2.0.34/dapl/include/dat/udat.h not found"
Looking into include dir: there is no 'dat' subdir but 'dat2'.

Just for fun I also tried to move 'dat2' to 'dat' back (dirty hack I know :-) - 
the configure stage was then successful but the compilation failed. The header 
seem to be really changed, not just moved.

The question: are the Open MPI developer aware of this changes, and when a 
version of Open MPI will be available with support for DAPL 2.0.34?

(Background: we have some trouble with Intel MPI and current DAPL which we do 
not have with DAPL 2.0.34, so our dream is to update as soon as possible)

Best wishes and an nice weekend,

Paul






http://www.openfabrics.org/downloads/OFED/release_notes/OFED_1.5.4_release_notes




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] Cofigure(?) problem building /1.5.3 on ScientificLinux6.0

2011-12-09 Thread Paul Kapinos

Hello Gus, Ralph, Jeff

a very late answer for this - just found it in my mailbox.


Would "cp -rp" help?
(To preserve time stamps, instead of "cp -r".)


Yes, the root of the evil were the time stamps. 'cp -a' is the magic 
wand. Many thanks for your help, and I should wear sackcloth and 
ashes... :-/


Best,

Paul





Anyway, since 1.2.8 here I build 5, sometimes more versions,
all from the same tarball, but on separate build directories,
as Jeff suggests.
[VPATH] Works for me.

My two cents.
Gus Correa

Jeff Squyres wrote:
Ah -- Ralph pointed out the relevant line to me in your first mail 
that I initially missed:



In each case I build 16 versions at all (4 compiler * 32bit/64bit *
support for multithreading ON/OFF). The same error arise in all 16 
versions.


Perhaps you should just expand the tarball once and then do VPATH 
builds...?


Something like this:

tar xf openmpi-1.5.3.tar.bz2
cd openmpi-1.5.3

mkdir build-gcc
cd build-gcc
../configure blah..
make -j 4
make install
cd ..

mkdir build-icc
../configure CC=icc CXX=icpc FC=ifort F77=ifort ..blah.
make -j 4
make install
cd ..
etc.

This allows you to have one set of source and have N different builds 
from it.  Open MPI uses the GNU Autotools correctly to support this 
kind of build pattern.





On Jul 22, 2011, at 2:37 PM, Jeff Squyres wrote:

Your RUNME script is a *very* strange way to build Open MPI.  It 
starts with a massive copy:


cp -r /home/pk224850/OpenMPI/openmpi-1.5.3/AUTHORS 
/home/pk224850/OpenMPI/openmpi-1.5.3/CMakeLists.txt <...much 
snipped...> .


Why are you doing this kind of copy?  I suspect that the GNU 
autotools' timestamps are getting all out of whack when you do this 
kind of copy, and therefore when you run "configure", it tries to 
re-autogen itself.


To be clear: when you expand OMPI from a tarball, you shouldn't need 
the GNU Autotools installed at all -- the tarball is pre-bootstrapped 
exactly to avoid you needing to use the Autotools (much less any 
specific version of the Autotools).


I suspect that if you do this:

-
tar xf openmpi-1.5.3.tar.bz2
cd openmpi-1.5.3
./configure etc.
-

everything will work just fine.


On Jul 22, 2011, at 11:12 AM, Paul Kapinos wrote:


Dear OpenMPI volks,
currently I have a problem by building the version 1.5.3 of OpenMPI on
Scientific Linux 6.0 systems, which seem vor me to be a configuration
problem.

After the configure run (which seem to terminate without error code),
the "gmake all" stage produces errors and exits.

Typical is the output below.

Fancy: the 1.4.3 version on same computer can be build with no special
trouble. Both the 1.4.3 and 1.5.3 versions can be build on other
computer running CentOS 5.6.

In each case I build 16 versions at all (4 compiler * 32bit/64bit *
support for multithreading ON/OFF). The same error arise in all 16 
versions.


Can someone give a hint about how to avoid this issue? Thanks!

Best wishes,

Paul


Some logs and configure are downloadable here:
https://gigamove.rz.rwth-aachen.de/d/id/2jM6MEa2nveJJD

The configure line is in RUNME.sh, the
logs of configure and build stage in log_* files; I also attached the
config.log file and the configure itself (which is the standard from 
the

1.5.3 release).


##


CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh
/tmp/pk224850/linuxc2_11254/openmpi-1.5.3mt_linux64_gcc/config/missing
--run aclocal-1.11 -I config
sh: config/ompi_get_version.sh: No such file or directory
/usr/bin/m4: esyscmd subprocess failed



configure.ac:953: warning: OMPI_CONFIGURE_SETUP is m4_require'd but not
m4_defun'd
config/ompi_mca.m4:37: OMPI_MCA is expanded from...
configure.ac:953: the top level
configure.ac:953: warning: AC_COMPILE_IFELSE was called before
AC_USE_SYSTEM_EXTENSIONS
../../lib/autoconf/specific.m4:386: AC_USE_SYSTEM_EXTENSIONS is 
expanded

from...
opal/mca/paffinity/hwloc/hwloc/config/hwloc.m4:152:
HWLOC_SETUP_CORE_AFTER_C99 is expanded from...
../../lib/m4sugar/m4sh.m4:505: AS_IF is expanded from...
opal/mca/paffinity/hwloc/hwloc/config/hwloc.m4:22: HWLOC_SETUP_CORE is
expanded from...
opal/mca/paffinity/hwloc/configure.m4:40: MCA_paffinity_hwloc_CONFIG is
expanded from...
config/ompi_mca.m4:540: MCA_CONFIGURE_M4_CONFIG_COMPONENT is expanded
from...
config/ompi_mca.m4:326: MCA_CONFIGURE_FRAMEWORK is expanded from...
config/ompi_mca.m4:247: MCA_CONFIGURE_PROJECT is expanded from...
configure.ac:953: warning: AC_RUN_IFELSE was called before
AC_USE_SYSTEM_EXTENSIONS




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
jsquy...@c

Re: [OMPI users] SIGV at MPI_Cart_sub

2012-01-10 Thread Paul Kapinos

A blind guess: did you use Intel compiler?
If so, there is/was an error leading to SIGSEGV _in Open MPI itselv_.

http://www.open-mpi.org/community/lists/users/2012/01/18091.php

If the SIGSEGV arise not in OpenMPI but in application itself it may be 
a programming issue.. In any case, more precisely answer are impossible 
without seeing any codes snippet and/or logs.


Best,
Paul


Anas Al-Trad wrote:
Dear people, 
   In my application, I have the segmentation fault of 
Integer Divide-by-zero when calling MPI_cart_sub routine. My program is 
as follows, I have 128 ranks, I make a new communicator of the first 96 
ranks via MPI_Comm_creat. Then I create a grid of 8X12 by calling 
MPI_Cart_create. After creating the grid if I call MPI_Cart_sub then I 
have that error.


This error happens also when I use a communicator of 24 ranks and create 
a grid of 4X6. Can you please help me in solving this?


Regards,
Anas






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] rankfiles on really big nodes broken?

2012-01-20 Thread Paul Kapinos

Hello, Open MPI developer!

Now, we have a really nice toy: 2 Tb RAM, 16 sockets, 128 cores.
(4x smaller Bull S6010 coupled by BCS chips to a single image machine)

On a such big box, process pinning is vital.

So we tried to use the Open MPI capabilities to pin te processes. But it 
seem that the rankfile infrastructure does not work properly: we always 
get "Error: Invalid argument" message on the 128-core node, also if the 
rankfile was OK.
On a smaller node (up to 32 cores/ 64 threads) the very same rankfile 
(with changed node name of course) works well.


I believe, this computer dimension is a bit too big for the pinning 
infrasructure now. A bug?


Best wishes,

Paul Kapinos

P.S. see the attached .tgz for some logzz

--
   Rankfiles
   Rankfiles provide a means for specifying detailed information 
about how process ranks should  be  mapped  to nodes and how they should 
be bound.  Consider the following:


--
Open RTE: 1.5.3
   Open RTE SVN revision: r24532
   Open RTE release date: Mar 16, 2011
OPAL: 1.5.3
   OPAL SVN revision: r24532
   OPAL release date: Mar 16, 2011
Ident string: 1.5.3



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


rankfiles128.tgz
Description: application/compressed-tar


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] rankfiles on really big nodes broken?

2012-01-23 Thread Paul Kapinos

Hello Ralph,
Yes, the rankfiles in rankfiles128.tgz are the rankfiles which are used, 
and linuxbsc*.txt files contain the output produced.


It would surprise me if the rankfile3 is incorrect - the very same files 
(exept the node name of course) rankfile1, rankfile2 worked on smaller 
machines, cf. runme.sh, the rankfile* files ant the output files.


The behaviour "it works on small box but does not work on thick box" was 
the quell of mu assumption that there is a error somewhere..


 For the complete error message on the thick node see linuxbsc269.txt file.

Updating to newer 1.5.x is a good idea; but it is always a bit 
tedious... Would 1.5.5 arrive the next time?


Best wishes,
Paul Kapinos


Ralph Castain wrote:

I don't see anything in the code that limits the number of procs in a rankfile.

> Are the attached rankfiles the ones you are trying to use?
I'm wondering if there is a syntax error that is causing the problem. 
It would help if you could provide the complete error message output.


At one time, there was a limit on the number of procs on a node - 

> nothing to do with rankfile. That was fixed, though, and there

is no real limit any more. I don't recall the precise release number
where it changed in the 1.5 series - you might try updating 
to 1.5.4 as I'm sure it doesn't exist there.





On Jan 20, 2012, at 12:43 PM, Paul Kapinos wrote:


Hello, Open MPI developer!

Now, we have a really nice toy: 2 Tb RAM, 16 sockets, 128 cores.
(4x smaller Bull S6010 coupled by BCS chips to a single image machine)

On a such big box, process pinning is vital.

So we tried to use the Open MPI capabilities to pin te processes. But it seem that the 
rankfile infrastructure does not work properly: we always get "Error: Invalid 
argument" message on the 128-core node, also if the rankfile was OK.
On a smaller node (up to 32 cores/ 64 threads) the very same rankfile (with 
changed node name of course) works well.

I believe, this computer dimension is a bit too big for the pinning 
infrasructure now. A bug?

Best wishes,

Paul Kapinos

P.S. see the attached .tgz for some logzz

--
  Rankfiles
  Rankfiles provide a means for specifying detailed information about how 
process ranks should  be  mapped  to nodes and how they should be bound.  
Consider the following:

--
   Open RTE: 1.5.3
  Open RTE SVN revision: r24532
  Open RTE release date: Mar 16, 2011
   OPAL: 1.5.3
  OPAL SVN revision: r24532
  OPAL release date: Mar 16, 2011
   Ident string: 1.5.3



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] Mpirun: How to print STDOUT of just one process?

2012-02-01 Thread Paul Kapinos

Try out the attached wrapper:
$ mpiexec -np 2 masterstdout 


mpirun -n 2 



Is there a way to have mpirun just merger STDOUT of one process to its
STDOUT stream?





--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
#!/bin/sh
ARGS=$@
if [[ $OMPI_COMM_WORLD_RANK == 0 ]] 
then 
  $ARGS
else
  $ARGS 1>/dev/null 2>/dev/null
fi


smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] Environment variables [documentation]

2012-02-27 Thread Paul Kapinos

Dear Open MPI developer,
here:
http://www.open-mpi.org/faq/?category=running#mpi-environmental-variables
are enlisted four envvars Open MPI set for every process. We use they for some 
scripting and thank you for providing they.


But simple "mpiexec -np 1 env | grep OMPI" brings lotz more envvars. These are 
interesting for us:


1) OMPI_COMM_WORLD_LOCAL_SIZE - seem to contain the number of processes which 
are running on the specific node, see also

http://www.open-mpi.org/community/lists/users/2008/07/6054.php

Is this envvar also "stable" as OMPI_COMM_WORLD_LOCAL_RANK is? (This would make 
sense as it looks like the  OMPI_COMM_WORLD_SIZE, OMPI_COMM_WORLD_RANK pair.)


If yes, maybe it also should be documented in the Wiki page.



2) OMPI_COMM_WORLD_NODE_RANK - is that just a double of 
OMPI_COMM_WORLD_LOCAL_RANK ?


Best wishes,
Paul Kapinos



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] Problem running over IB with huge data set

2012-02-27 Thread Paul Kapinos

Hello Jeff, Ralph, All Open MPI folks,

We had an off-list discussion about an error in the Serpent program. Ralph said:

>We already have several tickets for that problem, each relating to a different 
scenario:

>https://svn.open-mpi.org/trac/ompi/ticket/2155
>https://svn.open-mpi.org/trac/ompi/ticket/2157
>https://svn.open-mpi.org/trac/ompi/ticket/2295

I've build a quite small reproducer for the original issue (with a huge memory 
footprint) and have send it to you.


The other week, another user got problemz if using huge data sets.

A program, which runs without any problem with smaller data sets (in order of 
24Gb data in total and smaller), got problem with huge data sets (in order of 
100Gb data in total and more),

_if running over infiniband or IPoIB_.

The program essentially hangs, mostly blocking the transport used. In some 
scenarios it crash.
The same program and data set run fine over ethernet or shared memory (yes, 
we've computers with 100ths of GB of memory). The behaviour is reproducible.


Diverse errors are produced, some of them are listed below. Another thing is 
that in the most cases, if the program hangs, it also blocks the transport, that 
is another programs cannot run over the same interface (just as reported earlier).


More fun: we also found some '#procs x #Nodes' combinations where the program 
run fine.


I.e.,
30 and 60 processes over 6 nodes run through fine,
6 procs over 6 nodes - killed with a error message (see below)
12,18,36,61,62,64,66 procs over 6 nodes - hangs and block the interface.

Well, we cannot give any warranty that that isn't a bug in the program itself, 
because it is just in development now. However, since the program works well for 
smaller sized data sets and over TCP and over ShMem, it smells like a MPI 
library error, thus this mail.


Or maybe the puzzling behaviour may be a follow-up of any bugs in the program 
itself? If yes, what it could be and how we could try no find it?


I did not attach a reproducer to this mail because the user do not want to 
spread the code all over the world, but can send it to you if you are interested 
in reproducing it. [The code is about matrix transpose of huge matrices and 
essentially calls MPI_Alltoallv, it is written a 'nice, well-structured' C++ 
code (nothing stays unwrapped) but is pretty small and readable].


Ralph, Jeff, anybody - any interest in reproducing this issue?

Best wishes,
Paul Kapinos


P.S. Open MPI 1.5.3 used - still waiting for 1.5.5 ;-)








Some error messages:

with 6 procs over 6 Nodes:
--
mlx4: local QP operation err (QPN 7c0063, WQE index 0, vendor syndrome 6f, 
opcode = 5e)
[[8771,1],5][btl_openib_component.c:3316:handle_wc] from 
linuxbdc07.rz.RWTH-Aachen.DE to: linuxbdc04 error polling LP CQ with status 
LOCAL QP OPERATION ERROR status number 2 for wr_id 6afb70 opcode 0  vendor error 
111 qp_idx 3
mlx4: local QP operation err (QPN 18005f, WQE index 0, vendor syndrome 6f, 
opcode = 5e)
[[8771,1],2][btl_openib_component.c:3316:handle_wc] from 
linuxbdc03.rz.RWTH-Aachen.DE to: linuxbdc02 error polling LP CQ with status 
LOCAL QP OPERATION ERROR status number 2 for wr_id 6afb70 opcode 0  vendor error 
111 qp_idx 3
[[8771,1],1][btl_openib_component.c:3316:handle_wc] from 
linuxbdc02.rz.RWTH-Aachen.DE to: linuxbdc01 error polling LP CQ with status 
LOCAL QP OPERATION ERROR status number 2 for wr_id 6afb70 opcode 0  vendor error 
111 qp_idx 3
mlx4: local QP operation err (QPN 340057, WQE index 0, vendor syndrome 6f, 
opcode = 5e)

--


with 61 processes using IPoIB:
mpiexec -mca btl ^openib -np 61 -host 1,2,3,4,5,6 a.out < dim100G.in
--
[linuxbdc02.rz.RWTH-Aachen.DE][[21403,1],1][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] 
connect() to 134.61.208.202 failed: Connection timed out (110)
[linuxbdc01.rz.RWTH-Aachen.DE][[21403,1],18][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] 
connect() to 134.61.208.203 failed: Connection timed out (110)
[linuxbdc01.rz.RWTH-Aachen.DE][[21403,1],18][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] 
connect() to 134.61.208.203 failed: Connection timed out (110)

--


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] Hybrid OpenMPI / OpenMP programming

2012-03-02 Thread Paul Kapinos
port
KMP_BLOCKTIME=0 ... The latest finally leads to an interesting reduction
of computing time but worsens the second problem we have to face (see
bellow).

b) We managed to have a "correct" (?) implementation of our MPI-processes
on our sockets by using: mpirun -bind-to-socket -bysocket -np 4n 
However if OpenMP threads initially seem to scatter on each socket (one

thread by core) they slowly migrate to the same core as their 'Master MPI 
process' or gather on one or two cores by socket
We play around with the environment variable KMP_AFFINITY but the best we could 
obtain was a pinning of the OpenMP threads to their own core... disorganizing 
at the same time the implementation of the 4n Level-2 MPI processes. When 
added, neither the specification of a rankfile nor the mpirun option -x 
IPATH_NO_CPUAFFINITY=1 seem to change significantly the situation.
This comportment looks rather inefficient but so far we did not manage to 
prevent the migration of the 4 threads to at most a couple of cores !

Is there something wrong in our "Hybrid" implementation?
Do you have any advices?
Thanks for your help,
Francis

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



_______
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] Still bothered / cannot run an application

2012-07-12 Thread Paul Kapinos

(cross-post to 'users' and 'devel' mailing lists)

Dear Open MPI developer,
a long time ago, I reported about an error in Open MPI:
http://www.open-mpi.org/community/lists/users/2012/02/18565.php

Well, in the 1.6 the behaviour has changed: the test case don't hang forever and 
block an InfiniBand interface, but seem to run through, and now this error 
message is printed:

--
The OpenFabrics (openib) BTL failed to register memory in the driver.
Please check /var/log/messages or dmesg for driver specific failure
reason.
The failure occured here:

  Local host:mlx4_0
  Device:openib_reg_mr
  Function:  Cannot allocate memory()
  Errno says:

You may need to consult with your system administrator to get this
problem fixed.
--



Looking into FAQ
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
deliver us no hint about what is bad. The locked memory is unlimited:
--
pk224850@linuxbdc02:~[502]$ cat /etc/security/limits.conf | grep memlock
#- memlock - max locked-in-memory address space (KB)
*   hardmemlock unlimited
*   softmemlock unlimited
--


Could it still be an Open MPI issue? Are you interested in reproduce this?

Best,
Paul Kapinos

P.S: The same test with Intel MPI cannot run using DAPL, but run very fine opef 
'ofa' (= native verbs as Open MPI use it). So I believe the problem is rooted in 
the communication pattern of the program; it send very LARGE messages to a lot 
of/all other processes. (The program perform an matrix transposition of a 
distributed matrix).


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] Re :Re: OpenMP and OpenMPI Issue

2012-07-23 Thread Paul Kapinos

Jack,
note that support for THREAD_MULTIPLE is available in [newer] versions of open 
MPI, but disabled by default. You have to enable it by configuring, in 1.6:


  --enable-mpi-thread-multiple
  Enable MPI_THREAD_MULTIPLE support (default:
  disabled)

You may check the available threading supprt level by using the attaches 
program.


On 07/20/12 19:33, Jack Galloway wrote:

This is an old thread, and I'm curious if there is support now for this?  I have
a large code that I'm running, a hybrid MPI/OpenMP code, that is having trouble
over our infiniband network.  I'm running a fairly large problem (uses about
18GB), and part way in, I get the following errors:


You say "big footprint"? I hear a bell ringing...
http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem








--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
  PROGRAM tthr
  IMPLICIT NONE
  INCLUDE "mpif.h"
  INTEGER  REQUIRED, PROVIDED, IERROR
  REQUIRED = MPI_THREAD_MULTIPLE
  PROVIDED = -1
  CALL MPI_INIT_THREAD(REQUIRED, PROVIDED, IERROR)
  WRITE (*,*) MPI_THREAD_SINGLE, MPI_THREAD_FUNNELED,
 & MPI_THREAD_SERIALIZED, MPI_THREAD_MULTIPLE
  WRITE (*,*) REQUIRED, PROVIDED, IERROR
  CALL MPI_FINALIZE(IERROR)
  END


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] Infiniband performance Problem and stalling

2012-08-28 Thread Paul Kapinos

Randolph,
after reading this:

On 08/28/12 04:26, Randolph Pullen wrote:

- On occasions it seems to stall indefinately, waiting on a single receive.


... I would make a blind guess: are you aware about IB card parameters for 
registered memory?

http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem

"Waiting forever" for a single operation is one of symptoms of the problem 
especially in 1.5.3.



best,
Paul

P.S. the lower performance with 'big' chinks is known phenomenon, cf.
http://www.scl.ameslab.gov/netpipe/
(image on bottom of the page). But the chunk size of 64k is fairly small




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] OMPI 1.6.x Hang on khugepaged 100% CPU time

2012-09-05 Thread Paul Kapinos

Yevgeny,
we at RZ Aachen also see problems very similar to described in initial posting 
of Yong Qin, on VASP with Open MPI 1.5.3.


We're currently looking for a data set able to reproduce this. I'll write an 
email if we gotcha it.


Best,
Paul


On 09/05/12 13:52, Yevgeny Kliteynik wrote:

I'm checking it with OFED folks, but I doubt that there are some dedicated
tests for THP.

So do you see it only with a specific application and only on a specific
data set? Wonder if I can somehow reproduce it in-house...



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] too much stack size: _silently_ failback to IPoIB

2012-10-05 Thread Paul Kapinos

Dear Open MPI developer,
there are often problems with the user limit for the stack size (ulimit -s) on 
Linux if running Fortran and/or OpenMP(=hybride) programs.


In one case we have seen the user has set the stack size in his environment by 
occasion far too high - to about one TeraByte (on nodes with less than 100Gb RAM).


It turned out that Open MPI (1.6.1) cannot use InfiniBand in this environment 
(cannot activate IB card / register memory / something else because of lack of 
virtual memory - all memory reserved for the virtual stack?). The job seem to 
fail back and run over IPoIB, according to achieved bandwidth.


The problem was that there was no single word of caution printed out, whereby 
Open MPI usually warns the user iff an seemingly available high performance 
network cannot be used, AFAIK. Thus the problem of the user - 15x bandwidth and 
performance loss - was covered for many weeks and found only by occasion.


So, what's going wrong [if any]?

Reproducing: try to set the 'ulimit -s' in your environment to an astronomic 
value, or use the attached wrapper.


$MPI_ROOT/bin/mpiexec  -mca oob_tcp_if_include ib0 -mca btl_tcp_if_include ib0 
-np 2 -H linuxbdc01,linuxbdc02 /home/pk224850/bin/ulimit_high.sh  MPI_FastTest.exe




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


ulimit_high.sh
Description: application/shellscript


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] Performance/stability impact of thread support

2012-10-30 Thread Paul Kapinos

At least, be aware of silently disabling the usage of InfiniBand if 'multiple'
threading level is activated:

http://www.open-mpi.org/community/lists/devel/2012/10/11584.php




On 10/29/12 19:14, Daniel Mitchell wrote:

Hi everyone,

I've asked my linux distribution to repackage Open MPI with thread support 
(meaning configure with --enable-thread-multiple). They are willing to do this 
if it won't have any performance/stability hit for Open MPI users who don't 
need thread support (meaning everyone but me, apparently). Does enabling thread 
support impact performance/stability?

Daniel
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] Multirail + Open MPI 1.6.1 = very big latency for the first communication

2012-10-31 Thread Paul Kapinos

Hello all,

Open MPI is clever and use by default multiple IB adapters, if available.
http://www.open-mpi.org/faq/?category=openfabrics#ofa-port-wireup

Open MPI is lazy and establish connections only iff needed.

Both is good.

We have kinda special nodes: up to 16 sockets, 128 cores, 4 boards, 4 IB cards. 
Multirail works!


The crucial thing is, that starting with v1.6.1 the latency of the very first 
PingPong sample between two nodes take really a lot of time - some 100x - 200x 
of usual latency. You cannot see this using usual latency benchmark(*) because 
they tend to omit the first samples as "warmup phase", but we use a kinda 
self-written parallel test which clearly show this (and let me to muse some days).
If Miltirail is forbidden (-mca btl_openib_max_btls 1), or if v.1.5.3 used, or 
if the MPI processes are preconnected 
(http://www.open-mpi.org/faq/?category=running#mpi-preconnect) there is no such 
huge latency outliers for the first sample.


Well, we know about the warm-up and lazy connections.

But 200x ?!

Any comments about that is OK so?

Best,

Paul Kapinos

(*) E.g. HPCC explicitely say in http://icl.cs.utk.edu/hpcc/faq/index.html#132
> Additional startup latencies are masked out by starting the measurement after
> one non-measured ping-pong.

P.S. Sorry for cross-posting to both Users and Developers, but my last questions 
to Users have no reply until yet, so trying to broadcast...



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] MPI_Alltoallv performance regression 1.6.0 to 1.6.1

2012-12-19 Thread Paul Kapinos
Did you *really* wanna to dig into code just in order to switch a default 
communication algorithm?


Note there are several ways to set the parameters; --mca on command line is just 
one of them (suitable for quick online tests).


http://www.open-mpi.org/faq/?category=tuning#setting-mca-params

We 'tune' our Open MPI by setting environment variables

Best
Paul Kapinos



On 12/19/12 11:44, Number Cruncher wrote:

Having run some more benchmarks, the new default is *really* bad for our
application (2-10x slower), so I've been looking at the source to try and figure
out why.

It seems that the biggest difference will occur when the all_to_all is actually
sparse (e.g. our application); if most N-M process exchanges are zero in size
the 1.6 ompi_coll_tuned_alltoallv_intra_basic_linear algorithm will actually
only post irecv/isend for non-zero exchanges; any zero-size exchanges are
skipped. It then waits once for all requests to complete. In contrast, the new
ompi_coll_tuned_alltoallv_intra_pairwise will post the zero-size exchanges for
*every* N-M pair, and wait for each pairwise exchange. This is O(comm_size)
waits, may of which are zero. I'm not clear what optimizations there are for
zero-size isend/irecv, but surely there's a great deal more latency if each
pairwise exchange has to be confirmed complete before executing the next?

Relatedly, how would I direct OpenMPI to use the older algorithm
programmatically? I don't want the user to have to use "--mca" in their
"mpiexec". Is there a C API?

Thanks,
Simon


On 16/11/12 10:15, Iliev, Hristo wrote:

Hi, Simon,

The pairwise algorithm passes messages in a synchronised ring-like fashion
with increasing stride, so it works best when independent communication
paths could be established between several ports of the network
switch/router. Some 1 Gbps Ethernet equipment is not capable of doing so,
some is - it depends (usually on the price). This said, not all algorithms
perform the same given a specific type of network interconnect. For example,
on our fat-tree InfiniBand network the pairwise algorithm performs better.

You can switch back to the basic linear algorithm by providing the following
MCA parameters:

mpiexec --mca coll_tuned_use_dynamic_rules 1 --mca
coll_tuned_alltoallv_algorithm 1 ...

Algorithm 1 is the basic linear, which used to be the default. Algorithm 2
is the pairwise one.
You can also set these values as exported environment variables:

export OMPI_MCA_coll_tuned_use_dynamic_rules=1
export OMPI_MCA_coll_tuned_alltoallv_algorithm=1
mpiexec ...

You can also put this in $HOME/.openmpi/mcaparams.conf or (to make it have
global effect) in $OPAL_PREFIX/etc/openmpi-mca-params.conf:

coll_tuned_use_dynamic_rules=1
coll_tuned_alltoallv_algorithm=1

A gratuitous hint: dual-Opteron systems are NUMAs so it makes sense to
activate process binding with --bind-to-core if you haven't already did so.
It prevents MPI processes from being migrated to other NUMA nodes while
running.

Kind regards,
Hristo
--
Hristo Iliev, Ph.D. -- High Performance Computing
RWTH Aachen University, Center for Computing and Communication
Rechen- und Kommunikationszentrum der RWTH Aachen
Seffenter Weg 23, D 52074 Aachen (Germany)



-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
On Behalf Of Number Cruncher
Sent: Thursday, November 15, 2012 5:37 PM
To: Open MPI Users
Subject: [OMPI users] MPI_Alltoallv performance regression 1.6.0 to 1.6.1

I've noticed a very significant (100%) slow down for MPI_Alltoallv calls

as of

version 1.6.1.
* This is most noticeable for high-frequency exchanges over 1Gb ethernet
where process-to-process message sizes are fairly small (e.g. 100kbyte)

and

much of the exchange matrix is sparse.
* 1.6.1 release notes mention "Switch the MPI_ALLTOALLV default algorithm
to a pairwise exchange", but I'm not clear what this means or how to

switch

back to the old "non-default algorithm".

I attach a test program which illustrates the sort of usage in our MPI
application. I have run as this as 32 processes on four nodes, over 1Gb
ethernet, each node with 2x Opteron 4180 (dual hex-core); rank 0,4,8,..
on node 1, rank 1,5,9, ... on node 2, etc.

It constructs an array of integers and a nProcess x nProcess exchange

typical

of part of our application. This is then exchanged several thousand times.
Output from "mpicc -O3" runs shown below.

My guess is that 1.6.1 is hitting additional latency not present in 1.6.0.

I also

attach a plot showing network throughput on our actual mesh generation
application. Nodes cfsc01-04 are running 1.6.0 and finish within 35

minutes.

Nodes cfsc05-08 are running 1.6.2 (started 10 minutes later) and take over

a

hour to run. There seems to be a much greater network demand in the 1.6.1
version, despite the user-code and input data being identical.

Thanks for any help you can give,
Simon



___

Re: [OMPI users] Initializing OMPI with invoking the array constructor on Fortran derived types causes the executable to crash

2013-01-11 Thread Paul Kapinos

This is hardly an Open MPI issue:

switch the calls to MPI_Init, MPI_Finalize against
WRITE(*,*) "f"
comment aut 'USE mpi'  an see your error (SIGSEGV) again, now without any 
MPI part in the program.
So my suspiction is this is an bug in your GCC version. Especially because there 
is no SIGSEGV using 4.7.2 GCC (whereby it crasehs using 4.4.6)


==> Update your compilers!


On 01/11/13 14:01, Stefan Mauerberger wrote:

Hi There!

First of all, this is my first post here. In case I am doing something
inappropriate pleas be soft with me. On top of that I am not quite sure
whether that issue is related to Open MPI or GCC.

Regarding my problem: Well, it is a little bulky, see below. I could
figure out that the actual crash is caused by invoking Fortran's array
constructor [ xx, yy ] on derived-data-types xx and yy. The one key
factor is that those types have allocatable member variables.
Well, that fact points to blame gfortran for that. However, the crash
does not occur if MPI_Iinit is not called in before. Compiled as a
serial program everything works perfectly fine. I am pretty sure, the
lines I wrote are valid F2003 code.

Here is a minimal working example:
PROGRAM main
 USE mpi

 IMPLICIT NONE

 INTEGER :: ierr

 TYPE :: test_typ
 REAL, ALLOCATABLE :: a(:)
 END TYPE

 TYPE(test_typ) :: xx, yy
 TYPE(test_typ), ALLOCATABLE :: conc(:)

 CALL mpi_init( ierr )

 conc = [ xx, yy ]

 CALL mpi_finalize( ierr )

END PROGRAM main
Just compile with mpif90 ... and execute leads to:

*** glibc detected *** ./a.out: free(): invalid pointer: 0x7fefd2a147f8 ***
=== Backtrace: =
/lib/x86_64-linux-gnu/libc.so.6(+0x7eb96)[0x7fefd26dab96]
./a.out[0x400fdb]
./a.out(main+0x34)[0x401132]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7fefd267d76d]
./a.out[0x400ad9]

With commenting out 'CALL MPI_Init' and 'MPI_Finalize' everything seems to be 
fine.

What do you think: Is this a OMPI or a GCC related bug?

Cheers,
Stefan


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] openmpi, 1.6.3, mlx4_core, log_num_mtt and Debian/vanilla kernel

2013-02-21 Thread Paul Kapinos
The MTT-Parameter mess is well-known and the good solution is to set the MTT 
parameter high. In other case you never know what you will get - your 
application may hang, block the IB interface, run bit slower, run very slow...


http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem
http://www.open-mpi.org/community/lists/devel/2012/08/11417.php
http://montecarlo.vtt.fi/mtg/2012_Madrid/Hans_Hammer2.pdf

On 02/21/13 11:53, Stefan Friedel wrote:

Is there a way to tell openmpi-1.6.3 to use the ofed-module from vanilla
kernel and not to rely on log_num_mtt for
"do-we-have-enough-registred-mem" computation for Mellanox HCAs? Any
other idea/hint?



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] bug in mpif90? OMPI_FC envvar does not work with 'use mpi'

2013-03-13 Thread Paul Kapinos


AFAIK the GNU people change the Fotran Module syntax every time they get any 
chance for doing it :-(


So openmpi compiled with  4.4.6 (sys-default for RHEL 6.x) definitely does not 
work with 4.5, 4.6, 4.7 versions of gfortran.


Intel 'ifort' compiler build modules which are compatible from 11.x through 13.x 
versions.


So, the recommended solution is to build an own  version of Open MPI with any 
compiler you use.


Greetings,
Paul


P.S. As Hristo said, changing the Fortran compiler vendor and using the 
precompiled Fortran header would never work: the syntax of these .mod files is 
not standatised at all.


On 03/13/13 11:05, Iliev, Hristo wrote:

However, it works if for example you configure Open MPI with the system supplied
version of gfortran and then specify a later gfortran version, e.g.
OMPI_FC=gfortran-4.7 (unless the module format has changed in the meantime).



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] OpenMPI 1.6.4, MPI I/O on Lustre, 32bit: bug?

2013-03-25 Thread Paul Kapinos

Hello,
we observe the following divide-by-zero error:

[linuxscc005:31416] *** Process received signal ***
[linuxscc005:31416] Signal: Floating point exception (8)
[linuxscc005:31416] Signal code: Integer divide-by-zero (1)
[linuxscc005:31416] Failing at address: 0x2282db
[linuxscc005:31416] [ 0] [0x3a9410]
[linuxscc005:31416] [ 1] /lib/libgcc_s.so.1(__divdi3+0x8b) [0x2282db]
[linuxscc005:31416] [ 2] 
/opt/MPI/openmpi-1.6.4/linux/intel/lib/lib32/libmpi.so.1(ADIOI_LUSTRE_WriteStrided+0x1c36) 
[0x8c8206]
[linuxscc005:31416] [ 3] 
/opt/MPI/openmpi-1.6.4/linux/intel/lib/lib32/libmpi.so.1(MPIOI_File_write+0x1f2) 
[0x8ed752]
[linuxscc005:31416] [ 4] 
/opt/MPI/openmpi-1.6.4/linux/intel/lib/lib32/libmpi.so.1(mca_io_romio_dist_MPI_File_write+0x33) 
[0x8ed553]
[linuxscc005:31416] [ 5] 
/opt/MPI/openmpi-1.6.4/linux/intel/lib/lib32/libmpi.so.1(mca_io_romio_file_write+0x2e) 
[0x8a46fe]
[linuxscc005:31416] [ 6] 
/opt/MPI/openmpi-1.6.4/linux/intel/lib/lib32/libmpi.so.1(MPI_File_write+0x45) 
[0x846c25]
[linuxscc005:31416] [ 7] 
/rwthfs/rz/cluster/home/pk224850/SVN/rz_cluster_utils/test_suite/trunk/tests/mpi/mpiIO/mpiIOC32.exe() 
[0x804a1ac]

[linuxscc005:31416] [ 8] /lib/libc.so.6(__libc_start_main+0xe6) [0x6fccce6]
[linuxscc005:31416] [ 9] 
/rwthfs/rz/cluster/home/pk224850/SVN/rz_cluster_utils/test_suite/trunk/tests/mpi/mpiIO/mpiIOC32.exe() 
[0x8049d91]

[linuxscc005:31416] *** End of error message ***

... if we're using Open MPI 1.6.4 for compiling a 'C' test program(*) 
(attached), which perform some MPI I/O on Lustre.


0.) The error only came if the binary is compiled in 32bit
1.) the error did not corellate with a compiler used to build the MPI library 
(all 4 we have - GCC, Su/Oralce Studio; Intel, PGI - result in the same behaviour)
2.) The error did not came in our version Open MPI / 1.6.1 (however I'm not 
really sure the configure options used are the same)
3.) The error did only came if the file to be written is located on the Lustre 
file system (no error on local disc or on NFS share).

4.) The Fortran version (also attached) did not have the issue.
5.) The error only occur when using 2 or more processes

On the basis of the error message I believe the error could be located somewhere 
indeepth of the OpenMPI/ROMIO implementation...
Well, is somebody interested in further investigation of this issue? If yes we 
can feed you with informations. Otherwise we will ignore it, probably...


Best
Paul Kapinos

(*) we've kinda internal test suite in order to check our MPIs...

P.S. $ mpicc -O0 -m32 -o ./mpiIOC32.exe ctest.c -lm

P.S.2 an example cofnigure line:

./configure --with-openib --with-lsf --with-devel-headers 
--enable-contrib-no-build=vt --enable-heterogeneous --enable-cxx-exceptions 
--enable-orterun-prefix-by-default --disable-dlopen --disable-mca-dso 
--with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre' 
--enable-mpi-ext CFLAGS="$FLAGS_FAST $FLAGS_ARCH32 " CXXFLAGS="$FLAGS_FAST 
$FLAGS_ARCH32 " FFLAGS="$FLAGS_FAST $FLAGS_ARCH32 " FCFLAGS="$FLAGS_FAST 
$FLAGS_ARCH32 " LDFLAGS="$FLAGS_FAST $FLAGS_ARCH32 
-L/opt/lsf/8.0/linux2.6-glibc2.3-x86/lib" 
--prefix=/opt/MPI/openmpi-1.6.4/linux/gcc 
--mandir=/opt/MPI/openmpi-1.6.4/linux/gcc/man 
--bindir=/opt/MPI/openmpi-1.6.4/linux/gcc/bin/32 
--libdir=/opt/MPI/openmpi-1.6.4/linux/gcc/lib/lib32 
--includedir=/opt/MPI/openmpi-1.6.4/linux/gcc/include/32 
--datarootdir=/opt/MPI/openmpi-1.6.4/linux/gcc/share/32 2>&1 | tee log_01_conf.txt


I _believe_ the part
--with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre'
is new in our 1.6.4 installation compared with 1.6.1. Could this be the root of 
evil?


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
program test
  ! Zum Betrieb wird die f77 Methode benutzt, MPI einzubindetn, Zum Entwickeln 
versuche f90 Methode zu nutzen.
  ! Erhoffe dadurchweniger Fehler; beispileweise die unterschide MPI_INT --> 
MPI_INTEGER können sehr wohl abgefangen werden..
  USE MPI
  IMPLICIT NONE
  !include "mpif.h"
  !
  !
  integer :: 
wrank,wsize,ierr,status(MPI_STATUS_SIZE),fh,i8_type,mpi_info,my_data_wsize(1),my_disp(1)
  integer(8), allocatable :: id_list(:), quelle(:)
  character(len=1024) :: filename
  CHARACTER(len=1024) :: pfad

  integer :: anzahl = 12000
  integer(KIND=MPI_OFFSET_KIND) :: offset = 0 !5776
  INTEGER(8) :: my_current_offset, my_offset
  INTEGER :: i, laenge, intsize

  call MPI_INIT(ierr)
  call MPI_COMM_SIZE(MPI_COMM_WORLD,wsize,ierr)
  call MPI_COMM_RANK(MPI_COMM_WORLD,wrank,ierr)
  call getarg(1, pfad) ! das ist der Pfad+Dateiname
  !filename = TRIM(pfad) // "blupp"

  !Anlegen der Binärdatei (seriell, nur Master)
!  CALL MPI_File_seek(fh, my_offset, MPI_SEEK_SET);
!  CALL MPI_File_get_position(fh, _current_offset);

  IF (wrank .EQ. 0) THEN
my_offset = 

[OMPI users] an MPI process using about 12 file descriptors per neighbour processes - isn't it a bit too much?

2009-08-14 Thread Paul Kapinos

Hi OpenMPI folks,

We use Sun MPI (Cluster Tools 8.2) and also native OpenMPI 1.3.3 and we 
wonder us about the way OpenMPI devours file descriptors: on our 
computers, ulimit -n is currently set to 1024, and we found out that we 
may run maximally 84 MPI processes per box, and if we try to run 85 (or 
above) processes, we got such error message:


--
Error: system limit exceeded on number of network connections that can 
be open

.
--

Simple computing tells us, 1024/85 is about 12. This lets us believe 
that there is an single OpenMPI process, which needs 12 file descriptor 
per other MPI process.


By now, we have only one box with more than 100 CPUs on which it may be 
meaningfull to run more than 85 processes. But in the quite near future, 
many-core boxes are arising (we also ordered 128-way nehalems), so it 
may be disadvantageous to consume a lot of file descriptors per MPI 
process.



We see a possibility to awod this problem by setting the ulimit for file 
descriptor to a higher value.  This is not easy unter linux: you need 
either to recompile the kernel (which is not a choise for us), or to set 
a root process somewhere which will set the ulimit to a higher value 
(which is a security risk and not easy to implement).


We also tryed to set the opal_set_max_sys_limits to 1, as the help says 
(by adding  "-mca opal_set_max_sys_limits 1" to the command line), but 
we does not see any change of behaviour).


What is your meaning?

Best regards,
Paul Kapinos
RZ RWTH Aachen



#
 /opt/SUNWhpc/HPC8.2/intel/bin/mpiexec -mca opal_set_max_sys_limits 1 
-np 86   a.out
<>

smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] an environment variable with same meaning than the -x option of mpiexec

2009-11-06 Thread Paul Kapinos

Dear OpenMPI developer,

with the -x option of mpiexec there is a way to distribute environmnet 
variables:


 -x   Export  the  specified  environment  variables  to the remote
 nodes before executing the  program.


Is there an environment variable ( OMPI_) with the same meaning? The 
writing of environmnet variables on the command line is ugly and tedious...


I've searched for this info on OpenMPI web pages for about an hour and 
didn't find the ansver :-/



Thanking you in anticipation,

Paul




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] an environment variable with same meaning than the -x option of mpiexec

2009-11-10 Thread Paul Kapinos

Hi Ralph,



Not at the moment - though I imagine we could create one. It is a tad 
tricky in that we allow multiple -x options on the cmd line, but we 
obviously can't do that with an envar.


why not?

export OMPI_Magic_Variavle="-x LD_LIBRARY_PATH -x PATH"
cold be possible, or not?




I can add it to the "to-do" list for a rainy day :-)

That would be great :-)

Thanks for your help!

Paul Kapinos




with the -x option of mpiexec there is a way to distribute environmnet 
variables:


-x   Export  the  specified  environment  variables  to the remote
nodes before executing the  program.


Is there an environment variable ( OMPI_) with the same meaning? 
The writing of environmnet variables on the command line is ugly and 
tedious...


I've searched for this info on OpenMPI web pages for about an hour and 
didn't find the ansver :-/



Thanking you in anticipation,

Paul




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] an environment variable with same meaning than the-x option of mpiexec

2009-11-10 Thread Paul Kapinos

Hi Jeff,

FWIW, environment variables prefixed with "OMPI_" will automatically be 
distributed out to processes.  


Of course, but saddingly the variable(s) we want to ditribute aren't 
"OMPI_" variable.






Depending on your environment and launcher, your entire environment may 
be copied out to all the processes, anyway (rsh does not, but 
environments like SLURM do), making the OMPI_* and -x mechanisms 
somewhat redundant.


Does this help?


By now I specified the $MPIEXEC variable to "mpiexec -x BLABLABLA" and 
advice the users to use this. This is a bit ugly, but working 
workaround. What i wanted to achieve with my mail, was a less ugly 
solution :o)


Thanks for your help,

Paul Kapinos









Not at the moment - though I imagine we could create one. It is a tad
tricky in that we allow multiple -x options on the cmd line, but we
obviously can't do that with an envar.

The most likely solution would be to specify multiple "-x" equivalents
by separating them with a comma in the envar. It would take some
parsing to make it all work, but not impossible.

I can add it to the "to-do" list for a rainy day :-)


On Nov 6, 2009, at 7:59 AM, Paul Kapinos wrote:

> Dear OpenMPI developer,
>
> with the -x option of mpiexec there is a way to distribute
> environmnet variables:
>
> -x   Export  the  specified  environment  variables  to the
> remote
> nodes before executing the  program.
>
>
> Is there an environment variable ( OMPI_) with the same meaning?
> The writing of environmnet variables on the command line is ugly and
> tedious...
>
> I've searched for this info on OpenMPI web pages for about an hour
> and didn't find the ansver :-/
>
>
> Thanking you in anticipation,
>
> Paul
>
>
>
>
> --
> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
> RWTH Aachen University, Center for Computing and Communication
> Seffenter Weg 23,  D 52074  Aachen (Germany)
> Tel: +49 241/80-24915
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_______
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users







--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] exceedingly virtual memory consumption of MPI environment if higher-setting "ulimit -s"

2009-11-19 Thread Paul Kapinos

Hi volks,

we see an exeedingly *virtual* memory consumtion through MPI processes 
if "ulimit -s" (stack size)in profile configuration was setted higher.


Furthermore we believe, every mpi process started, wastes about the 
double size of `ulimit -s` value which will be set in a fresh console 
(that is, the value is configurated in e.g.  .zshenv, *not* the value 
actually setted in the console from which the mpiexec runs).


Sun MPI 8.2.1, an empty mpi-HelloWorld program
! either if running both processes on the same host..

.zshenv: ulimit -s 10240   --> VmPeak:180072 kB
.zshenv: ulimit -s 102400  --> VmPeak:364392 kB
.zshenv: ulimit -s 1024000 --> VmPeak:2207592 kB
.zshenv: ulimit -s 2024000 --> VmPeak:4207592 kB
.zshenv: ulimit -s 2024 --> VmPeak:   39.7 GB
(see the attached files; the a.out binary is a mpi helloworld program 
running an never ending loop).




Normally, the stack size ulimit is set to some 10 MB by us, but we see a 
lot of codes which needs *a lot* of stack space, e.g. Fortran codes, 
OpenMP codes (and especially fortran OpenMP codes). Users tends to 
hard-code the setting-up the higher value for stack size ulimit.


Normally, the using of a lot of virtual memory is no problem, because 
there is a lot of this thing :-) But... If more than one person is 
allowed to work on a computer, you have to divide the ressources in such 
a way that nobody can crash the box. We do not know how to limit the 
real RAM used so we need to divide the RAM by means of setting virtual 
memory ulimit (in our batch system e.g.. That is, for us

"virtual memory consumption" = "real memory consumption".
And real memory is not that way cheap than virtual memory.


So, why consuming the *twice* amount of stack size for each process?

And, why consuming the virtual memory at all? We guess this virtual 
memory is allocated for the stack (why else it will be related to the 
stack size ulimit). But, is such allocation really needed? Is there a 
way to avoid the vaste of virtual memory?


best regards,
Paul Kapinos











--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
! Paul Kapinos 22.09.2009 - 
! RZ RWTH Aachen, www.rz.rwth-aachen.de
!
! MPI-Hello-World
!
PROGRAM PK_MPI_Test
USE MPI
IMPLICIT NONE
!
INTEGER :: my_MPI_Rank, laenge, ierr
CHARACTER*(MPI_MAX_PROCESSOR_NAME) my_Host
!
!WRITE (*,*) "Jetz penn ich mal 30"
!CALL Sleep(30)

CALL MPI_INIT (ierr)
!
!WRITE (*,*) "Nach MPI_INIT"
!CALL Sleep(30)
CALL MPI_COMM_RANK( MPI_COMM_WORLD, my_MPI_Rank, ierr )
!WRITE (*,*) "Nach MPI_COMM_RANK"
CALL MPI_GET_PROCESSOR_NAME(my_Host, laenge, ierr)
WRITE (*,*) "Prozessor ", my_MPI_Rank, "on Host: ", my_Host(1:laenge)

! sleeping or spinnig - the same behaviour
!CALL Sleep(3)
DO WHILE (.TRUE.)
ENDDO

!CALL Sleep(3)

CALL MPI_FINALIZE(ierr)
!
WRITE (*,*) "Daswars"
!
END PROGRAM PK_MPI_Test


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] exceedingly virtual memory consumption of MPI, environment if higher-setting "ulimit -s"

2009-12-03 Thread Paul Kapinos

Hi Jeff, hi all,


I can't think of what OMPI would be doing related to the predefined 
stack size -- I am not aware of anywhere in the code where we look up 
the predefine stack size and then do something with it.


I do not know OMPI code at all - but what I see is the consumption of 
virtual memory according to the twice stack size defaults by new login..





That being said, I don't know what the OS and resource consumption 
effects are of setting 1GB+ stack size on *any* application...  


we defenitely have applications which *need* stack size of 500+MB.

Users who use such codes, may trend to hard-code a *huge* stack size in 
their profile (you do not wanna to lose a day ot two of computing time 
just by forgitting to set a ulimit, right?). (Currently, I see *one* 
such user, but who knows how many there are...)


nevertheless, also if the users do not use a huge stack size, the 
default stack size is some 20 MB. That's not much, but does this 
allocation-and-never-use of twice of the stack size really needed?



Best wishes,
PK




Have you
tried non-MPI examples, potentially with applications as large as MPI 
applications but without the complexity of MPI?



On Nov 19, 2009, at 3:13 PM, David Singleton wrote:



Depending on the setup, threads often get allocated a thread local
stack with size equal to the stacksize rlimit.  Two threads maybe?

David

Terry Dontje wrote:
> A couple things to note.  First Sun MPI 8.2.1 is effectively OMPI
> 1.3.4.  I also reproduced the below issue using a C code so I think 
this

> is a general issue with OMPI and not Fortran based.
>
> I did a pmap of a process and there were two anon spaces equal to the
> stack space set by ulimit.
>
> In one case (setting 102400) the anon spaces were next to each other
> prior to all the loadable libraries.  In another case (setting 1024000)
> one anon space was locate in the same area as the first case but the
> second space was deep into some memory used by ompi.
>
> Is any of this possibly related to the predefined handles?  Though I am
> not sure why it would expand based on stack size?.
>
> --td
>> Date: Thu, 19 Nov 2009 19:21:46 +0100
>> From: Paul Kapinos <kapi...@rz.rwth-aachen.de>
>> Subject: [OMPI users] exceedingly virtual memory consumption of MPI
>> environment if higher-setting "ulimit -s"
>> To: Open MPI Users <us...@open-mpi.org>
>> Message-ID: <4b058cba.3000...@rz.rwth-aachen.de>
>> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>>
>> Hi volks,
>>
>> we see an exeedingly *virtual* memory consumtion through MPI processes
>> if "ulimit -s" (stack size)in profile configuration was setted higher.
>>
>> Furthermore we believe, every mpi process started, wastes about the
>> double size of `ulimit -s` value which will be set in a fresh console
>> (that is, the value is configurated in e.g.  .zshenv, *not* the value
>> actually setted in the console from which the mpiexec runs).
>>
>> Sun MPI 8.2.1, an empty mpi-HelloWorld program
>> ! either if running both processes on the same host..
>>
>> .zshenv: ulimit -s 10240   --> VmPeak:180072 kB
>> .zshenv: ulimit -s 102400  --> VmPeak:364392 kB
>> .zshenv: ulimit -s 1024000 --> VmPeak:2207592 kB
>> .zshenv: ulimit -s 2024000 --> VmPeak:4207592 kB
>> .zshenv: ulimit -s 2024 --> VmPeak:   39.7 GB
>> (see the attached files; the a.out binary is a mpi helloworld program
>> running an never ending loop).
>>
>>
>>
>> Normally, the stack size ulimit is set to some 10 MB by us, but we see
>> a lot of codes which needs *a lot* of stack space, e.g. Fortran codes,
>> OpenMP codes (and especially fortran OpenMP codes). Users tends to
>> hard-code the setting-up the higher value for stack size ulimit.
>>
>> Normally, the using of a lot of virtual memory is no problem, because
>> there is a lot of this thing :-) But... If more than one person is
>> allowed to work on a computer, you have to divide the ressources in
>> such a way that nobody can crash the box. We do not know how to limit
>> the real RAM used so we need to divide the RAM by means of setting
>> virtual memory ulimit (in our batch system e.g.. That is, for us
>> "virtual memory consumption" = "real memory consumption".
>> And real memory is not that way cheap than virtual memory.
>>
>>
>> So, why consuming the *twice* amount of stack size for each process?
>>
>> And, why consuming the virtual memory at all? We guess this virtual
>> memory is allocated for the stack (why else it will be related to the
>> stack size ulimit). But, is such 

[OMPI users] MPI_Comm_set_errhandler: error in Fortran90 Interface mpi.mod

2010-05-03 Thread Paul Kapinos

Hello OpenMPI / Sun/Oracle MPI folks,

we believe that the OpenMPI and SunMPI (Cluster Tools)  has an error in 
the Fortran-90 (f90) bindings of the MPI_Comm_set_errhandler routine.


Tested MPI versions: OpenMPI/1.3.3 and Cluster Tools 8.2.1

Consider the attached example. This file uses the "USE MPI" to bind the 
MPI routines f90-style. The f77-style "include 'mpif.h'" is commented out.


If using Intel MPI the attached example is running error-free (with both 
bindings).


If trying to compiler with OpenMPI and using f90 bindings, any compilers 
tested (Intel/11.1, Sun Studio/12.1, gcc/4.1) says the code cannot be 
build because of trying to use a constant (MPI_COMM_WORLD) as input.


For example, the output of the Intel compiler:
-
MPI_Comm_set_errhandler.f90(12): error #6638: An actual argument is an 
expression or constant; this is not valid since the associated dummy 
argument has the explicit INTENT(OUT) or INTENT(INOUT) attribute.   [0]
call MPI_Comm_set_errhandler (MPI_COMM_WORLD, errhandler, ierr)  ! 
MPI_COMM_WORLD in MPI_Comm_set_errhandler is the problem...

--^
compilation aborted for MPI_Comm_set_errhandler.f90 (code 1)
-
With the f77 bindings, the attached program compiles and runs fine.

The older (deprecated) routine MPI_Errhandler_set which is defined to 
have the same functionality works fine with both bindings and all MPI's.


So, we believe the OpenMPI implementation of MPI standard erroneously 
sets the INTENT(OUT) or INTENT(INOUT) attribute for the communicator 
attribute. The definition of an error handle for MPI_COMM_WORLD should 
be possible which it is currently not.


Best wishes,
Paul Kapinos





--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
PROGRAM sunerr
USE MPI   ! f90: Error on MPI_Comm_set_errhandler if using this with OpenMPI / Sun MPI
!include 'mpif.h'  ! f77: Works fine with all MPI's tested
IMPLICIT NONE
!
integer :: data = 1, errhandler, ierr
external AbortWithMessage
!
call MPI_Init(ierr)
call MPI_Comm_create_errhandler (AbortWithMessage, errhandler, ierr)  ! Creating a handle: no problem

call MPI_Comm_set_errhandler (MPI_COMM_WORLD, errhandler, ierr)  ! MPI_COMM_WORLD in MPI_Comm_set_errhandler is the problem... in f90
!call MPI_Errhandler_set (MPI_COMM_WORLD, errhandler, ierr)! and this one deprecated function works fine both for f77 and f90


! ... a errornous MPI routine ... 
call MPI_Send (data, 1, MPI_INTEGER, 1, -12, MPI_COMM_WORLD, ierr)
call MPI_Finalize( ierr )

END PROGRAM sunerr



subroutine AbortWithMessage (comm, errorcode)
  use mpi
  implicit none
  integer :: comm, errorcode
  character(LEN=MPI_MAX_ERROR_STRING) :: errstr
  integer :: stringlength, ierr
  call MPI_Error_string (errorcode, errstr, stringlength, ierr)
  write (*,*) 'Error:  =+=>  ', errstr, ' =+=> Aborting'
  call MPI_Abort (comm, errorcode, ierr)
end subroutine AbortWithMessage



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] Fortran derived types

2010-05-06 Thread Paul Kapinos

Hi,

In general, even in your serial fortran code, you're already 
taking a performance hit using a derived type. 


That is not generally true. The right statement is: "it depends".

Yes, sometimes derived data types and object orientation and so on can 
lead to some performance hit; but current compiler usually can oprimise 
alot.


E.g. consider http://www.terboven.com/download/OAbstractionsLA.pdf 
(especially p.19).



So, I would not recommend to disturb the ready program in order to let 
it be the old good f77 style. And let us not start a flame about 
"assembler is faster but OO is easier"! :-)


Best wishes
Paul





-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Prentice Bisbal
Sent: Wednesday, May 05, 2010 11:51 AM
To: Open MPI Users
Subject: Re: [OMPI users] Fortran derived types

Vedran Coralic wrote:

Hello,

In my Fortran 90 code I use several custom defined derived types.
Amongst them is a vector of arrays, i.e. v(:)%f(:,:,:). I am wondering 
what the proper way of sending this data structure from one processor 
to another is. Is the best way to just restructure the data by copying 
it into a vector and sending that or is there a simpler way possible 
by defining an MPI derived type that can handle it?


I looked into the latter myself but so far, I have only found the 
solution for a scalar fortran derived type and the methodology that 
was suggested in that case did not seem naturally extensible to the vector case.




It depends on how your data is distributed in memory. If the arrays are evenly 
distributed, like what would happen in a multidimensional-array, the derived 
datatypes will work fine. If you can't guarantee the spacing between the arrays 
that make up the vector, then using MPI_Pack/MPI_Unpack (or whatever the 
Fortran equivalents are) is the best way to go.

I'm not an expert MPI programmer, but I wrote a small program earlier this year 
that created a dynamically created array of dynamically created arrays. After 
doing some research into this same problem, it looked like packing/unpacking 
was the only way to go.

Using Pack/Unpack is easy, but there is a performance hit since the data needs 
to be copied into the packed buffer before sending, and then copied out of the 
buffer after the receive.


--
Prentice
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] Why compilig in global paths (only) for configuretion files?

2008-09-08 Thread Paul Kapinos

Hi all!

We are using OpenMPI on an variety of machines (running Linux, 
Solaris/Sparc and /Opteron) using couple of compilers (GCC, Sun Studio, 
Intel, PGI, 32 and 64 bit...) so we have at least 15 versions of each 
release of OpenMPI (SUN Cluster Tools not included).


This shows, that we have to support an complete petting zoo of 
OpenMPI's. Sometimes we may need to move things around.



If OpenMPI is being configured, the install path may be provided using 
--prefix keyword, say so:


./configure --prefix=/my/love/path/for/openmpi/tmp1

After "gmake all install" in ...tmp1 an installation of OpenMPI may be 
found.


Then, say, we need to *move* this Version to an another path, say 
/my/love/path/for/openmpi/blupp


Of course we have to set $PATH and $LD_LIBRARY_PATH accordingly (we can 
that ;-)


And if we tried to use OpenMPI from new location, we got error message like

$ ./mpicc
Cannot open configuration file 
/my/love/path/for/openmpi/tmp1/share/openmpi/mpicc-wrapper-data.txt

Error parsing data file mpicc: Not found

(note the old installation path used)

That looks for me, that the install path provided with --prefix in 
configuration step, is compiled into opal_wrapper executable file and 
opal_wrapper works iff the set of configuration files is in this path. 
But after move of the OpenMP installation directory the configuration 
files aren't there...


An side effect of this behaviour is the certainty that binary 
distributions of OpenMPI (RPM's) are not relocatable. That's 
uncomfortably. (Actually, this mail is initiated by the fact that Sun 
ClusterTools RPM's are not relocatable)



So, does this behavior have an deeper sence I cannot recognise, or maybe 
 the configuring of global paths is not needed?


What I mean, is that the paths for the configuration files, which 
opal_wrapper need, may be setted locally like ../share/openmpi/*** 
without affectiong the integrity of OpenMPI. Maybe there were were more 
places where the usage of local paths may be needed to allowe movable 
(relocable) OpenMPI.


What do you mean about?

Best regards
Paul Kapinos



<>

smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] Need help resolving No route to host error with OpenMPI 1.1.2

2008-09-09 Thread Paul Kapinos

Hi,

First, consider to update to newer OpenMPI.

Second, look on your environment on the box you startts OpenMPI (runs 
mpirun ...).


Type
ulimit -n
to explore how many file descriptors your envirinment have. (ulimit -a 
for all limits). Note, every process on older versions of OpenMPI (prior 
1.2.6 inclusively) needs an own file descriptor for each process 
started, IMHO. Maybe its your problem? Does your HelloWorld run OK with 
some 500 processes?


best regards
PK



Prasanna Ranganathan wrote:

Hi,

I am trying to run a test mpiHelloWorld program that simply initializes 
the MPI environment on all the nodes, prints the hostname and rank of 
each node in the MPI process group and exits.


I am using MPI 1.1.2 and am running 997 processes on 499 nodes (Nodes 
have 2 dual core CPUs).


I get the following error messages when I run my program as follows: 
mpirun -np 997 -bynode -hostfile nodelist /main/mpiHelloWorld

.
.
.
[0,1,380][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 
[0,1,142][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 
[0,1,140][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 
[0,1,390][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 
connect() failed with errno=113
connect() failed with errno=113connect() failed with errno=113connect() 
failed with 
errno=113[0,1,138][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 

connect() failed with 
errno=113[0,1,384][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 
[0,1,144][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 
connect() failed with errno=113
[0,1,388][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 
connect() failed with 
errno=113[0,1,386][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 
connect() failed with errno=113
[0,1,139][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] 
connect() failed with errno=113

connect() failed with errno=113
.
.

*The main thing is that I get these error messages around 3-4 times out 
of 10 attempts with the rest all completing successfully. I have looked 
into the FAQs in detail and also checked the tcp btl settings but am not 
able to figure it out.

*
All the 499 nodes have only eth0 active and I get the error even when I 
run the following: mpirun -np 997 -bynode –hostfile nodelist --mca 
btl_tcp_if_include eth0 /main/mpiHelloWorld


I have attached the output of ompi_info —all.

The following is the output of /sbin/ifconfig on the node where I start 
the mpi process (it is one of the 499 nodes)


eth0  Link encap:Ethernet  HWaddr 00:03:25:44:8F:D6  
  inet addr:10.12.1.11  Bcast:10.12.255.255  Mask:255.255.0.0

  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:1978724556 errors:17 dropped:0 overruns:0 frame:17
  TX packets:1767028063 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:580938897359 (554026.5 Mb)  TX bytes:689318600552 
(657385.4 Mb)

  Interrupt:22 Base address:0xc000

loLink encap:Local Loopback  
  inet addr:127.0.0.1  Mask:255.0.0.0

  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:70560 errors:0 dropped:0 overruns:0 frame:0
  TX packets:70560 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:339687635 (323.9 Mb)  TX bytes:339687635 (323.9 Mb)


Kindly help.

Regards,

Prasanna.




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


<>

smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] Why compilig in global paths (only) for configuretion files?

2008-09-17 Thread Paul Kapinos

Hi Jeff again!

But the setting of the environtemt variable OPAL_PREFIX to an 
appropriate value (assuming PATH and LD_LIBRARY_PATH are setted too) 
is not enough to let the OpenMPI rock from the new lokation.


Hmm.  It should be.


(update) it works with "truly" OpenMPI, but it works *not* with SUN 
Cluster Tools 8.0 (which is also an OpenMPI). So, it seems be an SUN 
problem and not general problem of openMPI. Sorry for false relating the 
problem.



The only trouble we have now are the error messages like

--
Sorry!  You were supposed to get help about:
no hca params found
from the file:
help-mpi-btl-openib.txt
But I couldn't find any file matching that name.  Sorry!
--

(the job still runs without problems! :o)

if running openmpi from new location, and the old location being 
removed. (if the old location being also persistense there is no error, 
so it seems to be an attempt to access to an file on old path).


Maybe we have to explicitly pass the OPAL_PREFIX environment variable to 
all processes?




Because of the fact, that all the files containing settings for 
opal_wrapper, which are located in share/openmpi/ and called e.g. 
mpif77-wrapper-data.txt, contain (defined by installation with 
--prefix) hard-coded paths, too.


Hmm; they should not.  In my 1.2.7 install, I see the following:

-
[11:14] svbu-mpi:/home/jsquyres/bogus/share/openmpi % cat 
mpif77-wrapper-data.txt

# There can be multiple blocks of configuration data, chosen by
# compiler flags (using the compiler_args key to chose which block
# should be activated.  This can be useful for multilib builds.  See the
# multilib page at:
#https://svn.open-mpi.org/trac/ompi/wiki/compilerwrapper3264
# for more information.

project=Open MPI
project_short=OMPI
version=1.2.7rc6r19546
language=Fortran 77
compiler_env=F77
compiler_flags_env=FFLAGS
compiler=gfortran
extra_includes=
preprocessor_flags=
compiler_flags=
linker_flags=
libs=-lmpi_f77 -lmpi -lopen-rte -lopen-pal   -ldl   -Wl,--export-dynamic 
-lnsl -lutil -lm -ldl

required_file=not supported
includedir=${includedir}
libdir=${libdir}
[11:14] svbu-mpi:/home/jsquyres/bogus/share/openmpi %
-

Note the "includedir" and "libdir" lines -- they're expressed in terms 
of ${foo}, which we can replace when OPAL_PREFIX (or related) is used.


What version of OMPI are you using?



Note one of configure files contained in Sun ClusterMPI 8.0 (see 
attached file). The paths are really hard-coded in instead of usage of 
variables; this makes the package really not relocable without parsing 
the configure files.


Did you (or anyone reading this message) have any contact to SUN 
developers to point to this circumstance? *Why* do them use hard-coded 
paths? :o)


best regards,

Paul Kapinos
#
# Default word-size (used when -m flag is supplied to wrapper compiler)
#
compiler_args=

project=Open MPI
project_short=OMPI
version=r19400-ct8.0-b31c-r29

language=Fortran 90
compiler_env=FC
compiler_flags_env=FCFLAGS
compiler=f90
module_option=-M
extra_includes=
preprocessor_flags=
compiler_flags=
libs=-lmpi -lopen-rte -lopen-pal -lnsl -lrt -lm -ldl -lutil -lpthread -lmpi_f77 
-lmpi_f90
linker_flags=-R/opt/mx/lib/lib64 -R/opt/SUNWhpc/HPC8.0/lib/lib64 
required_file=
includedir=/opt/SUNWhpc/HPC8.0/include/64
libdir=/opt/SUNWhpc/HPC8.0/lib/lib64

#
# Alternative word-size (used when -m flag is not supplied to wrapper compiler)
#
compiler_args=-m32

project=Open MPI
project_short=OMPI
version=r19400-ct8.0-b31c-r29

language=Fortran 90
compiler_env=FC
compiler_flags_env=FCFLAGS
compiler=f90
module_option=-M
extra_includes=
preprocessor_flags=
compiler_flags=-m32
libs=-lmpi -lopen-rte -lopen-pal -lnsl -lrt -lm -ldl -lutil -lpthread -lmpi_f77 
-lmpi_f90
linker_flags=-R/opt/mx/lib -R/opt/SUNWhpc/HPC8.0/lib 
required_file=
includedir=/opt/SUNWhpc/HPC8.0/include
libdir=/opt/SUNWhpc/HPC8.0/lib
<>

smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] Why compilig in global paths (only) for configuretion files?

2008-09-17 Thread Paul Kapinos

Hi Rolf,

Rolf vandeVaart wrote:

I don't know -- this sounds like an issue with the Sun CT 8 build 
process.  It could also be a by-product of using the combined 32/64 
feature...?  I haven't used that in forever and I don't remember the 
restrictions.  Terry/Rolf -- can you comment?



I will write an separate eMail to ct-feedb...@sun.com


Hi Paul:
Yes, there are Sun people on this list!  We originally put those 
hardcoded paths in to make everything work correctly out of the box and 
our install process ensured that everything would be at 
/opt/SUNWhpc/HPC8.0.  However, let us take a look at everything that was 
just discussed here and see what we can do.  We will get back to you 
shortly.




I've just sent an eMail to ct-feedb...@sun.com with some explanation of 
our troubles...


The main trouble: we wanna to have *both* versions of CT8.0 (for studio 
and for gnu compiler) installed on same sythems. The RPMs are not 
relocatable, have same name and installs everything into the same 
directories... yes, it works out-of-box, but iff just *one* version 
installed. So, I started to move installations around, asking on these 
mailing list, setting envvars, and parsing configuretion files


I think installing everyting to hard-coded paths is somewhat inflexible. 
Maybe you may provide relocatable RPMs somewhere in the future?


But as mentioned above, our main goal is to have both versions of CT on 
same sythem working.


Best regards,

Paul Kapinos
<>

smime.p7s
Description: S/MIME Cryptographic Signature


[OMPI users] Errors compiling OpenMPI 1.2.8 with SUN Studio express (2008/07/10) in 32bit modus

2008-10-16 Thread Paul Kapinos

Hi all,

We tried to install OpenMPI 1.2.8 on Linux in a couple of versions here 
(compiler from intel, pgi, studio, gcc - all 64bit and 32bit).


If we used SUN Studio Express (2008/07/10) and configured to produce 
32bit library, we got following errors (full log see in file 
my_makelog_sun32.txt)


..
gmake[2]: Entering directory 
`/rwthfs/rz/cluster/home/pk224850/OpenMPI/openmpi-1.2.8_studio32/ompi/mca/btl/openib'
source='btl_openib_component.c' object='btl_openib_component.lo' 
libtool=yes \

DEPDIR=.deps depmode=none /bin/sh ../../../../config/depcomp \
	/bin/sh ../../../../libtool --tag=CC   --mode=compile cc 
-DHAVE_CONFIG_H -I. -I../../../../opal/include 
-I../../../../orte/include -I../../../../ompi/include 
-DPKGDATADIR=\"/rwthfs/rz/SW/MPI/openmpi-1.2.8/linux32/studio/share/openmpi\" 
-I../../../..-DNDEBUG -O2 -m32  -c -o btl_openib_component.lo 
btl_openib_component.c
libtool: compile:  cc -DHAVE_CONFIG_H -I. -I../../../../opal/include 
-I../../../../orte/include -I../../../../ompi/include 
-DPKGDATADIR=\"/rwthfs/rz/SW/MPI/openmpi-1.2.8/linux32/studio/share/openmpi\" 
-I../../../.. -DNDEBUG -O2 -m32 -c btl_openib_component.c  -KPIC -DPIC 
-o .libs/btl_openib_component.o
"../../../../opal/include/opal/sys/ia32/atomic.h", line 167: warning: 
impossible constraint for "%1" asm operand
"../../../../opal/include/opal/sys/ia32/atomic.h", line 167: warning: 
parameter in inline asm statement unused: %2
"../../../../opal/include/opal/sys/ia32/atomic.h", line 184: warning: 
impossible constraint for "%1" asm operand
"../../../../opal/include/opal/sys/ia32/atomic.h", line 184: warning: 
parameter in inline asm statement unused: %2
"/usr/include/infiniband/kern-abi.h", line 103: syntax error before or 
at: __u64
"/usr/include/infiniband/kern-abi.h", line 109: syntax error before or 
at: __u64
"/usr/include/infiniband/kern-abi.h", line 124: syntax error before or 
at: __u64
"/usr/include/infiniband/kern-abi.h", line 135: syntax error before or 
at: __u64

...


This seems for us to be an error on linux headers in file kern-abi.h 
which includes  linux/types.h which contains this:



#if defined(__GNUC__) && !defined(__STRICT_ANSI__)
typedef __u64   uint64_t;
typedef __u64   u_int64_t;
typedef __s64   int64_t;
#endif


So, it looks for us so, that by byilding of openmpi 1.2.8 the SUN Studio 
compiler cannot compile some Linux headers because of these are 
programmed in "GNU C" instead of ANSI C.


If so then this is an Linux issue and not OpenMPI's - but, if so, *why* 
did you not seen this problems during of release preparation? That is, 
maybe we have done some mistakes? Maybe the devel headers and/or static 
libs are the problem? (I will try to disable them, but we want to report 
this problem anyway).






We use Scientific Linux 5.1 which is an Red Hat Enterprice 5 Linux.

$ uname -a
Linux linuxhtc01.rz.RWTH-Aachen.DE 2.6.18-53.1.14.el5_lustre.1.6.5custom 
#1 SMP Wed Jun 25 12:17:09 CEST 2008 x86_64 x86_64 x86_64 GNU/Linux



configured with:


 ./configure --enable-static --with-devel-headers CFLAGS="-O2 -m32" 
CXXFLAGS="-O2 -m32" FFLAGS="-O2 -m32" FCFLAGS="-O2 -m32" LDFLAGS="-m32" 
--prefix=/rwthfs/rz/SW/MPI/openmpi-1.2.8/linux32/studio



Best regards,

Paul Kapinos
HPC Group
RZ RWTH Aachen











This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.2.8, which was
generated by GNU Autoconf 2.61.  Invocation command line was

  $ ./configure --enable-static --with-devel-headers CFLAGS=-O2 -m32 CXXFLAGS=-O2 -m32 FFLAGS=-O2 -m32 FCFLAGS=-O2 -m32 LDFLAGS=-m32 --prefix=/rwthfs/rz/SW/MPI/openmpi-1.2.8/linux32/studio CC=cc CXX=CC FC=f95 --enable-ltdl-convenience --no-create --no-recursion

## - ##
## Platform. ##
## - ##

hostname = linuxhtc01.rz.RWTH-Aachen.DE
uname -m = x86_64
uname -r = 2.6.18-53.1.14.el5_lustre.1.6.5custom
uname -s = Linux
uname -v = #1 SMP Wed Jun 25 12:17:09 CEST 2008

/usr/bin/uname -p = x86_64
/bin/uname -X = unknown

/bin/arch  = x86_64
/usr/bin/arch -k   = x86_64
/usr/convex/getsysinfo = unknown
/usr/bin/hostinfo  = unknown
/bin/machine   = unknown
/usr/bin/oslevel   = unknown
/bin/universe  = unknown

PATH: /rwthfs/rz/SW/UTIL/StudioExpress20080724/SUNWspro/bin
PATH: /home/pk224850/bin
PATH: /usr/local_host/sbin
PATH: /usr/local_host/bin
PATH: /usr/local_rwth/sbin
PATH: /usr/local_rwth/bin
PATH: /usr/bin
PATH: /usr/sbin
PATH: /sbin
PATH: /usr/dt/bin
PATH: /usr/bin/X11
PATH: /usr/java/bin
PATH: /usr/local/bin
PATH: /usr/local/sbin
PATH: /opt/csw/bin
PATH: .


## --- ##
## Core tests. ##
## --- ##

configure:2986: checking for a BS

[OMPI users] OMPIO correctnes issues

2015-12-09 Thread Paul Kapinos

Dear Open MPI developers,
did OMPIO (1) reached 'usable-stable' state?

As we reported in (2) we had some trouble in building Open MPI with ROMIO, which 
fact was hidden by OMPIO implementation stepping into the MPI_IO breach. The 
fact 'ROMIO isn't AVBL' was detected after users complained 'MPI_IO don't work 
as expected with version XYZ of OpenMPI' and further investigations.


Take a look at the attached example. It deliver different result in case of 
using ROMIO and OMPIO even with 1 MPI rank on local hard disk, cf. (3). We've 
seen more examples of divergent behaviour but this one is quite handy.


Is that a bug in OMPIO or did we miss something?

Best
Paul Kapinos


1) http://www.open-mpi.org/faq/?category=ompio

2) http://www.open-mpi.org/community/lists/devel/2015/12/18405.php

3) (ROMIO is default; on local hard drive at node 'cluster')
$ ompi_info  | grep  romio
  MCA io: romio (MCA v2.0.0, API v2.0.0, Component v1.10.1)
$ ompi_info  | grep  ompio
  MCA io: ompio (MCA v2.0.0, API v2.0.0, Component v1.10.1)
$ mpif90 main.f90

$ echo hello1234 > out.txt; $MPIEXEC -np 1 -H cluster  ./a.out;
 fileOffset, fileSize1010
 fileOffset, fileSize2626
 ierr0
 MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128

$ export OMPI_MCA_io=ompio
$ echo hello1234 > out.txt; $MPIEXEC -np 1 -H cluster  ./a.out;
 fileOffset, fileSize 010
 fileOffset, fileSize 016
 ierr0
 MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
program example
use mpi
integer:: ierr
integer(MPI_OFFSET_KIND) :: fileOffset
integer(KIND=MPI_OFFSET_KIND):: fileSize
real :: outData(10)
integer :: resUnit=565
 call MPI_INIT(ierr)
 call MPI_file_open(MPI_COMM_WORLD,  'out.txt',   MPI_MODE_WRONLY + 
MPI_MODE_APPEND,  MPI_INFO_NULL,  resUnit, ierr)
 
 call MPI_FILE_GET_SIZE (resUnit, fileSize, ierr)
 call MPI_file_get_position(resUnit,fileOffset,ierr)
 print *, 'fileOffset, fileSize', fileOffset, fileSize
 
 call MPI_file_seek (resUnit,fileOffset,MPI_SEEK_SET,ierr)
 call MPI_file_write(resUnit, outData, 2, &
 MPI_DOUBLE, MPI_STATUS_IGNORE, ierr)
 
 call MPI_file_get_position(resUnit,fileOffset,ierr)
 call MPI_FILE_GET_SIZE (resUnit, fileSize, ierr)
 print *, 'fileOffset, fileSize', fileOffset, fileSize
 
 
 print *, 'ierr ', ierr
 print *, 'MPI_MODE_WRONLY,  MPI_MODE_APPEND ', MPI_MODE_WRONLY,  
MPI_MODE_APPEND
 
 
 call MPI_file_close(resUnit,ierr)
 call MPI_FINALIZE(ierr)
end


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] OMPIO correctnes issues

2015-12-09 Thread Paul Kapinos

Sorry, forgot to mention: 1.10.1


Open MPI: 1.10.1
  Open MPI repo revision: v1.10.0-178-gb80f802
   Open MPI release date: Nov 03, 2015
Open RTE: 1.10.1
  Open RTE repo revision: v1.10.0-178-gb80f802
   Open RTE release date: Nov 03, 2015
OPAL: 1.10.1
  OPAL repo revision: v1.10.0-178-gb80f802
   OPAL release date: Nov 03, 2015
 MPI API: 3.0.0
Ident string: 1.10.1


On 12/09/15 11:26, Gilles Gouaillardet wrote:

Paul,

which OpenMPI version are you using ?

thanks for providing a simple reproducer, that will make things much easier from
now.
(and at first glance, that might not be a very tricky bug)

Cheers,

Gilles

On Wednesday, December 9, 2015, Paul Kapinos <kapi...@itc.rwth-aachen.de
<mailto:kapi...@itc.rwth-aachen.de>> wrote:

Dear Open MPI developers,
did OMPIO (1) reached 'usable-stable' state?

As we reported in (2) we had some trouble in building Open MPI with ROMIO,
which fact was hidden by OMPIO implementation stepping into the MPI_IO
breach. The fact 'ROMIO isn't AVBL' was detected after users complained
'MPI_IO don't work as expected with version XYZ of OpenMPI' and further
investigations.

Take a look at the attached example. It deliver different result in case of
using ROMIO and OMPIO even with 1 MPI rank on local hard disk, cf. (3).
We've seen more examples of divergent behaviour but this one is quite handy.

Is that a bug in OMPIO or did we miss something?

    Best
Paul Kapinos


1) http://www.open-mpi.org/faq/?category=ompio

2) http://www.open-mpi.org/community/lists/devel/2015/12/18405.php

3) (ROMIO is default; on local hard drive at node 'cluster')
$ ompi_info  | grep  romio
   MCA io: romio (MCA v2.0.0, API v2.0.0, Component v1.10.1)
$ ompi_info  | grep  ompio
   MCA io: ompio (MCA v2.0.0, API v2.0.0, Component v1.10.1)
$ mpif90 main.f90

$ echo hello1234 > out.txt; $MPIEXEC -np 1 -H cluster  ./a.out;
  fileOffset, fileSize1010
  fileOffset, fileSize2626
  ierr0
  MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128

$ export OMPI_MCA_io=ompio
$ echo hello1234 > out.txt; $MPIEXEC -np 1 -H cluster  ./a.out;
  fileOffset, fileSize 010
  fileOffset, fileSize 016
  ierr0
  MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/12/28145.php




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] OMPIO correctnes issues

2015-12-09 Thread Paul Kapinos

Dear Edgar,


On 12/09/15 16:16, Edgar Gabriel wrote:

I tested your code in master and v1.10 ( on my local machine), and I get for
both version of ompio exactly the same (correct) output that you had with romio.


I've tested it at local hard disk..

pk224850@cluster:/tmp/pk224850/cluster_15384/TMP[529]$ df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/sda3   1.1T   16G  1.1T   2% /w0

pk224850@cluster:/tmp/pk224850/cluster_15384/TMP[530]$ echo hell-o > out.txt; 
./a.out

 fileOffset, fileSize 7 7
 fileOffset, fileSize2323
 ierr0
 MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128

pk224850@cluster:/tmp/pk224850/cluster_15384/TMP[531]$ export OMPI_MCA_io=ompio

pk224850@cluster:/tmp/pk224850/cluster_15384/TMP[532]$ echo hell-o > out.txt; 
./a.out

 fileOffset, fileSize 0 7
 fileOffset, fileSize 016
 ierr0
 MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128



However, I also noticed that in the ompio version that is in the v1.10 branch,
the MPI_File_get_size function is not implemented on lustre.


Yes we have Lustre in the cluster.
I believe that was one of 'another' issues mentioned, yes some users tend to use 
Lustre as HPC file system =)







Thanks
Edgar

On 12/9/2015 8:06 AM, Edgar Gabriel wrote:

I will look at your test case and see what is going on in ompio. That
being said, the vast number of fixes and improvements that went into
ompio over the last two years were not back ported to the 1.8 (and thus
1.10) series, since it would have required changes to the interfaces of
the frameworks involved (and thus would have violated one of rules of
Open MPI release series) . Anyway, if there is a simple fix for your
test case for the 1.10 series, I am happy to provide a patch. It might
take me a day or two however.

Edgar

On 12/9/2015 6:24 AM, Paul Kapinos wrote:

Sorry, forgot to mention: 1.10.1


   Open MPI: 1.10.1
 Open MPI repo revision: v1.10.0-178-gb80f802
  Open MPI release date: Nov 03, 2015
   Open RTE: 1.10.1
 Open RTE repo revision: v1.10.0-178-gb80f802
  Open RTE release date: Nov 03, 2015
   OPAL: 1.10.1
 OPAL repo revision: v1.10.0-178-gb80f802
  OPAL release date: Nov 03, 2015
MPI API: 3.0.0
   Ident string: 1.10.1


On 12/09/15 11:26, Gilles Gouaillardet wrote:

Paul,

which OpenMPI version are you using ?

thanks for providing a simple reproducer, that will make things much easier
from
now.
(and at first glance, that might not be a very tricky bug)

Cheers,

Gilles

On Wednesday, December 9, 2015, Paul Kapinos <kapi...@itc.rwth-aachen.de
<mailto:kapi...@itc.rwth-aachen.de>> wrote:

  Dear Open MPI developers,
  did OMPIO (1) reached 'usable-stable' state?

  As we reported in (2) we had some trouble in building Open MPI with
ROMIO,
  which fact was hidden by OMPIO implementation stepping into the MPI_IO
  breach. The fact 'ROMIO isn't AVBL' was detected after users complained
  'MPI_IO don't work as expected with version XYZ of OpenMPI' and further
  investigations.

  Take a look at the attached example. It deliver different result in
case of
  using ROMIO and OMPIO even with 1 MPI rank on local hard disk, cf. (3).
  We've seen more examples of divergent behaviour but this one is quite
handy.

  Is that a bug in OMPIO or did we miss something?

      Best
  Paul Kapinos


  1) http://www.open-mpi.org/faq/?category=ompio

  2) http://www.open-mpi.org/community/lists/devel/2015/12/18405.php

  3) (ROMIO is default; on local hard drive at node 'cluster')
  $ ompi_info  | grep  romio
 MCA io: romio (MCA v2.0.0, API v2.0.0, Component
v1.10.1)
  $ ompi_info  | grep  ompio
 MCA io: ompio (MCA v2.0.0, API v2.0.0, Component
v1.10.1)
  $ mpif90 main.f90

  $ echo hello1234 > out.txt; $MPIEXEC -np 1 -H cluster  ./a.out;
fileOffset, fileSize1010
fileOffset, fileSize2626
ierr0
MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128

  $ export OMPI_MCA_io=ompio
  $ echo hello1234 > out.txt; $MPIEXEC -np 1 -H cluster  ./a.out;
fileOffset, fileSize 010
fileOffset, fileSize 016
ierr0
MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128


  --
  Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
  RWTH Aachen University, IT Center
  Seffenter Weg 23,  D 52074  Aachen (Germany)
  Tel: +49 241/80-24915



_

[OMPI users] funny SIGSEGV in 'ompi_info'

2016-11-14 Thread Paul Kapinos

Dear developers,
also the following issue is defintely raised by a misconfiguration of Open MPI, 
SIGSEGV's in 'ompi_info' isn'n a good thing, thus this one mail.


Just call:
$ export OMPI_MCA_mtl="^tcp,^ib"
$ ompi_info --param all all --level 9
... and take a look at the below core dump of 'ompi_info' like below one.

(yes we know that "^tcp,^ib" is a bad idea).

Have a nice day,

Paul Kapinos

P.S. Open MPI: 1.10.4 and 2.0.1 have the same behaviour

--
[lnm001:39957] *** Process received signal ***
[lnm001:39957] Signal: Segmentation fault (11)
[lnm001:39957] Signal code: Address not mapped (1)
[lnm001:39957] Failing at address: (nil)
[lnm001:39957] [ 0] /lib64/libpthread.so.0(+0xf100)[0x2b30f1a79100]
[lnm001:39957] [ 1] 
/opt/MPI/openmpi-1.10.4/linux/intel_16.0.2.181/lib/libopen-pal.so.13(+0x2f11f)[0x2b30f084911f]
[lnm001:39957] [ 2] 
/opt/MPI/openmpi-1.10.4/linux/intel_16.0.2.181/lib/libopen-pal.so.13(+0x2f265)[0x2b30f0849265]
[lnm001:39957] [ 3] 
/opt/MPI/openmpi-1.10.4/linux/intel_16.0.2.181/lib/libopen-pal.so.13(opal_info_show_mca_params+0x91)[0x2b30f0849031]
[lnm001:39957] [ 4] 
/opt/MPI/openmpi-1.10.4/linux/intel_16.0.2.181/lib/libopen-pal.so.13(opal_info_do_params+0x1f4)[0x2b30f0848e84]

[lnm001:39957] [ 5] ompi_info[0x402643]
[lnm001:39957] [ 6] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b30f1ca7b15]
[lnm001:39957] [ 7] ompi_info[0x4022a9]
[lnm001:39957] *** End of error message ***
zsh: segmentation fault (core dumped)  ompi_info --param all all --level 9
--



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915




smime.p7s
Description: S/MIME Cryptographic Signature
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-12-14 Thread Paul Kapinos

Hello all,
we seem to run into the same issue: 'mpif90' sigsegvs immediately for Open MPI 
1.10.4 compiled using Intel compilers 16.0.4.258 and 16.0.3.210, while it works 
fine when compiled with 16.0.2.181.


It seems to be a compiler issue (more exactly: library issue on libs delivered 
with 16.0.4.258 and 16.0.3.210 versions). Changing the version of compiler 
loaded back to 16.0.2.181 (=> change of dynamically loaded libs) let the 
prevously-failing binary (compiled with newer compilers) to work propperly.


Compiling with -O0 does not help. As the issue is likely in the Intel libs (as 
said changing out these solves/raises the issue) we will do a failback to 
16.0.2.181 compiler version. We will try to open a case by Intel - let's see...


Have a nice day,

Paul Kapinos



On 05/06/16 14:10, Jeff Squyres (jsquyres) wrote:

Ok, good.

I asked that question because typically when we see errors like this, it is 
usually either a busted compiler installation or inadvertently mixing the 
run-times of multiple different compilers in some kind of incompatible way.  
Specifically, the mpifort (aka mpif90) application is a fairly simple program 
-- there's no reason it should segv, especially with a stack trace that you 
sent that implies that it's dying early in startup, potentially even before it 
has hit any Open MPI code (i.e., it could even be pre-main).

BTW, you might be able to get a more complete stack trace from the debugger 
that comes with the Intel compiler (idb?  I don't remember offhand).

Since you are able to run simple programs compiled by this compiler, it sounds 
like the compiler is working fine.  Good!

The next thing to check is to see if somehow the compiler and/or run-time 
environments are getting mixed up.  E.g., the apps were compiled for one 
compiler/run-time but are being used with another.  Also ensure that any 
compiler/linker flags that you are passing to Open MPI's configure script are 
native and correct for the platform for which you're compiling (e.g., don't 
pass in flags that optimize for a different platform; that may result in 
generating machine code instructions that are invalid for your platform).

Try recompiling/re-installing Open MPI from scratch, and if it still doesn't 
work, then send all the information listed here:

https://www.open-mpi.org/community/help/



On May 6, 2016, at 3:45 AM, Giacomo Rossi <giacom...@gmail.com> wrote:

Yes, I've tried three simple "Hello world" programs in fortan, C and C++ and 
the compile and run with intel 16.0.3. The problem is with the openmpi compiled from 
source.

Giacomo Rossi Ph.D., Space Engineer

Research Fellow at Dept. of Mechanical and Aerospace Engineering, "Sapienza" 
University of Rome
p: (+39) 0692927207 | m: (+39) 3408816643 | e: giacom...@gmail.com

Member of Fortran-FOSS-programmers


2016-05-05 11:15 GMT+02:00 Giacomo Rossi <giacom...@gmail.com>:
 gdb /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90
GNU gdb (GDB) 7.11
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90...(no 
debugging symbols found)...done.
(gdb) r -v
Starting program: /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90 -v

Program received signal SIGSEGV, Segmentation fault.
0x76858f38 in ?? ()
(gdb) bt
#0  0x76858f38 in ?? ()
#1  0x77de5828 in _dl_relocate_object () from 
/lib64/ld-linux-x86-64.so.2
#2  0x77ddcfa3 in dl_main () from /lib64/ld-linux-x86-64.so.2
#3  0x77df029c in _dl_sysdep_start () from /lib64/ld-linux-x86-64.so.2
#4  0x774a in _dl_start () from /lib64/ld-linux-x86-64.so.2
#5  0x77dd9d98 in _start () from /lib64/ld-linux-x86-64.so.2
#6  0x0002 in ?? ()
#7  0x7fffaa8a in ?? ()
#8  0x7fffaab6 in ?? ()
#9  0x in ?? ()

Giacomo Rossi Ph.D., Space Engineer

Research Fellow at Dept. of Mechanical and Aerospace Engineering, "Sapienza" 
University of Rome
p: (+39) 0692927207 | m: (+39) 3408816643 | e: giacom...@gmail.com

Member of Fortran-FOSS-programmers


2016-05-05 10:44 GMT+02:00 Giacomo Rossi <giacom...@gmail.com>:
Here the result of ldd command:
'ldd /opt/openmpi/1.10.2/intel/16.0.3/b

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-12-23 Thread Paul Kapinos

Hi all,

we discussed this issue with Intel compiler support and it looks like they now 
know what the issue is and how to protect after. It is a known issue resulting 
from a backwards incompatibility in an OS/glibc update, cf. 
https://sourceware.org/bugzilla/show_bug.cgi?id=20019


Affected versions of the Intel compilers: 16.0.3, 16.0.4
Not affected versions: 16.0.2, 17.0

So, simply do not use affected versions (and hope on an bugfix update in 16x 
series if you cannot immediately upgrade to 17x, like we, despite this is the 
favourite option from Intel).


Have a nice Christmas time!

Paul Kapinos

On 12/14/16 13:29, Paul Kapinos wrote:

Hello all,
we seem to run into the same issue: 'mpif90' sigsegvs immediately for Open MPI
1.10.4 compiled using Intel compilers 16.0.4.258 and 16.0.3.210, while it works
fine when compiled with 16.0.2.181.

It seems to be a compiler issue (more exactly: library issue on libs delivered
with 16.0.4.258 and 16.0.3.210 versions). Changing the version of compiler
loaded back to 16.0.2.181 (=> change of dynamically loaded libs) let the
prevously-failing binary (compiled with newer compilers) to work propperly.

Compiling with -O0 does not help. As the issue is likely in the Intel libs (as
said changing out these solves/raises the issue) we will do a failback to
16.0.2.181 compiler version. We will try to open a case by Intel - let's see...

Have a nice day,

Paul Kapinos



On 05/06/16 14:10, Jeff Squyres (jsquyres) wrote:

Ok, good.

I asked that question because typically when we see errors like this, it is
usually either a busted compiler installation or inadvertently mixing the
run-times of multiple different compilers in some kind of incompatible way.
Specifically, the mpifort (aka mpif90) application is a fairly simple program
-- there's no reason it should segv, especially with a stack trace that you
sent that implies that it's dying early in startup, potentially even before it
has hit any Open MPI code (i.e., it could even be pre-main).

BTW, you might be able to get a more complete stack trace from the debugger
that comes with the Intel compiler (idb?  I don't remember offhand).

Since you are able to run simple programs compiled by this compiler, it sounds
like the compiler is working fine.  Good!

The next thing to check is to see if somehow the compiler and/or run-time
environments are getting mixed up.  E.g., the apps were compiled for one
compiler/run-time but are being used with another.  Also ensure that any
compiler/linker flags that you are passing to Open MPI's configure script are
native and correct for the platform for which you're compiling (e.g., don't
pass in flags that optimize for a different platform; that may result in
generating machine code instructions that are invalid for your platform).

Try recompiling/re-installing Open MPI from scratch, and if it still doesn't
work, then send all the information listed here:

https://www.open-mpi.org/community/help/



On May 6, 2016, at 3:45 AM, Giacomo Rossi <giacom...@gmail.com> wrote:

Yes, I've tried three simple "Hello world" programs in fortan, C and C++ and
the compile and run with intel 16.0.3. The problem is with the openmpi
compiled from source.

Giacomo Rossi Ph.D., Space Engineer

Research Fellow at Dept. of Mechanical and Aerospace Engineering, "Sapienza"
University of Rome
p: (+39) 0692927207 | m: (+39) 3408816643 | e: giacom...@gmail.com

Member of Fortran-FOSS-programmers


2016-05-05 11:15 GMT+02:00 Giacomo Rossi <giacom...@gmail.com>:
 gdb /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90
GNU gdb (GDB) 7.11
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90...(no
debugging symbols found)...done.
(gdb) r -v
Starting program: /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90 -v

Program received signal SIGSEGV, Segmentation fault.
0x76858f38 in ?? ()
(gdb) bt
#0  0x76858f38 in ?? ()
#1  0x77de5828 in _dl_relocate_object () from
/lib64/ld-linux-x86-64.so.2
#2  0x77ddcfa3 in dl_main () from /lib64/ld-linux-x86-64.so.2
#3  0x77df029c in _dl_sysdep_start () from /lib64/ld-linux-x86-64.so.2
#4  0x774a in _dl_start () from /lib64/l

Re: [OMPI users] openib/mpi_alloc_mem pathology

2017-03-16 Thread Paul Kapinos

Jeff, I confirm: your patch did it.

(tried on 1.10.6 - do not even need to rebuild the cp2k.popt , just load another 
Open MPI version compiled with Jeff'path)


( On Intel OmpiPath the same speed as with --mca btl ^tcp,openib )


On 03/16/17 01:03, Jeff Squyres (jsquyres) wrote:

It looks like there were 3 separate threads on this CP2K issue, but I think we 
developers got sidetracked because there was a bunch of talk in the other 
threads about PSM, non-IB(verbs) networks, etc.

So: the real issue is an app is experiencing a lot of slowdown when calling 
MPI_ALLOC_MEM/MPI_FREE_MEM when the openib BTL is involved.

The MPI_*_MEM calls are "slow" when used with the openib BTL because we're 
registering the memory every time you call MPI_ALLOC_MEM and deregistering the memory 
every time you call MPI_FREE_MEM.  This was intended as an optimization such that the 
memory is already registered when you invoke an MPI communications function with that 
buffer.  I guess we didn't really anticipate the case where *every* allocation goes 
through ALLOC_MEM...

Meaning: if the app is aggressive in using MPI_*_MEM *everywhere* -- even for 
buffers that aren't used for MPI communication -- I guess you could end up with 
lots of useless registration/deregistration.  If the app does it a lot, that 
could be the source of quite a lot of needless overhead.

We don't have a run-time bypass of this behavior (i.e., we assumed that if 
you're calling MPI_*_MEM, you mean to do so).  But let's try an experiment -- 
can you try applying this patch and see if it removes the slowness?  This patch 
basically removes the registration / deregistration with ALLOC/FREE_MEM (and 
instead handles it lazily / upon demand when buffers are passed to MPI 
functions):

```patch
diff --git a/ompi/mpi/c/alloc_mem.c b/ompi/mpi/c/alloc_mem.c
index 8c8fb8cd54..c62c8ff706 100644
--- a/ompi/mpi/c/alloc_mem.c
+++ b/ompi/mpi/c/alloc_mem.c
@@ -74,6 +74,7 @@ int MPI_Alloc_mem(MPI_Aint size, MPI_Info info, void *baseptr)

 OPAL_CR_ENTER_LIBRARY();

+#if 0
 if (MPI_INFO_NULL != info) {
 int flag;
 (void) ompi_info_get (info, "mpool_hints", MPI_MAX_INFO_VAL, info_value, 

@@ -84,6 +85,9 @@ int MPI_Alloc_mem(MPI_Aint size, MPI_Info info, void *baseptr)

 *((void **) baseptr) = mca_mpool_base_alloc ((size_t) size, (struct 
opal_info_t
  mpool_hints);
+#else
+*((void **) baseptr) = malloc(size);
+#endif
 OPAL_CR_EXIT_LIBRARY();
 if (NULL == *((void **) baseptr)) {
 return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_NO_MEM,
diff --git a/ompi/mpi/c/free_mem.c b/ompi/mpi/c/free_mem.c
index 4498fc8bb1..4c65ea2339 100644
--- a/ompi/mpi/c/free_mem.c
+++ b/ompi/mpi/c/free_mem.c
@@ -50,10 +50,16 @@ int MPI_Free_mem(void *baseptr)

If you call MPI_ALLOC_MEM with a size of 0, you get NULL
back.  So don't consider a NULL==baseptr an error. */
+#if 0
 if (NULL != baseptr && OMPI_SUCCESS != mca_mpool_base_free(baseptr)) {
 OPAL_CR_EXIT_LIBRARY();
 return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_NO_MEM, 
FUNC_NAME);
 }
+#else
+if (NULL != baseptr) {
+free(baseptr);
+}
+#endif

 OPAL_CR_EXIT_LIBRARY();
 return MPI_SUCCESS;
```

This will at least tell us if the innards of our ALLOC_MEM/FREE_MEM (i.e., 
likely the registration/deregistration) are causing the issue.





On Mar 15, 2017, at 1:27 PM, Dave Love <dave.l...@manchester.ac.uk> wrote:

Paul Kapinos <kapi...@itc.rwth-aachen.de> writes:


Nathan,
unfortunately '--mca memory_linux_disable 1' does not help on this
issue - it does not change the behaviour at all.
Note that the pathological behaviour is present in Open MPI 2.0.2 as
well as in /1.10.x, and Intel OmniPath (OPA) network-capable nodes are
affected only.


[I guess that should have been "too" rather than "only".  It's loading
the openib btl that is the problem.]


The known workaround is to disable InfiniBand failback by '--mca btl
^tcp,openib' on nodes with OPA network. (On IB nodes, the same tweak
lead to 5% performance improvement on single-node jobs;


It was a lot more than that in my cp2k test.


but obviously
disabling IB on nodes connected via IB is not a solution for
multi-node jobs, huh).


But it works OK with libfabric (ofi mtl).  Is there a problem with
libfabric?

Has anyone reported this issue to the cp2k people?  I know it's not
their problem, but I assume they'd like to know for users' sake,
particularly if it's not going to be addressed.  I wonder what else
might be affected.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users






--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24

Re: [OMPI users] openib/mpi_alloc_mem pathology [#20160912-1315]

2017-03-16 Thread Paul Kapinos

Hi,

On 03/16/17 10:35, Alfio Lazzaro wrote:

We would like to ask you which version of CP2K you are using in your tests

Release 4.1



and
if you can share with us your input file and output log.


The question goes to Mr Mathias Schumacher, on CC:

Best
Paul Kapinos

(Our internal ticketing system also on CC:)


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] openib/mpi_alloc_mem pathology

2017-03-13 Thread Paul Kapinos

Nathan,
unfortunately '--mca memory_linux_disable 1' does not help on this issue - it 
does not change the behaviour at all.
 Note that the pathological behaviour is present in Open MPI 2.0.2 as well as 
in /1.10.x, and Intel OmniPath (OPA) network-capable nodes are affected only.


The known workaround is to disable InfiniBand failback by '--mca btl 
^tcp,openib' on nodes with OPA network. (On IB nodes, the same tweak lead to 5% 
performance improvement on single-node jobs; but obviously disabling IB on nodes 
connected via IB is not a solution for multi-node jobs, huh).



On 03/07/17 20:22, Nathan Hjelm wrote:

If this is with 1.10.x or older run with --mca memory_linux_disable 1. There is 
a bad interaction between ptmalloc2 and psm2 support. This problem is not 
present in v2.0.x and newer.

-Nathan


On Mar 7, 2017, at 10:30 AM, Paul Kapinos <kapi...@itc.rwth-aachen.de> wrote:

Hi Dave,



On 03/06/17 18:09, Dave Love wrote:
I've been looking at a new version of an application (cp2k, for for what
it's worth) which is calling mpi_alloc_mem/mpi_free_mem, and I don't


Welcome to the club! :o)
In our measures we see some 70% of time in 'mpi_free_mem'... and 15x 
performance loss if using Open MPI vs. Intel MPI. So it goes.

https://www.mail-archive.com/users@lists.open-mpi.org//msg30593.html



think it did so the previous version I looked at.  I found on an
IB-based system it's spending about half its time in those allocation
routines (according to its own profiling) -- a tad surprising.

It turns out that's due to some pathological interaction with openib,
and just having openib loaded.  It shows up on a single-node run iff I
don't suppress the openib btl, and doesn't for multi-node PSM runs iff I
suppress openib (on a mixed Mellanox/Infinipath system).


we're lucky - our issue is on Intel OmniPath (OPA) network (and we will junk IB 
hardware in near future, I think) - so we disabled the IB transport failback,
--mca btl ^tcp,openib

For single-node jobs this will also help on plain IB nodes, likely. (you can 
disable IB if you do not use it)



Can anyone say why, and whether there's a workaround?  (I can't easily
diagnose what it's up to as ptrace is turned off on the system
concerned, and I can't find anything relevant in archives.)

I had the idea to try libfabric instead for multi-node jobs, and that
doesn't show the pathological behaviour iff openib is suppressed.
However, it requires ompi 1.10, not 1.8, which I was trying to use.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] openib/mpi_alloc_mem pathology

2017-03-07 Thread Paul Kapinos

Hi Dave,


On 03/06/17 18:09, Dave Love wrote:

I've been looking at a new version of an application (cp2k, for for what
it's worth) which is calling mpi_alloc_mem/mpi_free_mem, and I don't


Welcome to the club! :o)
In our measures we see some 70% of time in 'mpi_free_mem'... and 15x performance 
loss if using Open MPI vs. Intel MPI. So it goes.


https://www.mail-archive.com/users@lists.open-mpi.org//msg30593.html



think it did so the previous version I looked at.  I found on an
IB-based system it's spending about half its time in those allocation
routines (according to its own profiling) -- a tad surprising.

It turns out that's due to some pathological interaction with openib,
and just having openib loaded.  It shows up on a single-node run iff I
don't suppress the openib btl, and doesn't for multi-node PSM runs iff I
suppress openib (on a mixed Mellanox/Infinipath system).


we're lucky - our issue is on Intel OmniPath (OPA) network (and we will junk IB 
hardware in near future, I think) - so we disabled the IB transport failback,

--mca btl ^tcp,openib

For single-node jobs this will also help on plain IB nodes, likely. (you can 
disable IB if you do not use it)




Can anyone say why, and whether there's a workaround?  (I can't easily
diagnose what it's up to as ptrace is turned off on the system
concerned, and I can't find anything relevant in archives.)

I had the idea to try libfabric instead for multi-node jobs, and that
doesn't show the pathological behaviour iff openib is suppressed.
However, it requires ompi 1.10, not 1.8, which I was trying to use.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Is building with "--enable-mpi-thread-multiple" recommended?

2017-03-03 Thread Paul Kapinos

Hi Mark,


On 02/18/17 09:14, Mark Dixon wrote:

On Fri, 17 Feb 2017, r...@open-mpi.org wrote:


Depends on the version, but if you are using something in the v2.x range, you
should be okay with just one installed version




How good is MPI_THREAD_MULTIPLE support these days and how far up the wishlist
is it, please?


Note that on 1.10.x series (even on 1.10.6), enabling of MPI_THREAD_MULTIPLE in 
lead to (silent) shutdown of the InfiniBand fabric for that application => SLOW!


2.x versions (tested: 2.0.1) handle MPI_THREAD_MULTIPLE on InfiniBand the right 
way up, however due to absence of memory hooks (= nut aligned memory allocation) 
we get 20% less bandwidth on IB with 2.x versions compared to 1.10.x versions of 
Open MPI (regardless with or without support of MPI_THREAD_MULTIPLE).


On Intel OmniPath network both above issues seem to be not present, but due to a 
performance bug in MPI_Free_mem your application can be horribly slow (seen: 
CP2K) if the InfiniBand failback of OPA not disabled manually, see

https://www.mail-archive.com/users@lists.open-mpi.org//msg30593.html

Best,

Paul Kapinos



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Is building with "--enable-mpi-thread-multiple" recommended?

2017-03-03 Thread Paul Kapinos

Hi,

On 03/03/17 12:41, Mark Dixon wrote:

Your 20% memory bandwidth performance hit on 2.x and the OPA problem are
concerning - will look at that. Are there tickets open for them?


OPA performance issue on CP2K (15x slowdown) :
https://www.mail-archive.com/users@lists.open-mpi.org//msg30593.html
(cf. the thread) workaround is to disable IB failback on OPA,
> --mca btl ^tcp,openib
With this  tweak on OPA, OpenMPI's CP2K is less than 10% slower than Intel MPI's 
(the same result as on InfiniBand) - which is much much better that 1500%, huh. 
However Open MPI's CP2K still stays slower than Intel MPI's due to worse 
MPI_Alltoallv, as far as I understood the profiles.

I will mail to CP2K developers soon...



20% bandwidth with Open MPI 2.x: cf.
https://www.mail-archive.com/devel@lists.open-mpi.org/msg00043.html
- Nathan Hjelm mean the hooks are removed by intention. We have a (nasty) 
workaround, cf.

https://www.mail-archive.com/devel@lists.open-mpi.org/msg00052.html
As far as I can see this issue is on InfiniBand only.


Best

Paul

--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Performance issues: 1.10.x vs 2.x

2017-05-05 Thread Paul Kapinos

On 05/05/17 12:10, marcin.krotkiewski wrote:
> in my case it was enough to allocate my own arrays using posix_memalign.
Be happy. This did not work for Fortran codes..



But since that worked, it means that 1.10.6 deals somehow better with unaligned
data. Anyone knows the reason for this?


In 1.10.x series there were 'memory hooks' - Open MPI did take some care abount 
the alignment. This was removed in 2.x series, cf. the whole thread on my link.






--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] openib/mpi_alloc_mem pathology [#20160912-1315]

2017-10-19 Thread Paul Kapinos
Hi all,
sorry for the long long latency - this message was buried in my mailbox for
months



On 03/16/2017 10:35 AM, Alfio Lazzaro wrote:
> Hello Dave and others,
> we jump in the discussion as CP2K developers.
> We would like to ask you which version of CP2K you are using in your tests 
version 4.1 (release)

> and
> if you can share with us your input file and output log.

The input file is property of Mathias Schumacher (CC:) and we need a permission
of him to provide it.



> Some clarifications on the way we use MPI allocate/free:
> 1) only buffers used for MPI communications are allocated with MPI 
> allocate/free
> 2) in general we use memory pools, therefore we reach a limit in the buffers
> sizes after some iterations, i.e. they are not reallocated anymore
> 3) there are some cases where we don't use memory pools, but their overall
> contribution should be very small. You can run with the CALLGRAPH option
> (https://www.cp2k.org/dev:profiling#the_cp2k_callgraph) to get more insight
> where those allocations/deallocations are.

We ran the data set again with CALLGRAPH option. Please have a look at the
attached files. You see a callgraph file (from rank 0 of 24 used) and some
exported call tree views.

We can see that the *allocate* routines (mp_[de|]allocate_[i|d]) are called 33k
vs. 28k times (multiple this with 24x processes per node). In the 'good case'
(Intel MPI and Open MPI with workaround) these calls are only a fraction of 1%
of time; in 'bad case' (OpenMPI w/o workaround, attached) the two
mp_dealocate_[i|d] calls use 81% of the time in 'Self', huh. That's mainly the
observation we made a long time ago: if in a node with Intel OmniPath fabric the
failback to InfiniBand is not prohibited, the  MPI_Free_mem() take ages.
(I'm not familiar with CCachegrind so forgive me if I'm not true).

Have a nice day,

Paul Kapinos



-- 
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


20171019-callgraph.tar.gz
Description: application/gzip


smime.p7s
Description: S/MIME Cryptographic Signature
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] openib/mpi_alloc_mem pathology [#20160912-1315]

2017-10-20 Thread Paul Kapinos
On 10/20/2017 12:24 PM, Dave Love wrote:
> Paul Kapinos <kapi...@itc.rwth-aachen.de> writes:
> 
>> Hi all,
>> sorry for the long long latency - this message was buried in my mailbox for
>> months
>>
>>
>>
>> On 03/16/2017 10:35 AM, Alfio Lazzaro wrote:
>>> Hello Dave and others,
>>> we jump in the discussion as CP2K developers.
>>> We would like to ask you which version of CP2K you are using in your tests 
>> version 4.1 (release)
>>
>>> and
>>> if you can share with us your input file and output log.
>>
>> The input file is property of Mathias Schumacher (CC:) and we need a 
>> permission
>> of him to provide it.
> 
> I lost track of this, but the problem went away using libfabric instead
> of openib, so I left it at that, though libfabric hurt IMB pingpong
> latency compared with openib.
> 
> I seem to remember there's a workaround in the cp2k development source,
> but that obviously doesn't solve the general problem.
> 

The issue has two facing pages:
- CP2K used a lot of MPI_Alloc_mem() / MPI_Free_mem() calls. This was addressed
by CP2K developers, (private mail by Alfio Lazzaro):
> in the new CP2K release (next week will have version 5.1), I have reduced the
> amount of MPI allocations. I have also added a flag to avoid any MPI
> allocations, that you can add in the CP2K input file:
> 
>  
> use_mpi_allocator F
>  
>  GLOBAL
We will test the new release after it is available. (Note, the user still has to
think on disabling use_mpi_allocator, if in doubt.)



- in Open MPI compiled for both(1) IB and OPA, on a node with OPA, using
*default* configuration (failback to 'openib' *not prohibited*), MPI_Free_mem()
calls suddenly lasts 1 or so times longer, starting to dominate the
application run time.
Known workaround: prohibit the failback to 'openib' BTL by '-mca btl
^tcp,openib' - that's what we implemented.

It's up to Open MPI developers if they would like to follow-up this 'small'
performance issue.

Best,
Paul Kapinos

P.S. It was a hard work even to locate this issue, as only 2 tools (of 7 or 8
tried) were able to point to the evil call...

(1) yes we use the same Open MPI installation on islands with InfiniBand, with
OmniPath, and even on ethernet-only.


-- 
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] NAG Fortran 2018 bindings with Open MPI 4.1.2

2022-01-04 Thread Paul Kapinos via users

Dear Jeff,
I should like to point out that the NAG Fortran compiler is [and likely their 
developers are] the most picky and overly didactic Fortran compiler [developers] 
I know.


(I worked tightly with more than 5 vendors and in dozens of versions, an I 
reported some 200 bugs to the early development stage of Mercurium Fortran 
compiler https://github.com/bsc-pm/mcxx and dozens to Intels 'ifort' - sorry for 
praising myself :-)


In about 5 cases I was hard believing 'that is a bug in the NAG compiler!' 
because they did not compile a code accepted (and often working!) by all other 
compilers - intel, gfortran, Sun/Oracle studio, PGI... Then I tried to open a 
case by NAG (once or two times IIRC), and to read the fg Fortran language 
standard, and in *all* cases - without exception! - the NAGs interpretation of 
the standard was the *right* one. (I cannot state that about gfortran and intel, 
by the way.)


So these guys may be snarky, but they can Fortran, definitely. And if Open MPI 
bindings may be compiled by this compiler - they would be likely very 
standard-conforming.


Have a nice day and a nice year 2022,

Paul Kapinos



On 12/30/21 16:27, Jeff Squyres (jsquyres) via users wrote:

Snarky comments from the NAG tech support people aside, if they could be a 
little more specific about what non-conformant Fortran code they're referring 
to, we'd be happy to work with them to get it fixed.

I'm one of the few people in the Open MPI dev community who has a clue about 
Fortran, and I'm *very far* from being a Fortran expert.  Modern Fortran is a 
legitimately complicated language.  So it doesn't surprise me that we might 
have some code in our configure tests that isn't quite right.

Let's also keep in mind that the state of F2008 support varies widely across 
compilers and versions.  The current Open MPI configure tests straddle the line 
of trying to find *enough* F2008 support in a given compiler to be sufficient 
for the mpi_f08 module without being so overly proscriptive as to disqualify 
compilers that aren't fully F2008-compliant.  Frankly, the state of F2008 
support across the various Fortran compilers was a mess when we wrote those 
configure tests; we had to cobble together a variety of complicated tests to 
figure out if any given compiler supported enough F2008 support for some / all 
of the mpi_f08 module.  That's why the configure tests are... complicated.

--
Jeff Squyres
jsquy...@cisco.com


From: users  on behalf of Matt Thompson via users 

Sent: Thursday, December 23, 2021 11:41 AM
To: Wadud Miah
Cc: Matt Thompson; Open MPI Users
Subject: Re: [OMPI users] NAG Fortran 2018 bindings with Open MPI 4.1.2

I heard back from NAG:

Regarding OpenMPI, we have attempted the build ourselves but cannot make sense of the configure script. Only 
the OpenMPI maintainers can do something about that and it looks like they assume that all compilers will 
just swallow non-conforming Fortran code. The error downgrading options for NAG compiler remain 
"-dusty", "-mismatch" and "-mismatch_all" and none of them seem to help with 
the mpi_f08 module of OpenMPI. If there is a bug in the NAG Fortran Compiler that is responsible for this, we 
would love to hear about it, but at the moment we are not aware of such.

So it might mean the configure script itself might need to be altered to use 
F2008 conforming code?

On Thu, Dec 23, 2021 at 8:31 AM Wadud Miah 
mailto:wmiah...@gmail.com>> wrote:
You can contact NAG support at supp...@nag.co.uk<mailto:supp...@nag.co.uk> but 
they will look into this in the new year.

Regards,

On Thu, 23 Dec 2021, 13:18 Matt Thompson via users, 
mailto:users@lists.open-mpi.org>> wrote:
Oh. Yes, I am on macOS. The Linux cluster I work on doesn't have NAG 7.1 on 
it...mainly because I haven't asked for it. Until NAG fix the bug we are 
seeing, I figured why bother the admins.

Still, it does *seem* like it should work. I might ask NAG support about it.

On Wed, Dec 22, 2021 at 6:28 PM Tom Kacvinsky 
mailto:tkacv...@gmail.com>> wrote:
On Wed, Dec 22, 2021 at 5:45 PM Tom Kacvinsky 
mailto:tkacv...@gmail.com>> wrote:


On Wed, Dec 22, 2021 at 4:11 PM Matt Thompson 
mailto:fort...@gmail.com>> wrote:


All,

When I build Open MPI with NAG, I have to pass in:

   FCFLAGS"=-mismatch_all -fpp"

this flag tells nagfor to downgrade some errors with interfaces to warnings:

-mismatch_all
  Further downgrade consistency checking of procedure argument 
lists so that calls to routines in the same file which are
  incorrect will produce warnings instead of error messages.  
This option disables -C=calls.

The fpp flag is how you tell NAG to do preprocessing (it doesn't automatically 
do it with .F90 files).

I also have to pass in a lot of other flags as seen here:

https://github.com/mathomp4/parcelmodulefiles/blob/main/Compiler/