Re: [OMPI users] Isend, Recv and Test

2016-05-05 Thread Zhen Wang
Jeff,

Thanks.

Best regards,
Zhen

On Thu, May 5, 2016 at 8:45 PM, Jeff Squyres (jsquyres) 
wrote:

> It's taking so long because you are sleeping for .1 second between calling
> MPI_Test().
>
> The TCP transport is only sending a few fragments of your message during
> each iteration through MPI_Test (because, by definition, it has to return
> "immediately").  Other transports do better handing off large messages like
> this to hardware for asynchronous progress.
>
This agrees with what I observed. Larger messages needs more calls of
MPI_Test. What do you mean by other transports?

>
> Additionally, in the upcoming v2.0.0 release is a non-default option to
> enable an asynchronous progress thread for the TCP transport.  We're up to
> v2.0.0rc2; you can give that async TCP support a whirl, if you want.  Pass
> "--mca btl_tcp_progress_thread 1" on the mpirun command line to enable the
> TCP progress thread to try it.
>
Does this mean there's an additional thread to transfer data in background?

>
>
> > On May 4, 2016, at 7:40 PM, Zhen Wang  wrote:
> >
> > Hi,
> >
> > I'm having a problem with Isend, Recv and Test in Linux Mint 16 Petra.
> The source is attached.
> >
> > Open MPI 1.10.2 is configured with
> > ./configure --enable-debug --prefix=/home//Tool/openmpi-1.10.2-debug
> >
> > The source is built with
> > ~/Tool/openmpi-1.10.2-debug/bin/mpiCC a5.cpp
> >
> > and run in one node with
> > ~/Tool/openmpi-1.10.2-debug/bin/mpirun -n 2 ./a.out
> >
> > The output is in the end. What puzzles me is why MPI_Test is called so
> many times, and it takes so long to send a message. Am I doing something
> wrong? I'm simulating a more complicated program: MPI 0 Isends data to MPI
> 1, computes (usleep here), and calls Test to check if data are sent. MPI 1
> Recvs data, and computes.
> >
> > Thanks in advance.
> >
> >
> > Best regards,
> > Zhen
> >
> > MPI 0: Isend of 0 started at 20:32:35.
> > MPI 1: Recv of 0 started at 20:32:35.
> > MPI 0: MPI_Test of 0 at 20:32:35.
> > MPI 0: MPI_Test of 0 at 20:32:35.
> > MPI 0: MPI_Test of 0 at 20:32:35.
> > MPI 0: MPI_Test of 0 at 20:32:35.
> > MPI 0: MPI_Test of 0 at 20:32:35.
> > MPI 0: MPI_Test of 0 at 20:32:35.
> > MPI 0: MPI_Test of 0 at 20:32:36.
> > MPI 0: MPI_Test of 0 at 20:32:36.
> > MPI 0: MPI_Test of 0 at 20:32:36.
> > MPI 0: MPI_Test of 0 at 20:32:36.
> > MPI 0: MPI_Test of 0 at 20:32:36.
> > MPI 0: MPI_Test of 0 at 20:32:36.
> > MPI 0: MPI_Test of 0 at 20:32:36.
> > MPI 0: MPI_Test of 0 at 20:32:36.
> > MPI 0: MPI_Test of 0 at 20:32:36.
> > MPI 0: MPI_Test of 0 at 20:32:37.
> > MPI 0: MPI_Test of 0 at 20:32:37.
> > MPI 0: MPI_Test of 0 at 20:32:37.
> > MPI 0: MPI_Test of 0 at 20:32:37.
> > MPI 0: MPI_Test of 0 at 20:32:37.
> > MPI 0: MPI_Test of 0 at 20:32:37.
> > MPI 0: MPI_Test of 0 at 20:32:37.
> > MPI 0: MPI_Test of 0 at 20:32:37.
> > MPI 0: MPI_Test of 0 at 20:32:37.
> > MPI 0: MPI_Test of 0 at 20:32:37.
> > MPI 0: MPI_Test of 0 at 20:32:38.
> > MPI 0: MPI_Test of 0 at 20:32:38.
> > MPI 0: MPI_Test of 0 at 20:32:38.
> > MPI 0: MPI_Test of 0 at 20:32:38.
> > MPI 0: MPI_Test of 0 at 20:32:38.
> > MPI 0: MPI_Test of 0 at 20:32:38.
> > MPI 0: MPI_Test of 0 at 20:32:38.
> > MPI 0: MPI_Test of 0 at 20:32:38.
> > MPI 0: MPI_Test of 0 at 20:32:38.
> > MPI 0: MPI_Test of 0 at 20:32:38.
> > MPI 0: MPI_Test of 0 at 20:32:39.
> > MPI 0: MPI_Test of 0 at 20:32:39.
> > MPI 0: MPI_Test of 0 at 20:32:39.
> > MPI 0: MPI_Test of 0 at 20:32:39.
> > MPI 0: MPI_Test of 0 at 20:32:39.
> > MPI 0: MPI_Test of 0 at 20:32:39.
> > MPI 1: Recv of 0 finished at 20:32:39.
> > MPI 0: MPI_Test of 0 at 20:32:39.
> > MPI 0: Isend of 0 finished at 20:32:39.
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/05/29085.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/05/29105.php
>


Re: [OMPI users] Isend, Recv and Test

2016-05-05 Thread Jeff Squyres (jsquyres)
It's taking so long because you are sleeping for .1 second between calling 
MPI_Test().

The TCP transport is only sending a few fragments of your message during each 
iteration through MPI_Test (because, by definition, it has to return 
"immediately").  Other transports do better handing off large messages like 
this to hardware for asynchronous progress.

Additionally, in the upcoming v2.0.0 release is a non-default option to enable 
an asynchronous progress thread for the TCP transport.  We're up to v2.0.0rc2; 
you can give that async TCP support a whirl, if you want.  Pass "--mca 
btl_tcp_progress_thread 1" on the mpirun command line to enable the TCP 
progress thread to try it.


> On May 4, 2016, at 7:40 PM, Zhen Wang  wrote:
> 
> Hi,
> 
> I'm having a problem with Isend, Recv and Test in Linux Mint 16 Petra. The 
> source is attached.
> 
> Open MPI 1.10.2 is configured with
> ./configure --enable-debug --prefix=/home//Tool/openmpi-1.10.2-debug
> 
> The source is built with
> ~/Tool/openmpi-1.10.2-debug/bin/mpiCC a5.cpp
> 
> and run in one node with
> ~/Tool/openmpi-1.10.2-debug/bin/mpirun -n 2 ./a.out
> 
> The output is in the end. What puzzles me is why MPI_Test is called so many 
> times, and it takes so long to send a message. Am I doing something wrong? 
> I'm simulating a more complicated program: MPI 0 Isends data to MPI 1, 
> computes (usleep here), and calls Test to check if data are sent. MPI 1 Recvs 
> data, and computes.
> 
> Thanks in advance.
> 
> 
> Best regards,
> Zhen
> 
> MPI 0: Isend of 0 started at 20:32:35.
> MPI 1: Recv of 0 started at 20:32:35.
> MPI 0: MPI_Test of 0 at 20:32:35.
> MPI 0: MPI_Test of 0 at 20:32:35.
> MPI 0: MPI_Test of 0 at 20:32:35.
> MPI 0: MPI_Test of 0 at 20:32:35.
> MPI 0: MPI_Test of 0 at 20:32:35.
> MPI 0: MPI_Test of 0 at 20:32:35.
> MPI 0: MPI_Test of 0 at 20:32:36.
> MPI 0: MPI_Test of 0 at 20:32:36.
> MPI 0: MPI_Test of 0 at 20:32:36.
> MPI 0: MPI_Test of 0 at 20:32:36.
> MPI 0: MPI_Test of 0 at 20:32:36.
> MPI 0: MPI_Test of 0 at 20:32:36.
> MPI 0: MPI_Test of 0 at 20:32:36.
> MPI 0: MPI_Test of 0 at 20:32:36.
> MPI 0: MPI_Test of 0 at 20:32:36.
> MPI 0: MPI_Test of 0 at 20:32:37.
> MPI 0: MPI_Test of 0 at 20:32:37.
> MPI 0: MPI_Test of 0 at 20:32:37.
> MPI 0: MPI_Test of 0 at 20:32:37.
> MPI 0: MPI_Test of 0 at 20:32:37.
> MPI 0: MPI_Test of 0 at 20:32:37.
> MPI 0: MPI_Test of 0 at 20:32:37.
> MPI 0: MPI_Test of 0 at 20:32:37.
> MPI 0: MPI_Test of 0 at 20:32:37.
> MPI 0: MPI_Test of 0 at 20:32:37.
> MPI 0: MPI_Test of 0 at 20:32:38.
> MPI 0: MPI_Test of 0 at 20:32:38.
> MPI 0: MPI_Test of 0 at 20:32:38.
> MPI 0: MPI_Test of 0 at 20:32:38.
> MPI 0: MPI_Test of 0 at 20:32:38.
> MPI 0: MPI_Test of 0 at 20:32:38.
> MPI 0: MPI_Test of 0 at 20:32:38.
> MPI 0: MPI_Test of 0 at 20:32:38.
> MPI 0: MPI_Test of 0 at 20:32:38.
> MPI 0: MPI_Test of 0 at 20:32:38.
> MPI 0: MPI_Test of 0 at 20:32:39.
> MPI 0: MPI_Test of 0 at 20:32:39.
> MPI 0: MPI_Test of 0 at 20:32:39.
> MPI 0: MPI_Test of 0 at 20:32:39.
> MPI 0: MPI_Test of 0 at 20:32:39.
> MPI 0: MPI_Test of 0 at 20:32:39.
> MPI 1: Recv of 0 finished at 20:32:39.
> MPI 0: MPI_Test of 0 at 20:32:39.
> MPI 0: Isend of 0 finished at 20:32:39.
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29085.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] Problems using 1.10.2 with MOFED 3.1-1.1.0.1

2016-05-05 Thread Andy Riebs

  
  
Sorry, my output listing was incomplete -- the program did run after
the "No OpenFabrics" message, but (I presume) ran over Ethernet
rather than InfiniBand. So I can't really say what was causing it to
fail.

Andy

On 05/05/2016 06:09 PM, Nathan Hjelm
  wrote:


  
It should work fine with ob1 (the default). Did you determine what was
causing it to fail?

-Nathan

On Thu, May 05, 2016 at 06:04:55PM -0400, Andy Riebs wrote:

  
   For anyone like me who happens to google this in the future, the solution
   was to set OMPI_MCA_pml=yalla

   Many thanks Josh!

   On 05/05/2016 12:52 PM, Joshua Ladd wrote:

 We are working with Andy offline.

 Josh
 On Thu, May 5, 2016 at 7:32 AM, Andy Riebs  wrote:

   I've built 1.10.2 with all my favorite configuration options, but I
   get messages such as this (one for each rank with
   orte_base_help_aggregate=0) when I try to run on a MOFED system:

   $ shmemrun -H hades02,hades03 $PWD/shmem.out
   --
   No OpenFabrics connection schemes reported that they were able to be
   used on a specific port.  As such, the openib BTL (OpenFabrics
   support) will be disabled for this port.

 Local host:   hades03
 Local device: mlx4_0
 Local port:   2
 CPCs attempted:   rdmacm, udcm
   --

   My configure options:
   config_opts="--prefix=${INSTALL_DIR} \
   --without-mpi-param-check \
   --with-knem=/opt/mellanox/hpcx/knem \
   --with-mxm=/opt/mellanox/mxm  \
   --with-mxm-libdir=/opt/mellanox/mxm/lib \
   --with-fca=/opt/mellanox/fca \
   --with-pmi=${INSTALL_ROOT}/slurm \
   --without-psm --disable-dlopen \
   --disable-vt \
   --enable-orterun-prefix-by-default \
   --enable-debug-symbols"

   There aren't any obvious error messages in the build log -- what am I
   missing?

   Andy

   --
   Andy Riebs
   andy.ri...@hpe.com
   Hewlett-Packard Enterprise
   High Performance Computing Software Engineering
   +1 404 648 9024
   My opinions are not necessarily those of HPE

   ___
   users mailing list
   us...@open-mpi.org
   Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
   Link to this post:
   http://www.open-mpi.org/community/lists/users/2016/05/29094.php

 ___
 users mailing list
 us...@open-mpi.org
 Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
 Link to this post: http://www.open-mpi.org/community/lists/users/2016/05/29100.php

  
  

  
___
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: http://www.open-mpi.org/community/lists/users/2016/05/29101.php

  
  

  
  
  
  ___
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: http://www.open-mpi.org/community/lists/users/2016/05/29102.php


  



Re: [OMPI users] Isend, Recv and Test

2016-05-05 Thread Zhen Wang
2016-05-05 9:27 GMT-05:00 Gilles Gouaillardet :

> Out of curiosity, can you try
> mpirun --mca btl self,sm ...
>
Same as before. Many MPI_Test calls.

> and
> mpirun --mca btl self,vader ...
>
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:  VirtualBox
Framework: btl
Component: vade
--
*** An error occurred in MPI_Init
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  mca_bml_base_open() failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[VirtualBox:2188] Local abort before MPI_INIT completed successfully; not
able to aggregate error messages, and not able to guarantee that all other
processes were killed!
---
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
---
--
mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:

  Process name: [[9235,1],0]
  Exit code:1
--
[VirtualBox:02186] 1 more process has sent help message help-mca-base.txt /
find-available:not-valid
[VirtualBox:02186] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages


>
> and see if one performs better than the other ?
>
> Cheers,
>
> Gilles
>
> On Thursday, May 5, 2016, Zhen Wang  wrote:
>
>> Gilles,
>>
>> Thanks for your reply.
>>
>> Best regards,
>> Zhen
>>
>> On Wed, May 4, 2016 at 8:43 PM, Gilles Gouaillardet <
>> gilles.gouaillar...@gmail.com> wrote:
>>
>>> Note there is no progress thread in openmpi 1.10
>>> from a pragmatic point of view, that means that for "large" messages, no
>>> data is sent in MPI_Isend, and the data is sent when MPI "progresses" e.g.
>>> call a MPI_Test, MPI_Probe, MPI_Recv or some similar subroutine.
>>> in your example, the data is transferred after the first usleep
>>> completes.
>>>
>> I agree.
>>
>>>
>>> that being said, it takes quite a while, and there could be an issue.
>>> what if you use MPI_Send instead () ?
>>>
>> Works as expected.
>>
>> MPI 1: Recv of 0 started at 08:37:10.
>> MPI 1: Recv of 0 finished at 08:37:10.
>> MPI 0: Send of 0 started at 08:37:10.
>> MPI 0: Send of 0 finished at 08:37:10.
>>
>>
>>> what if you send/Recv a large message first (to "warmup" connections),
>>> MPI_Barrier, and then start your MPI_Isend ?
>>>
>> Not working. For what I want to accomplish, is my code the right way to
>> go? Is there an altenative method? Thanks.
>>
>> MPI 1: Recv of 0 started at 08:38:46.
>> MPI 0: Isend of 0 started at 08:38:46.
>> MPI 0: Isend of 1 started at 08:38:46.
>> MPI 0: Isend of 2 started at 08:38:46.
>> MPI 0: Isend of 3 started at 08:38:46.
>> MPI 0: Isend of 4 started at 08:38:46.
>> MPI 0: MPI_Test of 0 at 08:38:46.
>> MPI 0: MPI_Test of 0 at 08:38:46.
>> MPI 0: MPI_Test of 0 at 08:38:46.
>> MPI 0: MPI_Test of 0 at 08:38:46.
>> MPI 0: MPI_Test of 0 at 08:38:46.
>> MPI 0: MPI_Test of 0 at 08:38:46.
>> MPI 0: MPI_Test of 0 at 08:38:46.
>> MPI 0: MPI_Test of 0 at 08:38:47.
>> MPI 0: MPI_Test of 0 at 08:38:47.
>> MPI 0: MPI_Test of 0 at 08:38:47.
>> MPI 0: MPI_Test of 0 at 08:38:47.
>> MPI 0: MPI_Test of 0 at 08:38:47.
>> MPI 0: MPI_Test of 0 at 08:38:47.
>> MPI 0: MPI_Test of 0 at 08:38:47.
>> MPI 0: MPI_Test of 0 at 08:38:47.
>> MPI 0: MPI_Test of 0 at 08:38:47.
>> MPI 0: MPI_Test of 0 at 08:38:47.
>> MPI 0: MPI_Test of 0 at 08:38:48.
>> MPI 0: MPI_Test of 0 at 08:38:48.
>> MPI 0: MPI_Test of 0 at 08:38:48.
>> MPI 0: MPI_Test of 0 at 08:38:48.
>> MPI 0: MPI_Test of 0 at 08:38:48.
>> MPI 0: MPI_Test of 0 at 08:38:48.
>> MPI 0: MPI_Test of 0 at 08:38:48.
>> MPI 0: MPI_Test of 0 at 08:38:48.
>> MPI 0: MPI_Test of 0 at 08:38:48.
>> MPI 0: MPI_Test of 0 at 08:38:48.
>> MPI 0: MPI_Test of 0 at 08:38:49.
>> 

Re: [OMPI users] Problems using 1.10.2 with MOFED 3.1-1.1.0.1

2016-05-05 Thread Nathan Hjelm

It should work fine with ob1 (the default). Did you determine what was
causing it to fail?

-Nathan

On Thu, May 05, 2016 at 06:04:55PM -0400, Andy Riebs wrote:
>For anyone like me who happens to google this in the future, the solution
>was to set OMPI_MCA_pml=yalla
> 
>Many thanks Josh!
> 
>On 05/05/2016 12:52 PM, Joshua Ladd wrote:
> 
>  We are working with Andy offline.
> 
>  Josh
>  On Thu, May 5, 2016 at 7:32 AM, Andy Riebs  wrote:
> 
>I've built 1.10.2 with all my favorite configuration options, but I
>get messages such as this (one for each rank with
>orte_base_help_aggregate=0) when I try to run on a MOFED system:
> 
>$ shmemrun -H hades02,hades03 $PWD/shmem.out
>
> --
>No OpenFabrics connection schemes reported that they were able to be
>used on a specific port.  As such, the openib BTL (OpenFabrics
>support) will be disabled for this port.
> 
>  Local host:   hades03
>  Local device: mlx4_0
>  Local port:   2
>  CPCs attempted:   rdmacm, udcm
>
> --
> 
>My configure options:
>config_opts="--prefix=${INSTALL_DIR} \
>--without-mpi-param-check \
>--with-knem=/opt/mellanox/hpcx/knem \
>--with-mxm=/opt/mellanox/mxm  \
>--with-mxm-libdir=/opt/mellanox/mxm/lib \
>--with-fca=/opt/mellanox/fca \
>--with-pmi=${INSTALL_ROOT}/slurm \
>--without-psm --disable-dlopen \
>--disable-vt \
>--enable-orterun-prefix-by-default \
>--enable-debug-symbols"
> 
>There aren't any obvious error messages in the build log -- what am I
>missing?
> 
>Andy
> 
>--
>Andy Riebs
>andy.ri...@hpe.com
>Hewlett-Packard Enterprise
>High Performance Computing Software Engineering
>+1 404 648 9024
>My opinions are not necessarily those of HPE
> 
>___
>users mailing list
>us...@open-mpi.org
>Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>Link to this post:
>http://www.open-mpi.org/community/lists/users/2016/05/29094.php
> 
>  ___
>  users mailing list
>  us...@open-mpi.org
>  Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>  Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29100.php

> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29101.php



pgppkcMHX75O6.pgp
Description: PGP signature


Re: [OMPI users] Problems using 1.10.2 with MOFED 3.1-1.1.0.1

2016-05-05 Thread Andy Riebs

  
  
For anyone like me who happens to google this in the future, the
solution was to set OMPI_MCA_pml=yalla

Many thanks Josh!

On 05/05/2016 12:52 PM, Joshua Ladd
  wrote:


  
  
We are working with Andy offline.
  

Josh
  
  
On Thu, May 5, 2016 at 7:32 AM, Andy
  Riebs 
  wrote:
  I've built
1.10.2 with all my favorite configuration options, but I get
messages such as this (one for each rank with
orte_base_help_aggregate=0) when I try to run on a MOFED
system:

$ shmemrun -H hades02,hades03 $PWD/shmem.out
--
No OpenFabrics connection schemes reported that they were
able to be
used on a specific port.  As such, the openib BTL
(OpenFabrics
support) will be disabled for this port.

  Local host:           hades03
  Local device:         mlx4_0
  Local port:           2
  CPCs attempted:       rdmacm, udcm
--

My configure options:
config_opts="--prefix=${INSTALL_DIR} \
        --without-mpi-param-check \
        --with-knem=/opt/mellanox/hpcx/knem \
        --with-mxm=/opt/mellanox/mxm  \
        --with-mxm-libdir=/opt/mellanox/mxm/lib \
        --with-fca=/opt/mellanox/fca \
        --with-pmi=${INSTALL_ROOT}/slurm \
        --without-psm --disable-dlopen \
        --disable-vt \
        --enable-orterun-prefix-by-default \
        --enable-debug-symbols"


There aren't any obvious error messages in the build log --
what am I missing?

Andy

-- 
Andy Riebs
andy.ri...@hpe.com
Hewlett-Packard Enterprise
High Performance Computing Software Engineering
+1 404 648 9024
My opinions are not necessarily those of HPE

___
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: http://www.open-mpi.org/community/lists/users/2016/05/29094.php
  


  
  
  
  
  ___
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: http://www.open-mpi.org/community/lists/users/2016/05/29100.php


  



Re: [OMPI users] Problems using 1.10.2 with MOFED 3.1-1.1.0.1

2016-05-05 Thread Joshua Ladd
We are working with Andy offline.

Josh

On Thu, May 5, 2016 at 7:32 AM, Andy Riebs  wrote:

> I've built 1.10.2 with all my favorite configuration options, but I get
> messages such as this (one for each rank with orte_base_help_aggregate=0)
> when I try to run on a MOFED system:
>
> $ shmemrun -H hades02,hades03 $PWD/shmem.out
> --
> No OpenFabrics connection schemes reported that they were able to be
> used on a specific port.  As such, the openib BTL (OpenFabrics
> support) will be disabled for this port.
>
>   Local host:   hades03
>   Local device: mlx4_0
>   Local port:   2
>   CPCs attempted:   rdmacm, udcm
> --
>
> My configure options:
> config_opts="--prefix=${INSTALL_DIR} \
> --without-mpi-param-check \
> --with-knem=/opt/mellanox/hpcx/knem \
> --with-mxm=/opt/mellanox/mxm  \
> --with-mxm-libdir=/opt/mellanox/mxm/lib \
> --with-fca=/opt/mellanox/fca \
> --with-pmi=${INSTALL_ROOT}/slurm \
> --without-psm --disable-dlopen \
> --disable-vt \
> --enable-orterun-prefix-by-default \
> --enable-debug-symbols"
>
>
> There aren't any obvious error messages in the build log -- what am I
> missing?
>
> Andy
>
> --
> Andy Riebs
> andy.ri...@hpe.com
> Hewlett-Packard Enterprise
> High Performance Computing Software Engineering
> +1 404 648 9024
> My opinions are not necessarily those of HPE
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/05/29094.php
>


Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-05 Thread Jeff Squyres (jsquyres)
Giacomo --

Are you able to run anything that is compiled by that Intel compiler 
installation?



> On May 5, 2016, at 12:02 PM, Gus Correa  wrote:
> 
> Hi Giacomo
> 
> Some programs fail with segmentation fault
> because the stack size is too small.
> [But others because of bugs in memory allocation/management, etc.]
> 
> Have you tried
> 
> ulimit -s unlimited
> 
> before you run the program?
> 
> Are you using a single machine or a cluster?
> If you're using infiniband you may need also to make the locked memory 
> unlimited:
> 
> ulimit -l unlimited
> 
> I hope this helps,
> Gus Correa
> 
> On 05/05/2016 05:15 AM, Giacomo Rossi wrote:
>>  gdb /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90
>> GNU gdb (GDB) 7.11
>> Copyright (C) 2016 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later
>> 
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-pc-linux-gnu".
>> Type "show configuration" for configuration details.
>> For bug reporting instructions, please see:
>> .
>> Find the GDB manual and other documentation resources online at:
>> .
>> For help, type "help".
>> Type "apropos word" to search for commands related to "word"...
>> Reading symbols from /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90...(no
>> debugging symbols found)...done.
>> (gdb) r -v
>> Starting program: /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90 -v
>> 
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x76858f38 in ?? ()
>> (gdb) bt
>> #0  0x76858f38 in ?? ()
>> #1  0x77de5828 in _dl_relocate_object () from
>> /lib64/ld-linux-x86-64.so.2
>> #2  0x77ddcfa3 in dl_main () from /lib64/ld-linux-x86-64.so.2
>> #3  0x77df029c in _dl_sysdep_start () from
>> /lib64/ld-linux-x86-64.so.2
>> #4  0x774a in _dl_start () from /lib64/ld-linux-x86-64.so.2
>> #5  0x77dd9d98 in _start () from /lib64/ld-linux-x86-64.so.2
>> #6  0x0002 in ?? ()
>> #7  0x7fffaa8a in ?? ()
>> #8  0x7fffaab6 in ?? ()
>> #9  0x in ?? ()
>> 
>>Giacomo Rossi Ph.D., Space Engineer
>> 
>>Research Fellow at Dept. of Mechanical and Aerospace Engineering,
>>"Sapienza" University of Rome
>>*p: *(+39) 0692927207 | *m**: *(+39) 3408816643 | *e:
>>*giacom...@gmail.com 
>>
>>Member of Fortran-FOSS-programmers
>>
>> 
>> 
>> 2016-05-05 10:44 GMT+02:00 Giacomo Rossi > >:
>> 
>>Here the result of ldd command:
>>'ldd /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90
>>linux-vdso.so.1 (0x7ffcacbbe000)
>>libopen-pal.so.13 =>
>>/opt/openmpi/1.10.2/intel/16.0.3/lib/libopen-pal.so.13
>>(0x7fa9597a9000)
>>libm.so.6 => /usr/lib/libm.so.6 (0x7fa9594a4000)
>>libpciaccess.so.0 => /usr/lib/libpciaccess.so.0 (0x7fa95929a000)
>>libdl.so.2 => /usr/lib/libdl.so.2 (0x7fa959096000)
>>librt.so.1 => /usr/lib/librt.so.1 (0x7fa958e8e000)
>>libutil.so.1 => /usr/lib/libutil.so.1 (0x7fa958c8b000)
>>libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x7fa958a75000)
>>libpthread.so.0 => /usr/lib/libpthread.so.0 (0x7fa958858000)
>>libc.so.6 => /usr/lib/libc.so.6 (0x7fa9584b7000)
>>libimf.so =>
>>
>> /home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libimf.so
>>(0x7fa957fb9000)
>>libsvml.so =>
>>
>> /home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libsvml.so
>>(0x7fa9570ad000)
>>libirng.so =>
>>
>> /home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libirng.so
>>(0x7fa956d3b000)
>>libintlc.so.5 =>
>>
>> /home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libintlc.so.5
>>(0x7fa956acf000)
>>/lib64/ld-linux-x86-64.so.2 (0x7fa959ab9000)'
>> 
>>I can't provide a core file, because I can't compile or launch any
>>program with mpifort... I've always the error 'core dumped' also
>>when I try to compile a program with mpifort, and of course there
>>isn't any core file.
>> 
>> 
>>Giacomo Rossi Ph.D., Space Engineer
>> 
>>Research Fellow at Dept. of Mechanical and Aerospace
>>Engineering, "Sapienza" University of Rome
>>*p: *(+39) 0692927207  | *m**:
>>*(+39) 3408816643  | *e:
>>*giacom...@gmail.com 
>>
>>Member of Fortran-FOSS-programmers

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-05 Thread Gus Correa

Hi Giacomo

Some programs fail with segmentation fault
because the stack size is too small.
[But others because of bugs in memory allocation/management, etc.]

Have you tried

ulimit -s unlimited

before you run the program?

Are you using a single machine or a cluster?
If you're using infiniband you may need also to make the locked memory 
unlimited:


ulimit -l unlimited

I hope this helps,
Gus Correa

On 05/05/2016 05:15 AM, Giacomo Rossi wrote:

  gdb /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90
GNU gdb (GDB) 7.11
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90...(no
debugging symbols found)...done.
(gdb) r -v
Starting program: /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90 -v

Program received signal SIGSEGV, Segmentation fault.
0x76858f38 in ?? ()
(gdb) bt
#0  0x76858f38 in ?? ()
#1  0x77de5828 in _dl_relocate_object () from
/lib64/ld-linux-x86-64.so.2
#2  0x77ddcfa3 in dl_main () from /lib64/ld-linux-x86-64.so.2
#3  0x77df029c in _dl_sysdep_start () from
/lib64/ld-linux-x86-64.so.2
#4  0x774a in _dl_start () from /lib64/ld-linux-x86-64.so.2
#5  0x77dd9d98 in _start () from /lib64/ld-linux-x86-64.so.2
#6  0x0002 in ?? ()
#7  0x7fffaa8a in ?? ()
#8  0x7fffaab6 in ?? ()
#9  0x in ?? ()

Giacomo Rossi Ph.D., Space Engineer

Research Fellow at Dept. of Mechanical and Aerospace Engineering,
"Sapienza" University of Rome
*p: *(+39) 0692927207 | *m**: *(+39) 3408816643 | *e:
*giacom...@gmail.com 

Member of Fortran-FOSS-programmers



2016-05-05 10:44 GMT+02:00 Giacomo Rossi >:

Here the result of ldd command:
'ldd /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90
linux-vdso.so.1 (0x7ffcacbbe000)
libopen-pal.so.13 =>
/opt/openmpi/1.10.2/intel/16.0.3/lib/libopen-pal.so.13
(0x7fa9597a9000)
libm.so.6 => /usr/lib/libm.so.6 (0x7fa9594a4000)
libpciaccess.so.0 => /usr/lib/libpciaccess.so.0 (0x7fa95929a000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x7fa959096000)
librt.so.1 => /usr/lib/librt.so.1 (0x7fa958e8e000)
libutil.so.1 => /usr/lib/libutil.so.1 (0x7fa958c8b000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x7fa958a75000)
libpthread.so.0 => /usr/lib/libpthread.so.0 (0x7fa958858000)
libc.so.6 => /usr/lib/libc.so.6 (0x7fa9584b7000)
libimf.so =>

/home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libimf.so
(0x7fa957fb9000)
libsvml.so =>

/home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libsvml.so
(0x7fa9570ad000)
libirng.so =>

/home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libirng.so
(0x7fa956d3b000)
libintlc.so.5 =>

/home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libintlc.so.5
(0x7fa956acf000)
/lib64/ld-linux-x86-64.so.2 (0x7fa959ab9000)'

I can't provide a core file, because I can't compile or launch any
program with mpifort... I've always the error 'core dumped' also
when I try to compile a program with mpifort, and of course there
isn't any core file.


Giacomo Rossi Ph.D., Space Engineer

Research Fellow at Dept. of Mechanical and Aerospace
Engineering, "Sapienza" University of Rome
*p: *(+39) 0692927207  | *m**:
*(+39) 3408816643  | *e:
*giacom...@gmail.com 

Member of Fortran-FOSS-programmers



2016-05-05 8:50 GMT+02:00 Giacomo Rossi >:

I’ve installed the latest version of Intel Parallel Studio
(16.0.3), then I’ve downloaded the latest version of openmpi
(1.10.2) and I’ve compiled it with

`./configure CC=icc CXX=icpc F77=ifort FC=ifort
--prefix=/opt/openmpi/1.10.2/intel/16.0.3`

then I've 

Re: [OMPI users] Isend, Recv and Test

2016-05-05 Thread Gilles Gouaillardet
Out of curiosity, can you try
mpirun --mca btl self,sm ...
and
mpirun --mca btl self,vader ...

and see if one performs better than the other ?

Cheers,

Gilles

On Thursday, May 5, 2016, Zhen Wang  wrote:

> Gilles,
>
> Thanks for your reply.
>
> Best regards,
> Zhen
>
> On Wed, May 4, 2016 at 8:43 PM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com
> > wrote:
>
>> Note there is no progress thread in openmpi 1.10
>> from a pragmatic point of view, that means that for "large" messages, no
>> data is sent in MPI_Isend, and the data is sent when MPI "progresses" e.g.
>> call a MPI_Test, MPI_Probe, MPI_Recv or some similar subroutine.
>> in your example, the data is transferred after the first usleep completes.
>>
> I agree.
>
>>
>> that being said, it takes quite a while, and there could be an issue.
>> what if you use MPI_Send instead () ?
>>
> Works as expected.
>
> MPI 1: Recv of 0 started at 08:37:10.
> MPI 1: Recv of 0 finished at 08:37:10.
> MPI 0: Send of 0 started at 08:37:10.
> MPI 0: Send of 0 finished at 08:37:10.
>
>
>> what if you send/Recv a large message first (to "warmup" connections),
>> MPI_Barrier, and then start your MPI_Isend ?
>>
> Not working. For what I want to accomplish, is my code the right way to
> go? Is there an altenative method? Thanks.
>
> MPI 1: Recv of 0 started at 08:38:46.
> MPI 0: Isend of 0 started at 08:38:46.
> MPI 0: Isend of 1 started at 08:38:46.
> MPI 0: Isend of 2 started at 08:38:46.
> MPI 0: Isend of 3 started at 08:38:46.
> MPI 0: Isend of 4 started at 08:38:46.
> MPI 0: MPI_Test of 0 at 08:38:46.
> MPI 0: MPI_Test of 0 at 08:38:46.
> MPI 0: MPI_Test of 0 at 08:38:46.
> MPI 0: MPI_Test of 0 at 08:38:46.
> MPI 0: MPI_Test of 0 at 08:38:46.
> MPI 0: MPI_Test of 0 at 08:38:46.
> MPI 0: MPI_Test of 0 at 08:38:46.
> MPI 0: MPI_Test of 0 at 08:38:47.
> MPI 0: MPI_Test of 0 at 08:38:47.
> MPI 0: MPI_Test of 0 at 08:38:47.
> MPI 0: MPI_Test of 0 at 08:38:47.
> MPI 0: MPI_Test of 0 at 08:38:47.
> MPI 0: MPI_Test of 0 at 08:38:47.
> MPI 0: MPI_Test of 0 at 08:38:47.
> MPI 0: MPI_Test of 0 at 08:38:47.
> MPI 0: MPI_Test of 0 at 08:38:47.
> MPI 0: MPI_Test of 0 at 08:38:47.
> MPI 0: MPI_Test of 0 at 08:38:48.
> MPI 0: MPI_Test of 0 at 08:38:48.
> MPI 0: MPI_Test of 0 at 08:38:48.
> MPI 0: MPI_Test of 0 at 08:38:48.
> MPI 0: MPI_Test of 0 at 08:38:48.
> MPI 0: MPI_Test of 0 at 08:38:48.
> MPI 0: MPI_Test of 0 at 08:38:48.
> MPI 0: MPI_Test of 0 at 08:38:48.
> MPI 0: MPI_Test of 0 at 08:38:48.
> MPI 0: MPI_Test of 0 at 08:38:48.
> MPI 0: MPI_Test of 0 at 08:38:49.
> MPI 0: MPI_Test of 0 at 08:38:49.
> MPI 0: MPI_Test of 0 at 08:38:49.
> MPI 0: MPI_Test of 0 at 08:38:49.
> MPI 0: MPI_Test of 0 at 08:38:49.
> MPI 0: MPI_Test of 0 at 08:38:49.
> MPI 0: MPI_Test of 0 at 08:38:49.
> MPI 0: MPI_Test of 0 at 08:38:49.
> MPI 0: MPI_Test of 0 at 08:38:49.
> MPI 0: MPI_Test of 0 at 08:38:49.
> MPI 0: MPI_Test of 0 at 08:38:50.
> MPI 0: MPI_Test of 0 at 08:38:50.
> MPI 0: MPI_Test of 0 at 08:38:50.
> MPI 0: MPI_Test of 0 at 08:38:50.
> MPI 1: Recv of 0 finished at 08:38:50.
> MPI 1: Recv of 1 started at 08:38:50.
> MPI 0: MPI_Test of 0 at 08:38:50.
> MPI 0: Isend of 0 finished at 08:38:50.
> MPI 0: MPI_Test of 1 at 08:38:50.
> MPI 0: MPI_Test of 1 at 08:38:50.
> MPI 0: MPI_Test of 1 at 08:38:50.
> MPI 0: MPI_Test of 1 at 08:38:50.
> MPI 0: MPI_Test of 1 at 08:38:50.
> MPI 0: MPI_Test of 1 at 08:38:51.
> MPI 0: MPI_Test of 1 at 08:38:51.
> MPI 0: MPI_Test of 1 at 08:38:51.
> MPI 0: MPI_Test of 1 at 08:38:51.
> MPI 0: MPI_Test of 1 at 08:38:51.
> MPI 0: MPI_Test of 1 at 08:38:51.
> MPI 0: MPI_Test of 1 at 08:38:51.
> MPI 0: MPI_Test of 1 at 08:38:51.
> MPI 0: MPI_Test of 1 at 08:38:51.
> MPI 0: MPI_Test of 1 at 08:38:51.
> MPI 0: MPI_Test of 1 at 08:38:52.
> MPI 0: MPI_Test of 1 at 08:38:52.
> MPI 0: MPI_Test of 1 at 08:38:52.
> MPI 0: MPI_Test of 1 at 08:38:52.
> MPI 0: MPI_Test of 1 at 08:38:52.
> MPI 0: MPI_Test of 1 at 08:38:52.
> MPI 0: MPI_Test of 1 at 08:38:52.
> MPI 0: MPI_Test of 1 at 08:38:52.
> MPI 0: MPI_Test of 1 at 08:38:52.
> MPI 0: MPI_Test of 1 at 08:38:52.
> MPI 0: MPI_Test of 1 at 08:38:53.
> MPI 0: MPI_Test of 1 at 08:38:53.
> MPI 0: MPI_Test of 1 at 08:38:53.
> MPI 0: MPI_Test of 1 at 08:38:53.
> MPI 0: MPI_Test of 1 at 08:38:53.
> MPI 0: MPI_Test of 1 at 08:38:53.
> MPI 0: MPI_Test of 1 at 08:38:53.
> MPI 0: MPI_Test of 1 at 08:38:53.
> MPI 0: MPI_Test of 1 at 08:38:53.
> MPI 0: MPI_Test of 1 at 08:38:53.
> MPI 0: MPI_Test of 1 at 08:38:54.
> MPI 0: MPI_Test of 1 at 08:38:54.
> MPI 0: MPI_Test of 1 at 08:38:54.
> MPI 0: MPI_Test of 1 at 08:38:54.
> MPI 0: MPI_Test of 1 at 08:38:54.
> MPI 1: Recv of 1 finished at 08:38:54.
> MPI 1: Recv of 2 started at 08:38:54.
> MPI 0: MPI_Test of 1 at 08:38:54.
> MPI 0: Isend of 1 finished at 08:38:54.
> MPI 0: MPI_Test of 2 at 08:38:54.
> MPI 0: MPI_Test of 2 at 08:38:54.
> MPI 0: MPI_Test of 2 at 08:38:54.
> MPI 0: MPI_Test of 2 at 08:38:55.

Re: [OMPI users] [open-mpi/ompi] COMM_SPAWN broken on Solaris/v1.10 (#1569)

2016-05-05 Thread Siegmar Gross

Hi Gilles,

is the following output helpful to find the error? I've put
another output below the output from gdb, which shows that
things are a little bit "random" if I use only 3+2 or 4+1
Sparc machines.


tyr spawn 127 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
GNU gdb (GDB) 7.6.1
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.10".
For bug reporting instructions, please see:
...
Reading symbols from 
/export2/prog/SunOS_sparc/openmpi-1.10.3_64_cc/bin/orterun...done.
(gdb) set args -np 1 --host tyr,sunpc1,linpc1,ruester spawn_multiple_master
(gdb) run
Starting program: /usr/local/openmpi-1.10.3_64_cc/bin/mpiexec -np 1 --host 
tyr,sunpc1,linpc1,ruester spawn_multiple_master
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP2]

Parent process 0 running on tyr.informatik.hs-fulda.de
  I create 3 slave processes.

Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *) (proc_pointer))->obj_magic_id, file ../../openmpi-v1.10.2-163-g42da15d/ompi/group/group_init.c, line 
215, function ompi_group_increment_proc_count

[ruester:17809] *** Process received signal ***
[ruester:17809] Signal: Abort (6)
[ruester:17809] Signal code:  (-1)
/usr/local/openmpi-1.10.3_64_cc/lib64/libopen-pal.so.13.0.2:opal_backtrace_print+0x1c
/usr/local/openmpi-1.10.3_64_cc/lib64/libopen-pal.so.13.0.2:0x1b10f0
/lib/sparcv9/libc.so.1:0xd8c28
/lib/sparcv9/libc.so.1:0xcc79c
/lib/sparcv9/libc.so.1:0xcc9a8
/lib/sparcv9/libc.so.1:__lwp_kill+0x8 [ Signal 2091943080 (?)]
/lib/sparcv9/libc.so.1:abort+0xd0
/lib/sparcv9/libc.so.1:_assert_c99+0x78
/usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12.0.3:ompi_group_increment_proc_count+0x10c
/usr/local/openmpi-1.10.3_64_cc/lib64/openmpi/mca_dpm_orte.so:0xe758
/usr/local/openmpi-1.10.3_64_cc/lib64/openmpi/mca_dpm_orte.so:0x113d4
/usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12.0.3:ompi_mpi_init+0x188c
/usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12.0.3:MPI_Init+0x26c
/home/fd1026/SunOS/sparc/bin/spawn_slave:main+0x18
/home/fd1026/SunOS/sparc/bin/spawn_slave:_start+0x108
[ruester:17809] *** End of error message ***
--
mpiexec noticed that process rank 2 with PID 0 on node ruester exited on signal 
6 (Abort).
--
[LWP2 exited]
[New Thread 2]
[Switching to Thread 1 (LWP 1)]
sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to satisfy 
query
(gdb) bt
#0  0x7f6173d0 in rtld_db_dlactivity () from /usr/lib/sparcv9/ld.so.1
#1  0x7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1
#2  0x7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1
#3  0x7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1
#4  0x7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1
#5  0x7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1
#6  0x7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1
#7  0x7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1
#8  0x7e5f9718 in dlopen_close (handle=0x100)
at 
../../../../../openmpi-v1.10.2-163-g42da15d/opal/mca/dl/dlopen/dl_dlopen_module.c:144
#9  0x7e5f364c in opal_dl_close (handle=0xff7d700200ff)
at 
../../../../openmpi-v1.10.2-163-g42da15d/opal/mca/dl/base/dl_base_fns.c:53
#10 0x7e546714 in ri_destructor (obj=0x1200)
at 
../../../../openmpi-v1.10.2-163-g42da15d/opal/mca/base/mca_base_component_repository.c:357
#11 0x7e543840 in opal_obj_run_destructors (object=0xff7f607a6cff)
at ../../../../openmpi-v1.10.2-163-g42da15d/opal/class/opal_object.h:451
#12 0x7e545f54 in mca_base_component_repository_release 
(component=0xff7c801df0ff)
at 
../../../../openmpi-v1.10.2-163-g42da15d/opal/mca/base/mca_base_component_repository.c:223
#13 0x7e54d0d8 in mca_base_component_unload 
(component=0xff7d3000, output_id=-1610596097)
at 
../../../../openmpi-v1.10.2-163-g42da15d/opal/mca/base/mca_base_components_close.c:47
#14 0x7e54d17c in mca_base_component_close (component=0x100, 
output_id=-1878702080)
at 
../../../../openmpi-v1.10.2-163-g42da15d/opal/mca/base/mca_base_components_close.c:60
#15 0x7e54d28c in mca_base_components_close (output_id=1942099968, 
components=0xff,
skip=0xff7f61c5a800)
at 
../../../../openmpi-v1.10.2-163-g42da15d/opal/mca/base/mca_base_components_close.c:86
#16 0x7e54d1cc in mca_base_framework_components_close 
(framework=0x100ff, skip=0x10018ebb000)
at 

Re: [OMPI users] Isend, Recv and Test

2016-05-05 Thread Zhen Wang
Gilles,

Thanks for your reply.

Best regards,
Zhen

On Wed, May 4, 2016 at 8:43 PM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Note there is no progress thread in openmpi 1.10
> from a pragmatic point of view, that means that for "large" messages, no
> data is sent in MPI_Isend, and the data is sent when MPI "progresses" e.g.
> call a MPI_Test, MPI_Probe, MPI_Recv or some similar subroutine.
> in your example, the data is transferred after the first usleep completes.
>
I agree.

>
> that being said, it takes quite a while, and there could be an issue.
> what if you use MPI_Send instead () ?
>
Works as expected.

MPI 1: Recv of 0 started at 08:37:10.
MPI 1: Recv of 0 finished at 08:37:10.
MPI 0: Send of 0 started at 08:37:10.
MPI 0: Send of 0 finished at 08:37:10.


> what if you send/Recv a large message first (to "warmup" connections),
> MPI_Barrier, and then start your MPI_Isend ?
>
Not working. For what I want to accomplish, is my code the right way to go?
Is there an altenative method? Thanks.

MPI 1: Recv of 0 started at 08:38:46.
MPI 0: Isend of 0 started at 08:38:46.
MPI 0: Isend of 1 started at 08:38:46.
MPI 0: Isend of 2 started at 08:38:46.
MPI 0: Isend of 3 started at 08:38:46.
MPI 0: Isend of 4 started at 08:38:46.
MPI 0: MPI_Test of 0 at 08:38:46.
MPI 0: MPI_Test of 0 at 08:38:46.
MPI 0: MPI_Test of 0 at 08:38:46.
MPI 0: MPI_Test of 0 at 08:38:46.
MPI 0: MPI_Test of 0 at 08:38:46.
MPI 0: MPI_Test of 0 at 08:38:46.
MPI 0: MPI_Test of 0 at 08:38:46.
MPI 0: MPI_Test of 0 at 08:38:47.
MPI 0: MPI_Test of 0 at 08:38:47.
MPI 0: MPI_Test of 0 at 08:38:47.
MPI 0: MPI_Test of 0 at 08:38:47.
MPI 0: MPI_Test of 0 at 08:38:47.
MPI 0: MPI_Test of 0 at 08:38:47.
MPI 0: MPI_Test of 0 at 08:38:47.
MPI 0: MPI_Test of 0 at 08:38:47.
MPI 0: MPI_Test of 0 at 08:38:47.
MPI 0: MPI_Test of 0 at 08:38:47.
MPI 0: MPI_Test of 0 at 08:38:48.
MPI 0: MPI_Test of 0 at 08:38:48.
MPI 0: MPI_Test of 0 at 08:38:48.
MPI 0: MPI_Test of 0 at 08:38:48.
MPI 0: MPI_Test of 0 at 08:38:48.
MPI 0: MPI_Test of 0 at 08:38:48.
MPI 0: MPI_Test of 0 at 08:38:48.
MPI 0: MPI_Test of 0 at 08:38:48.
MPI 0: MPI_Test of 0 at 08:38:48.
MPI 0: MPI_Test of 0 at 08:38:48.
MPI 0: MPI_Test of 0 at 08:38:49.
MPI 0: MPI_Test of 0 at 08:38:49.
MPI 0: MPI_Test of 0 at 08:38:49.
MPI 0: MPI_Test of 0 at 08:38:49.
MPI 0: MPI_Test of 0 at 08:38:49.
MPI 0: MPI_Test of 0 at 08:38:49.
MPI 0: MPI_Test of 0 at 08:38:49.
MPI 0: MPI_Test of 0 at 08:38:49.
MPI 0: MPI_Test of 0 at 08:38:49.
MPI 0: MPI_Test of 0 at 08:38:49.
MPI 0: MPI_Test of 0 at 08:38:50.
MPI 0: MPI_Test of 0 at 08:38:50.
MPI 0: MPI_Test of 0 at 08:38:50.
MPI 0: MPI_Test of 0 at 08:38:50.
MPI 1: Recv of 0 finished at 08:38:50.
MPI 1: Recv of 1 started at 08:38:50.
MPI 0: MPI_Test of 0 at 08:38:50.
MPI 0: Isend of 0 finished at 08:38:50.
MPI 0: MPI_Test of 1 at 08:38:50.
MPI 0: MPI_Test of 1 at 08:38:50.
MPI 0: MPI_Test of 1 at 08:38:50.
MPI 0: MPI_Test of 1 at 08:38:50.
MPI 0: MPI_Test of 1 at 08:38:50.
MPI 0: MPI_Test of 1 at 08:38:51.
MPI 0: MPI_Test of 1 at 08:38:51.
MPI 0: MPI_Test of 1 at 08:38:51.
MPI 0: MPI_Test of 1 at 08:38:51.
MPI 0: MPI_Test of 1 at 08:38:51.
MPI 0: MPI_Test of 1 at 08:38:51.
MPI 0: MPI_Test of 1 at 08:38:51.
MPI 0: MPI_Test of 1 at 08:38:51.
MPI 0: MPI_Test of 1 at 08:38:51.
MPI 0: MPI_Test of 1 at 08:38:51.
MPI 0: MPI_Test of 1 at 08:38:52.
MPI 0: MPI_Test of 1 at 08:38:52.
MPI 0: MPI_Test of 1 at 08:38:52.
MPI 0: MPI_Test of 1 at 08:38:52.
MPI 0: MPI_Test of 1 at 08:38:52.
MPI 0: MPI_Test of 1 at 08:38:52.
MPI 0: MPI_Test of 1 at 08:38:52.
MPI 0: MPI_Test of 1 at 08:38:52.
MPI 0: MPI_Test of 1 at 08:38:52.
MPI 0: MPI_Test of 1 at 08:38:52.
MPI 0: MPI_Test of 1 at 08:38:53.
MPI 0: MPI_Test of 1 at 08:38:53.
MPI 0: MPI_Test of 1 at 08:38:53.
MPI 0: MPI_Test of 1 at 08:38:53.
MPI 0: MPI_Test of 1 at 08:38:53.
MPI 0: MPI_Test of 1 at 08:38:53.
MPI 0: MPI_Test of 1 at 08:38:53.
MPI 0: MPI_Test of 1 at 08:38:53.
MPI 0: MPI_Test of 1 at 08:38:53.
MPI 0: MPI_Test of 1 at 08:38:53.
MPI 0: MPI_Test of 1 at 08:38:54.
MPI 0: MPI_Test of 1 at 08:38:54.
MPI 0: MPI_Test of 1 at 08:38:54.
MPI 0: MPI_Test of 1 at 08:38:54.
MPI 0: MPI_Test of 1 at 08:38:54.
MPI 1: Recv of 1 finished at 08:38:54.
MPI 1: Recv of 2 started at 08:38:54.
MPI 0: MPI_Test of 1 at 08:38:54.
MPI 0: Isend of 1 finished at 08:38:54.
MPI 0: MPI_Test of 2 at 08:38:54.
MPI 0: MPI_Test of 2 at 08:38:54.
MPI 0: MPI_Test of 2 at 08:38:54.
MPI 0: MPI_Test of 2 at 08:38:55.
MPI 0: MPI_Test of 2 at 08:38:55.
MPI 0: MPI_Test of 2 at 08:38:55.
MPI 0: MPI_Test of 2 at 08:38:55.
MPI 0: MPI_Test of 2 at 08:38:55.
MPI 0: MPI_Test of 2 at 08:38:55.
MPI 0: MPI_Test of 2 at 08:38:55.
MPI 0: MPI_Test of 2 at 08:38:55.
MPI 0: MPI_Test of 2 at 08:38:55.
MPI 0: MPI_Test of 2 at 08:38:55.
MPI 0: MPI_Test of 2 at 08:38:56.
MPI 0: MPI_Test of 2 at 08:38:56.
MPI 0: MPI_Test of 2 at 08:38:56.
MPI 0: MPI_Test of 2 at 08:38:56.
MPI 0: MPI_Test of 2 at 08:38:56.
MPI 0: MPI_Test of 2 at 08:38:56.
MPI 0: MPI_Test of 2 at 

[OMPI users] Problems using 1.10.2 with MOFED 3.1-1.1.0.1

2016-05-05 Thread Andy Riebs
I've built 1.10.2 with all my favorite configuration options, but I get 
messages such as this (one for each rank with 
orte_base_help_aggregate=0) when I try to run on a MOFED system:


$ shmemrun -H hades02,hades03 $PWD/shmem.out
--
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:   hades03
  Local device: mlx4_0
  Local port:   2
  CPCs attempted:   rdmacm, udcm
--

My configure options:
config_opts="--prefix=${INSTALL_DIR} \
--without-mpi-param-check \
--with-knem=/opt/mellanox/hpcx/knem \
--with-mxm=/opt/mellanox/mxm  \
--with-mxm-libdir=/opt/mellanox/mxm/lib \
--with-fca=/opt/mellanox/fca \
--with-pmi=${INSTALL_ROOT}/slurm \
--without-psm --disable-dlopen \
--disable-vt \
--enable-orterun-prefix-by-default \
--enable-debug-symbols"


There aren't any obvious error messages in the build log -- what am I 
missing?


Andy

--
Andy Riebs
andy.ri...@hpe.com
Hewlett-Packard Enterprise
High Performance Computing Software Engineering
+1 404 648 9024
My opinions are not necessarily those of HPE



Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-05 Thread Giacomo Rossi
 gdb /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90
GNU gdb (GDB) 7.11
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90...(no
debugging symbols found)...done.
(gdb) r -v
Starting program: /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90 -v

Program received signal SIGSEGV, Segmentation fault.
0x76858f38 in ?? ()
(gdb) bt
#0  0x76858f38 in ?? ()
#1  0x77de5828 in _dl_relocate_object () from
/lib64/ld-linux-x86-64.so.2
#2  0x77ddcfa3 in dl_main () from /lib64/ld-linux-x86-64.so.2
#3  0x77df029c in _dl_sysdep_start () from
/lib64/ld-linux-x86-64.so.2
#4  0x774a in _dl_start () from /lib64/ld-linux-x86-64.so.2
#5  0x77dd9d98 in _start () from /lib64/ld-linux-x86-64.so.2
#6  0x0002 in ?? ()
#7  0x7fffaa8a in ?? ()
#8  0x7fffaab6 in ?? ()
#9  0x in ?? ()

Giacomo Rossi Ph.D., Space Engineer

Research Fellow at Dept. of Mechanical and Aerospace Engineering, "Sapienza"
University of Rome
*p: *(+39) 0692927207 | *m**: *(+39) 3408816643 | *e: *giacom...@gmail.com

Member of Fortran-FOSS-programmers



2016-05-05 10:44 GMT+02:00 Giacomo Rossi :

> Here the result of ldd command:
> 'ldd /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90
> linux-vdso.so.1 (0x7ffcacbbe000)
> libopen-pal.so.13 =>
> /opt/openmpi/1.10.2/intel/16.0.3/lib/libopen-pal.so.13 (0x7fa9597a9000)
> libm.so.6 => /usr/lib/libm.so.6 (0x7fa9594a4000)
> libpciaccess.so.0 => /usr/lib/libpciaccess.so.0 (0x7fa95929a000)
> libdl.so.2 => /usr/lib/libdl.so.2 (0x7fa959096000)
> librt.so.1 => /usr/lib/librt.so.1 (0x7fa958e8e000)
> libutil.so.1 => /usr/lib/libutil.so.1 (0x7fa958c8b000)
> libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x7fa958a75000)
> libpthread.so.0 => /usr/lib/libpthread.so.0 (0x7fa958858000)
> libc.so.6 => /usr/lib/libc.so.6 (0x7fa9584b7000)
> libimf.so =>
> /home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libimf.so
> (0x7fa957fb9000)
> libsvml.so =>
> /home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libsvml.so
> (0x7fa9570ad000)
> libirng.so =>
> /home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libirng.so
> (0x7fa956d3b000)
> libintlc.so.5 =>
> /home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libintlc.so.5
> (0x7fa956acf000)
> /lib64/ld-linux-x86-64.so.2 (0x7fa959ab9000)'
>
> I can't provide a core file, because I can't compile or launch any program
> with mpifort... I've always the error 'core dumped' also when I try to
> compile a program with mpifort, and of course there isn't any core file.
>
>
> Giacomo Rossi Ph.D., Space Engineer
>
> Research Fellow at Dept. of Mechanical and Aerospace Engineering, "Sapienza"
> University of Rome
> *p: *(+39) 0692927207 | *m**: *(+39) 3408816643 | *e: *giacom...@gmail.com
> 
> Member of Fortran-FOSS-programmers
> 
>
>
> 2016-05-05 8:50 GMT+02:00 Giacomo Rossi :
>
>> I’ve installed the latest version of Intel Parallel Studio (16.0.3), then
>> I’ve downloaded the latest version of openmpi (1.10.2) and I’ve compiled it
>> with
>>
>> `./configure CC=icc CXX=icpc F77=ifort FC=ifort
>> --prefix=/opt/openmpi/1.10.2/intel/16.0.3`
>>
>> then I've installed and everything seems ok, but when I try the simple
>> command
>>
>> ' /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90 -v'
>>
>> I receive the following error
>>
>> 'Segmentation fault (core dumped)'
>>
>> I'm on ArchLinux, with kernel 4.5.1-1-ARCH; I've attache to this email
>> the config.log file compressed with bzip2.
>>
>> Any help will be appreciated!
>>
>> Giacomo Rossi Ph.D., Space Engineer
>>
>> Research Fellow at Dept. of Mechanical and Aerospace Engineering, "Sapienza"
>> University of Rome
>> *p: *(+39) 0692927207 | *m**: *(+39) 3408816643 | *e: *
>> giacom...@gmail.com
>> 
>> Member of Fortran-FOSS-programmers
>> 
>>
>> ​
>>
>
>


Re: [OMPI users] [open-mpi/ompi] COMM_SPAWN broken on Solaris/v1.10 (#1569)

2016-05-05 Thread Gilles Gouaillardet
Siegmar,

is this Solaris 10 specific (e.g. Solaris 11 works fine)

( I only have a x86_64 vm with Solaris 11 and sun compilers ...)

Cheers,

Gilles

On Thursday, May 5, 2016, Siegmar Gross <
siegmar.gr...@informatik.hs-fulda.de> wrote:

> Hi Ralph and Gilles,
>
> Am 04.05.2016 um 20:02 schrieb rhc54:
>
>> @ggouaillardet  Where does this stand?
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly or view it on GitHub
>> 
>>
>
> With my last installed version of openmpi-v1.10.x all of my
> spawn programs fail on Solaris Sparc and x86_64 with the same
> error for both compilers (gcc-5.1.0 and Sun C 5.13). Everything
> works as expected on Linux. Tomorrow I'm back in my office and
> I can try to build and test the latest version.
>
> sunpc1 fd1026 108 ompi_info | grep -e "OPAL repo" -e "C compiler absolute"
>   OPAL repo revision: v1.10.2-163-g42da15d
>  C compiler absolute: /opt/solstudio12.4/bin/cc
> sunpc1 fd1026 114 mpiexec -np 1 --host sunpc1,sunpc1,sunpc1,sunpc1,sunpc1
> spawnmaster
> [sunpc1:00957] *** Process received signal ***
> [sunpc1:00957] Signal: Segmentation Fault (11)
> [sunpc1:00957] Signal code: Address not mapped (1)
> [sunpc1:00957] Failing at address: 0
>
> /export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:opalbacktrace_print+0x2d
>
> /export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:0x2383c
> /lib/amd64/libc.so.1:0xdd6b6
> /lib/amd64/libc.so.1:0xd1f82
> /lib/amd64/libc.so.1:strlen+0x30 [ Signal 11 (SEGV)]
> /lib/amd64/libc.so.1:vsnprintf+0x51
> /lib/amd64/libc.so.1:vasprintf+0x49
>
> /export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:opalshow_help_vstring+0x83
>
> /export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-rte.so.20.0.0:orteshow_help+0xd6
>
> /export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libmpi.so.20.0.0:ompi_mpi_nit+0x1010
>
> /export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libmpi.so.20.0.0:PMPI_Init0x9d
> /home/fd1026/SunOS/x86_64/bin/spawn_master:main+0x21
> [sunpc1:00957] *** End of error message ***
> --
> mpiexec noticed that process rank 0 with PID 957 on node sunpc1 exited on
> signa 11 (Segmentation Fault).
> --
>
>
>
> sunpc1 fd1026 115 mpiexec -np 1 --host sunpc1,sunpc1,sunpc1,sunpc1,sunpc1
> spawnmultiple_master
> [sunpc1:00960] *** Process received signal ***
> [sunpc1:00960] Signal: Segmentation Fault (11)
> [sunpc1:00960] Signal code: Address not mapped (1)
> [sunpc1:00960] Failing at address: 0
>
> /export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:opalbacktrace_print+0x2d
>
> /export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:0x2383c
> /lib/amd64/libc.so.1:0xdd6b6
> /lib/amd64/libc.so.1:0xd1f82
> /lib/amd64/libc.so.1:strlen+0x30 [ Signal 11 (SEGV)]
> /lib/amd64/libc.so.1:vsnprintf+0x51
> /lib/amd64/libc.so.1:vasprintf+0x49
>
> /export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:opalshow_help_vstring+0x83
>
> /export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-rte.so.20.0.0:orteshow_help+0xd6
>
> /export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libmpi.so.20.0.0:ompi_mpi_nit+0x1010
>
> /export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libmpi.so.20.0.0:PMPI_Init0x9d
> /home/fd1026/SunOS/x86_64/bin/spawn_multiple_master:main+0x5d
> [sunpc1:00960] *** End of error message ***
> --
> mpiexec noticed that process rank 0 with PID 960 on node sunpc1 exited on
> signa 11 (Segmentation Fault).
> --
>
>
>
> sunpc1 fd1026 116 mpiexec -np 1 --host sunpc1,sunpc1,sunpc1,sunpc1,sunpc1
> spawnintra_comm
> [sunpc1:00963] *** Process received signal ***
> [sunpc1:00963] Signal: Segmentation Fault (11)
> [sunpc1:00963] Signal code: Address not mapped (1)
> [sunpc1:00963] Failing at address: 0
>
> /export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:opalbacktrace_print+0x2d
>
> /export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:0x2383c
> /lib/amd64/libc.so.1:0xdd6b6
> /lib/amd64/libc.so.1:0xd1f82
> /lib/amd64/libc.so.1:strlen+0x30 [ Signal 11 (SEGV)]
> /lib/amd64/libc.so.1:vsnprintf+0x51
> /lib/amd64/libc.so.1:vasprintf+0x49
>
> /export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:opalshow_help_vstring+0x83
>
> /export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-rte.so.20.0.0:orteshow_help+0xd6
>
> /export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libmpi.so.20.0.0:ompi_mpi_nit+0x1010
>
> /export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libmpi.so.20.0.0:PMPI_Init0x9d
> 

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-05 Thread Gilles Gouaillardet
Giacomo,

one option is to (if your shell is bash)
ulimit -c unlimited
mpif90 -v
you should get a core file

an other option is to
gdb /.../mpif90
r -v
bt

Cheers,

Gilles

On Thursday, May 5, 2016, Giacomo Rossi  wrote:

> Here the result of ldd command:
> 'ldd /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90
> linux-vdso.so.1 (0x7ffcacbbe000)
> libopen-pal.so.13 =>
> /opt/openmpi/1.10.2/intel/16.0.3/lib/libopen-pal.so.13 (0x7fa9597a9000)
> libm.so.6 => /usr/lib/libm.so.6 (0x7fa9594a4000)
> libpciaccess.so.0 => /usr/lib/libpciaccess.so.0 (0x7fa95929a000)
> libdl.so.2 => /usr/lib/libdl.so.2 (0x7fa959096000)
> librt.so.1 => /usr/lib/librt.so.1 (0x7fa958e8e000)
> libutil.so.1 => /usr/lib/libutil.so.1 (0x7fa958c8b000)
> libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x7fa958a75000)
> libpthread.so.0 => /usr/lib/libpthread.so.0 (0x7fa958858000)
> libc.so.6 => /usr/lib/libc.so.6 (0x7fa9584b7000)
> libimf.so =>
> /home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libimf.so
> (0x7fa957fb9000)
> libsvml.so =>
> /home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libsvml.so
> (0x7fa9570ad000)
> libirng.so =>
> /home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libirng.so
> (0x7fa956d3b000)
> libintlc.so.5 =>
> /home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libintlc.so.5
> (0x7fa956acf000)
> /lib64/ld-linux-x86-64.so.2 (0x7fa959ab9000)'
>
> I can't provide a core file, because I can't compile or launch any program
> with mpifort... I've always the error 'core dumped' also when I try to
> compile a program with mpifort, and of course there isn't any core file.
>
>
> Giacomo Rossi Ph.D., Space Engineer
>
> Research Fellow at Dept. of Mechanical and Aerospace Engineering, "Sapienza"
> University of Rome
> *p: *(+39) 0692927207 | *m**: *(+39) 3408816643 | *e: *giacom...@gmail.com
> 
> 
> Member of Fortran-FOSS-programmers
> 
>
>
> 2016-05-05 8:50 GMT+02:00 Giacomo Rossi  >:
>
>> I’ve installed the latest version of Intel Parallel Studio (16.0.3), then
>> I’ve downloaded the latest version of openmpi (1.10.2) and I’ve compiled it
>> with
>>
>> `./configure CC=icc CXX=icpc F77=ifort FC=ifort
>> --prefix=/opt/openmpi/1.10.2/intel/16.0.3`
>>
>> then I've installed and everything seems ok, but when I try the simple
>> command
>>
>> ' /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90 -v'
>>
>> I receive the following error
>>
>> 'Segmentation fault (core dumped)'
>>
>> I'm on ArchLinux, with kernel 4.5.1-1-ARCH; I've attache to this email
>> the config.log file compressed with bzip2.
>>
>> Any help will be appreciated!
>>
>> Giacomo Rossi Ph.D., Space Engineer
>>
>> Research Fellow at Dept. of Mechanical and Aerospace Engineering, "Sapienza"
>> University of Rome
>> *p: *(+39) 0692927207 | *m**: *(+39) 3408816643 | *e: *
>> giacom...@gmail.com 
>> 
>> Member of Fortran-FOSS-programmers
>> 
>>
>> ​
>>
>
>


Re: [OMPI users] [open-mpi/ompi] COMM_SPAWN broken on Solaris/v1.10 (#1569)

2016-05-05 Thread Siegmar Gross

Hi Ralph and Gilles,

Am 04.05.2016 um 20:02 schrieb rhc54:

@ggouaillardet  Where does this stand?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub



With my last installed version of openmpi-v1.10.x all of my
spawn programs fail on Solaris Sparc and x86_64 with the same
error for both compilers (gcc-5.1.0 and Sun C 5.13). Everything
works as expected on Linux. Tomorrow I'm back in my office and
I can try to build and test the latest version.

sunpc1 fd1026 108 ompi_info | grep -e "OPAL repo" -e "C compiler absolute"
  OPAL repo revision: v1.10.2-163-g42da15d
 C compiler absolute: /opt/solstudio12.4/bin/cc
sunpc1 fd1026 114 mpiexec -np 1 --host sunpc1,sunpc1,sunpc1,sunpc1,sunpc1 
spawnmaster

[sunpc1:00957] *** Process received signal ***
[sunpc1:00957] Signal: Segmentation Fault (11)
[sunpc1:00957] Signal code: Address not mapped (1)
[sunpc1:00957] Failing at address: 0
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:opalbacktrace_print+0x2d
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:0x2383c
/lib/amd64/libc.so.1:0xdd6b6
/lib/amd64/libc.so.1:0xd1f82
/lib/amd64/libc.so.1:strlen+0x30 [ Signal 11 (SEGV)]
/lib/amd64/libc.so.1:vsnprintf+0x51
/lib/amd64/libc.so.1:vasprintf+0x49
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:opalshow_help_vstring+0x83
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-rte.so.20.0.0:orteshow_help+0xd6
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libmpi.so.20.0.0:ompi_mpi_nit+0x1010
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libmpi.so.20.0.0:PMPI_Init0x9d
/home/fd1026/SunOS/x86_64/bin/spawn_master:main+0x21
[sunpc1:00957] *** End of error message ***
--
mpiexec noticed that process rank 0 with PID 957 on node sunpc1 exited on 
signa 11 (Segmentation Fault).

--



sunpc1 fd1026 115 mpiexec -np 1 --host sunpc1,sunpc1,sunpc1,sunpc1,sunpc1 
spawnmultiple_master

[sunpc1:00960] *** Process received signal ***
[sunpc1:00960] Signal: Segmentation Fault (11)
[sunpc1:00960] Signal code: Address not mapped (1)
[sunpc1:00960] Failing at address: 0
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:opalbacktrace_print+0x2d
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:0x2383c
/lib/amd64/libc.so.1:0xdd6b6
/lib/amd64/libc.so.1:0xd1f82
/lib/amd64/libc.so.1:strlen+0x30 [ Signal 11 (SEGV)]
/lib/amd64/libc.so.1:vsnprintf+0x51
/lib/amd64/libc.so.1:vasprintf+0x49
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:opalshow_help_vstring+0x83
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-rte.so.20.0.0:orteshow_help+0xd6
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libmpi.so.20.0.0:ompi_mpi_nit+0x1010
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libmpi.so.20.0.0:PMPI_Init0x9d
/home/fd1026/SunOS/x86_64/bin/spawn_multiple_master:main+0x5d
[sunpc1:00960] *** End of error message ***
--
mpiexec noticed that process rank 0 with PID 960 on node sunpc1 exited on 
signa 11 (Segmentation Fault).

--



sunpc1 fd1026 116 mpiexec -np 1 --host sunpc1,sunpc1,sunpc1,sunpc1,sunpc1 
spawnintra_comm

[sunpc1:00963] *** Process received signal ***
[sunpc1:00963] Signal: Segmentation Fault (11)
[sunpc1:00963] Signal code: Address not mapped (1)
[sunpc1:00963] Failing at address: 0
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:opalbacktrace_print+0x2d
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:0x2383c
/lib/amd64/libc.so.1:0xdd6b6
/lib/amd64/libc.so.1:0xd1f82
/lib/amd64/libc.so.1:strlen+0x30 [ Signal 11 (SEGV)]
/lib/amd64/libc.so.1:vsnprintf+0x51
/lib/amd64/libc.so.1:vasprintf+0x49
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:opalshow_help_vstring+0x83
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-rte.so.20.0.0:orteshow_help+0xd6
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libmpi.so.20.0.0:ompi_mpi_nit+0x1010
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libmpi.so.20.0.0:PMPI_Init0x9d
/home/fd1026/SunOS/x86_64/bin/spawn_intra_comm:main+0x23
[sunpc1:00963] *** End of error message ***
--
mpiexec noticed that process rank 0 with PID 963 on node sunpc1 exited on 
signa 11 (Segmentation Fault).

--
sunpc1 fd1026 117


Kind regards

Siegmar


Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-05 Thread Giacomo Rossi
Here the result of ldd command:
'ldd /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90
linux-vdso.so.1 (0x7ffcacbbe000)
libopen-pal.so.13 => /opt/openmpi/1.10.2/intel/16.0.3/lib/libopen-pal.so.13
(0x7fa9597a9000)
libm.so.6 => /usr/lib/libm.so.6 (0x7fa9594a4000)
libpciaccess.so.0 => /usr/lib/libpciaccess.so.0 (0x7fa95929a000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x7fa959096000)
librt.so.1 => /usr/lib/librt.so.1 (0x7fa958e8e000)
libutil.so.1 => /usr/lib/libutil.so.1 (0x7fa958c8b000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x7fa958a75000)
libpthread.so.0 => /usr/lib/libpthread.so.0 (0x7fa958858000)
libc.so.6 => /usr/lib/libc.so.6 (0x7fa9584b7000)
libimf.so =>
/home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libimf.so
(0x7fa957fb9000)
libsvml.so =>
/home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libsvml.so
(0x7fa9570ad000)
libirng.so =>
/home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libirng.so
(0x7fa956d3b000)
libintlc.so.5 =>
/home/giacomo/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64/libintlc.so.5
(0x7fa956acf000)
/lib64/ld-linux-x86-64.so.2 (0x7fa959ab9000)'

I can't provide a core file, because I can't compile or launch any program
with mpifort... I've always the error 'core dumped' also when I try to
compile a program with mpifort, and of course there isn't any core file.


Giacomo Rossi Ph.D., Space Engineer

Research Fellow at Dept. of Mechanical and Aerospace Engineering, "Sapienza"
University of Rome
*p: *(+39) 0692927207 | *m**: *(+39) 3408816643 | *e: *giacom...@gmail.com

Member of Fortran-FOSS-programmers



2016-05-05 8:50 GMT+02:00 Giacomo Rossi :

> I’ve installed the latest version of Intel Parallel Studio (16.0.3), then
> I’ve downloaded the latest version of openmpi (1.10.2) and I’ve compiled it
> with
>
> `./configure CC=icc CXX=icpc F77=ifort FC=ifort
> --prefix=/opt/openmpi/1.10.2/intel/16.0.3`
>
> then I've installed and everything seems ok, but when I try the simple
> command
>
> ' /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90 -v'
>
> I receive the following error
>
> 'Segmentation fault (core dumped)'
>
> I'm on ArchLinux, with kernel 4.5.1-1-ARCH; I've attache to this email the
> config.log file compressed with bzip2.
>
> Any help will be appreciated!
>
> Giacomo Rossi Ph.D., Space Engineer
>
> Research Fellow at Dept. of Mechanical and Aerospace Engineering, "Sapienza"
> University of Rome
> *p: *(+39) 0692927207 | *m**: *(+39) 3408816643 | *e: *giacom...@gmail.com
> 
> Member of Fortran-FOSS-programmers
> 
>
> ​
>


[OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-05 Thread Gilles Gouaillardet
Giacomo,

could you also open the core file with gdb and post the backtrace ?

can you also
ldd mpif90
and confirm no intel MPI library is used ?

btw, the OpenMPI fortran wrapper is now mpifort

Cheers,

Gilles

On Thursday, May 5, 2016, Giacomo Rossi > wrote:

> I’ve installed the latest version of Intel Parallel Studio (16.0.3), then
> I’ve downloaded the latest version of openmpi (1.10.2) and I’ve compiled it
> with
>
> `./configure CC=icc CXX=icpc F77=ifort FC=ifort
> --prefix=/opt/openmpi/1.10.2/intel/16.0.3`
>
> then I've installed and everything seems ok, but when I try the simple
> command
>
> ' /opt/openmpi/1.10.2/intel/16.0.3/bin/mpif90 -v'
>
> I receive the following error
>
> 'Segmentation fault (core dumped)'
>
> I'm on ArchLinux, with kernel 4.5.1-1-ARCH; I've attache to this email the
> config.log file compressed with bzip2.
>
> Any help will be appreciated!
>
> Giacomo Rossi Ph.D., Space Engineer
>
> Research Fellow at Dept. of Mechanical and Aerospace Engineering, "Sapienza"
> University of Rome
> *p: *(+39) 0692927207 | *m**: *(+39) 3408816643 | *e: *giacom...@gmail.com
>
> Member of Fortran-FOSS-programmers
> 
>
> ​
>