Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-25 Thread Brock Palen
Yes 

ompi_info --all 

Works,

ompi_info -param all all

[brockp@flux-login1 34241]$ ompi_info --param all all
Error getting SCIF driver version 
 MCA btl: parameter "btl_tcp_if_include" (current value: "",
  data source: default, level: 1 user/basic, type:
  string)
  Comma-delimited list of devices and/or CIDR
  notation of networks to use for MPI communication
  (e.g., "eth0,192.168.0.0/16").  Mutually exclusive
  with btl_tcp_if_exclude.
 MCA btl: parameter "btl_tcp_if_exclude" (current value:
  "127.0.0.1/8,sppp", data source: default, level: 1
  user/basic, type: string)
  Comma-delimited list of devices and/or CIDR
  notation of networks to NOT use for MPI
  communication -- all devices not matching these
  specifications will be used (e.g.,
  "eth0,192.168.0.0/16").  If set to a non-default
  value, it is mutually exclusive with
  btl_tcp_if_include.
[brockp@flux-login1 34241]$ 


ompi_info --param all all --level 9 
(gives me what I expect).

Thanks,

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 24, 2014, at 10:22 AM, Jeff Squyres (jsquyres)  
wrote:

> Brock --
> 
> Can you run with "ompi_info --all"?
> 
> With "--param all all", ompi_info in v1.8.x is defaulting to only showing 
> level 1 MCA params.  It's showing you all possible components and variables, 
> but only level 1.
> 
> Or you could also use "--level 9" to show all 9 levels.  Here's the relevant 
> section from the README:
> 
> -
> The following options may be helpful:
> 
> --all   Show a *lot* of information about your Open MPI
>installation. 
> --parsable  Display all the information in an easily
>grep/cut/awk/sed-able format.
> --param  
>A  of "all" and a  of "all" will
>show all parameters to all components.  Otherwise, the
>parameters of all the components in a specific framework,
>or just the parameters of a specific component can be
>displayed by using an appropriate  and/or
> name.
> --level 
>By default, ompi_info only shows "Level 1" MCA parameters
>-- parameters that can affect whether MPI processes can
>run successfully or not (e.g., determining which network
>interfaces to use).  The --level option will display all
>MCA parameters from level 1 to  (the max 
>value is 9).  Use "ompi_info --param 
> --level 9" to see *all* MCA parameters for a
>given component.  See "The Modular Component Architecture
>(MCA)" section, below, for a fuller explanation.
> 
> 
> 
> 
> 
> On Jun 24, 2014, at 5:19 AM, Ralph Castain  wrote:
> 
>> That's odd - it shouldn't truncate the output. I'll take a look later today 
>> - we're all gathered for a developer's conference this week, so I'll be able 
>> to poke at this with Nathan.
>> 
>> 
>> 
>> On Mon, Jun 23, 2014 at 3:15 PM, Brock Palen  wrote:
>> Perfection, flexible, extensible, so nice.
>> 
>> BTW this doesn't happen older versions,
>> 
>> [brockp@flux-login2 34241]$ ompi_info --param all all
>> Error getting SCIF driver version
>> MCA btl: parameter "btl_tcp_if_include" (current value: "",
>>  data source: default, level: 1 user/basic, type:
>>  string)
>>  Comma-delimited list of devices and/or CIDR
>>  notation of networks to use for MPI communication
>>  (e.g., "eth0,192.168.0.0/16").  Mutually exclusive
>>  with btl_tcp_if_exclude.
>> MCA btl: parameter "btl_tcp_if_exclude" (current value:
>>  "127.0.0.1/8,sppp", data source: default, level: 1
>>  user/basic, type: string)
>>  Comma-delimited list of devices and/or CIDR
>>  notation of networks to NOT use for MPI
>>  communication -- all devices not matching these
>>  specifications will be used (e.g.,
>>  "eth0,192.168.0.0/16").  If set to a non-default
>>  value, it is mutually exclusive with
>>  btl_tcp_if_include.
>> 
>> 
>> This is normally much longer.  And yes we don't have the PHI stuff installed 
>> on all nodes, strange that 'all all' is now very short,  ompi_info -a  still 
>> works though.
>> 
>> 
>> 
>> 

[OMPI users] Fwd: openmpi linking problem

2014-06-25 Thread Sergii Veremieiev
Dear Sir/Madam,

I'm trying to run a parallel finite element analysis 64-bit code on my
desktop with Windows 7, Cygwin, Open MPI 1.7.5, 64Gb RAM and 6-core Intel
Core i7-3930K CPU via "mpirun -np 6 executable" command. The code runs
fine, but if I increase the number of elements to a critical one (roughly
more than 100k) the built-in Mumps library returns an error message (please
see below). Can you possibly advise me what can be a problem? I have
checked in Task Manager the code is using about 3-6Gb per process or about
20Gb in total, that is much smaller than the amount of physical memory
available on the system 55Gb. Is there possibly a memory limit in Windows
available per process? Thank you.

Best regards,

Sergii


mpirun has exited due to process rank 1 with PID 6028 on
node exiting improperly. There are three reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

You can avoid this message by specifying -quiet on the mpirun command line.


Re: [OMPI users] poor performance using the openib btl

2014-06-25 Thread Fischer, Greg A.
I looked through my configure log, and that option is not enabled. Thanks for 
the suggestion.

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime 
Boissonneault
Sent: Wednesday, June 25, 2014 10:51 AM
To: Open MPI Users
Subject: Re: [OMPI users] poor performance using the openib btl

Hi,
I recovered the name of the option that caused problems for us. It is 
--enable-mpi-thread-multiple

This option enables threading within OPAL, which was bugged (at least in 1.6.x 
series). I don't know if it has been fixed in 1.8 series.

I do not see your configure line in the attached file, to see if it was enabled 
or not.

Maxime

Le 2014-06-25 10:46, Fischer, Greg A. a écrit :
Attached are the results of "grep thread" on my configure output. There appears 
to be some amount of threading, but is there anything I should look for in 
particular?

I see Mike Dubman's questions on the mailing list website, but his message 
didn't appear to make it to my inbox. The answers to his questions are:

[binford:fischega] $ rpm -qa | grep ofed
ofed-doc-1.5.4.1-0.11.5
ofed-kmp-default-1.5.4.1_3.0.76_0.11-0.11.5
ofed-1.5.4.1-0.11.5

Distro: SLES11 SP3

HCA:
[binf102:fischega] $ /usr/sbin/ibstat
CA 'mlx4_0'
CA type: MT26428

Command line (path and LD_LIBRARY_PATH are set correctly):
mpirun -x LD_LIBRARY_PATH -mca btl openib,sm,self -mca btl_openib_verbose 1 -np 
31 $CTF_EXEC

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime 
Boissonneault
Sent: Tuesday, June 24, 2014 6:41 PM
To: Open MPI Users
Subject: Re: [OMPI users] poor performance using the openib btl

What are your threading options for OpenMPI (when it was built) ?

I have seen OpenIB BTL completely lock when some level of threading is enabled 
before.

Maxime Boissonneault


Le 2014-06-24 18:18, Fischer, Greg A. a écrit :
Hello openmpi-users,

A few weeks ago, I posted to the list about difficulties I was having getting 
openib to work with Torque (see "openib segfaults with Torque", June 6, 2014). 
The issues were related to Torque imposing restrictive limits on locked memory, 
and have since been resolved.

However, now that I've had some time to test the applications, I'm seeing 
abysmal performance over the openib layer. Applications run with the tcp btl 
execute about 10x faster than with the openib btl. Clearly something still 
isn't quite right.

I tried running with "-mca btl_openib_verbose 1", but didn't see anything 
resembling a smoking gun. How should I go about determining the source of the 
problem? (This uses the same OpenMPI Version 1.8.1 / SLES11 SP3 / GCC 4.8.3 
setup discussed previously.)

Thanks,
Greg





___

users mailing list

us...@open-mpi.org

Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users

Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/06/24697.php





--

-

Maxime Boissonneault

Analyste de calcul - Calcul Québec, Université Laval

Ph. D. en physique




___

users mailing list

us...@open-mpi.org

Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users

Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/06/24700.php




--

-

Maxime Boissonneault

Analyste de calcul - Calcul Québec, Université Laval

Ph. D. en physique


Re: [OMPI users] poor performance using the openib btl

2014-06-25 Thread Maxime Boissonneault

Hi,
I recovered the name of the option that caused problems for us. It is 
--enable-mpi-thread-multiple


This option enables threading within OPAL, which was bugged (at least in 
1.6.x series). I don't know if it has been fixed in 1.8 series.


I do not see your configure line in the attached file, to see if it was 
enabled or not.


Maxime

Le 2014-06-25 10:46, Fischer, Greg A. a écrit :


Attached are the results of "grep thread" on my configure output. 
There appears to be some amount of threading, but is there anything I 
should look for in particular?


I see Mike Dubman's questions on the mailing list website, but his 
message didn't appear to make it to my inbox. The answers to his 
questions are:


[binford:fischega] $ rpm -qa | grep ofed

ofed-doc-1.5.4.1-0.11.5

ofed-kmp-default-1.5.4.1_3.0.76_0.11-0.11.5

ofed-1.5.4.1-0.11.5

Distro: SLES11 SP3

HCA:

[binf102:fischega] $ /usr/sbin/ibstat

CA 'mlx4_0'

CA type: MT26428

Command line (path and LD_LIBRARY_PATH are set correctly):

mpirun -x LD_LIBRARY_PATH -mca btl openib,sm,self -mca 
btl_openib_verbose 1 -np 31 $CTF_EXEC


*From:*users [mailto:users-boun...@open-mpi.org] *On Behalf Of *Maxime 
Boissonneault

*Sent:* Tuesday, June 24, 2014 6:41 PM
*To:* Open MPI Users
*Subject:* Re: [OMPI users] poor performance using the openib btl

What are your threading options for OpenMPI (when it was built) ?

I have seen OpenIB BTL completely lock when some level of threading is 
enabled before.


Maxime Boissonneault


Le 2014-06-24 18:18, Fischer, Greg A. a écrit :

Hello openmpi-users,

A few weeks ago, I posted to the list about difficulties I was
having getting openib to work with Torque (see "openib segfaults
with Torque", June 6, 2014). The issues were related to Torque
imposing restrictive limits on locked memory, and have since been
resolved.

However, now that I've had some time to test the applications, I'm
seeing abysmal performance over the openib layer. Applications run
with the tcp btl execute about 10x faster than with the openib
btl. Clearly something still isn't quite right.

I tried running with "-mca btl_openib_verbose 1", but didn't see
anything resembling a smoking gun. How should I go about
determining the source of the problem? (This uses the same OpenMPI
Version 1.8.1 / SLES11 SP3 / GCC 4.8.3 setup discussed previously.)

Thanks,

Greg




___

users mailing list

us...@open-mpi.org  

Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users

Link to this 
post:http://www.open-mpi.org/community/lists/users/2014/06/24697.php




--
-
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique


___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/06/24700.php



--
-
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique



Re: [OMPI users] poor performance using the openib btl

2014-06-25 Thread Fischer, Greg A.
Attached are the results of "grep thread" on my configure output. There appears 
to be some amount of threading, but is there anything I should look for in 
particular?

I see Mike Dubman's questions on the mailing list website, but his message 
didn't appear to make it to my inbox. The answers to his questions are:

[binford:fischega] $ rpm -qa | grep ofed
ofed-doc-1.5.4.1-0.11.5
ofed-kmp-default-1.5.4.1_3.0.76_0.11-0.11.5
ofed-1.5.4.1-0.11.5

Distro: SLES11 SP3

HCA:
[binf102:fischega] $ /usr/sbin/ibstat
CA 'mlx4_0'
CA type: MT26428

Command line (path and LD_LIBRARY_PATH are set correctly):
mpirun -x LD_LIBRARY_PATH -mca btl openib,sm,self -mca btl_openib_verbose 1 -np 
31 $CTF_EXEC

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime 
Boissonneault
Sent: Tuesday, June 24, 2014 6:41 PM
To: Open MPI Users
Subject: Re: [OMPI users] poor performance using the openib btl

What are your threading options for OpenMPI (when it was built) ?

I have seen OpenIB BTL completely lock when some level of threading is enabled 
before.

Maxime Boissonneault


Le 2014-06-24 18:18, Fischer, Greg A. a écrit :
Hello openmpi-users,

A few weeks ago, I posted to the list about difficulties I was having getting 
openib to work with Torque (see "openib segfaults with Torque", June 6, 2014). 
The issues were related to Torque imposing restrictive limits on locked memory, 
and have since been resolved.

However, now that I've had some time to test the applications, I'm seeing 
abysmal performance over the openib layer. Applications run with the tcp btl 
execute about 10x faster than with the openib btl. Clearly something still 
isn't quite right.

I tried running with "-mca btl_openib_verbose 1", but didn't see anything 
resembling a smoking gun. How should I go about determining the source of the 
problem? (This uses the same OpenMPI Version 1.8.1 / SLES11 SP3 / GCC 4.8.3 
setup discussed previously.)

Thanks,
Greg




___

users mailing list

us...@open-mpi.org

Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users

Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/06/24697.php




--

-

Maxime Boissonneault

Analyste de calcul - Calcul Québec, Université Laval

Ph. D. en physique
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking pthread.h usability... yes
checking pthread.h presence... yes
checking for pthread.h... yes
checking if C compiler and POSIX threads work as is... no
checking if C++ compiler and POSIX threads work as is... no
checking if Fortran compiler and POSIX threads work as is... no
checking if C compiler and POSIX threads work with -Kthread... no
checking if C compiler and POSIX threads work with -kthread... no
checking if C compiler and POSIX threads work with -pthread... yes
checking if C++ compiler and POSIX threads work with -Kthread... no
checking if C++ compiler and POSIX threads work with -kthread... no
checking if C++ compiler and POSIX threads work with -pthread... yes
checking if Fortran compiler and POSIX threads work with -Kthread... no
checking if Fortran compiler and POSIX threads work with -kthread... no
checking if Fortran compiler and POSIX threads work with -pthread... yes
checking for pthread_mutexattr_setpshared... yes
checking for pthread_condattr_setpshared... yes
checking for working POSIX threads package... yes
checking for type of thread support... posix
checking if threads have different pids (pthreads on linux)... no
checking for pthread_t... yes
checking pthread_np.h usability... no
checking pthread_np.h presence... no
checking for pthread_np.h... no
checking whether pthread_setaffinity_np is declared... yes
checking whether pthread_getaffinity_np is declared... yes
checking for library containing pthread_getthrds_np... no
checking for pthread_mutex_lock... yes
checking libevent configuration args... --disable-dns --disable-http 
--disable-rpc --disable-openssl --enable-thread-support --disable-evport
configure: running /bin/sh 
'../../../../../../openmpi-1.8.1/opal/mca/event/libevent2021/libevent/configure'
 --disable-dns --disable-http --disable-rpc --disable-openssl 
--enable-thread-support --disable-evport  
'--prefix=/casl/vera_ib/gcc-4.8.3/toolset/openmpi-1.8.1' --cache-file=/dev/null 
--srcdir=../../../../../../openmpi-1.8.1/opal/mca/event/libevent2021/libevent 
--disable-option-checking
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for the pthreads library -lpthreads... no
checking whether pthreads work without any flags... yes
checking for joinable pthread attribute... PTHREAD_CREATE_JOINABLE
checking if more special flags are required for pthreads... no
checking size of pthread_t... 8
config.status: creating libevent_pthreads.pc
checking for thread support (needed for rdmacm/udcm)... posix
configure: running /bin/sh 
'../../../../../../openmpi-1.8.1/ompi/mca/io/romio/romio/configure'  

Re: [OMPI users] poor performance using the openib btl

2014-06-25 Thread Mike Dubman
Hi
what ofed/mofed are you using? what HCA, distro and command line?
M


On Wed, Jun 25, 2014 at 1:40 AM, Maxime Boissonneault <
maxime.boissonnea...@calculquebec.ca> wrote:

>  What are your threading options for OpenMPI (when it was built) ?
>
> I have seen OpenIB BTL completely lock when some level of threading is
> enabled before.
>
> Maxime Boissonneault
>
>
> Le 2014-06-24 18:18, Fischer, Greg A. a écrit :
>
>  Hello openmpi-users,
>
>
>
> A few weeks ago, I posted to the list about difficulties I was having
> getting openib to work with Torque (see “openib segfaults with Torque”,
> June 6, 2014). The issues were related to Torque imposing restrictive
> limits on locked memory, and have since been resolved.
>
>
>
> However, now that I’ve had some time to test the applications, I’m seeing
> abysmal performance over the openib layer. Applications run with the tcp
> btl execute about 10x faster than with the openib btl. Clearly something
> still isn’t quite right.
>
>
>
> I tried running with “-mca btl_openib_verbose 1”, but didn’t see anything
> resembling a smoking gun. How should I go about determining the source of
> the problem? (This uses the same OpenMPI Version 1.8.1 / SLES11 SP3 / GCC
> 4.8.3 setup discussed previously.)
>
>
>
> Thanks,
>
> Greg
>
>
> ___
> users mailing listus...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/06/24697.php
>
>
>
> --
> -
> Maxime Boissonneault
> Analyste de calcul - Calcul Québec, Université Laval
> Ph. D. en physique
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/06/24698.php
>