Re: [OMPI users] Strange affinity messages with 1.8 and torque 5

2014-09-23 Thread Maxime Boissonneault
Do you know the topology of the cores allocated by Torque (i.e. were 
they all on the same nodes, or 8 per node, or a heterogenous 
distribution for example ?)



Le 2014-09-23 15:05, Brock Palen a écrit :

Yes the request to torque was procs=64,

We are using cpusets.

the mpirun without -np 64  creates 64 spawned hostnames.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Sep 23, 2014, at 3:02 PM, Ralph Castain  wrote:


FWIW: that warning has been removed from the upcoming 1.8.3 release


On Sep 23, 2014, at 11:45 AM, Reuti  wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Am 23.09.2014 um 19:53 schrieb Brock Palen:


I found a fun head scratcher, with openmpi 1.8.2  with torque 5 built with TM 
support, on hereto core layouts  I get the fun thing:
mpirun -report-bindings hostname< Works

And you get 64 lines of output?



mpirun -report-bindings -np 64 hostname   <- Wat?
--
A request was made to bind to that would result in binding more
processes than cpus on a resource:

Bind to: CORE
Node:nyx5518
#processes:  2
#cpus:   1

You can override this protection by adding the "overload-allowed"
option to your binding directive.
--

How many cores are physically installed on this machine - two as mentioned 
above?

- -- Reuti



I ran with --oversubscribed and got the expected host list, which matched 
$PBS_NODEFILE and was 64 entires long:

mpirun -overload-allowed -report-bindings -np 64 --oversubscribe hostname

What did I do wrong?  I'm stumped why one works one doesn't but the one that 
doesn't if your force it appears correct.


Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/09/25375.php

-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.20 (Darwin)
Comment: GPGTools - http://gpgtools.org

iEYEARECAAYFAlQhv7IACgkQo/GbGkBRnRr3HgCgjZoD9l9a+WThl5CDaGF1jawx
PWIAmwWnZwQdytNgAJgbir6V7yCyBt5D
=NG0H
-END PGP SIGNATURE-
___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/09/25376.php

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/09/25378.php



___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/09/25379.php



--
-
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique



Re: [OMPI users] Strange affinity messages with 1.8 and torque 5

2014-09-23 Thread Brock Palen
Yes the request to torque was procs=64,

We are using cpusets.

the mpirun without -np 64  creates 64 spawned hostnames. 

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Sep 23, 2014, at 3:02 PM, Ralph Castain  wrote:

> FWIW: that warning has been removed from the upcoming 1.8.3 release
> 
> 
> On Sep 23, 2014, at 11:45 AM, Reuti  wrote:
> 
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>> 
>> Am 23.09.2014 um 19:53 schrieb Brock Palen:
>> 
>>> I found a fun head scratcher, with openmpi 1.8.2  with torque 5 built with 
>>> TM support, on hereto core layouts  I get the fun thing:
>>> mpirun -report-bindings hostname< Works
>> 
>> And you get 64 lines of output?
>> 
>> 
>>> mpirun -report-bindings -np 64 hostname   <- Wat?
>>> --
>>> A request was made to bind to that would result in binding more
>>> processes than cpus on a resource:
>>> 
>>> Bind to: CORE
>>> Node:nyx5518
>>> #processes:  2
>>> #cpus:   1
>>> 
>>> You can override this protection by adding the "overload-allowed"
>>> option to your binding directive.
>>> --
>> 
>> How many cores are physically installed on this machine - two as mentioned 
>> above?
>> 
>> - -- Reuti
>> 
>> 
>>> I ran with --oversubscribed and got the expected host list, which matched 
>>> $PBS_NODEFILE and was 64 entires long:
>>> 
>>> mpirun -overload-allowed -report-bindings -np 64 --oversubscribe hostname
>>> 
>>> What did I do wrong?  I'm stumped why one works one doesn't but the one 
>>> that doesn't if your force it appears correct.
>>> 
>>> 
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> CAEN Advanced Computing
>>> XSEDE Campus Champion
>>> bro...@umich.edu
>>> (734)936-1985
>>> 
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/09/25375.php
>> 
>> -BEGIN PGP SIGNATURE-
>> Version: GnuPG/MacGPG2 v2.0.20 (Darwin)
>> Comment: GPGTools - http://gpgtools.org
>> 
>> iEYEARECAAYFAlQhv7IACgkQo/GbGkBRnRr3HgCgjZoD9l9a+WThl5CDaGF1jawx
>> PWIAmwWnZwQdytNgAJgbir6V7yCyBt5D
>> =NG0H
>> -END PGP SIGNATURE-
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/09/25376.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/09/25378.php



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [OMPI users] Strange affinity messages with 1.8 and torque 5

2014-09-23 Thread Ralph Castain
FWIW: that warning has been removed from the upcoming 1.8.3 release


On Sep 23, 2014, at 11:45 AM, Reuti  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Am 23.09.2014 um 19:53 schrieb Brock Palen:
> 
>> I found a fun head scratcher, with openmpi 1.8.2  with torque 5 built with 
>> TM support, on hereto core layouts  I get the fun thing:
>> mpirun -report-bindings hostname< Works
> 
> And you get 64 lines of output?
> 
> 
>> mpirun -report-bindings -np 64 hostname   <- Wat?
>> --
>> A request was made to bind to that would result in binding more
>> processes than cpus on a resource:
>> 
>>  Bind to: CORE
>>  Node:nyx5518
>>  #processes:  2
>>  #cpus:   1
>> 
>> You can override this protection by adding the "overload-allowed"
>> option to your binding directive.
>> --
> 
> How many cores are physically installed on this machine - two as mentioned 
> above?
> 
> - -- Reuti
> 
> 
>> I ran with --oversubscribed and got the expected host list, which matched 
>> $PBS_NODEFILE and was 64 entires long:
>> 
>> mpirun -overload-allowed -report-bindings -np 64 --oversubscribe hostname
>> 
>> What did I do wrong?  I'm stumped why one works one doesn't but the one that 
>> doesn't if your force it appears correct.
>> 
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/09/25375.php
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG/MacGPG2 v2.0.20 (Darwin)
> Comment: GPGTools - http://gpgtools.org
> 
> iEYEARECAAYFAlQhv7IACgkQo/GbGkBRnRr3HgCgjZoD9l9a+WThl5CDaGF1jawx
> PWIAmwWnZwQdytNgAJgbir6V7yCyBt5D
> =NG0H
> -END PGP SIGNATURE-
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/09/25376.php



Re: [OMPI users] Strange affinity messages with 1.8 and torque 5

2014-09-23 Thread Maxime Boissonneault

Hi,
Just an idea here. Do you use cpusets within Torque ? Did you request 
enough cores to torque ?


Maxime Boissonneault

Le 2014-09-23 13:53, Brock Palen a écrit :

I found a fun head scratcher, with openmpi 1.8.2  with torque 5 built with TM 
support, on hereto core layouts  I get the fun thing:
mpirun -report-bindings hostname< Works
mpirun -report-bindings -np 64 hostname   <- Wat?
--
A request was made to bind to that would result in binding more
processes than cpus on a resource:

Bind to: CORE
Node:nyx5518
#processes:  2
#cpus:   1

You can override this protection by adding the "overload-allowed"
option to your binding directive.
--


I ran with --oversubscribed and got the expected host list, which matched 
$PBS_NODEFILE and was 64 entires long:

mpirun -overload-allowed -report-bindings -np 64 --oversubscribe hostname

What did I do wrong?  I'm stumped why one works one doesn't but the one that 
doesn't if your force it appears correct.


Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985





___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/09/25375.php



--
-
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique



Re: [OMPI users] Strange affinity messages with 1.8 and torque 5

2014-09-23 Thread Reuti
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Am 23.09.2014 um 19:53 schrieb Brock Palen:

> I found a fun head scratcher, with openmpi 1.8.2  with torque 5 built with TM 
> support, on hereto core layouts  I get the fun thing:
> mpirun -report-bindings hostname< Works

And you get 64 lines of output?


> mpirun -report-bindings -np 64 hostname   <- Wat?
> --
> A request was made to bind to that would result in binding more
> processes than cpus on a resource:
> 
>   Bind to: CORE
>   Node:nyx5518
>   #processes:  2
>   #cpus:   1
> 
> You can override this protection by adding the "overload-allowed"
> option to your binding directive.
> --

How many cores are physically installed on this machine - two as mentioned 
above?

- -- Reuti


> I ran with --oversubscribed and got the expected host list, which matched 
> $PBS_NODEFILE and was 64 entires long:
> 
> mpirun -overload-allowed -report-bindings -np 64 --oversubscribe hostname
> 
> What did I do wrong?  I'm stumped why one works one doesn't but the one that 
> doesn't if your force it appears correct.
> 
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/09/25375.php

-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.20 (Darwin)
Comment: GPGTools - http://gpgtools.org

iEYEARECAAYFAlQhv7IACgkQo/GbGkBRnRr3HgCgjZoD9l9a+WThl5CDaGF1jawx
PWIAmwWnZwQdytNgAJgbir6V7yCyBt5D
=NG0H
-END PGP SIGNATURE-


[OMPI users] Strange affinity messages with 1.8 and torque 5

2014-09-23 Thread Brock Palen
I found a fun head scratcher, with openmpi 1.8.2  with torque 5 built with TM 
support, on hereto core layouts  I get the fun thing:
mpirun -report-bindings hostname< Works
mpirun -report-bindings -np 64 hostname   <- Wat?
--
A request was made to bind to that would result in binding more
processes than cpus on a resource:

   Bind to: CORE
   Node:nyx5518
   #processes:  2
   #cpus:   1

You can override this protection by adding the "overload-allowed"
option to your binding directive.
--


I ran with --oversubscribed and got the expected host list, which matched 
$PBS_NODEFILE and was 64 entires long:

mpirun -overload-allowed -report-bindings -np 64 --oversubscribe hostname

What did I do wrong?  I'm stumped why one works one doesn't but the one that 
doesn't if your force it appears correct.


Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985





signature.asc
Description: Message signed with OpenPGP using GPGMail