Re: [OMPI users] Strange affinity messages with 1.8 and torque 5
Do you know the topology of the cores allocated by Torque (i.e. were they all on the same nodes, or 8 per node, or a heterogenous distribution for example ?) Le 2014-09-23 15:05, Brock Palen a écrit : Yes the request to torque was procs=64, We are using cpusets. the mpirun without -np 64 creates 64 spawned hostnames. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Sep 23, 2014, at 3:02 PM, Ralph Castainwrote: FWIW: that warning has been removed from the upcoming 1.8.3 release On Sep 23, 2014, at 11:45 AM, Reuti wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Am 23.09.2014 um 19:53 schrieb Brock Palen: I found a fun head scratcher, with openmpi 1.8.2 with torque 5 built with TM support, on hereto core layouts I get the fun thing: mpirun -report-bindings hostname< Works And you get 64 lines of output? mpirun -report-bindings -np 64 hostname <- Wat? -- A request was made to bind to that would result in binding more processes than cpus on a resource: Bind to: CORE Node:nyx5518 #processes: 2 #cpus: 1 You can override this protection by adding the "overload-allowed" option to your binding directive. -- How many cores are physically installed on this machine - two as mentioned above? - -- Reuti I ran with --oversubscribed and got the expected host list, which matched $PBS_NODEFILE and was 64 entires long: mpirun -overload-allowed -report-bindings -np 64 --oversubscribe hostname What did I do wrong? I'm stumped why one works one doesn't but the one that doesn't if your force it appears correct. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/09/25375.php -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.20 (Darwin) Comment: GPGTools - http://gpgtools.org iEYEARECAAYFAlQhv7IACgkQo/GbGkBRnRr3HgCgjZoD9l9a+WThl5CDaGF1jawx PWIAmwWnZwQdytNgAJgbir6V7yCyBt5D =NG0H -END PGP SIGNATURE- ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/09/25376.php ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/09/25378.php ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/09/25379.php -- - Maxime Boissonneault Analyste de calcul - Calcul Québec, Université Laval Ph. D. en physique
Re: [OMPI users] Strange affinity messages with 1.8 and torque 5
Yes the request to torque was procs=64, We are using cpusets. the mpirun without -np 64 creates 64 spawned hostnames. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Sep 23, 2014, at 3:02 PM, Ralph Castainwrote: > FWIW: that warning has been removed from the upcoming 1.8.3 release > > > On Sep 23, 2014, at 11:45 AM, Reuti wrote: > >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA1 >> >> Am 23.09.2014 um 19:53 schrieb Brock Palen: >> >>> I found a fun head scratcher, with openmpi 1.8.2 with torque 5 built with >>> TM support, on hereto core layouts I get the fun thing: >>> mpirun -report-bindings hostname< Works >> >> And you get 64 lines of output? >> >> >>> mpirun -report-bindings -np 64 hostname <- Wat? >>> -- >>> A request was made to bind to that would result in binding more >>> processes than cpus on a resource: >>> >>> Bind to: CORE >>> Node:nyx5518 >>> #processes: 2 >>> #cpus: 1 >>> >>> You can override this protection by adding the "overload-allowed" >>> option to your binding directive. >>> -- >> >> How many cores are physically installed on this machine - two as mentioned >> above? >> >> - -- Reuti >> >> >>> I ran with --oversubscribed and got the expected host list, which matched >>> $PBS_NODEFILE and was 64 entires long: >>> >>> mpirun -overload-allowed -report-bindings -np 64 --oversubscribe hostname >>> >>> What did I do wrong? I'm stumped why one works one doesn't but the one >>> that doesn't if your force it appears correct. >>> >>> >>> Brock Palen >>> www.umich.edu/~brockp >>> CAEN Advanced Computing >>> XSEDE Campus Champion >>> bro...@umich.edu >>> (734)936-1985 >>> >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/09/25375.php >> >> -BEGIN PGP SIGNATURE- >> Version: GnuPG/MacGPG2 v2.0.20 (Darwin) >> Comment: GPGTools - http://gpgtools.org >> >> iEYEARECAAYFAlQhv7IACgkQo/GbGkBRnRr3HgCgjZoD9l9a+WThl5CDaGF1jawx >> PWIAmwWnZwQdytNgAJgbir6V7yCyBt5D >> =NG0H >> -END PGP SIGNATURE- >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/09/25376.php > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/09/25378.php signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [OMPI users] Strange affinity messages with 1.8 and torque 5
FWIW: that warning has been removed from the upcoming 1.8.3 release On Sep 23, 2014, at 11:45 AM, Reutiwrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Am 23.09.2014 um 19:53 schrieb Brock Palen: > >> I found a fun head scratcher, with openmpi 1.8.2 with torque 5 built with >> TM support, on hereto core layouts I get the fun thing: >> mpirun -report-bindings hostname< Works > > And you get 64 lines of output? > > >> mpirun -report-bindings -np 64 hostname <- Wat? >> -- >> A request was made to bind to that would result in binding more >> processes than cpus on a resource: >> >> Bind to: CORE >> Node:nyx5518 >> #processes: 2 >> #cpus: 1 >> >> You can override this protection by adding the "overload-allowed" >> option to your binding directive. >> -- > > How many cores are physically installed on this machine - two as mentioned > above? > > - -- Reuti > > >> I ran with --oversubscribed and got the expected host list, which matched >> $PBS_NODEFILE and was 64 entires long: >> >> mpirun -overload-allowed -report-bindings -np 64 --oversubscribe hostname >> >> What did I do wrong? I'm stumped why one works one doesn't but the one that >> doesn't if your force it appears correct. >> >> >> Brock Palen >> www.umich.edu/~brockp >> CAEN Advanced Computing >> XSEDE Campus Champion >> bro...@umich.edu >> (734)936-1985 >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/09/25375.php > > -BEGIN PGP SIGNATURE- > Version: GnuPG/MacGPG2 v2.0.20 (Darwin) > Comment: GPGTools - http://gpgtools.org > > iEYEARECAAYFAlQhv7IACgkQo/GbGkBRnRr3HgCgjZoD9l9a+WThl5CDaGF1jawx > PWIAmwWnZwQdytNgAJgbir6V7yCyBt5D > =NG0H > -END PGP SIGNATURE- > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/09/25376.php
Re: [OMPI users] Strange affinity messages with 1.8 and torque 5
Hi, Just an idea here. Do you use cpusets within Torque ? Did you request enough cores to torque ? Maxime Boissonneault Le 2014-09-23 13:53, Brock Palen a écrit : I found a fun head scratcher, with openmpi 1.8.2 with torque 5 built with TM support, on hereto core layouts I get the fun thing: mpirun -report-bindings hostname< Works mpirun -report-bindings -np 64 hostname <- Wat? -- A request was made to bind to that would result in binding more processes than cpus on a resource: Bind to: CORE Node:nyx5518 #processes: 2 #cpus: 1 You can override this protection by adding the "overload-allowed" option to your binding directive. -- I ran with --oversubscribed and got the expected host list, which matched $PBS_NODEFILE and was 64 entires long: mpirun -overload-allowed -report-bindings -np 64 --oversubscribe hostname What did I do wrong? I'm stumped why one works one doesn't but the one that doesn't if your force it appears correct. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/09/25375.php -- - Maxime Boissonneault Analyste de calcul - Calcul Québec, Université Laval Ph. D. en physique
Re: [OMPI users] Strange affinity messages with 1.8 and torque 5
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Am 23.09.2014 um 19:53 schrieb Brock Palen: > I found a fun head scratcher, with openmpi 1.8.2 with torque 5 built with TM > support, on hereto core layouts I get the fun thing: > mpirun -report-bindings hostname< Works And you get 64 lines of output? > mpirun -report-bindings -np 64 hostname <- Wat? > -- > A request was made to bind to that would result in binding more > processes than cpus on a resource: > > Bind to: CORE > Node:nyx5518 > #processes: 2 > #cpus: 1 > > You can override this protection by adding the "overload-allowed" > option to your binding directive. > -- How many cores are physically installed on this machine - two as mentioned above? - -- Reuti > I ran with --oversubscribed and got the expected host list, which matched > $PBS_NODEFILE and was 64 entires long: > > mpirun -overload-allowed -report-bindings -np 64 --oversubscribe hostname > > What did I do wrong? I'm stumped why one works one doesn't but the one that > doesn't if your force it appears correct. > > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > XSEDE Campus Champion > bro...@umich.edu > (734)936-1985 > > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/09/25375.php -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.20 (Darwin) Comment: GPGTools - http://gpgtools.org iEYEARECAAYFAlQhv7IACgkQo/GbGkBRnRr3HgCgjZoD9l9a+WThl5CDaGF1jawx PWIAmwWnZwQdytNgAJgbir6V7yCyBt5D =NG0H -END PGP SIGNATURE-
[OMPI users] Strange affinity messages with 1.8 and torque 5
I found a fun head scratcher, with openmpi 1.8.2 with torque 5 built with TM support, on hereto core layouts I get the fun thing: mpirun -report-bindings hostname< Works mpirun -report-bindings -np 64 hostname <- Wat? -- A request was made to bind to that would result in binding more processes than cpus on a resource: Bind to: CORE Node:nyx5518 #processes: 2 #cpus: 1 You can override this protection by adding the "overload-allowed" option to your binding directive. -- I ran with --oversubscribed and got the expected host list, which matched $PBS_NODEFILE and was 64 entires long: mpirun -overload-allowed -report-bindings -np 64 --oversubscribe hostname What did I do wrong? I'm stumped why one works one doesn't but the one that doesn't if your force it appears correct. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 signature.asc Description: Message signed with OpenPGP using GPGMail