Re: [OMPI users] Oversubscribing When Running Locally

2018-01-24 Thread Gilles Gouaillardet
Benjamin,

There was no need to open a new thread with the same title and a
slightly modified question,
it just added some confusion.

If you want to allow oversubscription by default, you can insert the
following line in your
/etc/openmpi-mca-params.conf (update the path if needed)

rmaps_base_oversubscribe = true

FWIW

you can also do that on a per user basis by adding the same line in
$HOME/.openmpi/mca-params.conf

last but not least, that can also be achieved via an environment variable
export OMPI_MCA_rmaps_base_oversubscribe=true

and as already answered, via the command line
mpirun --oversubscribe ...


Cheers,

Gilles

On Thu, Jan 25, 2018 at 7:57 AM, Jeff Squyres (jsquyres)
 wrote:
> Ben --
>
> Did you not see Jeff Hammond's reply earlier today?
>
> https://www.mail-archive.com/users@lists.open-mpi.org//msg31964.html
>
>
>> On Jan 24, 2018, at 5:40 PM, Benjamin Brock  wrote:
>>
>> Recently, when I try to run something locally with OpenMPI with more than 
>> two ranks (I have a dual-core machine), I get the friendly message
>>
>> --
>> There are not enough slots available in the system to satisfy the 3 slots
>> that were requested by the application:
>>   ./kmer_generic_hash
>>
>> Either request fewer slots for your application, or make more slots available
>> for use.
>> --
>>
>> Why is oversubscription now disabled by default when running without a 
>> hostfile?  And how can I turn this off?  Is the recommended way to do this 
>> editing /etc/openmpi/openmpi-default-hostfile?
>>
>> I'm using default OpenMPI 3.0.0 on Arch Linux.
>>
>> Cheers,
>>
>> Ben
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Oversubscribing When Running Locally

2018-01-24 Thread Jeff Squyres (jsquyres)
Ben --

Did you not see Jeff Hammond's reply earlier today?

https://www.mail-archive.com/users@lists.open-mpi.org//msg31964.html


> On Jan 24, 2018, at 5:40 PM, Benjamin Brock  wrote:
> 
> Recently, when I try to run something locally with OpenMPI with more than two 
> ranks (I have a dual-core machine), I get the friendly message
> 
> --
> There are not enough slots available in the system to satisfy the 3 slots
> that were requested by the application:
>   ./kmer_generic_hash
> 
> Either request fewer slots for your application, or make more slots available
> for use.
> --
> 
> Why is oversubscription now disabled by default when running without a 
> hostfile?  And how can I turn this off?  Is the recommended way to do this 
> editing /etc/openmpi/openmpi-default-hostfile?
> 
> I'm using default OpenMPI 3.0.0 on Arch Linux.
> 
> Cheers,
> 
> Ben
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com



___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


[OMPI users] Oversubscribing When Running Locally

2018-01-24 Thread Benjamin Brock
Recently, when I try to run something locally with OpenMPI with more than
two ranks (I have a dual-core machine), I get the friendly message

--
There are not enough slots available in the system to satisfy the 3 slots
that were requested by the application:
  ./kmer_generic_hash

Either request fewer slots for your application, or make more slots
available
for use.
--

Why is oversubscription now disabled by default when running without a
hostfile?  And how can I turn this off?  Is the recommended way to do this
editing /etc/openmpi/openmpi-default-hostfile?

I'm using default OpenMPI 3.0.0 on Arch Linux.

Cheers,

Ben
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Oversubscribing

2018-01-24 Thread Jeff Hammond
mpirun --oversubscribe $OTHER_ARGS

Jeff

On Wed, Jan 24, 2018 at 12:13 PM, Benjamin Brock 
wrote:
>
> Recently, when I try to run something locally with OpenMPI with more than
two ranks (I have a dual-core machine), I get the friendly message
>
> --
> There are not enough slots available in the system to satisfy the 3 slots
> that were requested by the application:
>   ./kmer_generic_hash
>
> Either request fewer slots for your application, or make more slots
available
> for use.
> --
>
> Why is oversubscription now disabled by default when running without a
hostfile?  And how can I turn this off?
>
> I'm using default OpenMPI 3.0.0 on Arch Linux.
>
> Cheers,
>
> Ben
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users




--
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Oversubscribing

2018-01-24 Thread Benjamin Brock
Recently, when I try to run something locally with OpenMPI with more than
two ranks (I have a dual-core machine), I get the friendly message

--
There are not enough slots available in the system to satisfy the 3 slots
that were requested by the application:
  ./kmer_generic_hash

Either request fewer slots for your application, or make more slots
available
for use.
--

Why is oversubscription now disabled by default when running without a
hostfile?  And how can I turn this off?

I'm using default OpenMPI 3.0.0 on Arch Linux.

Cheers,

Ben
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-11 Thread Ralph Castain
You are more than welcome - we really appreciate your spotting the problem!

As a side note: you commented about how this works now even if you don’t set 
the “yield” MCA param. Just as an FYI: we automatically set the “yield” param 
for you when we detect that you are oversubscribing the node as we know this 
will otherwise kill performance. So you can use the MCA param to force us to 
“not yield” in that scenario - otherwise, we will always protect you.

HTH
Ralph


> On Dec 10, 2014, at 11:18 AM, Eric Chamberland 
>  wrote:
> 
>> On 12/10/2014 12:55 PM, Ralph Castain wrote:
>>> Tarball now available on web site
>>> 
>>> 
>>> http://www.open-mpi.org/nightly/v1.8/
 
 I’ll run the tarball generator now so you can try the nightly tarball.
>> 
>> ok, retrieved openmpi-v1.8.3-236-ga21cb20 and it compiled, linked, and
>> executed nicely when oversubscribing.
> 
> Sorry, forgot something:
> 
> thanks a lot!!! ;-)
> 
> Eric
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/12/25955.php



Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-10 Thread Eric Chamberland

On 12/10/2014 12:55 PM, Ralph Castain wrote:

Tarball now available on web site


http://www.open-mpi.org/nightly/v1.8/


I’ll run the tarball generator now so you can try the nightly tarball.


ok, retrieved openmpi-v1.8.3-236-ga21cb20 and it compiled, linked, and
executed nicely when oversubscribing.


Sorry, forgot something:

thanks a lot!!! ;-)

Eric



Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-10 Thread Eric Chamberland

On 12/10/2014 12:55 PM, Ralph Castain wrote:

Tarball now available on web site


http://www.open-mpi.org/nightly/v1.8/
_
_
_
_

On Dec 10, 2014, at 9:40 AM, Ralph Castain > wrote:

I’ll run the tarball generator now so you can try the nightly tarball.


ok, retrieved openmpi-v1.8.3-236-ga21cb20 and it compiled, linked, and 
executed nicely when oversubscribing.


Eric



Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-10 Thread Ralph Castain
Tarball now available on web site


http://www.open-mpi.org/nightly/v1.8/ 


> On Dec 10, 2014, at 9:40 AM, Ralph Castain  wrote:
> 
> I’ll run the tarball generator now so you can try the nightly tarball.
> 
>> On Dec 10, 2014, at 9:20 AM, Eric Chamberland 
>>  wrote:
>> 
>> On 12/10/2014 10:40 AM, Ralph Castain wrote:
>>> You should be able to apply the patch - I don’t think that section of
>>> code differs from what is in the 1.8 repo.
>> 
>> it compiles, link, but gives me a segmentation violation now:
>> 
>> #0  0x7f1827b00e91 in mca_allocator_component_lookup () from 
>> /opt/openmpi-1.8.3_patchyield/lib64/libmpi.so.1
>> #1  0x7f1821aee378 in mca_pml_ob1_component_init () from 
>> /opt/openmpi-1.8_git_opt/lib64/openmpi/mca_pml_ob1.so
>> #2  0x7f182cd72141 in mca_pml_base_select () from 
>> /opt/openmpi-1.8_git_opt/lib64/libmpi.so.0
>> #3  0x7f182cd22610 in ompi_mpi_init () from 
>> /opt/openmpi-1.8_git_opt/lib64/libmpi.so.0
>> #4  0x7f182cd428b6 in PMPI_Init_thread () from 
>> /opt/openmpi-1.8_git_opt/lib64/libmpi.so.0
>> #5  0x7f182f03db9c in PetscInitialize (argc=0x7fffb4ca28ec, 
>> args=0x7fffb4ca28e0, file=0x0, help=0x0) at 
>> /home/mefpp_ericc/petsc-3.5.2/src/sys/objects/pinit.c:781
>> #6  0x7f183492b2bc in PETScInitialisation::PETScInitialisation 
>> (this=0x7fffb4ca33f0, pArgc=0x7fffb4ca28ec, pArgv=0x7fffb4ca28e0) at 
>> /home/mefpp_ericc/GIREF/src/commun/Petsc/PETScInitialisation.cc:122
>> #7  0x004d in main (pArgc=12, pArgv=0x7fffb4ca42b8) at 
>> /home/mefpp_ericc/GIREF/app/src/Test.Parallele/Test.PAScatter.cc:72
>> 
>> 
>>> 
>>> The sha for 1.8.3 can be found on the web site (see right-most column in
>>> table):
>>> 
>>> http://www.open-mpi.org/software/ompi/v1.8/
>> 
>> I see them, but can't find these into the git repo...  I meant: where are 
>> stocked the git SHAs for each releases?  *Forgive-me*, just found the 
>> "ompi-release" repo... sorry...
>> 
>> I would like to see if a4fff57720 (ompi) is included in ompi-release? (It 
>> seems not).
>> 
>> Should it be applied too?
> 
> Don’t know - let me see
> 
>> 
>> Eric
>> 
> 



Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-10 Thread Ralph Castain
I’ll run the tarball generator now so you can try the nightly tarball.

> On Dec 10, 2014, at 9:20 AM, Eric Chamberland 
>  wrote:
> 
> On 12/10/2014 10:40 AM, Ralph Castain wrote:
>> You should be able to apply the patch - I don’t think that section of
>> code differs from what is in the 1.8 repo.
> 
> it compiles, link, but gives me a segmentation violation now:
> 
> #0  0x7f1827b00e91 in mca_allocator_component_lookup () from 
> /opt/openmpi-1.8.3_patchyield/lib64/libmpi.so.1
> #1  0x7f1821aee378 in mca_pml_ob1_component_init () from 
> /opt/openmpi-1.8_git_opt/lib64/openmpi/mca_pml_ob1.so
> #2  0x7f182cd72141 in mca_pml_base_select () from 
> /opt/openmpi-1.8_git_opt/lib64/libmpi.so.0
> #3  0x7f182cd22610 in ompi_mpi_init () from 
> /opt/openmpi-1.8_git_opt/lib64/libmpi.so.0
> #4  0x7f182cd428b6 in PMPI_Init_thread () from 
> /opt/openmpi-1.8_git_opt/lib64/libmpi.so.0
> #5  0x7f182f03db9c in PetscInitialize (argc=0x7fffb4ca28ec, 
> args=0x7fffb4ca28e0, file=0x0, help=0x0) at 
> /home/mefpp_ericc/petsc-3.5.2/src/sys/objects/pinit.c:781
> #6  0x7f183492b2bc in PETScInitialisation::PETScInitialisation 
> (this=0x7fffb4ca33f0, pArgc=0x7fffb4ca28ec, pArgv=0x7fffb4ca28e0) at 
> /home/mefpp_ericc/GIREF/src/commun/Petsc/PETScInitialisation.cc:122
> #7  0x004d in main (pArgc=12, pArgv=0x7fffb4ca42b8) at 
> /home/mefpp_ericc/GIREF/app/src/Test.Parallele/Test.PAScatter.cc:72
> 
> 
>> 
>> The sha for 1.8.3 can be found on the web site (see right-most column in
>> table):
>> 
>> http://www.open-mpi.org/software/ompi/v1.8/
> 
> I see them, but can't find these into the git repo...  I meant: where are 
> stocked the git SHAs for each releases?  *Forgive-me*, just found the 
> "ompi-release" repo... sorry...
> 
> I would like to see if a4fff57720 (ompi) is included in ompi-release? (It 
> seems not).
> 
> Should it be applied too?

Don’t know - let me see

> 
> Eric
> 



Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-10 Thread Eric Chamberland

On 12/10/2014 10:40 AM, Ralph Castain wrote:

You should be able to apply the patch - I don’t think that section of
code differs from what is in the 1.8 repo.


it compiles, link, but gives me a segmentation violation now:

#0  0x7f1827b00e91 in mca_allocator_component_lookup () from 
/opt/openmpi-1.8.3_patchyield/lib64/libmpi.so.1
#1  0x7f1821aee378 in mca_pml_ob1_component_init () from 
/opt/openmpi-1.8_git_opt/lib64/openmpi/mca_pml_ob1.so
#2  0x7f182cd72141 in mca_pml_base_select () from 
/opt/openmpi-1.8_git_opt/lib64/libmpi.so.0
#3  0x7f182cd22610 in ompi_mpi_init () from 
/opt/openmpi-1.8_git_opt/lib64/libmpi.so.0
#4  0x7f182cd428b6 in PMPI_Init_thread () from 
/opt/openmpi-1.8_git_opt/lib64/libmpi.so.0
#5  0x7f182f03db9c in PetscInitialize (argc=0x7fffb4ca28ec, 
args=0x7fffb4ca28e0, file=0x0, help=0x0) at 
/home/mefpp_ericc/petsc-3.5.2/src/sys/objects/pinit.c:781
#6  0x7f183492b2bc in PETScInitialisation::PETScInitialisation 
(this=0x7fffb4ca33f0, pArgc=0x7fffb4ca28ec, pArgv=0x7fffb4ca28e0) at 
/home/mefpp_ericc/GIREF/src/commun/Petsc/PETScInitialisation.cc:122
#7  0x004d in main (pArgc=12, pArgv=0x7fffb4ca42b8) at 
/home/mefpp_ericc/GIREF/app/src/Test.Parallele/Test.PAScatter.cc:72





The sha for 1.8.3 can be found on the web site (see right-most column in
table):

http://www.open-mpi.org/software/ompi/v1.8/


I see them, but can't find these into the git repo...  I meant: where 
are stocked the git SHAs for each releases?  *Forgive-me*, just found 
the "ompi-release" repo... sorry...


I would like to see if a4fff57720 (ompi) is included in ompi-release? 
(It seems not).


Should it be applied too?

Eric



Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-10 Thread Ralph Castain
You should be able to apply the patch - I don’t think that section of code 
differs from what is in the 1.8 repo.

The sha for 1.8.3 can be found on the web site (see right-most column in table):

http://www.open-mpi.org/software/ompi/v1.8/ 



> On Dec 10, 2014, at 7:35 AM, Eric Chamberland 
>  wrote:
> 
> Hi Nathan,
> 
> I pulled your commit  d0da29351f9 and tested it against our example.
> 
> It now works perfectly.  Strangely, I can even unset 
> "OMPI_MCA_mpi_yield_when_idle=1" and it doesn't seems to last longer.
> 
> Can I apply the patch to a fresh "1.8.3" and it should work?
> 
> Other question: how can I retrieve the SHA for 1.8.3?  (Should they be tagged 
> in the repository? Is it normal if I just see a "dev" tag??)
> 
> Thanks,
> 
> Eric
> 
> 
> On 12/09/2014 04:19 PM, Nathan Hjelm wrote:
>> 
>> yield when idle is broken on 1.8. Fixing now.
>> 
>> -Nathan
>> 
>> On Tue, Dec 09, 2014 at 01:02:08PM -0800, Ralph Castain wrote:
>>> Hmmm….well, it looks like we are doing the right thing and running unbound 
>>> when oversubscribed like this. I don’t have any brilliant idea why it would 
>>> be running so slowly in that situation when compared with 1.6.5 - it could 
>>> be that yield-when-idle is borked. I’ll try to dig into that notion a bit.
>>> 
>>> 
 On Dec 9, 2014, at 10:39 AM, Eric Chamberland 
  wrote:
 
 Hi again,
 
 I sorted and "seded" (cat outpout.1.00 |sed 's/default/default 
 value/g'|sed 's/true/1/g' |sed 's/false/0/g') the output.1.00 file from:
 
 mpirun --output-filename output -mca mpi_show_mca_params all 
 --report-bindings -np 32 myprog
 
 between a launch with 165 vs 183.
 
 The diff may be interesting but I can't interpret everything that is 
 written...
 
 The files are attached...
 
 Thanks,
 
 Eric
 
 On 12/09/2014 01:02 PM, Eric Chamberland wrote:
> On 12/09/2014 12:24 PM, Ralph Castain wrote:
>> Can you provide an example cmd line you use to launch one of these
>> tests using 1.8.3? Some of the options changed between the 1.6 and 1.8
>> series, and we bind by default in 1.8 - the combination may be causing
>> you a problem.
> 
> I very simply launch:
> 
> "mpirun -np 32 myprog"
> 
> Maybe the result of "-mca mpi_show_mca_params all" would be insightful?
> 
> Eric
> 
>> 
>> 
>>> On Dec 9, 2014, at 9:14 AM, Eric Chamberland
>>>  wrote:
>>> 
>>> Hi,
>>> 
>>> we were used to do oversubscribing just to do code validation in
>>> nightly automated parallel runs of our code.
>>> 
>>> I just compiled openmpi 1.8.3 and launched the whole suit of
>>> sequential/parallel tests and noticed a *major* slowdown in
>>> oversubscribed parallel tests with 1.8.3 compared to 1.6.5.
>>> 
>>> For example, on my computer (2 cpu), a validation test of 64
>>> processes launched with 1.8.3 took 1500 seconds (~29 minutes) to
>>> execute, while the very same test compiled with 1.6.5 took only 7.4
>>> seconds!
>>> 
>>> To have this result with 1.6.5 we had to set the variable
>>> "OMPI_MCA_mpi_yield_when_idle=1", but it seems to have no effects in
>>> 1.8.3 when I launch more processes than number of core in my
>>> computer, even if it is still mentioned to work (see
>>> http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded).
>>> However, when I launch with fewer processes than number of core, then
>>> it is faster without "OMPI_MCA_mpi_yield_when_idle=1", which is the
>>> same behavior in 1.6.5.
>>> 
>>> I tried to launch with a host file like this:
>>> 
>>> localhost slots=2
>>> 
>>> but it changed nothing...
>>> 
>>> What do I do wrong?
>>> 
>>> Is it possible to retrieve "performances" of 1.6.5 for oversubscription?
>>> 
>>> Is there a compilation option that I have to enable in 1.8.3?
>>> 
>>> Here are the config.log and "ompi_info --all" files for both versions
>>> of mpi:
>>> 
>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz
>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.183.log.gz
>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.165.txt.gz
>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.183.txt.gz
>>> 
>>> Thanks,
>>> 
>>> Eric
>>> 
>>> 
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2014/12/25936.php
>> 
>> ___
>> users mailing list
>> 

Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-10 Thread Eric Chamberland

Hi Nathan,

I pulled your commit  d0da29351f9 and tested it against our example.

It now works perfectly.  Strangely, I can even unset 
"OMPI_MCA_mpi_yield_when_idle=1" and it doesn't seems to last longer.


Can I apply the patch to a fresh "1.8.3" and it should work?

Other question: how can I retrieve the SHA for 1.8.3?  (Should they be 
tagged in the repository? Is it normal if I just see a "dev" tag??)


Thanks,

Eric


On 12/09/2014 04:19 PM, Nathan Hjelm wrote:


yield when idle is broken on 1.8. Fixing now.

-Nathan

On Tue, Dec 09, 2014 at 01:02:08PM -0800, Ralph Castain wrote:

Hmmm….well, it looks like we are doing the right thing and running unbound when 
oversubscribed like this. I don’t have any brilliant idea why it would be 
running so slowly in that situation when compared with 1.6.5 - it could be that 
yield-when-idle is borked. I’ll try to dig into that notion a bit.



On Dec 9, 2014, at 10:39 AM, Eric Chamberland 
 wrote:

Hi again,

I sorted and "seded" (cat outpout.1.00 |sed 's/default/default value/g'|sed 
's/true/1/g' |sed 's/false/0/g') the output.1.00 file from:

mpirun --output-filename output -mca mpi_show_mca_params all --report-bindings 
-np 32 myprog

between a launch with 165 vs 183.

The diff may be interesting but I can't interpret everything that is written...

The files are attached...

Thanks,

Eric

On 12/09/2014 01:02 PM, Eric Chamberland wrote:

On 12/09/2014 12:24 PM, Ralph Castain wrote:

Can you provide an example cmd line you use to launch one of these
tests using 1.8.3? Some of the options changed between the 1.6 and 1.8
series, and we bind by default in 1.8 - the combination may be causing
you a problem.


I very simply launch:

"mpirun -np 32 myprog"

Maybe the result of "-mca mpi_show_mca_params all" would be insightful?

Eric





On Dec 9, 2014, at 9:14 AM, Eric Chamberland
 wrote:

Hi,

we were used to do oversubscribing just to do code validation in
nightly automated parallel runs of our code.

I just compiled openmpi 1.8.3 and launched the whole suit of
sequential/parallel tests and noticed a *major* slowdown in
oversubscribed parallel tests with 1.8.3 compared to 1.6.5.

For example, on my computer (2 cpu), a validation test of 64
processes launched with 1.8.3 took 1500 seconds (~29 minutes) to
execute, while the very same test compiled with 1.6.5 took only 7.4
seconds!

To have this result with 1.6.5 we had to set the variable
"OMPI_MCA_mpi_yield_when_idle=1", but it seems to have no effects in
1.8.3 when I launch more processes than number of core in my
computer, even if it is still mentioned to work (see
http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded).
However, when I launch with fewer processes than number of core, then
it is faster without "OMPI_MCA_mpi_yield_when_idle=1", which is the
same behavior in 1.6.5.

I tried to launch with a host file like this:

localhost slots=2

but it changed nothing...

What do I do wrong?

Is it possible to retrieve "performances" of 1.6.5 for oversubscription?

Is there a compilation option that I have to enable in 1.8.3?

Here are the config.log and "ompi_info --all" files for both versions
of mpi:

http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/config.183.log.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.165.txt.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.183.txt.gz

Thanks,

Eric




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/12/25936.php


___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/12/25938.php



___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/12/25940.php





___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/12/25942.php




Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-09 Thread Eric Chamberland

On 12/09/2014 04:19 PM, Nathan Hjelm wrote:


yield when idle is broken on 1.8. Fixing now.


ok, thanks a lot!  will wait for the fix!

Eric



Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-09 Thread Nathan Hjelm

yield when idle is broken on 1.8. Fixing now.

-Nathan

On Tue, Dec 09, 2014 at 01:02:08PM -0800, Ralph Castain wrote:
> Hmmm….well, it looks like we are doing the right thing and running unbound 
> when oversubscribed like this. I don’t have any brilliant idea why it would 
> be running so slowly in that situation when compared with 1.6.5 - it could be 
> that yield-when-idle is borked. I’ll try to dig into that notion a bit.
> 
> 
> > On Dec 9, 2014, at 10:39 AM, Eric Chamberland 
> >  wrote:
> > 
> > Hi again,
> > 
> > I sorted and "seded" (cat outpout.1.00 |sed 's/default/default value/g'|sed 
> > 's/true/1/g' |sed 's/false/0/g') the output.1.00 file from:
> > 
> > mpirun --output-filename output -mca mpi_show_mca_params all 
> > --report-bindings -np 32 myprog
> > 
> > between a launch with 165 vs 183.
> > 
> > The diff may be interesting but I can't interpret everything that is 
> > written...
> > 
> > The files are attached...
> > 
> > Thanks,
> > 
> > Eric
> > 
> > On 12/09/2014 01:02 PM, Eric Chamberland wrote:
> >> On 12/09/2014 12:24 PM, Ralph Castain wrote:
> >>> Can you provide an example cmd line you use to launch one of these
> >>> tests using 1.8.3? Some of the options changed between the 1.6 and 1.8
> >>> series, and we bind by default in 1.8 - the combination may be causing
> >>> you a problem.
> >> 
> >> I very simply launch:
> >> 
> >> "mpirun -np 32 myprog"
> >> 
> >> Maybe the result of "-mca mpi_show_mca_params all" would be insightful?
> >> 
> >> Eric
> >> 
> >>> 
> >>> 
>  On Dec 9, 2014, at 9:14 AM, Eric Chamberland
>   wrote:
>  
>  Hi,
>  
>  we were used to do oversubscribing just to do code validation in
>  nightly automated parallel runs of our code.
>  
>  I just compiled openmpi 1.8.3 and launched the whole suit of
>  sequential/parallel tests and noticed a *major* slowdown in
>  oversubscribed parallel tests with 1.8.3 compared to 1.6.5.
>  
>  For example, on my computer (2 cpu), a validation test of 64
>  processes launched with 1.8.3 took 1500 seconds (~29 minutes) to
>  execute, while the very same test compiled with 1.6.5 took only 7.4
>  seconds!
>  
>  To have this result with 1.6.5 we had to set the variable
>  "OMPI_MCA_mpi_yield_when_idle=1", but it seems to have no effects in
>  1.8.3 when I launch more processes than number of core in my
>  computer, even if it is still mentioned to work (see
>  http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded).
>  However, when I launch with fewer processes than number of core, then
>  it is faster without "OMPI_MCA_mpi_yield_when_idle=1", which is the
>  same behavior in 1.6.5.
>  
>  I tried to launch with a host file like this:
>  
>  localhost slots=2
>  
>  but it changed nothing...
>  
>  What do I do wrong?
>  
>  Is it possible to retrieve "performances" of 1.6.5 for oversubscription?
>  
>  Is there a compilation option that I have to enable in 1.8.3?
>  
>  Here are the config.log and "ompi_info --all" files for both versions
>  of mpi:
>  
>  http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz
>  http://www.giref.ulaval.ca/~ericc/ompi_bug/config.183.log.gz
>  http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.165.txt.gz
>  http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.183.txt.gz
>  
>  Thanks,
>  
>  Eric
>  
>  
>  
>  
>  ___
>  users mailing list
>  us...@open-mpi.org
>  Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>  Link to this post:
>  http://www.open-mpi.org/community/lists/users/2014/12/25936.php
> >>> 
> >>> ___
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>> Link to this post:
> >>> http://www.open-mpi.org/community/lists/users/2014/12/25938.php
> >>> 
> >> 
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/users/2014/12/25940.php
> > 
> > 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/12/25942.php


pgppFQdUPCOgO.pgp
Description: PGP signature


Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-09 Thread Ralph Castain
Hmmm….well, it looks like we are doing the right thing and running unbound when 
oversubscribed like this. I don’t have any brilliant idea why it would be 
running so slowly in that situation when compared with 1.6.5 - it could be that 
yield-when-idle is borked. I’ll try to dig into that notion a bit.


> On Dec 9, 2014, at 10:39 AM, Eric Chamberland 
>  wrote:
> 
> Hi again,
> 
> I sorted and "seded" (cat outpout.1.00 |sed 's/default/default value/g'|sed 
> 's/true/1/g' |sed 's/false/0/g') the output.1.00 file from:
> 
> mpirun --output-filename output -mca mpi_show_mca_params all 
> --report-bindings -np 32 myprog
> 
> between a launch with 165 vs 183.
> 
> The diff may be interesting but I can't interpret everything that is 
> written...
> 
> The files are attached...
> 
> Thanks,
> 
> Eric
> 
> On 12/09/2014 01:02 PM, Eric Chamberland wrote:
>> On 12/09/2014 12:24 PM, Ralph Castain wrote:
>>> Can you provide an example cmd line you use to launch one of these
>>> tests using 1.8.3? Some of the options changed between the 1.6 and 1.8
>>> series, and we bind by default in 1.8 - the combination may be causing
>>> you a problem.
>> 
>> I very simply launch:
>> 
>> "mpirun -np 32 myprog"
>> 
>> Maybe the result of "-mca mpi_show_mca_params all" would be insightful?
>> 
>> Eric
>> 
>>> 
>>> 
 On Dec 9, 2014, at 9:14 AM, Eric Chamberland
  wrote:
 
 Hi,
 
 we were used to do oversubscribing just to do code validation in
 nightly automated parallel runs of our code.
 
 I just compiled openmpi 1.8.3 and launched the whole suit of
 sequential/parallel tests and noticed a *major* slowdown in
 oversubscribed parallel tests with 1.8.3 compared to 1.6.5.
 
 For example, on my computer (2 cpu), a validation test of 64
 processes launched with 1.8.3 took 1500 seconds (~29 minutes) to
 execute, while the very same test compiled with 1.6.5 took only 7.4
 seconds!
 
 To have this result with 1.6.5 we had to set the variable
 "OMPI_MCA_mpi_yield_when_idle=1", but it seems to have no effects in
 1.8.3 when I launch more processes than number of core in my
 computer, even if it is still mentioned to work (see
 http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded).
 However, when I launch with fewer processes than number of core, then
 it is faster without "OMPI_MCA_mpi_yield_when_idle=1", which is the
 same behavior in 1.6.5.
 
 I tried to launch with a host file like this:
 
 localhost slots=2
 
 but it changed nothing...
 
 What do I do wrong?
 
 Is it possible to retrieve "performances" of 1.6.5 for oversubscription?
 
 Is there a compilation option that I have to enable in 1.8.3?
 
 Here are the config.log and "ompi_info --all" files for both versions
 of mpi:
 
 http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz
 http://www.giref.ulaval.ca/~ericc/ompi_bug/config.183.log.gz
 http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.165.txt.gz
 http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.183.txt.gz
 
 Thanks,
 
 Eric
 
 
 
 
 ___
 users mailing list
 us...@open-mpi.org
 Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
 Link to this post:
 http://www.open-mpi.org/community/lists/users/2014/12/25936.php
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2014/12/25938.php
>>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/12/25940.php
> 
> 



Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-09 Thread Eric Chamberland

Hi again,

I sorted and "seded" (cat outpout.1.00 |sed 's/default/default 
value/g'|sed 's/true/1/g' |sed 's/false/0/g') the output.1.00 file from:


mpirun --output-filename output -mca mpi_show_mca_params all 
--report-bindings -np 32 myprog


between a launch with 165 vs 183.

The diff may be interesting but I can't interpret everything that is 
written...


The files are attached...

Thanks,

Eric

On 12/09/2014 01:02 PM, Eric Chamberland wrote:

On 12/09/2014 12:24 PM, Ralph Castain wrote:

Can you provide an example cmd line you use to launch one of these
tests using 1.8.3? Some of the options changed between the 1.6 and 1.8
series, and we bind by default in 1.8 - the combination may be causing
you a problem.


I very simply launch:

"mpirun -np 32 myprog"

Maybe the result of "-mca mpi_show_mca_params all" would be insightful?

Eric





On Dec 9, 2014, at 9:14 AM, Eric Chamberland
 wrote:

Hi,

we were used to do oversubscribing just to do code validation in
nightly automated parallel runs of our code.

I just compiled openmpi 1.8.3 and launched the whole suit of
sequential/parallel tests and noticed a *major* slowdown in
oversubscribed parallel tests with 1.8.3 compared to 1.6.5.

For example, on my computer (2 cpu), a validation test of 64
processes launched with 1.8.3 took 1500 seconds (~29 minutes) to
execute, while the very same test compiled with 1.6.5 took only 7.4
seconds!

To have this result with 1.6.5 we had to set the variable
"OMPI_MCA_mpi_yield_when_idle=1", but it seems to have no effects in
1.8.3 when I launch more processes than number of core in my
computer, even if it is still mentioned to work (see
http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded).
However, when I launch with fewer processes than number of core, then
it is faster without "OMPI_MCA_mpi_yield_when_idle=1", which is the
same behavior in 1.6.5.

I tried to launch with a host file like this:

localhost slots=2

but it changed nothing...

What do I do wrong?

Is it possible to retrieve "performances" of 1.6.5 for oversubscription?

Is there a compilation option that I have to enable in 1.8.3?

Here are the config.log and "ompi_info --all" files for both versions
of mpi:

http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/config.183.log.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.165.txt.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.183.txt.gz

Thanks,

Eric




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/12/25936.php


___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/12/25938.php



___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/12/25940.php


 allocator_base_verbose=0 (default value)
 allocator_basic_priority=0 (default value)
 allocator_bucket_num_buckets=30 (default value)
 allocator_bucket_priority=0 (default value)
 allocator= (default value)
 backtrace_base_verbose=0 (default value)
 backtrace= (default value)
 backtrace_execinfo_priority=0 (default value)
 bml_base_verbose=0 (default value)
 bml= (default value)
 bml_r2_priority=0 (default value)
 bml_r2_show_unreach_errors=1 (default value)
 btl_base_exclude= (default value)
 btl_base_include= (default value)
 btl_base_verbose=0 (default value)
 btl_base_warn_component_unused=1 (default value)
 btl= (default value)
 btl_self_bandwidth=100 (default value)
 btl_self_eager_limit=131072 (default value)
 btl_self_exclusivity=65536 (default value)
 btl_self_flags=10 (default value)
 btl_self_free_list_inc=32 (default value)
 btl_self_free_list_max=-1 (default value)
 btl_self_free_list_num=0 (default value)
 btl_self_latency=0 (default value)
 btl_self_max_send_size=262144 (default value)
 btl_self_min_rdma_pipeline_size=0 (default value)
 btl_self_priority=0 (default value)
 btl_self_rdma_pipeline_frag_size=2147483647 (default value)
 btl_self_rdma_pipeline_send_length=2147483647 (default value)
 btl_self_rndv_eager_limit=131072 (default value)
 btl_sm_bandwidth=9000 (default value)
 btl_sm_eager_limit=4096 (default value)
 btl_sm_exclusivity=65535 (default value)
 btl_sm_fifo_lazy_free=120 (default value)
 btl_sm_fifo_size=4096 (default value)
 btl_sm_flags=1 (default value)
 btl_sm_free_list_inc=64 (default value)
 btl_sm_free_list_max=-1 (default value)
 btl_sm_free_list_num=8 (default value)
 btl_sm_have_knem_support=0 (default value)
 btl_sm_knem_dma_min=0 (default value)
 btl_sm_knem_max_simultaneous=0 

Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-09 Thread Eric Chamberland

On 12/09/2014 12:24 PM, Ralph Castain wrote:

Can you provide an example cmd line you use to launch one of these tests using 
1.8.3? Some of the options changed between the 1.6 and 1.8 series, and we bind 
by default in 1.8 - the combination may be causing you a problem.


I very simply launch:

"mpirun -np 32 myprog"

Maybe the result of "-mca mpi_show_mca_params all" would be insightful?

Eric





On Dec 9, 2014, at 9:14 AM, Eric Chamberland  
wrote:

Hi,

we were used to do oversubscribing just to do code validation in nightly 
automated parallel runs of our code.

I just compiled openmpi 1.8.3 and launched the whole suit of 
sequential/parallel tests and noticed a *major* slowdown in oversubscribed 
parallel tests with 1.8.3 compared to 1.6.5.

For example, on my computer (2 cpu), a validation test of 64 processes launched 
with 1.8.3 took 1500 seconds (~29 minutes) to execute, while the very same test 
compiled with 1.6.5 took only 7.4 seconds!

To have this result with 1.6.5 we had to set the variable 
"OMPI_MCA_mpi_yield_when_idle=1", but it seems to have no effects in 1.8.3 when I launch 
more processes than number of core in my computer, even if it is still mentioned to work (see 
http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded). However, when I launch 
with fewer processes than number of core, then it is faster without 
"OMPI_MCA_mpi_yield_when_idle=1", which is the same behavior in 1.6.5.

I tried to launch with a host file like this:

localhost slots=2

but it changed nothing...

What do I do wrong?

Is it possible to retrieve "performances" of 1.6.5 for oversubscription?

Is there a compilation option that I have to enable in 1.8.3?

Here are the config.log and "ompi_info --all" files for both versions of mpi:

http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/config.183.log.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.165.txt.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.183.txt.gz

Thanks,

Eric




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/12/25936.php


___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/12/25938.php





Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-09 Thread Ralph Castain
Not for that many procs - we default to binding to socket for anything more 
than 2 procs

> On Dec 9, 2014, at 9:24 AM, Nathan Hjelm  wrote:
> 
> 
> One thing that changed between 1.6 and 1.8 is the default binding
> policy. Open MPI 1.6 did not bind by default but 1.8 binds to core. You
> can unset the binding policy by adding --bind-to none.
> 
> -Nathan Hjelm
> HPC-5, LANL
> 
> On Tue, Dec 09, 2014 at 12:14:32PM -0500, Eric Chamberland wrote:
>> Hi,
>> 
>> we were used to do oversubscribing just to do code validation in nightly
>> automated parallel runs of our code.
>> 
>> I just compiled openmpi 1.8.3 and launched the whole suit of
>> sequential/parallel tests and noticed a *major* slowdown in oversubscribed
>> parallel tests with 1.8.3 compared to 1.6.5.
>> 
>> For example, on my computer (2 cpu), a validation test of 64 processes
>> launched with 1.8.3 took 1500 seconds (~29 minutes) to execute, while the
>> very same test compiled with 1.6.5 took only 7.4 seconds!
>> 
>> To have this result with 1.6.5 we had to set the variable
>> "OMPI_MCA_mpi_yield_when_idle=1", but it seems to have no effects in 1.8.3
>> when I launch more processes than number of core in my computer, even if it
>> is still mentioned to work (see
>> http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded).
>> However, when I launch with fewer processes than number of core, then it is
>> faster without "OMPI_MCA_mpi_yield_when_idle=1", which is the same behavior
>> in 1.6.5.
>> 
>> I tried to launch with a host file like this:
>> 
>> localhost slots=2
>> 
>> but it changed nothing...
>> 
>> What do I do wrong?
>> 
>> Is it possible to retrieve "performances" of 1.6.5 for oversubscription?
>> 
>> Is there a compilation option that I have to enable in 1.8.3?
>> 
>> Here are the config.log and "ompi_info --all" files for both versions of
>> mpi:
>> 
>> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz
>> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.183.log.gz
>> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.165.txt.gz
>> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.183.txt.gz
>> 
>> Thanks,
>> 
>> Eric
>> 
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/12/25936.php
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/12/25937.php



Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-09 Thread Ralph Castain
Can you provide an example cmd line you use to launch one of these tests using 
1.8.3? Some of the options changed between the 1.6 and 1.8 series, and we bind 
by default in 1.8 - the combination may be causing you a problem.


> On Dec 9, 2014, at 9:14 AM, Eric Chamberland 
>  wrote:
> 
> Hi,
> 
> we were used to do oversubscribing just to do code validation in nightly 
> automated parallel runs of our code.
> 
> I just compiled openmpi 1.8.3 and launched the whole suit of 
> sequential/parallel tests and noticed a *major* slowdown in oversubscribed 
> parallel tests with 1.8.3 compared to 1.6.5.
> 
> For example, on my computer (2 cpu), a validation test of 64 processes 
> launched with 1.8.3 took 1500 seconds (~29 minutes) to execute, while the 
> very same test compiled with 1.6.5 took only 7.4 seconds!
> 
> To have this result with 1.6.5 we had to set the variable 
> "OMPI_MCA_mpi_yield_when_idle=1", but it seems to have no effects in 1.8.3 
> when I launch more processes than number of core in my computer, even if it 
> is still mentioned to work (see 
> http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded). 
> However, when I launch with fewer processes than number of core, then it is 
> faster without "OMPI_MCA_mpi_yield_when_idle=1", which is the same behavior 
> in 1.6.5.
> 
> I tried to launch with a host file like this:
> 
> localhost slots=2
> 
> but it changed nothing...
> 
> What do I do wrong?
> 
> Is it possible to retrieve "performances" of 1.6.5 for oversubscription?
> 
> Is there a compilation option that I have to enable in 1.8.3?
> 
> Here are the config.log and "ompi_info --all" files for both versions of mpi:
> 
> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz
> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.183.log.gz
> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.165.txt.gz
> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.183.txt.gz
> 
> Thanks,
> 
> Eric
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/12/25936.php



Re: [OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-09 Thread Nathan Hjelm

One thing that changed between 1.6 and 1.8 is the default binding
policy. Open MPI 1.6 did not bind by default but 1.8 binds to core. You
can unset the binding policy by adding --bind-to none.

-Nathan Hjelm
HPC-5, LANL

On Tue, Dec 09, 2014 at 12:14:32PM -0500, Eric Chamberland wrote:
> Hi,
> 
> we were used to do oversubscribing just to do code validation in nightly
> automated parallel runs of our code.
> 
> I just compiled openmpi 1.8.3 and launched the whole suit of
> sequential/parallel tests and noticed a *major* slowdown in oversubscribed
> parallel tests with 1.8.3 compared to 1.6.5.
> 
> For example, on my computer (2 cpu), a validation test of 64 processes
> launched with 1.8.3 took 1500 seconds (~29 minutes) to execute, while the
> very same test compiled with 1.6.5 took only 7.4 seconds!
> 
> To have this result with 1.6.5 we had to set the variable
> "OMPI_MCA_mpi_yield_when_idle=1", but it seems to have no effects in 1.8.3
> when I launch more processes than number of core in my computer, even if it
> is still mentioned to work (see
> http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded).
> However, when I launch with fewer processes than number of core, then it is
> faster without "OMPI_MCA_mpi_yield_when_idle=1", which is the same behavior
> in 1.6.5.
> 
> I tried to launch with a host file like this:
> 
> localhost slots=2
> 
> but it changed nothing...
> 
> What do I do wrong?
> 
> Is it possible to retrieve "performances" of 1.6.5 for oversubscription?
> 
> Is there a compilation option that I have to enable in 1.8.3?
> 
> Here are the config.log and "ompi_info --all" files for both versions of
> mpi:
> 
> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz
> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.183.log.gz
> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.165.txt.gz
> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.183.txt.gz
> 
> Thanks,
> 
> Eric
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/12/25936.php


pgptv6WDwJdit.pgp
Description: PGP signature


[OMPI users] Oversubscribing in 1.8.3 vs 1.6.5

2014-12-09 Thread Eric Chamberland

Hi,

we were used to do oversubscribing just to do code validation in nightly 
automated parallel runs of our code.


I just compiled openmpi 1.8.3 and launched the whole suit of 
sequential/parallel tests and noticed a *major* slowdown in 
oversubscribed parallel tests with 1.8.3 compared to 1.6.5.


For example, on my computer (2 cpu), a validation test of 64 processes 
launched with 1.8.3 took 1500 seconds (~29 minutes) to execute, while 
the very same test compiled with 1.6.5 took only 7.4 seconds!


To have this result with 1.6.5 we had to set the variable 
"OMPI_MCA_mpi_yield_when_idle=1", but it seems to have no effects in 
1.8.3 when I launch more processes than number of core in my computer, 
even if it is still mentioned to work (see 
http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded). However, 
when I launch with fewer processes than number of core, then it is 
faster without "OMPI_MCA_mpi_yield_when_idle=1", which is the same 
behavior in 1.6.5.


I tried to launch with a host file like this:

localhost slots=2

but it changed nothing...

What do I do wrong?

Is it possible to retrieve "performances" of 1.6.5 for oversubscription?

Is there a compilation option that I have to enable in 1.8.3?

Here are the config.log and "ompi_info --all" files for both versions of 
mpi:


http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/config.183.log.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.165.txt.gz
http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.183.txt.gz

Thanks,

Eric






Re: [OMPI users] Oversubscribing a subset of a machine's cores

2008-02-11 Thread Torje Henriksen

Hi,

Thanks for the heads up Joseph, you sent me in the right direction.  
Very helpful indeed, although the command that seems to be doing the  
trick on my system is


$taskset -c X ...


Best regards,

Torje Henriksen

On Feb 7, 2008, at 2:47 PM, Joe Landman wrote:


Torje Henriksen wrote:

[...]

Still, all eight cores are being used. I can see why you would want  
to

use all cores, and I can see that oversubscribing a sub-set of the
cores might seem silly.  My question is, is it possible to do what I
want to do without hacking the open mpi code?


Could you get numactl to help you do what you want?  That is, for the
code, somehow tweak the launcher to run

numactl --physcpubind=X ...

or similar?



--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: land...@scalableinformatics.com
web  : http://www.scalableinformatics.com
   http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Oversubscribing a subset of a machine's cores

2008-02-07 Thread Torje Henriksen

Hi,

I have a slightly odd problem, that you might not think is important  
at all. Anyways, here it goes:


I'm using a single eight-core machine. I want to oversubscribe four of  
the cores and leave the other four idle. My approach is to make a  
hostfile:


localhost slot=4 # shouldn't this limit the core count to 4?


and run the command:

$mpirun -np 8 --hostfile my_hostfile ./my_mpiprog

or the command:

$mpirun -np 8 --host localhost,localhost,localhost,localhost ./ 
my_mpiprog



Still, all eight cores are being used. I can see why you would want to  
use all cores, and I can see that oversubscribing a sub-set of the  
cores might seem silly.  My question is, is it possible to do what I  
want to do without hacking the open mpi code?


Guess I just wanted to know is there is a solution I overlooked before  
I start hacking like a madman :)



Thanks

Torje Henriksen