FWIW: I just committed that patch to the 1.8 repo, so it will be in tomorrow’s 
nightly 1.8 tarball:

http://www.open-mpi.org/nightly/v1.8/ <http://www.open-mpi.org/nightly/v1.8/>


> On Dec 10, 2014, at 7:40 AM, Ralph Castain <r...@open-mpi.org> wrote:
> 
> You should be able to apply the patch - I don’t think that section of code 
> differs from what is in the 1.8 repo.
> 
> The sha for 1.8.3 can be found on the web site (see right-most column in 
> table):
> 
> http://www.open-mpi.org/software/ompi/v1.8/ 
> <http://www.open-mpi.org/software/ompi/v1.8/>
> 
> 
>> On Dec 10, 2014, at 7:35 AM, Eric Chamberland 
>> <eric.chamberl...@giref.ulaval.ca <mailto:eric.chamberl...@giref.ulaval.ca>> 
>> wrote:
>> 
>> Hi Nathan,
>> 
>> I pulled your commit  d0da29351f9 and tested it against our example.
>> 
>> It now works perfectly.  Strangely, I can even unset 
>> "OMPI_MCA_mpi_yield_when_idle=1" and it doesn't seems to last longer.
>> 
>> Can I apply the patch to a fresh "1.8.3" and it should work?
>> 
>> Other question: how can I retrieve the SHA for 1.8.3?  (Should they be 
>> tagged in the repository? Is it normal if I just see a "dev" tag??)
>> 
>> Thanks,
>> 
>> Eric
>> 
>> 
>> On 12/09/2014 04:19 PM, Nathan Hjelm wrote:
>>> 
>>> yield when idle is broken on 1.8. Fixing now.
>>> 
>>> -Nathan
>>> 
>>> On Tue, Dec 09, 2014 at 01:02:08PM -0800, Ralph Castain wrote:
>>>> Hmmm….well, it looks like we are doing the right thing and running unbound 
>>>> when oversubscribed like this. I don’t have any brilliant idea why it 
>>>> would be running so slowly in that situation when compared with 1.6.5 - it 
>>>> could be that yield-when-idle is borked. I’ll try to dig into that notion 
>>>> a bit.
>>>> 
>>>> 
>>>>> On Dec 9, 2014, at 10:39 AM, Eric Chamberland 
>>>>> <eric.chamberl...@giref.ulaval.ca 
>>>>> <mailto:eric.chamberl...@giref.ulaval.ca>> wrote:
>>>>> 
>>>>> Hi again,
>>>>> 
>>>>> I sorted and "seded" (cat outpout.1.00 |sed 's/default/default 
>>>>> value/g'|sed 's/true/1/g' |sed 's/false/0/g') the output.1.00 file from:
>>>>> 
>>>>> mpirun --output-filename output -mca mpi_show_mca_params all 
>>>>> --report-bindings -np 32 myprog
>>>>> 
>>>>> between a launch with 165 vs 183.
>>>>> 
>>>>> The diff may be interesting but I can't interpret everything that is 
>>>>> written...
>>>>> 
>>>>> The files are attached...
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Eric
>>>>> 
>>>>> On 12/09/2014 01:02 PM, Eric Chamberland wrote:
>>>>>> On 12/09/2014 12:24 PM, Ralph Castain wrote:
>>>>>>> Can you provide an example cmd line you use to launch one of these
>>>>>>> tests using 1.8.3? Some of the options changed between the 1.6 and 1.8
>>>>>>> series, and we bind by default in 1.8 - the combination may be causing
>>>>>>> you a problem.
>>>>>> 
>>>>>> I very simply launch:
>>>>>> 
>>>>>> "mpirun -np 32 myprog"
>>>>>> 
>>>>>> Maybe the result of "-mca mpi_show_mca_params all" would be insightful?
>>>>>> 
>>>>>> Eric
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Dec 9, 2014, at 9:14 AM, Eric Chamberland
>>>>>>>> <eric.chamberl...@giref.ulaval.ca 
>>>>>>>> <mailto:eric.chamberl...@giref.ulaval.ca>> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> we were used to do oversubscribing just to do code validation in
>>>>>>>> nightly automated parallel runs of our code.
>>>>>>>> 
>>>>>>>> I just compiled openmpi 1.8.3 and launched the whole suit of
>>>>>>>> sequential/parallel tests and noticed a *major* slowdown in
>>>>>>>> oversubscribed parallel tests with 1.8.3 compared to 1.6.5.
>>>>>>>> 
>>>>>>>> For example, on my computer (2 cpu), a validation test of 64
>>>>>>>> processes launched with 1.8.3 took 1500 seconds (~29 minutes) to
>>>>>>>> execute, while the very same test compiled with 1.6.5 took only 7.4
>>>>>>>> seconds!
>>>>>>>> 
>>>>>>>> To have this result with 1.6.5 we had to set the variable
>>>>>>>> "OMPI_MCA_mpi_yield_when_idle=1", but it seems to have no effects in
>>>>>>>> 1.8.3 when I launch more processes than number of core in my
>>>>>>>> computer, even if it is still mentioned to work (see
>>>>>>>> http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded
>>>>>>>>  
>>>>>>>> <http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded>).
>>>>>>>> However, when I launch with fewer processes than number of core, then
>>>>>>>> it is faster without "OMPI_MCA_mpi_yield_when_idle=1", which is the
>>>>>>>> same behavior in 1.6.5.
>>>>>>>> 
>>>>>>>> I tried to launch with a host file like this:
>>>>>>>> 
>>>>>>>> localhost slots=2
>>>>>>>> 
>>>>>>>> but it changed nothing...
>>>>>>>> 
>>>>>>>> What do I do wrong?
>>>>>>>> 
>>>>>>>> Is it possible to retrieve "performances" of 1.6.5 for 
>>>>>>>> oversubscription?
>>>>>>>> 
>>>>>>>> Is there a compilation option that I have to enable in 1.8.3?
>>>>>>>> 
>>>>>>>> Here are the config.log and "ompi_info --all" files for both versions
>>>>>>>> of mpi:
>>>>>>>> 
>>>>>>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz 
>>>>>>>> <http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz>
>>>>>>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.183.log.gz
>>>>>>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.165.txt.gz
>>>>>>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.183.txt.gz
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>>> Eric
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this post:
>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25936.php
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25938.php
>>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25940.php
>>>>> 
>>>>> <output.1.00.filtre.165.sorted><output.1.00.filtre.183.sorted.seded>
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2014/12/25942.php
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/12/25947.php 
>> <http://www.open-mpi.org/community/lists/users/2014/12/25947.php>

Reply via email to