FWIW: I just committed that patch to the 1.8 repo, so it will be in tomorrow’s nightly 1.8 tarball:
http://www.open-mpi.org/nightly/v1.8/ <http://www.open-mpi.org/nightly/v1.8/> > On Dec 10, 2014, at 7:40 AM, Ralph Castain <r...@open-mpi.org> wrote: > > You should be able to apply the patch - I don’t think that section of code > differs from what is in the 1.8 repo. > > The sha for 1.8.3 can be found on the web site (see right-most column in > table): > > http://www.open-mpi.org/software/ompi/v1.8/ > <http://www.open-mpi.org/software/ompi/v1.8/> > > >> On Dec 10, 2014, at 7:35 AM, Eric Chamberland >> <eric.chamberl...@giref.ulaval.ca <mailto:eric.chamberl...@giref.ulaval.ca>> >> wrote: >> >> Hi Nathan, >> >> I pulled your commit d0da29351f9 and tested it against our example. >> >> It now works perfectly. Strangely, I can even unset >> "OMPI_MCA_mpi_yield_when_idle=1" and it doesn't seems to last longer. >> >> Can I apply the patch to a fresh "1.8.3" and it should work? >> >> Other question: how can I retrieve the SHA for 1.8.3? (Should they be >> tagged in the repository? Is it normal if I just see a "dev" tag??) >> >> Thanks, >> >> Eric >> >> >> On 12/09/2014 04:19 PM, Nathan Hjelm wrote: >>> >>> yield when idle is broken on 1.8. Fixing now. >>> >>> -Nathan >>> >>> On Tue, Dec 09, 2014 at 01:02:08PM -0800, Ralph Castain wrote: >>>> Hmmm….well, it looks like we are doing the right thing and running unbound >>>> when oversubscribed like this. I don’t have any brilliant idea why it >>>> would be running so slowly in that situation when compared with 1.6.5 - it >>>> could be that yield-when-idle is borked. I’ll try to dig into that notion >>>> a bit. >>>> >>>> >>>>> On Dec 9, 2014, at 10:39 AM, Eric Chamberland >>>>> <eric.chamberl...@giref.ulaval.ca >>>>> <mailto:eric.chamberl...@giref.ulaval.ca>> wrote: >>>>> >>>>> Hi again, >>>>> >>>>> I sorted and "seded" (cat outpout.1.00 |sed 's/default/default >>>>> value/g'|sed 's/true/1/g' |sed 's/false/0/g') the output.1.00 file from: >>>>> >>>>> mpirun --output-filename output -mca mpi_show_mca_params all >>>>> --report-bindings -np 32 myprog >>>>> >>>>> between a launch with 165 vs 183. >>>>> >>>>> The diff may be interesting but I can't interpret everything that is >>>>> written... >>>>> >>>>> The files are attached... >>>>> >>>>> Thanks, >>>>> >>>>> Eric >>>>> >>>>> On 12/09/2014 01:02 PM, Eric Chamberland wrote: >>>>>> On 12/09/2014 12:24 PM, Ralph Castain wrote: >>>>>>> Can you provide an example cmd line you use to launch one of these >>>>>>> tests using 1.8.3? Some of the options changed between the 1.6 and 1.8 >>>>>>> series, and we bind by default in 1.8 - the combination may be causing >>>>>>> you a problem. >>>>>> >>>>>> I very simply launch: >>>>>> >>>>>> "mpirun -np 32 myprog" >>>>>> >>>>>> Maybe the result of "-mca mpi_show_mca_params all" would be insightful? >>>>>> >>>>>> Eric >>>>>> >>>>>>> >>>>>>> >>>>>>>> On Dec 9, 2014, at 9:14 AM, Eric Chamberland >>>>>>>> <eric.chamberl...@giref.ulaval.ca >>>>>>>> <mailto:eric.chamberl...@giref.ulaval.ca>> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> we were used to do oversubscribing just to do code validation in >>>>>>>> nightly automated parallel runs of our code. >>>>>>>> >>>>>>>> I just compiled openmpi 1.8.3 and launched the whole suit of >>>>>>>> sequential/parallel tests and noticed a *major* slowdown in >>>>>>>> oversubscribed parallel tests with 1.8.3 compared to 1.6.5. >>>>>>>> >>>>>>>> For example, on my computer (2 cpu), a validation test of 64 >>>>>>>> processes launched with 1.8.3 took 1500 seconds (~29 minutes) to >>>>>>>> execute, while the very same test compiled with 1.6.5 took only 7.4 >>>>>>>> seconds! >>>>>>>> >>>>>>>> To have this result with 1.6.5 we had to set the variable >>>>>>>> "OMPI_MCA_mpi_yield_when_idle=1", but it seems to have no effects in >>>>>>>> 1.8.3 when I launch more processes than number of core in my >>>>>>>> computer, even if it is still mentioned to work (see >>>>>>>> http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded >>>>>>>> >>>>>>>> <http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded>). >>>>>>>> However, when I launch with fewer processes than number of core, then >>>>>>>> it is faster without "OMPI_MCA_mpi_yield_when_idle=1", which is the >>>>>>>> same behavior in 1.6.5. >>>>>>>> >>>>>>>> I tried to launch with a host file like this: >>>>>>>> >>>>>>>> localhost slots=2 >>>>>>>> >>>>>>>> but it changed nothing... >>>>>>>> >>>>>>>> What do I do wrong? >>>>>>>> >>>>>>>> Is it possible to retrieve "performances" of 1.6.5 for >>>>>>>> oversubscription? >>>>>>>> >>>>>>>> Is there a compilation option that I have to enable in 1.8.3? >>>>>>>> >>>>>>>> Here are the config.log and "ompi_info --all" files for both versions >>>>>>>> of mpi: >>>>>>>> >>>>>>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz >>>>>>>> <http://www.giref.ulaval.ca/~ericc/ompi_bug/config.165.log.gz> >>>>>>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.183.log.gz >>>>>>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.165.txt.gz >>>>>>>> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.183.txt.gz >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Eric >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> Link to this post: >>>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25936.php >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25938.php >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25940.php >>>>> >>>>> <output.1.00.filtre.165.sorted><output.1.00.filtre.183.sorted.seded> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/12/25942.php >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/12/25947.php >> <http://www.open-mpi.org/community/lists/users/2014/12/25947.php>