Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/09/13 14:14, Christopher Samuel wrote: > However, modifying the test program confirms that variable is getting > propagated as expected with both mpirun and srun for 1.6.5 and the 1.7 > snapshot. :-( Investigating further by setting: export OMPI_MCA_orte_report_bindings=1 export SLURM_CPU_BIND=core export SLURM_CPU_BIND_VERBOSE=verbose reveals that only OMPI 1.6.5 with mpirun reports bindings being set (see below). We cannot understand why Slurm doesn't *appear* to be setting bindings as we have the correct settings according to the documentation. Whilst it may explain the difference between 1.6.5 mpirun and srun it doesn't to explain why the 1.7 snapshot is so much better as you'd expect them to be hurt in the same way. ==OPENMPI 1.6.5== ==mpirun== [barcoo003:03633] System has detected external process binding to cores 0001 [barcoo003:03633] MCW rank 0 bound to socket 0[core 0]: [B] [barcoo004:04504] MCW rank 1 bound to socket 0[core 0]: [B] Hello, World, I am 0 of 2 on host barcoo003 from app number 0 universe size 2 universe envar 2 Hello, World, I am 1 of 2 on host barcoo004 from app number 0 universe size 2 universe envar 2 ==srun== Hello, World, I am 0 of 2 on host barcoo003 from app number 1 universe size 2 universe envar NULL Hello, World, I am 1 of 2 on host barcoo004 from app number 1 universe size 2 universe envar NULL = ==OPENMPI 1.7.3== DANGER: YOU ARE LOADING A TEST VERSION OF OPENMPI. THIS MAY BE BAD. ==mpirun== Hello, World, I am 0 of 2 on host barcoo003 from app number 0 universe size 2 universe envar 2 Hello, World, I am 1 of 2 on host barcoo004 from app number 0 universe size 2 universe envar 2 ==srun== Hello, World, I am 0 of 2 on host barcoo003 from app number 0 universe size 2 universe envar NULL Hello, World, I am 1 of 2 on host barcoo004 from app number 0 universe size 2 universe envar NULL = - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlIpcxcACgkQO2KABBYQAh/wdQCfR4q7DfGqJVSU0O3BmgXqAn8w HsEAn3po0xaxB0+ywejWgSjQ385da7Pa =T3w4 -END PGP SIGNATURE-
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/09/13 00:23, Hjelm, Nathan T wrote: > I assume that process binding is enabled for both mpirun and srun? > If not that could account for a difference between the runtimes. You raise an interesting point, we have been doing that with: [samuel@barcoo ~]$ module show openmpi 2>&1 | grep binding setenv OMPI_MCA_orte_process_binding core However, modifying the test program confirms that variable is getting propagated as expected with both mpirun and srun for 1.6.5 and the 1.7 snapshot. :-( cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlIpVp4ACgkQO2KABBYQAh88rQCggOZkAjPV+/1PX2R9auuij+1M jdsAn17nDCoubkdvCsLRKozqGEYWjOY1 =RaoK -END PGP SIGNATURE-
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
I assume that process binding is enabled for both mpirun and srun? If not that could account for a difference between the runtimes. -Nathan From: devel [devel-boun...@open-mpi.org] on behalf of Ralph Castain [r...@open-mpi.org] Sent: Thursday, September 05, 2013 8:19 AM To: Open MPI Developers Subject: Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun No, nothing significant there. Afraid I've exhausted my thoughts on why the difference might exist. Anyone else care to chime in? On Sep 4, 2013, at 9:34 PM, Christopher Samuel <sam...@unimelb.edu.au> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hi Ralph, > > On 05/09/13 12:50, Ralph Castain wrote: > >> Jeff and I were looking at a similar issue today and suddenly >> realized that the mappings were different - i.e., what ranks are >> on what nodes differs depending on how you launch. You might want >> to check if that's the issue here as well. Just launch the >> attached program using mpirun vs srun and check to see if the maps >> are the same or not. > > Very interesting, the ranks to node mappings are identical in all > cases (mpirun and srun for 1.6.5 and my test 1.7.3 snapshot) but what > is different is as follows. > > > For the 1.6.5 build I see mpirun report: > > number 0 universe size 64 universe envar 64 > > whereas srun report: > > number 1 universe size 64 universe envar NULL > > > > For the 1.7.3 snapshot both report "number 0" so the only difference > there is that mpirun has: > > envar 64 > > whereas srun has: > > envar NULL > > > Are these differences significant? > > I'm intrigued that the problem child (srun 1.6.5) is the only one > where number is 1. > > All the best, > Chris > - -- > Christopher SamuelSenior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iEYEARECAAYFAlIoCesACgkQO2KABBYQAh+0NACeK9uyDk3UZerufAopuQRxhR/T > 4skAmwS/X+8jNOPlGt4H/t5yRK8vmMer > =8TGu > -END PGP SIGNATURE- > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
No, nothing significant there. Afraid I've exhausted my thoughts on why the difference might exist. Anyone else care to chime in? On Sep 4, 2013, at 9:34 PM, Christopher Samuelwrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hi Ralph, > > On 05/09/13 12:50, Ralph Castain wrote: > >> Jeff and I were looking at a similar issue today and suddenly >> realized that the mappings were different - i.e., what ranks are >> on what nodes differs depending on how you launch. You might want >> to check if that's the issue here as well. Just launch the >> attached program using mpirun vs srun and check to see if the maps >> are the same or not. > > Very interesting, the ranks to node mappings are identical in all > cases (mpirun and srun for 1.6.5 and my test 1.7.3 snapshot) but what > is different is as follows. > > > For the 1.6.5 build I see mpirun report: > > number 0 universe size 64 universe envar 64 > > whereas srun report: > > number 1 universe size 64 universe envar NULL > > > > For the 1.7.3 snapshot both report "number 0" so the only difference > there is that mpirun has: > > envar 64 > > whereas srun has: > > envar NULL > > > Are these differences significant? > > I'm intrigued that the problem child (srun 1.6.5) is the only one > where number is 1. > > All the best, > Chris > - -- > Christopher SamuelSenior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iEYEARECAAYFAlIoCesACgkQO2KABBYQAh+0NACeK9uyDk3UZerufAopuQRxhR/T > 4skAmwS/X+8jNOPlGt4H/t5yRK8vmMer > =8TGu > -END PGP SIGNATURE- > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Ralph, On 05/09/13 12:50, Ralph Castain wrote: > Jeff and I were looking at a similar issue today and suddenly > realized that the mappings were different - i.e., what ranks are > on what nodes differs depending on how you launch. You might want > to check if that's the issue here as well. Just launch the > attached program using mpirun vs srun and check to see if the maps > are the same or not. Very interesting, the ranks to node mappings are identical in all cases (mpirun and srun for 1.6.5 and my test 1.7.3 snapshot) but what is different is as follows. For the 1.6.5 build I see mpirun report: number 0 universe size 64 universe envar 64 whereas srun report: number 1 universe size 64 universe envar NULL For the 1.7.3 snapshot both report "number 0" so the only difference there is that mpirun has: envar 64 whereas srun has: envar NULL Are these differences significant? I'm intrigued that the problem child (srun 1.6.5) is the only one where number is 1. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlIoCesACgkQO2KABBYQAh+0NACeK9uyDk3UZerufAopuQRxhR/T 4skAmwS/X+8jNOPlGt4H/t5yRK8vmMer =8TGu -END PGP SIGNATURE-
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
Jeff and I were looking at a similar issue today and suddenly realized that the mappings were different - i.e., what ranks are on what nodes differs depending on how you launch. You might want to check if that's the issue here as well. Just launch the attached program using mpirun vs srun and check to see if the maps are the same or not. Ralph hello_nodename.c Description: Binary data On Sep 4, 2013, at 7:15 PM, Christopher Samuelwrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 04/09/13 18:33, George Bosilca wrote: > >> You can confirm that the slowdown happen during the MPI >> initialization stages by profiling the application (especially the >> MPI_Init call). > > NAMD helpfully prints benchmark and timing numbers during the initial > part of the simulation, so here's what they say. For both seconds > per step and days per nanosecond of simulation less is better. > > I've included the benchmark numbers (every 100 steps or so from the > start) and the final timing number after 25000 steps. It looks like > to me (as a sysadmin and not an MD person) that the final timing > number includes CPU time in seconds per step and wallclock time in > seconds per step. > > 64 cores over 10 nodes: > > OMPI 1.7.3a1r29103 mpirun > > Info: Benchmark time: 64 CPUs 0.410424 s/step 2.37514 days/ns 909.57 MB memory > Info: Benchmark time: 64 CPUs 0.392106 s/step 2.26913 days/ns 909.57 MB memory > Info: Benchmark time: 64 CPUs 0.313136 s/step 1.81213 days/ns 909.57 MB memory > Info: Benchmark time: 64 CPUs 0.316792 s/step 1.83329 days/ns 909.57 MB memory > Info: Benchmark time: 64 CPUs 0.313867 s/step 1.81636 days/ns 909.57 MB memory > > TIMING: 25000 CPU: 8247.2, 0.330157/step Wall: 8247.2, 0.330157/step, > 0.0229276 hours remaining, 921.894531 MB of memory in use. > > OMPI 1.7.3a1r29103 srun > > Info: Benchmark time: 64 CPUs 0.341967 s/step 1.97897 days/ns 903.883 MB > memory > Info: Benchmark time: 64 CPUs 0.339644 s/step 1.96553 days/ns 903.883 MB > memory > Info: Benchmark time: 64 CPUs 0.284424 s/step 1.64597 days/ns 903.883 MB > memory > Info: Benchmark time: 64 CPUs 0.28115 s/step 1.62702 days/ns 903.883 MB memory > Info: Benchmark time: 64 CPUs 0.279536 s/step 1.61769 days/ns 903.883 MB > memory > > TIMING: 25000 CPU: 7390.15, 0.296/step Wall: 7390.15, 0.296/step, 0.020 > hours remaining, 915.746094 MB of memory in use. > > > 64 cores over 18 nodes: > > OMPI 1.6.5 mpirun > > Info: Benchmark time: 64 CPUs 0.366327 s/step 2.11995 days/ns 939.527 MB > memory > Info: Benchmark time: 64 CPUs 0.359805 s/step 2.0822 days/ns 939.527 MB memory > Info: Benchmark time: 64 CPUs 0.292342 s/step 1.69179 days/ns 939.527 MB > memory > Info: Benchmark time: 64 CPUs 0.293499 s/step 1.69849 days/ns 939.527 MB > memory > Info: Benchmark time: 64 CPUs 0.292355 s/step 1.69187 days/ns 939.527 MB > memory > > TIMING: 25000 CPU: 7754.17, 0.312071/step Wall: 7754.17, 0.312071/step, > 0.0216716 hours remaining, 950.929688 MB of memory in use. > > OMPI 1.7.3a1r29103 srun > > Info: Benchmark time: 64 CPUs 0.347864 s/step 2.0131 days/ns 904.91 MB memory > Info: Benchmark time: 64 CPUs 0.346367 s/step 2.00444 days/ns 904.91 MB memory > Info: Benchmark time: 64 CPUs 0.29007 s/step 1.67865 days/ns 904.91 MB memory > Info: Benchmark time: 64 CPUs 0.279447 s/step 1.61717 days/ns 904.91 MB memory > Info: Benchmark time: 64 CPUs 0.280824 s/step 1.62514 days/ns 904.91 MB memory > > TIMING: 25000 CPU: 7420.91, 0.296029/step Wall: 7420.91, 0.296029/step, > 0.0205575 hours remaining, 916.312500 MB of memory in use. > > > Hope this is useful! > > All the best, > Chris > - -- > Christopher SamuelSenior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iEYEARECAAYFAlIn6UoACgkQO2KABBYQAh9GWgCghcYKSj1i9rDDQospURAeusD5 > E+EAn2beqUlYZWHxi1Dgj8ZEpiai4zH1 > =k5Uz > -END PGP SIGNATURE- > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/09/13 18:33, George Bosilca wrote: > You can confirm that the slowdown happen during the MPI > initialization stages by profiling the application (especially the > MPI_Init call). NAMD helpfully prints benchmark and timing numbers during the initial part of the simulation, so here's what they say. For both seconds per step and days per nanosecond of simulation less is better. I've included the benchmark numbers (every 100 steps or so from the start) and the final timing number after 25000 steps. It looks like to me (as a sysadmin and not an MD person) that the final timing number includes CPU time in seconds per step and wallclock time in seconds per step. 64 cores over 10 nodes: OMPI 1.7.3a1r29103 mpirun Info: Benchmark time: 64 CPUs 0.410424 s/step 2.37514 days/ns 909.57 MB memory Info: Benchmark time: 64 CPUs 0.392106 s/step 2.26913 days/ns 909.57 MB memory Info: Benchmark time: 64 CPUs 0.313136 s/step 1.81213 days/ns 909.57 MB memory Info: Benchmark time: 64 CPUs 0.316792 s/step 1.83329 days/ns 909.57 MB memory Info: Benchmark time: 64 CPUs 0.313867 s/step 1.81636 days/ns 909.57 MB memory TIMING: 25000 CPU: 8247.2, 0.330157/step Wall: 8247.2, 0.330157/step, 0.0229276 hours remaining, 921.894531 MB of memory in use. OMPI 1.7.3a1r29103 srun Info: Benchmark time: 64 CPUs 0.341967 s/step 1.97897 days/ns 903.883 MB memory Info: Benchmark time: 64 CPUs 0.339644 s/step 1.96553 days/ns 903.883 MB memory Info: Benchmark time: 64 CPUs 0.284424 s/step 1.64597 days/ns 903.883 MB memory Info: Benchmark time: 64 CPUs 0.28115 s/step 1.62702 days/ns 903.883 MB memory Info: Benchmark time: 64 CPUs 0.279536 s/step 1.61769 days/ns 903.883 MB memory TIMING: 25000 CPU: 7390.15, 0.296/step Wall: 7390.15, 0.296/step, 0.020 hours remaining, 915.746094 MB of memory in use. 64 cores over 18 nodes: OMPI 1.6.5 mpirun Info: Benchmark time: 64 CPUs 0.366327 s/step 2.11995 days/ns 939.527 MB memory Info: Benchmark time: 64 CPUs 0.359805 s/step 2.0822 days/ns 939.527 MB memory Info: Benchmark time: 64 CPUs 0.292342 s/step 1.69179 days/ns 939.527 MB memory Info: Benchmark time: 64 CPUs 0.293499 s/step 1.69849 days/ns 939.527 MB memory Info: Benchmark time: 64 CPUs 0.292355 s/step 1.69187 days/ns 939.527 MB memory TIMING: 25000 CPU: 7754.17, 0.312071/step Wall: 7754.17, 0.312071/step, 0.0216716 hours remaining, 950.929688 MB of memory in use. OMPI 1.7.3a1r29103 srun Info: Benchmark time: 64 CPUs 0.347864 s/step 2.0131 days/ns 904.91 MB memory Info: Benchmark time: 64 CPUs 0.346367 s/step 2.00444 days/ns 904.91 MB memory Info: Benchmark time: 64 CPUs 0.29007 s/step 1.67865 days/ns 904.91 MB memory Info: Benchmark time: 64 CPUs 0.279447 s/step 1.61717 days/ns 904.91 MB memory Info: Benchmark time: 64 CPUs 0.280824 s/step 1.62514 days/ns 904.91 MB memory TIMING: 25000 CPU: 7420.91, 0.296029/step Wall: 7420.91, 0.296029/step, 0.0205575 hours remaining, 916.312500 MB of memory in use. Hope this is useful! All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlIn6UoACgkQO2KABBYQAh9GWgCghcYKSj1i9rDDQospURAeusD5 E+EAn2beqUlYZWHxi1Dgj8ZEpiai4zH1 =k5Uz -END PGP SIGNATURE-
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
On Sep 4, 2013, at 4:33 AM, George Bosilcawrote: > You can confirm that the slowdown happen during the MPI initialization stages > by profiling the application (especially the MPI_Init call). You can also try just launching "MPI hello world" (i.e., examples/hello_c.c). It just calls MPI_INIT / MPI_FINALIZE. Additionally, you might want to try launching the ring program, too (examples/ring_c.c). That program sends a small message around in a ring, which forces some MPI communication to occur, and therefore does at least some level of setup in the BTLs, etc. (remember: most BTLs are lazy-connect, so they don't actually do anything until the first send. So a simple "ring" program sets up *some* BTL connections, but not nearly all of them). -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
This is 1.7.3 - there is no comm thread in ORTE in that version. On Sep 4, 2013, at 1:33 AM, George Bosilcawrote: > You can confirm that the slowdown happen during the MPI initialization stages > by profiling the application (especially the MPI_Init call). > > Another possible cause of slowdown might be the communication thread in the > ORTE. If it remains active outside the initialization it will definitively > disturb the application, by taking away critical resources. > > George. > > On Sep 4, 2013, at 05:59 , Christopher Samuel wrote: > >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA1 >> >> On 04/09/13 11:29, Ralph Castain wrote: >> >>> Your code is obviously doing something much more than just >>> launching and wiring up, so it is difficult to assess the >>> difference in speed between 1.6.5 and 1.7.3 - my guess is that it >>> has to do with changes in the MPI transport layer and nothing to do >>> with PMI or not. >> >> I'm testing with what would be our most used application in aggregate >> across our systems, the NAMD molecular dynamics code from here: >> >> http://www.ks.uiuc.edu/Research/namd/ >> >> so yes, you're quite right, it's doing a lot more than that and has a >> reputation for being a *very* chatty MPI code. >> >> For comparison whilst users see GROMACS also suffer with srun under >> 1.6.5 they don't see anything like the slow down that NAMD gets. >> >> All the best, >> Chris >> - -- >> Christopher SamuelSenior Systems Administrator >> VLSCI - Victorian Life Sciences Computation Initiative >> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 >> http://www.vlsci.org.au/ http://twitter.com/vlsci >> >> -BEGIN PGP SIGNATURE- >> Version: GnuPG v1.4.11 (GNU/Linux) >> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ >> >> iEYEARECAAYFAlImsCwACgkQO2KABBYQAh8c4wCfQlOd6ZL68tncAd1h3Fyb1hAr >> DicAn06seL8GzYPGtGImnYkb7sYd5op9 >> =pkwZ >> -END PGP SIGNATURE- >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
You can confirm that the slowdown happen during the MPI initialization stages by profiling the application (especially the MPI_Init call). Another possible cause of slowdown might be the communication thread in the ORTE. If it remains active outside the initialization it will definitively disturb the application, by taking away critical resources. George. On Sep 4, 2013, at 05:59 , Christopher Samuelwrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 04/09/13 11:29, Ralph Castain wrote: > >> Your code is obviously doing something much more than just >> launching and wiring up, so it is difficult to assess the >> difference in speed between 1.6.5 and 1.7.3 - my guess is that it >> has to do with changes in the MPI transport layer and nothing to do >> with PMI or not. > > I'm testing with what would be our most used application in aggregate > across our systems, the NAMD molecular dynamics code from here: > > http://www.ks.uiuc.edu/Research/namd/ > > so yes, you're quite right, it's doing a lot more than that and has a > reputation for being a *very* chatty MPI code. > > For comparison whilst users see GROMACS also suffer with srun under > 1.6.5 they don't see anything like the slow down that NAMD gets. > > All the best, > Chris > - -- > Christopher SamuelSenior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iEYEARECAAYFAlImsCwACgkQO2KABBYQAh8c4wCfQlOd6ZL68tncAd1h3Fyb1hAr > DicAn06seL8GzYPGtGImnYkb7sYd5op9 > =pkwZ > -END PGP SIGNATURE- > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/09/13 11:29, Ralph Castain wrote: > Your code is obviously doing something much more than just > launching and wiring up, so it is difficult to assess the > difference in speed between 1.6.5 and 1.7.3 - my guess is that it > has to do with changes in the MPI transport layer and nothing to do > with PMI or not. I'm testing with what would be our most used application in aggregate across our systems, the NAMD molecular dynamics code from here: http://www.ks.uiuc.edu/Research/namd/ so yes, you're quite right, it's doing a lot more than that and has a reputation for being a *very* chatty MPI code. For comparison whilst users see GROMACS also suffer with srun under 1.6.5 they don't see anything like the slow down that NAMD gets. All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlImsCwACgkQO2KABBYQAh8c4wCfQlOd6ZL68tncAd1h3Fyb1hAr DicAn06seL8GzYPGtGImnYkb7sYd5op9 =pkwZ -END PGP SIGNATURE-
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
Your code is obviously doing something much more than just launching and wiring up, so it is difficult to assess the difference in speed between 1.6.5 and 1.7.3 - my guess is that it has to do with changes in the MPI transport layer and nothing to do with PMI or not. Likewise, I can't imagine any differences in wireup method accounting for the 500 seconds in execution time difference between the two versions when using the same launch method. I launch more than 10 nodes in far less time than that, so again I expect this has to do with something in the MPI layer. The real question is why you see so much difference between launching via mpirun vs srun. Like I said, the launch and wireup times on such small scales is negligible, so somehow you are winding up selecting different MPI transport options. You can test this by just running "hello world" instead - I'll bet the mpirun vs srun time differences are a second or two at most. Perhaps Jeff or someone else can suggest some debug flags you could use to understand these differences? On Sep 3, 2013, at 6:13 PM, Christopher Samuelwrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 03/09/13 10:56, Ralph Castain wrote: > >> Yeah - --with-pmi= > > Actually I found that just --with-pmi=/usr/local/slurm/latest worked. :-) > > I've got some initial numbers for 64 cores, as I mentioned the system > I found this on initially is so busy at the moment I won't be able to > run anything bigger for a while, so I'm going to move my testing to > another system which is a bit quieter, but slower (it's Nehalem vs > SandyBridge). > > All the below tests are with the same NAMD 2.9 binary and within the > same Slurm job so it runs on the same cores each time. It's nice to > find that C code at least seems to be backwardly compatible! > > 64 cores over 18 nodes: > > Open-MPI 1.6.5 with mpirun - 7842 seconds > Open-MPI 1.7.3a1r29103 with srun - 7522 seconds > > so that's about a 4% speedup. > > 64 cores over 10 nodes: > > Open-MPI 1.7.3a1r29103 with mpirun - 8341 seconds > Open-MPI 1.7.3a1r29103 with srun - 7476 seconds > > So that's about 11% faster, and the mpirun speed has decreased though > of course that's built using PMI so perhaps that's the cause? > > cheers, > Chris > - -- > Christopher SamuelSenior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iEUEARECAAYFAlImiUUACgkQO2KABBYQAh+WvwCeM1ufCWvK627oz8aBbgKjfONe > cDEAmM3w+/EJ0unbmaetNR3ay4U6nrM= > =v/PT > -END PGP SIGNATURE- > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03/09/13 10:56, Ralph Castain wrote: > Yeah - --with-pmi= Actually I found that just --with-pmi=/usr/local/slurm/latest worked. :-) I've got some initial numbers for 64 cores, as I mentioned the system I found this on initially is so busy at the moment I won't be able to run anything bigger for a while, so I'm going to move my testing to another system which is a bit quieter, but slower (it's Nehalem vs SandyBridge). All the below tests are with the same NAMD 2.9 binary and within the same Slurm job so it runs on the same cores each time. It's nice to find that C code at least seems to be backwardly compatible! 64 cores over 18 nodes: Open-MPI 1.6.5 with mpirun - 7842 seconds Open-MPI 1.7.3a1r29103 with srun - 7522 seconds so that's about a 4% speedup. 64 cores over 10 nodes: Open-MPI 1.7.3a1r29103 with mpirun - 8341 seconds Open-MPI 1.7.3a1r29103 with srun - 7476 seconds So that's about 11% faster, and the mpirun speed has decreased though of course that's built using PMI so perhaps that's the cause? cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEUEARECAAYFAlImiUUACgkQO2KABBYQAh+WvwCeM1ufCWvK627oz8aBbgKjfONe cDEAmM3w+/EJ0unbmaetNR3ay4U6nrM= =v/PT -END PGP SIGNATURE-
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
Yeah - --with-pmi= On Sep 2, 2013, at 5:27 PM, Christopher Samuelwrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 31/08/13 02:42, Ralph Castain wrote: > >> We did some work on the OMPI side and removed the O(N) calls to >> "get", so it should behave better now. If you get the chance, >> please try the 1.7.3 nightly tarball. We hope to officially release >> it soon. > > Stupid question, but never having played with PMI before is it just > the case of appending the --with-pmi option to our current configure? > > thanks, > Chris > - -- > Christopher SamuelSenior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iEYEARECAAYFAlIlLPAACgkQO2KABBYQAh9GhwCeN192n4g5PBHpeHwOi2Kpyhs3 > +X8An0TJ2VzrgeKl4+2YVVeZXq+6fz/W > =h4Ip > -END PGP SIGNATURE- > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 31/08/13 02:42, Ralph Castain wrote: > We did some work on the OMPI side and removed the O(N) calls to > "get", so it should behave better now. If you get the chance, > please try the 1.7.3 nightly tarball. We hope to officially release > it soon. Stupid question, but never having played with PMI before is it just the case of appending the --with-pmi option to our current configure? thanks, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlIlLPAACgkQO2KABBYQAh9GhwCeN192n4g5PBHpeHwOi2Kpyhs3 +X8An0TJ2VzrgeKl4+2YVVeZXq+6fz/W =h4Ip -END PGP SIGNATURE-
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 31/08/13 02:42, Ralph Castain wrote: > Hi Chris et al Hiya, > We did some work on the OMPI side and removed the O(N) calls to > "get", so it should behave better now. If you get the chance, > please try the 1.7.3 nightly tarball. We hope to officially release > it soon. Thanks so much, I'll get our folks to rebuild a test version of NAMD against 1.7.3a1r29103 which I built this afternoon. It might be some time until I can get a test job of a suitable size to run though, looks like our systems are flat out! All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlIkNsMACgkQO2KABBYQAh9AqgCggCUKRRLODZhfXUAJ6T2pYjGI iSgAniISxkxnHXyEj7L6kmTs4wERy1rW =31Qg -END PGP SIGNATURE-
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 24/07/13 09:42, Ralph Castain wrote: > Not to 1.6 series, but it is in the about-to-be-released 1.7.3, > and will be there from that point onwards. Oh dear, I cannot delay this machine any more to change to 1.7.x. :-( > Still waiting to see if it resolves the difference. When I've got the current rush out of the way I'll try a private build of 1.7 and see how that goes with NAMD. cheers! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlHvbl8ACgkQO2KABBYQAh9a6QCgi0HOHHV/opqjPq+Av+lTasaj 4OkAnA8i8ajZ9Umw7MoaH8qJbWBgFOAf =p7Xl -END PGP SIGNATURE-
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
Not to 1.6 series, but it is in the about-to-be-released 1.7.3, and will be there from that point onwards. Still waiting to see if it resolves the difference. On Jul 23, 2013, at 4:28 PM, Christopher Samuelwrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 23/07/13 19:34, Joshua Ladd wrote: > >> Hi, Chris > > Hi Joshua, > > I've quoted you in full as I don't think your message made it through > to the slurm-dev list (at least I've not received it from there yet). > >> Funny you should mention this now. We identified and diagnosed the >> issue some time ago as a combination of SLURM's PMI1 >> implementation and some of, what I'll call, OMPI's topology >> requirements (probably not the right word.) Here's what is >> happening, in a nutshell, when you launch with srun: >> >> 1. Each process pushes his endpoint data up to the PMI "cloud" via >> PMI put (I think it's about five or six puts, bottom line, O(1).) >> 2. Then executes a PMI commit and PMI barrier to ensure all other >> processes have finished committing their data to the "cloud". 3. >> Subsequent to this, each process executes O(N) (N is the number of >> procs in the job) PMI gets in order to get all of the endpoint >> data for every process regardless of whether or not the process >> communicates with that endpoint. >> >> "We" (MLNX et al.) undertook an in-depth scaling study of this and >> identified several poorly scaling pieces with the worst offenders >> being: >> >> 1. PMI Barrier scales worse than linear. 2. At scale, the PMI get >> phase starts to look quadratic. >> >> The proposed solution that "we" (OMPI + SLURM) have come up with is >> to modify OMPI to support PMI2 and to use SLURM 2.6 which has >> support for PMI2 and is (allegedly) much more scalable than PMI1. >> Several folks in the combined communities are working hard, as we >> speak, trying to get this functional to see if it indeed makes a >> difference. Stay tuned, Chris. Hopefully we will have some data by >> the end of the week. > > Wonderful, great to know that what we're seeing is actually real and > not just pilot error on our part! We're happy enough to tell users > to keep on using mpirun as they will be used to from our other Intel > systems and to only use srun if the code requires it (one or two > commercial apps that use Intel MPI). > > Can I ask, if the PMI2 ideas work out is that likely to get backported > to OMPI 1.6.x ? > > All the best, > Chris > - -- > Christopher SamuelSenior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iEYEARECAAYFAlHvEZIACgkQO2KABBYQAh9QogCeMuR/E4oPivdsX3r671+z7EWd > Hv8An1N8csHMby7bouT/gC07i/J2PW+i > =gZsB > -END PGP SIGNATURE- > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 23/07/13 19:34, Joshua Ladd wrote: > Hi, Chris Hi Joshua, I've quoted you in full as I don't think your message made it through to the slurm-dev list (at least I've not received it from there yet). > Funny you should mention this now. We identified and diagnosed the > issue some time ago as a combination of SLURM's PMI1 > implementation and some of, what I'll call, OMPI's topology > requirements (probably not the right word.) Here's what is > happening, in a nutshell, when you launch with srun: > > 1. Each process pushes his endpoint data up to the PMI "cloud" via > PMI put (I think it's about five or six puts, bottom line, O(1).) > 2. Then executes a PMI commit and PMI barrier to ensure all other > processes have finished committing their data to the "cloud". 3. > Subsequent to this, each process executes O(N) (N is the number of > procs in the job) PMI gets in order to get all of the endpoint > data for every process regardless of whether or not the process > communicates with that endpoint. > > "We" (MLNX et al.) undertook an in-depth scaling study of this and > identified several poorly scaling pieces with the worst offenders > being: > > 1. PMI Barrier scales worse than linear. 2. At scale, the PMI get > phase starts to look quadratic. > > The proposed solution that "we" (OMPI + SLURM) have come up with is > to modify OMPI to support PMI2 and to use SLURM 2.6 which has > support for PMI2 and is (allegedly) much more scalable than PMI1. > Several folks in the combined communities are working hard, as we > speak, trying to get this functional to see if it indeed makes a > difference. Stay tuned, Chris. Hopefully we will have some data by > the end of the week. Wonderful, great to know that what we're seeing is actually real and not just pilot error on our part! We're happy enough to tell users to keep on using mpirun as they will be used to from our other Intel systems and to only use srun if the code requires it (one or two commercial apps that use Intel MPI). Can I ask, if the PMI2 ideas work out is that likely to get backported to OMPI 1.6.x ? All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlHvEZIACgkQO2KABBYQAh9QogCeMuR/E4oPivdsX3r671+z7EWd Hv8An1N8csHMby7bouT/gC07i/J2PW+i =gZsB -END PGP SIGNATURE-
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
Hi, Chris Funny you should mention this now. We identified and diagnosed the issue some time ago as a combination of SLURM's PMI1 implementation and some of, what I'll call, OMPI's topology requirements (probably not the right word.) Here's what is happening, in a nutshell, when you launch with srun: 1. Each process pushes his endpoint data up to the PMI "cloud" via PMI put (I think it's about five or six puts, bottom line, O(1).) 2. Then executes a PMI commit and PMI barrier to ensure all other processes have finished committing their data to the "cloud". 3. Subsequent to this, each process executes O(N) (N is the number of procs in the job) PMI gets in order to get all of the endpoint data for every process regardless of whether or not the process communicates with that endpoint. "We" (MLNX et al.) undertook an in-depth scaling study of this and identified several poorly scaling pieces with the worst offenders being: 1. PMI Barrier scales worse than linear. 2. At scale, the PMI get phase starts to look quadratic. The proposed solution that "we" (OMPI + SLURM) have come up with is to modify OMPI to support PMI2 and to use SLURM 2.6 which has support for PMI2 and is (allegedly) much more scalable than PMI1. Several folks in the combined communities are working hard, as we speak, trying to get this functional to see if it indeed makes a difference. Stay tuned, Chris. Hopefully we will have some data by the end of the week. Best regards, Josh Joshua S. Ladd, PhD HPC Algorithms Engineer Mellanox Technologies Email: josh...@mellanox.com Cell: +1 (865) 258 - 8898 -Original Message- From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of Christopher Samuel Sent: Tuesday, July 23, 2013 3:06 AM To: slurm-dev; Open MPI Developers Subject: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi there slurm-dev and OMPI devel lists, Bringing up a new IBM SandyBridge cluster I'm running a NAMD test case and noticed that if I run it with srun rather than mpirun it goes over 20% slower. These are all launched from an sbatch script too. Slurm 2.6.0, RHEL 6.4 (latest kernel), FDR IB. Here are some timings as reported as the WallClock time by NAMD itself (so not including startup/tear down overhead from Slurm). srun: run1/slurm-93744.out:WallClock: 695.079773 CPUTime: 695.079773 run4/slurm-94011.out:WallClock: 723.907959 CPUTime: 723.907959 run5/slurm-94013.out:WallClock: 726.156799 CPUTime: 726.156799 run6/slurm-94017.out:WallClock: 724.828918 CPUTime: 724.828918 Average of 692 seconds mpirun: run2/slurm-93746.out:WallClock: 559.311035 CPUTime: 559.311035 run3/slurm-93910.out:WallClock: 544.116333 CPUTime: 544.116333 run7/slurm-94019.out:WallClock: 586.072693 CPUTime: 586.072693 Average of 563 seconds. So that's about 23% slower. Everything is identical (they're all symlinks to the same golden master) *except* for the srun / mpirun which is modified by copying the batch script and substituting mpirun for srun. When they are running I can see that for jobs launched with srun they are direct children of slurmstepd whereas when started with mpirun they are children of Open-MPI's orted (or mpirun on the launch node) which itself is a child of slurmstepd. Has anyone else seen anything like this, or got any ideas? cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlHuKxoACgkQO2KABBYQAh8cYQCfT/YIFkyeDaNb/ksT2xk4W416 kycAoJfdZInLwy+nTIL7CzWapZZU20qm =ZJ1B -END PGP SIGNATURE- ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel