Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Hi Szilárd, Many thanks, now it is clear to me also how the tests are verified. This means, I can trust my energy calculation now. Thanks again, Steffi -Ursprüngliche Nachricht- Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd Páll Gesendet: Mittwoch, 10. April 2019 23:44 An: Discussion list for GROMACS users Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs Hi, On Wed, Apr 10, 2019 at 4:19 PM Tafelmeier, Stefanie < stefanie.tafelme...@zae-bayern.de> wrote: > Dear Szilárd and Jon, > > many thanks for your support. > > The system was Ubuntu 18.04 LTS, gcc 7.3 and CUDA 9.2. > We upgraded now gcc (to 8.2) and CUDA (to 10.1). > > Now the regressiontests all pass. > Also the tests Szilárd ask before are all running. Even just using mdrun > -nt 80 works. > Great, this confirms that there was indeed a strange compatibility issue as Jon suggested. Many thanks! It seems that this was the origin of the problem. > > Just to be sure, I would like to have a look at the short range value of > the complex test. As before some passed even without having the right > values. > What do you mean by that? > Is there a way to compare or a list with the correct outcome? > When the regressiontests are executed, the output by default lists all commands that do the test runs as well as those that verify the outputs, e.g. $ perl gmxtest.pl complex [...] Testing acetonitrilRF . . . gmx grompp -f ./grompp.mdp -c ./conf -r ./conf -p ./topol -maxwarn 10 >grompp.out 2>grompp.err gmx check -s1 ./reference_s.tpr -s2 topol.tpr -tol 0.0001 -abstol 0.001 >checktpr.out 2>checktpr.err gmx mdrun-nb cpu -notunepme >mdrun.out 2>&1 gmx check -e ./reference_s.edr -e2 ener.edr -tol 0.001 -abstol 0.05 -lastener Potential >checkpot.out 2>checkpot.err gmx check -f ./reference_s.trr -f2 traj.trr -tol 0.001 -abstol 0.05 >checkforce.out 2>checkforce.err PASSED but check mdp file differences The gmx check commands do the checking and the the reference_s|d files to comapre against. -- Szilárd > Anyway, here is the link to the tar-ball of the complex folder in case > there is interest: > https://it-service.zae-bayern.de/Team/index.php/s/mMyt3MPEfRrn8Ge > > Many thanks again for your help. > > Best wishes, > Steffi > > > > > -Ursprüngliche Nachricht- > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von > Jonathan Vincent > Gesendet: Dienstag, 9. April 2019 22:13 > An: gmx-us...@gromacs.org > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs > > Hi, > > Which operating system are you running on? We have seen some strange > behavior with large number of threads, gcc 7.3 and a newish version of > glibc. Specifically the default combination that comes with Ubuntu 18.04 > LTS, but it might be more generic than that. > > My suggestion would be to update to gcc 8.3 and CUDA 10.1 (which is > required for CUDA support of gcc 8), which seemed to fix the problem in > that case. > > If you still have problems we can look at this some more. > > Jon > > -Original Message- > From: gromacs.org_gmx-users-boun...@maillist.sys.kth.se < > gromacs.org_gmx-users-boun...@maillist.sys.kth.se> On Behalf Of Szilárd > Páll > Sent: 09 April 2019 20:08 > To: Discussion list for GROMACS users > Subject: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs > > Hi, > > One more test I realized it may be relevant considering that we had a > similar report earlier this year on similar CPU hardware: > can you please compile with -DGMX_SIMD=AVX2_256 and rerun the tests? > > -- > Szilárd > > > On Tue, Apr 9, 2019 at 8:35 PM Szilárd Páll > wrote: > > > Dear Stefanie, > > > > On Fri, Apr 5, 2019 at 11:48 AM Tafelmeier, Stefanie < > > stefanie.tafelme...@zae-bayern.de> wrote: > > > >> Hi Szilárd, > >> > >> thanks for your advices. > >> I performed the tests. > >> Both performed without errors. > >> > > > > OK, that excludes simple and obvious issues. > > Wild guess, but can you run those again, but this time prefix the > > command with "taskset -c 22-32" > > ? This makes the tests use cores 22-32 just to check if using a > > specific set of cores may somehow trigger an error. > > > > What CUDA version did you use to compiler the memtest tool -- was it > > the same (CUDA 9.2) as the one used for building GROMACS? > > > > Just to get it right; I have to ask in more detail, because the > > connection > >> between is th
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Hi, On Wed, Apr 10, 2019 at 4:19 PM Tafelmeier, Stefanie < stefanie.tafelme...@zae-bayern.de> wrote: > Dear Szilárd and Jon, > > many thanks for your support. > > The system was Ubuntu 18.04 LTS, gcc 7.3 and CUDA 9.2. > We upgraded now gcc (to 8.2) and CUDA (to 10.1). > > Now the regressiontests all pass. > Also the tests Szilárd ask before are all running. Even just using mdrun > -nt 80 works. > Great, this confirms that there was indeed a strange compatibility issue as Jon suggested. Many thanks! It seems that this was the origin of the problem. > > Just to be sure, I would like to have a look at the short range value of > the complex test. As before some passed even without having the right > values. > What do you mean by that? > Is there a way to compare or a list with the correct outcome? > When the regressiontests are executed, the output by default lists all commands that do the test runs as well as those that verify the outputs, e.g. $ perl gmxtest.pl complex [...] Testing acetonitrilRF . . . gmx grompp -f ./grompp.mdp -c ./conf -r ./conf -p ./topol -maxwarn 10 >grompp.out 2>grompp.err gmx check -s1 ./reference_s.tpr -s2 topol.tpr -tol 0.0001 -abstol 0.001 >checktpr.out 2>checktpr.err gmx mdrun-nb cpu -notunepme >mdrun.out 2>&1 gmx check -e ./reference_s.edr -e2 ener.edr -tol 0.001 -abstol 0.05 -lastener Potential >checkpot.out 2>checkpot.err gmx check -f ./reference_s.trr -f2 traj.trr -tol 0.001 -abstol 0.05 >checkforce.out 2>checkforce.err PASSED but check mdp file differences The gmx check commands do the checking and the the reference_s|d files to comapre against. -- Szilárd > Anyway, here is the link to the tar-ball of the complex folder in case > there is interest: > https://it-service.zae-bayern.de/Team/index.php/s/mMyt3MPEfRrn8Ge > > Many thanks again for your help. > > Best wishes, > Steffi > > > > > -Ursprüngliche Nachricht- > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von > Jonathan Vincent > Gesendet: Dienstag, 9. April 2019 22:13 > An: gmx-us...@gromacs.org > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs > > Hi, > > Which operating system are you running on? We have seen some strange > behavior with large number of threads, gcc 7.3 and a newish version of > glibc. Specifically the default combination that comes with Ubuntu 18.04 > LTS, but it might be more generic than that. > > My suggestion would be to update to gcc 8.3 and CUDA 10.1 (which is > required for CUDA support of gcc 8), which seemed to fix the problem in > that case. > > If you still have problems we can look at this some more. > > Jon > > -Original Message- > From: gromacs.org_gmx-users-boun...@maillist.sys.kth.se < > gromacs.org_gmx-users-boun...@maillist.sys.kth.se> On Behalf Of Szilárd > Páll > Sent: 09 April 2019 20:08 > To: Discussion list for GROMACS users > Subject: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs > > Hi, > > One more test I realized it may be relevant considering that we had a > similar report earlier this year on similar CPU hardware: > can you please compile with -DGMX_SIMD=AVX2_256 and rerun the tests? > > -- > Szilárd > > > On Tue, Apr 9, 2019 at 8:35 PM Szilárd Páll > wrote: > > > Dear Stefanie, > > > > On Fri, Apr 5, 2019 at 11:48 AM Tafelmeier, Stefanie < > > stefanie.tafelme...@zae-bayern.de> wrote: > > > >> Hi Szilárd, > >> > >> thanks for your advices. > >> I performed the tests. > >> Both performed without errors. > >> > > > > OK, that excludes simple and obvious issues. > > Wild guess, but can you run those again, but this time prefix the > > command with "taskset -c 22-32" > > ? This makes the tests use cores 22-32 just to check if using a > > specific set of cores may somehow trigger an error. > > > > What CUDA version did you use to compiler the memtest tool -- was it > > the same (CUDA 9.2) as the one used for building GROMACS? > > > > Just to get it right; I have to ask in more detail, because the > > connection > >> between is the CPU/GPU and calculation distribution is still a bit > >> blurry to me: > >> > >> If the output of the regressiontests show that the test crashes after > >> 1-2 steps, this means there is an issue between the transfer between > >> the CPU and GPU? > >> As far as I got the short range calculation part is normally split > >> into nonbonded -> GPU and bonded -> CPU? > >> > > > > The -nb/
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Dear Szilárd and Jon, many thanks for your support. The system was Ubuntu 18.04 LTS, gcc 7.3 and CUDA 9.2. We upgraded now gcc (to 8.2) and CUDA (to 10.1). Now the regressiontests all pass. Also the tests Szilárd ask before are all running. Even just using mdrun -nt 80 works. Many thanks! It seems that this was the origin of the problem. Just to be sure, I would like to have a look at the short range value of the complex test. As before some passed even without having the right values. Is there a way to compare or a list with the correct outcome? Anyway, here is the link to the tar-ball of the complex folder in case there is interest: https://it-service.zae-bayern.de/Team/index.php/s/mMyt3MPEfRrn8Ge Many thanks again for your help. Best wishes, Steffi -Ursprüngliche Nachricht- Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Jonathan Vincent Gesendet: Dienstag, 9. April 2019 22:13 An: gmx-us...@gromacs.org Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs Hi, Which operating system are you running on? We have seen some strange behavior with large number of threads, gcc 7.3 and a newish version of glibc. Specifically the default combination that comes with Ubuntu 18.04 LTS, but it might be more generic than that. My suggestion would be to update to gcc 8.3 and CUDA 10.1 (which is required for CUDA support of gcc 8), which seemed to fix the problem in that case. If you still have problems we can look at this some more. Jon -Original Message- From: gromacs.org_gmx-users-boun...@maillist.sys.kth.se On Behalf Of Szilárd Páll Sent: 09 April 2019 20:08 To: Discussion list for GROMACS users Subject: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs Hi, One more test I realized it may be relevant considering that we had a similar report earlier this year on similar CPU hardware: can you please compile with -DGMX_SIMD=AVX2_256 and rerun the tests? -- Szilárd On Tue, Apr 9, 2019 at 8:35 PM Szilárd Páll wrote: > Dear Stefanie, > > On Fri, Apr 5, 2019 at 11:48 AM Tafelmeier, Stefanie < > stefanie.tafelme...@zae-bayern.de> wrote: > >> Hi Szilárd, >> >> thanks for your advices. >> I performed the tests. >> Both performed without errors. >> > > OK, that excludes simple and obvious issues. > Wild guess, but can you run those again, but this time prefix the > command with "taskset -c 22-32" > ? This makes the tests use cores 22-32 just to check if using a > specific set of cores may somehow trigger an error. > > What CUDA version did you use to compiler the memtest tool -- was it > the same (CUDA 9.2) as the one used for building GROMACS? > > Just to get it right; I have to ask in more detail, because the > connection >> between is the CPU/GPU and calculation distribution is still a bit >> blurry to me: >> >> If the output of the regressiontests show that the test crashes after >> 1-2 steps, this means there is an issue between the transfer between >> the CPU and GPU? >> As far as I got the short range calculation part is normally split >> into nonbonded -> GPU and bonded -> CPU? >> > > The -nb/-pme/-bonded flags control which tasks executes where (if not > specified defaults control this); the output contains a report which > summarizes where the major force tasks are executed, e.g. this is from > one of your log files which tells that PP (i.e. particle tasks like > short-range > nonbonded) and the full PME tasks are offloaded to a GPU with ID 0 > (and to check which GPU is that you can look at the "Hardware > detection" section of the log): > > 1 GPU selected for this run. > Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node: > PP:0,PME:0 > PP tasks will do (non-perturbed) short-ranged interactions on the GPU > PME tasks will do all aspects on the GPU > > For more details, please see > http://manual.gromacs.org/documentation/2019.1/user-guide/mdrun-perfor > mance.html#running-mdrun-with-gpus > > We have seen two types of errors so far: > - "Asynchronous H2D copy failed: invalid argument" which is still > mysterious to me and has showed up both in your repeated manual runs > as well as the regressiontest; as this aborts the run > - Failing regressiontests with either invalid results or crashes > (below above abort): to be honest I do not know what causes these but > given that results > > The latter errors indicate incorrect results, in your last "complex" > tests tarball I saw some tests failing with LINCS errors (and > indicating NaN > values) and a good fraction of tests failing with a GPU-side > assertions -- both of which suggest that t
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Hi, Which operating system are you running on? We have seen some strange behavior with large number of threads, gcc 7.3 and a newish version of glibc. Specifically the default combination that comes with Ubuntu 18.04 LTS, but it might be more generic than that. My suggestion would be to update to gcc 8.3 and CUDA 10.1 (which is required for CUDA support of gcc 8), which seemed to fix the problem in that case. If you still have problems we can look at this some more. Jon -Original Message- From: gromacs.org_gmx-users-boun...@maillist.sys.kth.se On Behalf Of Szilárd Páll Sent: 09 April 2019 20:08 To: Discussion list for GROMACS users Subject: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs Hi, One more test I realized it may be relevant considering that we had a similar report earlier this year on similar CPU hardware: can you please compile with -DGMX_SIMD=AVX2_256 and rerun the tests? -- Szilárd On Tue, Apr 9, 2019 at 8:35 PM Szilárd Páll wrote: > Dear Stefanie, > > On Fri, Apr 5, 2019 at 11:48 AM Tafelmeier, Stefanie < > stefanie.tafelme...@zae-bayern.de> wrote: > >> Hi Szilárd, >> >> thanks for your advices. >> I performed the tests. >> Both performed without errors. >> > > OK, that excludes simple and obvious issues. > Wild guess, but can you run those again, but this time prefix the > command with "taskset -c 22-32" > ? This makes the tests use cores 22-32 just to check if using a > specific set of cores may somehow trigger an error. > > What CUDA version did you use to compiler the memtest tool -- was it > the same (CUDA 9.2) as the one used for building GROMACS? > > Just to get it right; I have to ask in more detail, because the > connection >> between is the CPU/GPU and calculation distribution is still a bit >> blurry to me: >> >> If the output of the regressiontests show that the test crashes after >> 1-2 steps, this means there is an issue between the transfer between >> the CPU and GPU? >> As far as I got the short range calculation part is normally split >> into nonbonded -> GPU and bonded -> CPU? >> > > The -nb/-pme/-bonded flags control which tasks executes where (if not > specified defaults control this); the output contains a report which > summarizes where the major force tasks are executed, e.g. this is from > one of your log files which tells that PP (i.e. particle tasks like > short-range > nonbonded) and the full PME tasks are offloaded to a GPU with ID 0 > (and to check which GPU is that you can look at the "Hardware > detection" section of the log): > > 1 GPU selected for this run. > Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node: > PP:0,PME:0 > PP tasks will do (non-perturbed) short-ranged interactions on the GPU > PME tasks will do all aspects on the GPU > > For more details, please see > http://manual.gromacs.org/documentation/2019.1/user-guide/mdrun-perfor > mance.html#running-mdrun-with-gpus > > We have seen two types of errors so far: > - "Asynchronous H2D copy failed: invalid argument" which is still > mysterious to me and has showed up both in your repeated manual runs > as well as the regressiontest; as this aborts the run > - Failing regressiontests with either invalid results or crashes > (below above abort): to be honest I do not know what causes these but > given that results > > The latter errors indicate incorrect results, in your last "complex" > tests tarball I saw some tests failing with LINCS errors (and > indicating NaN > values) and a good fraction of tests failing with a GPU-side > assertions -- both of which suggest that things do go wrong on the GPU. > > And does this mean that maybe also the calculation I do, have wrong >> energies? Can I trust my results? >> > > At this point I can unfortunately not recommend running production > simulations on this machine. > > Will try to continue exploring the possible errors and I hope you can > help out with some test: > > - Please run the complex regressiontests (using the RelWithAssert > binary) by setting the CUDA_LAUNCH_BLOCKING environment variable. This > may allow us to reason better about the source of the errors. Also you > can reconfigure with cmake -DGMX_OPENMP_MAX_THREADS=128 to avoid the > 88 OpenMP thread errors in tests that you encountered yourself. > > - Can you please update compiler GROMACS with CUDA 10 and check if > either of two kinds of errors does reproduce. (If it does, if you can > upgrade the driver I suggest upgrading to CUDA 10.1). > > > >> >> Many thanks again for your support. >> Best wishes, >
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Hi, One more test I realized it may be relevant considering that we had a similar report earlier this year on similar CPU hardware: can you please compile with -DGMX_SIMD=AVX2_256 and rerun the tests? -- Szilárd On Tue, Apr 9, 2019 at 8:35 PM Szilárd Páll wrote: > Dear Stefanie, > > On Fri, Apr 5, 2019 at 11:48 AM Tafelmeier, Stefanie < > stefanie.tafelme...@zae-bayern.de> wrote: > >> Hi Szilárd, >> >> thanks for your advices. >> I performed the tests. >> Both performed without errors. >> > > OK, that excludes simple and obvious issues. > Wild guess, but can you run those again, but this time prefix the command > with > "taskset -c 22-32" > ? This makes the tests use cores 22-32 just to check if using a specific > set of cores may somehow trigger an error. > > What CUDA version did you use to compiler the memtest tool -- was it the > same (CUDA 9.2) as the one used for building GROMACS? > > Just to get it right; I have to ask in more detail, because the connection >> between is the CPU/GPU and calculation distribution is still a bit blurry >> to me: >> >> If the output of the regressiontests show that the test crashes after 1-2 >> steps, this means there is an issue between the transfer between the CPU >> and GPU? >> As far as I got the short range calculation part is normally split into >> nonbonded -> GPU and bonded -> CPU? >> > > The -nb/-pme/-bonded flags control which tasks executes where (if not > specified defaults control this); the output contains a report which > summarizes where the major force tasks are executed, e.g. this is from one > of your log files which tells that PP (i.e. particle tasks like short-range > nonbonded) and the full PME tasks are offloaded to a GPU with ID 0 (and to > check which GPU is that you can look at the "Hardware detection" section of > the log): > > 1 GPU selected for this run. > Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node: > PP:0,PME:0 > PP tasks will do (non-perturbed) short-ranged interactions on the GPU > PME tasks will do all aspects on the GPU > > For more details, please see > http://manual.gromacs.org/documentation/2019.1/user-guide/mdrun-performance.html#running-mdrun-with-gpus > > We have seen two types of errors so far: > - "Asynchronous H2D copy failed: invalid argument" which is still > mysterious to me and has showed up both in your repeated manual runs as > well as the regressiontest; as this aborts the run > - Failing regressiontests with either invalid results or crashes (below > above abort): to be honest I do not know what causes these but given that > results > > The latter errors indicate incorrect results, in your last "complex" tests > tarball I saw some tests failing with LINCS errors (and indicating NaN > values) and a good fraction of tests failing with a GPU-side assertions -- > both of which suggest that things do go wrong on the GPU. > > And does this mean that maybe also the calculation I do, have wrong >> energies? Can I trust my results? >> > > At this point I can unfortunately not recommend running production > simulations on this machine. > > Will try to continue exploring the possible errors and I hope you can help > out with some test: > > - Please run the complex regressiontests (using the RelWithAssert binary) > by setting the CUDA_LAUNCH_BLOCKING environment variable. This may allow us > to reason better about the source of the errors. Also you can reconfigure > with cmake -DGMX_OPENMP_MAX_THREADS=128 to avoid the 88 OpenMP thread > errors in tests that you encountered yourself. > > - Can you please update compiler GROMACS with CUDA 10 and check if either > of two kinds of errors does reproduce. (If it does, if you can upgrade the > driver I suggest upgrading to CUDA 10.1). > > > >> >> Many thanks again for your support. >> Best wishes, >> Steffi >> >> > -- > Szilárd > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Dear Stefanie, On Fri, Apr 5, 2019 at 11:48 AM Tafelmeier, Stefanie < stefanie.tafelme...@zae-bayern.de> wrote: > Hi Szilárd, > > thanks for your advices. > I performed the tests. > Both performed without errors. > OK, that excludes simple and obvious issues. Wild guess, but can you run those again, but this time prefix the command with "taskset -c 22-32" ? This makes the tests use cores 22-32 just to check if using a specific set of cores may somehow trigger an error. What CUDA version did you use to compiler the memtest tool -- was it the same (CUDA 9.2) as the one used for building GROMACS? Just to get it right; I have to ask in more detail, because the connection > between is the CPU/GPU and calculation distribution is still a bit blurry > to me: > > If the output of the regressiontests show that the test crashes after 1-2 > steps, this means there is an issue between the transfer between the CPU > and GPU? > As far as I got the short range calculation part is normally split into > nonbonded -> GPU and bonded -> CPU? > The -nb/-pme/-bonded flags control which tasks executes where (if not specified defaults control this); the output contains a report which summarizes where the major force tasks are executed, e.g. this is from one of your log files which tells that PP (i.e. particle tasks like short-range nonbonded) and the full PME tasks are offloaded to a GPU with ID 0 (and to check which GPU is that you can look at the "Hardware detection" section of the log): 1 GPU selected for this run. Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node: PP:0,PME:0 PP tasks will do (non-perturbed) short-ranged interactions on the GPU PME tasks will do all aspects on the GPU For more details, please see http://manual.gromacs.org/documentation/2019.1/user-guide/mdrun-performance.html#running-mdrun-with-gpus We have seen two types of errors so far: - "Asynchronous H2D copy failed: invalid argument" which is still mysterious to me and has showed up both in your repeated manual runs as well as the regressiontest; as this aborts the run - Failing regressiontests with either invalid results or crashes (below above abort): to be honest I do not know what causes these but given that results The latter errors indicate incorrect results, in your last "complex" tests tarball I saw some tests failing with LINCS errors (and indicating NaN values) and a good fraction of tests failing with a GPU-side assertions -- both of which suggest that things do go wrong on the GPU. And does this mean that maybe also the calculation I do, have wrong > energies? Can I trust my results? > At this point I can unfortunately not recommend running production simulations on this machine. Will try to continue exploring the possible errors and I hope you can help out with some test: - Please run the complex regressiontests (using the RelWithAssert binary) by setting the CUDA_LAUNCH_BLOCKING environment variable. This may allow us to reason better about the source of the errors. Also you can reconfigure with cmake -DGMX_OPENMP_MAX_THREADS=128 to avoid the 88 OpenMP thread errors in tests that you encountered yourself. - Can you please update compiler GROMACS with CUDA 10 and check if either of two kinds of errors does reproduce. (If it does, if you can upgrade the driver I suggest upgrading to CUDA 10.1). > > Many thanks again for your support. > Best wishes, > Steffi > > -- Szilárd > > > > -Ursprüngliche Nachricht- > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd > Páll > Gesendet: Freitag, 29. März 2019 01:24 > An: Discussion list for GROMACS users > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs > > Hi, > > The standard output of the first set of runs is also something I was > interested in, but I've found the equivalent in the > complex/TESTDIR/mdrun.out files. What I see in the regresiontests output is > that the forces/energies results are simply not correct; some tests simply > crash after 1-2 steps, but others do complete (like the nbnxn-free-energy/) > and the short-range energies a clearly far off. > > I suggest to try to check if there may be hardware issue: > > - run this memory testing tool: > git clone > https://github.com/ComputationalRadiationPhysics/cuda_memtest.git > cd cuda_memtest > make cuda_memtest CFLAGS='-arch sm_30 -DSM_20 -O3 -DENABLE_NVML=0' > ./cuda_memtest > > - compile and run the gpu-burn tool: > git clone https://github.com/wilicc/gpu-burn > cd gpu-burn > make > then run > gpu-burn 300 > to test for 5 minutes. > > -- > Szilárd > > > On Thu, Mar 28, 2019 at 3:46 PM Tafelmeier, Stefanie < > stefanie.tafelme..
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Hi Szilárd, thanks for your advices. I performed the tests. Both performed without errors. Just to get it right; I have to ask in more detail, because the connection between is the CPU/GPU and calculation distribution is still a bit blurry to me: If the output of the regressiontests show that the test crashes after 1-2 steps, this means there is an issue between the transfer between the CPU and GPU? As far as I got the short range calculation part is normally split into nonbonded -> GPU and bonded -> CPU? And does this mean that maybe also the calculation I do, have wrong energies? Can I trust my results? Many thanks again for your support. Best wishes, Steffi -Ursprüngliche Nachricht- Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd Páll Gesendet: Freitag, 29. März 2019 01:24 An: Discussion list for GROMACS users Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs Hi, The standard output of the first set of runs is also something I was interested in, but I've found the equivalent in the complex/TESTDIR/mdrun.out files. What I see in the regresiontests output is that the forces/energies results are simply not correct; some tests simply crash after 1-2 steps, but others do complete (like the nbnxn-free-energy/) and the short-range energies a clearly far off. I suggest to try to check if there may be hardware issue: - run this memory testing tool: git clone https://github.com/ComputationalRadiationPhysics/cuda_memtest.git cd cuda_memtest make cuda_memtest CFLAGS='-arch sm_30 -DSM_20 -O3 -DENABLE_NVML=0' ./cuda_memtest - compile and run the gpu-burn tool: git clone https://github.com/wilicc/gpu-burn cd gpu-burn make then run gpu-burn 300 to test for 5 minutes. -- Szilárd On Thu, Mar 28, 2019 at 3:46 PM Tafelmeier, Stefanie < stefanie.tafelme...@zae-bayern.de> wrote: > Hi Szilárd, > > Thanks again! > > Regarding the test: > -ntmpi 1 -ntomp 22 -pin on -pinstride 1: 2 out of 5 run > https://it-service.zae-bayern.de/Team/index.php/s/XEQrYqq4pikGmMy / > https://it-service.zae-bayern.de/Team/index.php/s/YBdKKJ9c7zQpEg9 > Including: > -nsteps 0 -nb gpu -pme cpu -bonded cpu: 0 run > https://it-service.zae-bayern.de/Team/index.php/s/YiByc7iXW5AW9ZX > -nsteps 0 -nb gpu -pme gpu -bonded cpu: 2 out of 5 run > https://it-service.zae-bayern.de/Team/index.php/s/JNPXQnEgYtTAxGj / > https://it-service.zae-bayern.de/Team/index.php/s/6aq6BQwwbBELqWe > -nsteps 0 -nb gpu -pme gpu -bonded gpu: 0 run > https://it-service.zae-bayern.de/Team/index.php/s/yj4RAqPMFsDNgTc > > Including: > -ntmpi 1 -ntomp 22 -pin on -pinstride 2: 1 out of 5 run > https://it-service.zae-bayern.de/Team/index.php/s/q5jHbdJ2EygtDaQ / > https://it-service.zae-bayern.de/Team/index.php/s/sRPccwHRxojW9J8 > -nsteps 0 -nb gpu -pme cpu -bonded cpu: 0 run > https://it-service.zae-bayern.de/Team/index.php/s/GdKk5N68CY7BGxJ > -nsteps 0 -nb gpu -pme gpu -bonded cpu: 1 out of 5 run > https://it-service.zae-bayern.de/Team/index.php/s/orwzKJMampWwDo5 / > https://it-service.zae-bayern.de/Team/index.php/s/JXApT4tFtxQWxG6 > -nsteps 0 -nb gpu -pme gpu -bonded gpu: 0 run > https://it-service.zae-bayern.de/Team/index.php/s/8YKK7Zxax22RfGQ > > Including: > -ntmpi 1 -ntomp 22 -pin on -pinstride 4: 1 out of 5 run > https://it-service.zae-bayern.de/Team/index.php/s/szZjzaxmwfimrgB / > https://it-service.zae-bayern.de/Team/index.php/s/QdTd2an9dbE9BSt > -nsteps 0 -nb gpu -pme cpu -bonded cpu: 3 out of 5 run > https://it-service.zae-bayern.de/Team/index.php/s/DPoqKrgcWfF5PKM / > https://it-service.zae-bayern.de/Team/index.php/s/3NbsGHtCPsf7zFS > -nsteps 0 -nb gpu -pme gpu -bonded cpu: 3 out of 5 run > https://it-service.zae-bayern.de/Team/index.php/s/WqP4tXjrR8i3455 / > https://it-service.zae-bayern.de/Team/index.php/s/DACGc86xxKR6pWs > -nsteps 0 -nb gpu -pme gpu -bonded gpu: 0 run > https://it-service.zae-bayern.de/Team/index.php/s/3nKdwA28KySLEdB > > > Regarding the regressiontest: > Here is the link to the tarball: > https://it-service.zae-bayern.de/Team/index.php/s/mMyt3MPEfRrn8Ge > > > Thanks again for all your support and fingers crossed! > > Best wishes, > Steffi > > > > > > -Ursprüngliche Nachricht- > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd > Páll > Gesendet: Mittwoch, 27. März 2019 20:27 > An: Discussion list for GROMACS users > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs > > Hi Steffi, > > On Wed, Mar 27, 2019 at 1:08 PM Tafelmeier, Stefanie < > stefanie.tafelme...@zae-bayern.de&g
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Hi, The standard output of the first set of runs is also something I was interested in, but I've found the equivalent in the complex/TESTDIR/mdrun.out files. What I see in the regresiontests output is that the forces/energies results are simply not correct; some tests simply crash after 1-2 steps, but others do complete (like the nbnxn-free-energy/) and the short-range energies a clearly far off. I suggest to try to check if there may be hardware issue: - run this memory testing tool: git clone https://github.com/ComputationalRadiationPhysics/cuda_memtest.git cd cuda_memtest make cuda_memtest CFLAGS='-arch sm_30 -DSM_20 -O3 -DENABLE_NVML=0' ./cuda_memtest - compile and run the gpu-burn tool: git clone https://github.com/wilicc/gpu-burn cd gpu-burn make then run gpu-burn 300 to test for 5 minutes. -- Szilárd On Thu, Mar 28, 2019 at 3:46 PM Tafelmeier, Stefanie < stefanie.tafelme...@zae-bayern.de> wrote: > Hi Szilárd, > > Thanks again! > > Regarding the test: > -ntmpi 1 -ntomp 22 -pin on -pinstride 1: 2 out of 5 run > https://it-service.zae-bayern.de/Team/index.php/s/XEQrYqq4pikGmMy / > https://it-service.zae-bayern.de/Team/index.php/s/YBdKKJ9c7zQpEg9 > Including: > -nsteps 0 -nb gpu -pme cpu -bonded cpu: 0 run > https://it-service.zae-bayern.de/Team/index.php/s/YiByc7iXW5AW9ZX > -nsteps 0 -nb gpu -pme gpu -bonded cpu: 2 out of 5 run > https://it-service.zae-bayern.de/Team/index.php/s/JNPXQnEgYtTAxGj / > https://it-service.zae-bayern.de/Team/index.php/s/6aq6BQwwbBELqWe > -nsteps 0 -nb gpu -pme gpu -bonded gpu: 0 run > https://it-service.zae-bayern.de/Team/index.php/s/yj4RAqPMFsDNgTc > > Including: > -ntmpi 1 -ntomp 22 -pin on -pinstride 2: 1 out of 5 run > https://it-service.zae-bayern.de/Team/index.php/s/q5jHbdJ2EygtDaQ / > https://it-service.zae-bayern.de/Team/index.php/s/sRPccwHRxojW9J8 > -nsteps 0 -nb gpu -pme cpu -bonded cpu: 0 run > https://it-service.zae-bayern.de/Team/index.php/s/GdKk5N68CY7BGxJ > -nsteps 0 -nb gpu -pme gpu -bonded cpu: 1 out of 5 run > https://it-service.zae-bayern.de/Team/index.php/s/orwzKJMampWwDo5 / > https://it-service.zae-bayern.de/Team/index.php/s/JXApT4tFtxQWxG6 > -nsteps 0 -nb gpu -pme gpu -bonded gpu: 0 run > https://it-service.zae-bayern.de/Team/index.php/s/8YKK7Zxax22RfGQ > > Including: > -ntmpi 1 -ntomp 22 -pin on -pinstride 4: 1 out of 5 run > https://it-service.zae-bayern.de/Team/index.php/s/szZjzaxmwfimrgB / > https://it-service.zae-bayern.de/Team/index.php/s/QdTd2an9dbE9BSt > -nsteps 0 -nb gpu -pme cpu -bonded cpu: 3 out of 5 run > https://it-service.zae-bayern.de/Team/index.php/s/DPoqKrgcWfF5PKM / > https://it-service.zae-bayern.de/Team/index.php/s/3NbsGHtCPsf7zFS > -nsteps 0 -nb gpu -pme gpu -bonded cpu: 3 out of 5 run > https://it-service.zae-bayern.de/Team/index.php/s/WqP4tXjrR8i3455 / > https://it-service.zae-bayern.de/Team/index.php/s/DACGc86xxKR6pWs > -nsteps 0 -nb gpu -pme gpu -bonded gpu: 0 run > https://it-service.zae-bayern.de/Team/index.php/s/3nKdwA28KySLEdB > > > Regarding the regressiontest: > Here is the link to the tarball: > https://it-service.zae-bayern.de/Team/index.php/s/mMyt3MPEfRrn8Ge > > > Thanks again for all your support and fingers crossed! > > Best wishes, > Steffi > > > > > > -Ursprüngliche Nachricht- > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd > Páll > Gesendet: Mittwoch, 27. März 2019 20:27 > An: Discussion list for GROMACS users > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs > > Hi Steffi, > > On Wed, Mar 27, 2019 at 1:08 PM Tafelmeier, Stefanie < > stefanie.tafelme...@zae-bayern.de> wrote: > > > Hi Szilárd, > > > > thanks again! > > Here are the links for the log files, that didn't run: > > Old patch: > > -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran* > > https://it-service.zae-bayern.de/Team/index.php/s/b4AYiMCoHeNgJH3 > > -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran* > > https://it-service.zae-bayern.de/Team/index.php/s/JEP2iwFFZCebZLF > > -ntmpi 1 -ntomp 22 -pin on -pinstride 4:one out of 5 ran > > https://it-service.zae-bayern.de/Team/index.php/s/apra2zS7FHdqDQy > > > > New patch: > > -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran* > > https://it-service.zae-bayern.de/Team/index.php/s/jAD52jBgNddrS3w > > -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran* > > https://it-service.zae-bayern.de/Team/index.php/s/bcRjtz7r9NekzKB > > -ntmpi 1 -ntomp 22 -pin on -pinstride 4:none ran* > > https://it
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Hi Szilárd, Thanks again! Regarding the test: -ntmpi 1 -ntomp 22 -pin on -pinstride 1: 2 out of 5 run https://it-service.zae-bayern.de/Team/index.php/s/XEQrYqq4pikGmMy / https://it-service.zae-bayern.de/Team/index.php/s/YBdKKJ9c7zQpEg9 Including: -nsteps 0 -nb gpu -pme cpu -bonded cpu: 0 run https://it-service.zae-bayern.de/Team/index.php/s/YiByc7iXW5AW9ZX -nsteps 0 -nb gpu -pme gpu -bonded cpu: 2 out of 5 run https://it-service.zae-bayern.de/Team/index.php/s/JNPXQnEgYtTAxGj / https://it-service.zae-bayern.de/Team/index.php/s/6aq6BQwwbBELqWe -nsteps 0 -nb gpu -pme gpu -bonded gpu: 0 run https://it-service.zae-bayern.de/Team/index.php/s/yj4RAqPMFsDNgTc Including: -ntmpi 1 -ntomp 22 -pin on -pinstride 2: 1 out of 5 run https://it-service.zae-bayern.de/Team/index.php/s/q5jHbdJ2EygtDaQ / https://it-service.zae-bayern.de/Team/index.php/s/sRPccwHRxojW9J8 -nsteps 0 -nb gpu -pme cpu -bonded cpu: 0 run https://it-service.zae-bayern.de/Team/index.php/s/GdKk5N68CY7BGxJ -nsteps 0 -nb gpu -pme gpu -bonded cpu: 1 out of 5 run https://it-service.zae-bayern.de/Team/index.php/s/orwzKJMampWwDo5 / https://it-service.zae-bayern.de/Team/index.php/s/JXApT4tFtxQWxG6 -nsteps 0 -nb gpu -pme gpu -bonded gpu: 0 run https://it-service.zae-bayern.de/Team/index.php/s/8YKK7Zxax22RfGQ Including: -ntmpi 1 -ntomp 22 -pin on -pinstride 4: 1 out of 5 run https://it-service.zae-bayern.de/Team/index.php/s/szZjzaxmwfimrgB / https://it-service.zae-bayern.de/Team/index.php/s/QdTd2an9dbE9BSt -nsteps 0 -nb gpu -pme cpu -bonded cpu: 3 out of 5 run https://it-service.zae-bayern.de/Team/index.php/s/DPoqKrgcWfF5PKM / https://it-service.zae-bayern.de/Team/index.php/s/3NbsGHtCPsf7zFS -nsteps 0 -nb gpu -pme gpu -bonded cpu: 3 out of 5 run https://it-service.zae-bayern.de/Team/index.php/s/WqP4tXjrR8i3455 / https://it-service.zae-bayern.de/Team/index.php/s/DACGc86xxKR6pWs -nsteps 0 -nb gpu -pme gpu -bonded gpu: 0 run https://it-service.zae-bayern.de/Team/index.php/s/3nKdwA28KySLEdB Regarding the regressiontest: Here is the link to the tarball: https://it-service.zae-bayern.de/Team/index.php/s/mMyt3MPEfRrn8Ge Thanks again for all your support and fingers crossed! Best wishes, Steffi -Ursprüngliche Nachricht- Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd Páll Gesendet: Mittwoch, 27. März 2019 20:27 An: Discussion list for GROMACS users Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs Hi Steffi, On Wed, Mar 27, 2019 at 1:08 PM Tafelmeier, Stefanie < stefanie.tafelme...@zae-bayern.de> wrote: > Hi Szilárd, > > thanks again! > Here are the links for the log files, that didn't run: > Old patch: > -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran* > https://it-service.zae-bayern.de/Team/index.php/s/b4AYiMCoHeNgJH3 > -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran* > https://it-service.zae-bayern.de/Team/index.php/s/JEP2iwFFZCebZLF > -ntmpi 1 -ntomp 22 -pin on -pinstride 4:one out of 5 ran > https://it-service.zae-bayern.de/Team/index.php/s/apra2zS7FHdqDQy > > New patch: > -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran* > https://it-service.zae-bayern.de/Team/index.php/s/jAD52jBgNddrS3w > -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran* > https://it-service.zae-bayern.de/Team/index.php/s/bcRjtz7r9NekzKB > -ntmpi 1 -ntomp 22 -pin on -pinstride 4:none ran* > https://it-service.zae-bayern.de/Team/index.php/s/b3zp8DNztjE6ssF > This still doesn't tell much more unfortunately. Two more things to try (can be combined) - please set build with setting first cmake . -DCMAKE_BUILD_TYPE=RelWithAssert this may give us some extra debugging information during runs - please use this patch now -- it will print some additional stuff to the standard error output so please grab that and share it: https://termbin.com/zq4q (you can redirect the output e.g. by gmx mdrun > mdrun.out 2>&1) - try running (with the above binary build + patch) the above failing case repeasted a few times: -nsteps 0 -nb gpu -pme cpu -bonded cpu -nsteps 0 -nb gpu -pme gpu -bonded cpu -nsteps 0 -nb gpu -pme gpu -bonded gpu > Regarding the Regressiontest: > > Sorry I didn't get it at the first time. > If the md.log files are enough here is a folder for the failed parts of > the complex regression test: > https://it-service.zae-bayern.de/Team/index.php/s/64KAQBgNoPm4rJ2 > > If you need any other files or the full directories please let me know. > Hmmm, looks like there are more issues here, some log files look truncated others indicate termination by LINCS errors. Yes, the mdrun.out and checkpot* files would be useful. How
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Hi Steffi, On Wed, Mar 27, 2019 at 1:08 PM Tafelmeier, Stefanie < stefanie.tafelme...@zae-bayern.de> wrote: > Hi Szilárd, > > thanks again! > Here are the links for the log files, that didn't run: > Old patch: > -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran* > https://it-service.zae-bayern.de/Team/index.php/s/b4AYiMCoHeNgJH3 > -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran* > https://it-service.zae-bayern.de/Team/index.php/s/JEP2iwFFZCebZLF > -ntmpi 1 -ntomp 22 -pin on -pinstride 4:one out of 5 ran > https://it-service.zae-bayern.de/Team/index.php/s/apra2zS7FHdqDQy > > New patch: > -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran* > https://it-service.zae-bayern.de/Team/index.php/s/jAD52jBgNddrS3w > -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran* > https://it-service.zae-bayern.de/Team/index.php/s/bcRjtz7r9NekzKB > -ntmpi 1 -ntomp 22 -pin on -pinstride 4:none ran* > https://it-service.zae-bayern.de/Team/index.php/s/b3zp8DNztjE6ssF > This still doesn't tell much more unfortunately. Two more things to try (can be combined) - please set build with setting first cmake . -DCMAKE_BUILD_TYPE=RelWithAssert this may give us some extra debugging information during runs - please use this patch now -- it will print some additional stuff to the standard error output so please grab that and share it: https://termbin.com/zq4q (you can redirect the output e.g. by gmx mdrun > mdrun.out 2>&1) - try running (with the above binary build + patch) the above failing case repeasted a few times: -nsteps 0 -nb gpu -pme cpu -bonded cpu -nsteps 0 -nb gpu -pme gpu -bonded cpu -nsteps 0 -nb gpu -pme gpu -bonded gpu > Regarding the Regressiontest: > > Sorry I didn't get it at the first time. > If the md.log files are enough here is a folder for the failed parts of > the complex regression test: > https://it-service.zae-bayern.de/Team/index.php/s/64KAQBgNoPm4rJ2 > > If you need any other files or the full directories please let me know. > Hmmm, looks like there are more issues here, some log files look truncated others indicate termination by LINCS errors. Yes, the mdrun.out and checkpot* files would be useful. How about just making a tarball of the whole complex directory and sharing that? Hopefully these tests will shed some light on what the issue is. Cheers, -- Szilard Again, a lot of thank for your support. > Best wishes, > Steffi > > > > > > > > > > > -Ursprüngliche Nachricht- > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd > Páll > Gesendet: Dienstag, 26. März 2019 16:57 > An: Discussion list for GROMACS users > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs > > Hi Steffi, > > Thanks for running the tests; yes, the patch file was meant to be applied > to the unchanged GROMACS 2019 code. > > Please also share the log files from thr failed runs, not just the > copy-paste of the fatal error -- as a result of the additional check there > might have been a note printed which I was after. > > Regarding the regression tests, what I would like to have is the actual > directories of the tests that failed, i.e. as your log indicates a few of > the complex tests at least. > > Cheers, > -- > Szilárd > > On Tue, Mar 26, 2019 at 1:44 PM Tafelmeier, Stefanie < > stefanie.tafelme...@zae-bayern.de> wrote: > > > Hi Szilárd, > > > > thanks again for your answer. > > Regarding the tests: > > without the new patch: > > > > -ntmpi 1 -ntomp 11 -pin on -pinstride 1:all ran > > -ntmpi 1 -ntomp 11 -pin on -pinstride 2:all ran > > -ntmpi 1 -ntomp 11 -pin on -pinstride 4:all ran > > -ntmpi 1 -ntomp 11 -pin on -pinstride 8:all ran > > and > > -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran* > > -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran* > > -ntmpi 1 -ntomp 22 -pin on -pinstride 4:one out of 5 ran > > > > > > With the new patch (devicebuffer.cuh had to be the original, right? The > > already patched didn't work as the lines didn't fit, as far as I > > understood.): > > > > -ntmpi 1 -ntomp 11 -pin on -pinstride 1:all ran > > -ntmpi 1 -ntomp 11 -pin on -pinstride 2:all ran > > -ntmpi 1 -ntomp 11 -pin on -pinstride 4:all ran > > -ntmpi 1 -ntomp 11 -pin on -pinstride 8:all ran > > and > > -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran* > > -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran* > > -ntmpi 1 -ntomp 22 -pin on -pinstride 4:
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Hi Szilárd, thanks again! Here are the links for the log files, that didn't run: Old patch: -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran* https://it-service.zae-bayern.de/Team/index.php/s/b4AYiMCoHeNgJH3 -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran* https://it-service.zae-bayern.de/Team/index.php/s/JEP2iwFFZCebZLF -ntmpi 1 -ntomp 22 -pin on -pinstride 4:one out of 5 ran https://it-service.zae-bayern.de/Team/index.php/s/apra2zS7FHdqDQy New patch: -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran* https://it-service.zae-bayern.de/Team/index.php/s/jAD52jBgNddrS3w -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran* https://it-service.zae-bayern.de/Team/index.php/s/bcRjtz7r9NekzKB -ntmpi 1 -ntomp 22 -pin on -pinstride 4:none ran* https://it-service.zae-bayern.de/Team/index.php/s/b3zp8DNztjE6ssF Regarding the Regressiontest: Sorry I didn't get it at the first time. If the md.log files are enough here is a folder for the failed parts of the complex regression test: https://it-service.zae-bayern.de/Team/index.php/s/64KAQBgNoPm4rJ2 If you need any other files or the full directories please let me know. Again, a lot of thank for your support. Best wishes, Steffi -Ursprüngliche Nachricht- Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd Páll Gesendet: Dienstag, 26. März 2019 16:57 An: Discussion list for GROMACS users Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs Hi Steffi, Thanks for running the tests; yes, the patch file was meant to be applied to the unchanged GROMACS 2019 code. Please also share the log files from thr failed runs, not just the copy-paste of the fatal error -- as a result of the additional check there might have been a note printed which I was after. Regarding the regression tests, what I would like to have is the actual directories of the tests that failed, i.e. as your log indicates a few of the complex tests at least. Cheers, -- Szilárd On Tue, Mar 26, 2019 at 1:44 PM Tafelmeier, Stefanie < stefanie.tafelme...@zae-bayern.de> wrote: > Hi Szilárd, > > thanks again for your answer. > Regarding the tests: > without the new patch: > > -ntmpi 1 -ntomp 11 -pin on -pinstride 1:all ran > -ntmpi 1 -ntomp 11 -pin on -pinstride 2:all ran > -ntmpi 1 -ntomp 11 -pin on -pinstride 4:all ran > -ntmpi 1 -ntomp 11 -pin on -pinstride 8:all ran > and > -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran* > -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran* > -ntmpi 1 -ntomp 22 -pin on -pinstride 4:one out of 5 ran > > > With the new patch (devicebuffer.cuh had to be the original, right? The > already patched didn't work as the lines didn't fit, as far as I > understood.): > > -ntmpi 1 -ntomp 11 -pin on -pinstride 1:all ran > -ntmpi 1 -ntomp 11 -pin on -pinstride 2:all ran > -ntmpi 1 -ntomp 11 -pin on -pinstride 4:all ran > -ntmpi 1 -ntomp 11 -pin on -pinstride 8:all ran > and > -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran* > -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran* > -ntmpi 1 -ntomp 22 -pin on -pinstride 4:none ran* > > * Fatal error: > Asynchronous H2D copy failed: invalid argument > > > Regarding the regressiontest: > The LastTest.log is available here: > https://it-service.zae-bayern.de/Team/index.php/s/3sdki7Cf2x2CEQi > this was not given in the log: > The following tests FAILED: > 42 - regressiontests/complex (Timeout) > 46 - regressiontests/essentialdynamics (Failed) > Errors while running CTest > CMakeFiles/run-ctest-nophys.dir/build.make:57: recipe for target > 'CMakeFiles/run-ctest-nophys' failed > make[3]: *** [CMakeFiles/run-ctest-nophys] Error 8 > CMakeFiles/Makefile2:1397: recipe for target > 'CMakeFiles/run-ctest-nophys.dir/all'failed > make[2]: *** [CMakeFiles/run-ctest-nophys.dir/all] Error 2 > CMakeFiles/Makefile2:1177: recipe for target > 'CMakeFiles/check.dir/rule' failed > make[1]: *** [CMakeFiles/check.dir/rule] Error 2 > Makefile:626: recipe for target 'check' failed > make: *** [check] Error 2 > > Many thanks again. > Best wishes, > Steffi > > > > > > -Ursprüngliche Nachricht----- > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd > Páll > Gesendet: Montag, 25. März 2019 20:13 > An: Discussion list for GROMACS users > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and groma
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Hi Steffi, Thanks for running the tests; yes, the patch file was meant to be applied to the unchanged GROMACS 2019 code. Please also share the log files from thr failed runs, not just the copy-paste of the fatal error -- as a result of the additional check there might have been a note printed which I was after. Regarding the regression tests, what I would like to have is the actual directories of the tests that failed, i.e. as your log indicates a few of the complex tests at least. Cheers, -- Szilárd On Tue, Mar 26, 2019 at 1:44 PM Tafelmeier, Stefanie < stefanie.tafelme...@zae-bayern.de> wrote: > Hi Szilárd, > > thanks again for your answer. > Regarding the tests: > without the new patch: > > -ntmpi 1 -ntomp 11 -pin on -pinstride 1:all ran > -ntmpi 1 -ntomp 11 -pin on -pinstride 2:all ran > -ntmpi 1 -ntomp 11 -pin on -pinstride 4:all ran > -ntmpi 1 -ntomp 11 -pin on -pinstride 8:all ran > and > -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran* > -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran* > -ntmpi 1 -ntomp 22 -pin on -pinstride 4:one out of 5 ran > > > With the new patch (devicebuffer.cuh had to be the original, right? The > already patched didn't work as the lines didn't fit, as far as I > understood.): > > -ntmpi 1 -ntomp 11 -pin on -pinstride 1:all ran > -ntmpi 1 -ntomp 11 -pin on -pinstride 2:all ran > -ntmpi 1 -ntomp 11 -pin on -pinstride 4:all ran > -ntmpi 1 -ntomp 11 -pin on -pinstride 8:all ran > and > -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran* > -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran* > -ntmpi 1 -ntomp 22 -pin on -pinstride 4:none ran* > > * Fatal error: > Asynchronous H2D copy failed: invalid argument > > > Regarding the regressiontest: > The LastTest.log is available here: > https://it-service.zae-bayern.de/Team/index.php/s/3sdki7Cf2x2CEQi > this was not given in the log: > The following tests FAILED: > 42 - regressiontests/complex (Timeout) > 46 - regressiontests/essentialdynamics (Failed) > Errors while running CTest > CMakeFiles/run-ctest-nophys.dir/build.make:57: recipe for target > 'CMakeFiles/run-ctest-nophys' failed > make[3]: *** [CMakeFiles/run-ctest-nophys] Error 8 > CMakeFiles/Makefile2:1397: recipe for target > 'CMakeFiles/run-ctest-nophys.dir/all'failed > make[2]: *** [CMakeFiles/run-ctest-nophys.dir/all] Error 2 > CMakeFiles/Makefile2:1177: recipe for target > 'CMakeFiles/check.dir/rule' failed > make[1]: *** [CMakeFiles/check.dir/rule] Error 2 > Makefile:626: recipe for target 'check' failed > make: *** [check] Error 2 > > Many thanks again. > Best wishes, > Steffi > > > > > > -Ursprüngliche Nachricht- > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd > Páll > Gesendet: Montag, 25. März 2019 20:13 > An: Discussion list for GROMACS users > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs > > Hi, > > > > -- > Szilárd > > > On Mon, Mar 18, 2019 at 2:34 PM Tafelmeier, Stefanie < > stefanie.tafelme...@zae-bayern.de> wrote: > > > Hi, > > > > Many thanks again. > > > > Regarding the tests: > > - ntmpi 1 -ntomp 22 -pin on > > >OK, so this suggests that your previously successful 22-thread runs did > > not > > turn on pinning, I assume? > > It seems so, yet it does not run successfully each time. But if done with > > 20-threads, which works usually without error, it does not look like the > > pinning is turned on. > > > > Pinning is only turned on if mdrun can safely assume that the cores of the > node are not shared by multiple applications. This assumption can only be > made if all hardware threads of the entire node are used the run itself > (i.e. in your case 2x22 cores with HyperThreadince hence 2 threads each = > 88 threads). > > -ntmpi 1 -ntomp 1 -pin on; runs > > -ntmpi 1 -ntomp 2 -pin on; runs > > > > - ntmpi 24 -ntomp 1 -pinstride 1 -pin on; runs > > - ntmpi 24 -ntomp 1 -pinstride 2 -pin on; runs > > > > After patch supplied: > > - ntmpi 1 -ntomp 22 -pin on; sometime runs - sometimes doesn't* -> > > md_run.log at : > > https://it-service.zae-bayern.de/Team/index.php/s/ezXWnQ2pGNeFx6T > > > > md_norun.log at: > > https://it-service.zae-bayern.de/Team/index.php/s/wYPY7dWEJdwmqJi > > - ntmpi 1 -ntomp 22 -pin off; sometime runs - sometimes doesn
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Hi Szilárd, thanks again for your answer. Regarding the tests: without the new patch: -ntmpi 1 -ntomp 11 -pin on -pinstride 1:all ran -ntmpi 1 -ntomp 11 -pin on -pinstride 2:all ran -ntmpi 1 -ntomp 11 -pin on -pinstride 4:all ran -ntmpi 1 -ntomp 11 -pin on -pinstride 8:all ran and -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran* -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran* -ntmpi 1 -ntomp 22 -pin on -pinstride 4:one out of 5 ran With the new patch (devicebuffer.cuh had to be the original, right? The already patched didn't work as the lines didn't fit, as far as I understood.): -ntmpi 1 -ntomp 11 -pin on -pinstride 1:all ran -ntmpi 1 -ntomp 11 -pin on -pinstride 2:all ran -ntmpi 1 -ntomp 11 -pin on -pinstride 4:all ran -ntmpi 1 -ntomp 11 -pin on -pinstride 8:all ran and -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran* -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran* -ntmpi 1 -ntomp 22 -pin on -pinstride 4:none ran* * Fatal error: Asynchronous H2D copy failed: invalid argument Regarding the regressiontest: The LastTest.log is available here: https://it-service.zae-bayern.de/Team/index.php/s/3sdki7Cf2x2CEQi this was not given in the log: The following tests FAILED: 42 - regressiontests/complex (Timeout) 46 - regressiontests/essentialdynamics (Failed) Errors while running CTest CMakeFiles/run-ctest-nophys.dir/build.make:57: recipe for target 'CMakeFiles/run-ctest-nophys' failed make[3]: *** [CMakeFiles/run-ctest-nophys] Error 8 CMakeFiles/Makefile2:1397: recipe for target 'CMakeFiles/run-ctest-nophys.dir/all'failed make[2]: *** [CMakeFiles/run-ctest-nophys.dir/all] Error 2 CMakeFiles/Makefile2:1177: recipe for target 'CMakeFiles/check.dir/rule' failed make[1]: *** [CMakeFiles/check.dir/rule] Error 2 Makefile:626: recipe for target 'check' failed make: *** [check] Error 2 Many thanks again. Best wishes, Steffi -Ursprüngliche Nachricht- Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd Páll Gesendet: Montag, 25. März 2019 20:13 An: Discussion list for GROMACS users Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs Hi, -- Szilárd On Mon, Mar 18, 2019 at 2:34 PM Tafelmeier, Stefanie < stefanie.tafelme...@zae-bayern.de> wrote: > Hi, > > Many thanks again. > > Regarding the tests: > - ntmpi 1 -ntomp 22 -pin on > >OK, so this suggests that your previously successful 22-thread runs did > not > turn on pinning, I assume? > It seems so, yet it does not run successfully each time. But if done with > 20-threads, which works usually without error, it does not look like the > pinning is turned on. > Pinning is only turned on if mdrun can safely assume that the cores of the node are not shared by multiple applications. This assumption can only be made if all hardware threads of the entire node are used the run itself (i.e. in your case 2x22 cores with HyperThreadince hence 2 threads each = 88 threads). -ntmpi 1 -ntomp 1 -pin on; runs > -ntmpi 1 -ntomp 2 -pin on; runs > > - ntmpi 24 -ntomp 1 -pinstride 1 -pin on; runs > - ntmpi 24 -ntomp 1 -pinstride 2 -pin on; runs > > After patch supplied: > - ntmpi 1 -ntomp 22 -pin on; sometime runs - sometimes doesn't* -> > md_run.log at : > https://it-service.zae-bayern.de/Team/index.php/s/ezXWnQ2pGNeFx6T > > md_norun.log at: > https://it-service.zae-bayern.de/Team/index.php/s/wYPY7dWEJdwmqJi > - ntmpi 1 -ntomp 22 -pin off; sometime runs - sometimes doesn't* (ran > before) > - ntmpi 1 -ntomp 23 -pin off; doesn't work* (ran before) > > - ntmpi 1 -ntomp 23 -pinstride 1 -pin on; doesn't work* > > - ntmpi 1 -ntomp 23 -pinstride 2 -pin on; doesn't work* (ran before) > The suspicious thing is that the patch I made only improves the verbosity of the error reporting, it should have no impact on whether the error is triggered or not. Considering the above behavior it seems that pinning (at least the patters tried) has no influence on whether the runs work. Can you please try: -ntmpi 1 -ntomp 11 -pin on -pinstride 1 -ntmpi 1 -ntomp 11 -pin on -pinstride 2 -ntmpi 1 -ntomp 11 -pin on -pinstride 4 -ntmpi 1 -ntomp 11 -pin on -pinstride 8 and -ntmpi 1 -ntomp 22 -pin on -pinstride 1 -ntmpi 1 -ntomp 22 -pin on -pinstride 2 -ntmpi 1 -ntomp 22 -pin on -pinstride 4 And please run these 5 times each (-nsteps 0 is fine to make things quick). Also, please use this patch https://termbin.com/r8kk The same way as you did the one before, it adds another check that might shed some light on what's going on. - ntmpi 24 -ntomp 1 -pinstride 1 -pin on; runs &g
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Hi, -- Szilárd On Mon, Mar 18, 2019 at 2:34 PM Tafelmeier, Stefanie < stefanie.tafelme...@zae-bayern.de> wrote: > Hi, > > Many thanks again. > > Regarding the tests: > - ntmpi 1 -ntomp 22 -pin on > >OK, so this suggests that your previously successful 22-thread runs did > not > turn on pinning, I assume? > It seems so, yet it does not run successfully each time. But if done with > 20-threads, which works usually without error, it does not look like the > pinning is turned on. > Pinning is only turned on if mdrun can safely assume that the cores of the node are not shared by multiple applications. This assumption can only be made if all hardware threads of the entire node are used the run itself (i.e. in your case 2x22 cores with HyperThreadince hence 2 threads each = 88 threads). -ntmpi 1 -ntomp 1 -pin on; runs > -ntmpi 1 -ntomp 2 -pin on; runs > > - ntmpi 24 -ntomp 1 -pinstride 1 -pin on; runs > - ntmpi 24 -ntomp 1 -pinstride 2 -pin on; runs > > After patch supplied: > - ntmpi 1 -ntomp 22 -pin on; sometime runs - sometimes doesn't* -> > md_run.log at : > https://it-service.zae-bayern.de/Team/index.php/s/ezXWnQ2pGNeFx6T > > md_norun.log at: > https://it-service.zae-bayern.de/Team/index.php/s/wYPY7dWEJdwmqJi > - ntmpi 1 -ntomp 22 -pin off; sometime runs - sometimes doesn't* (ran > before) > - ntmpi 1 -ntomp 23 -pin off; doesn't work* (ran before) > > - ntmpi 1 -ntomp 23 -pinstride 1 -pin on; doesn't work* > > - ntmpi 1 -ntomp 23 -pinstride 2 -pin on; doesn't work* (ran before) > The suspicious thing is that the patch I made only improves the verbosity of the error reporting, it should have no impact on whether the error is triggered or not. Considering the above behavior it seems that pinning (at least the patters tried) has no influence on whether the runs work. Can you please try: -ntmpi 1 -ntomp 11 -pin on -pinstride 1 -ntmpi 1 -ntomp 11 -pin on -pinstride 2 -ntmpi 1 -ntomp 11 -pin on -pinstride 4 -ntmpi 1 -ntomp 11 -pin on -pinstride 8 and -ntmpi 1 -ntomp 22 -pin on -pinstride 1 -ntmpi 1 -ntomp 22 -pin on -pinstride 2 -ntmpi 1 -ntomp 22 -pin on -pinstride 4 And please run these 5 times each (-nsteps 0 is fine to make things quick). Also, please use this patch https://termbin.com/r8kk The same way as you did the one before, it adds another check that might shed some light on what's going on. - ntmpi 24 -ntomp 1 -pinstride 1 -pin on; runs > - ntmpi 24 -ntomp 1 -pinstride 2 -pin on; runs > > * Fatal error: > Asynchronous H2D copy failed: invalid argument > > When compiling, the make check shows that the regressiontest-complex and > regressiontest-essential dynamics fail. > I am not sure if this is correlated? > It might be, please share the outputs of the regressiontests. -- Szilárd > Many thanks in advance. > Best wishes, > Steffi > > > > > -Ursprüngliche Nachricht- > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd > Páll > Gesendet: Freitag, 15. März 2019 17:57 > An: Discussion list for GROMACS users > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs > > On Fri, Mar 15, 2019 at 5:02 PM Tafelmeier, Stefanie < > stefanie.tafelme...@zae-bayern.de> wrote: > > > Hi, > > > > about the tests: > > - ntmpi 1 -ntomp 22 -pin on; doesn't work* > > > > OK, so this suggests that your previously successful 22-thread runs did not > turn on pinning, I assume? > Can you please try: > -ntmpi 1 -ntomp 1 -pin on > -ntmpi 1 -ntomp 2 -pin on > that is to check does pinning work at all? > Also, please try one/both of the above (assuming they fail with) same > binary, but CPU-only run, i.e. > -ntmpi 1 -ntomp 1 -pin on -nb cpu > > > > - ntmpi 1 -ntomp 22 -pin off; runs > > - ntmpi 1 -ntomp 23 -pin off; runs > > - ntmpi 1 -ntomp 23 -pinstride 1 -pin on; doesn't work* > > - ntmpi 1 -ntomp 23 -pinstride 2 -pin on; runs > > - ntmpi 23 -ntomp 1 -pinstride 1 -pin on; doesn't work** > > - ntmpi 23 -ntomp 1 -pinstride 2 -pin on; doesn't work** > > > > Just to confirm, can you please run the **'s with either -ntmpi 24 (to > avoid the DD error). > > > > > > *Error as known. > > > > **The number of ranks you selected (23) contains a large prime factor 23. > > In > > most cases this will lead to bad performance. Choose a number with > smaller > > prime factors or set the decomposition (option -dd) manually. > > > > The log file is at: > > https://it-service.zae-bayern.de/Team/index.php/s/fypKB9iZJz8yXq8 > > > > Will have a look and get back with more later.
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Hi, Many thanks again. Regarding the tests: - ntmpi 1 -ntomp 22 -pin on >OK, so this suggests that your previously successful 22-thread runs did not turn on pinning, I assume? It seems so, yet it does not run successfully each time. But if done with 20-threads, which works usually without error, it does not look like the pinning is turned on. -ntmpi 1 -ntomp 1 -pin on; runs -ntmpi 1 -ntomp 2 -pin on; runs - ntmpi 24 -ntomp 1 -pinstride 1 -pin on; runs - ntmpi 24 -ntomp 1 -pinstride 2 -pin on; runs After patch supplied: - ntmpi 1 -ntomp 22 -pin on; sometime runs - sometimes doesn't* -> md_run.log at : https://it-service.zae-bayern.de/Team/index.php/s/ezXWnQ2pGNeFx6T md_norun.log at: https://it-service.zae-bayern.de/Team/index.php/s/wYPY7dWEJdwmqJi - ntmpi 1 -ntomp 22 -pin off; sometime runs - sometimes doesn't* (ran before) - ntmpi 1 -ntomp 23 -pin off; doesn't work* (ran before) - ntmpi 1 -ntomp 23 -pinstride 1 -pin on; doesn't work* - ntmpi 1 -ntomp 23 -pinstride 2 -pin on; doesn't work* (ran before) - ntmpi 24 -ntomp 1 -pinstride 1 -pin on; runs - ntmpi 24 -ntomp 1 -pinstride 2 -pin on; runs * Fatal error: Asynchronous H2D copy failed: invalid argument When compiling, the make check shows that the regressiontest-complex and regressiontest-essential dynamics fail. I am not sure if this is correlated? Many thanks in advance. Best wishes, Steffi -Ursprüngliche Nachricht- Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd Páll Gesendet: Freitag, 15. März 2019 17:57 An: Discussion list for GROMACS users Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs On Fri, Mar 15, 2019 at 5:02 PM Tafelmeier, Stefanie < stefanie.tafelme...@zae-bayern.de> wrote: > Hi, > > about the tests: > - ntmpi 1 -ntomp 22 -pin on; doesn't work* > OK, so this suggests that your previously successful 22-thread runs did not turn on pinning, I assume? Can you please try: -ntmpi 1 -ntomp 1 -pin on -ntmpi 1 -ntomp 2 -pin on that is to check does pinning work at all? Also, please try one/both of the above (assuming they fail with) same binary, but CPU-only run, i.e. -ntmpi 1 -ntomp 1 -pin on -nb cpu > - ntmpi 1 -ntomp 22 -pin off; runs > - ntmpi 1 -ntomp 23 -pin off; runs > - ntmpi 1 -ntomp 23 -pinstride 1 -pin on; doesn't work* > - ntmpi 1 -ntomp 23 -pinstride 2 -pin on; runs > - ntmpi 23 -ntomp 1 -pinstride 1 -pin on; doesn't work** > - ntmpi 23 -ntomp 1 -pinstride 2 -pin on; doesn't work** > Just to confirm, can you please run the **'s with either -ntmpi 24 (to avoid the DD error). > > *Error as known. > > **The number of ranks you selected (23) contains a large prime factor 23. > In > most cases this will lead to bad performance. Choose a number with smaller > prime factors or set the decomposition (option -dd) manually. > > The log file is at: > https://it-service.zae-bayern.de/Team/index.php/s/fypKB9iZJz8yXq8 > Will have a look and get back with more later. > > Many thanks again, > Steffi > > -Ursprüngliche Nachricht- > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd > Páll > Gesendet: Freitag, 15. März 2019 16:27 > An: Discussion list for GROMACS users > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs > > Hi, > > Please share log files with an external service attachments are not > accepted on the list. > > Also, when checking the error with the patch supplied, please run the > following cases -- no long runs are needed just want to know which of these > runs and which of these doesn't: > - ntmpi 1 -ntomp 22 -pin on > - ntmpi 1 -ntomp 22 -pin off > - ntmpi 1 -ntomp 23 -pin off > - ntmpi 1 -ntomp 23 -pinstride 1 -pin on > - ntmpi 1 -ntomp 23 -pinstride 2 -pin on > - ntmpi 23 -ntomp 1 -pinstride 1 -pin on > - ntmpi 23 -ntomp 1 -pinstride 2 -pin on > > Thanks, > -- > Szilárd > > > On Fri, Mar 15, 2019 at 4:04 PM Tafelmeier, Stefanie < > stefanie.tafelme...@zae-bayern.de> wrote: > > > Hi Szilárd, > > > > thanks for the quick reply. > > About the first suggestion, I'll try and give feedback soon. > > > > Regarding the second, I attached the log-file for the case of > > mdrun -v -nt 25 > > Which ends in the known error message. > > > > Again, thanks a lot for your information and help. > > > > Best wishes, > > Steffi > > > > > > > > -Ursprüngliche Nachricht-
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Did you use a binary compiled generated from patched sources? If so can you please also share the exact error message on the standard output? -- Szilárd On Fri, Mar 15, 2019 at 5:57 PM Szilárd Páll wrote: > On Fri, Mar 15, 2019 at 5:02 PM Tafelmeier, Stefanie < > stefanie.tafelme...@zae-bayern.de> wrote: > >> Hi, >> >> about the tests: >> - ntmpi 1 -ntomp 22 -pin on; doesn't work* >> > > OK, so this suggests that your previously successful 22-thread runs did > not turn on pinning, I assume? > Can you please try: > -ntmpi 1 -ntomp 1 -pin on > -ntmpi 1 -ntomp 2 -pin on > that is to check does pinning work at all? > Also, please try one/both of the above (assuming they fail with) same > binary, but CPU-only run, i.e. > -ntmpi 1 -ntomp 1 -pin on -nb cpu > > >> - ntmpi 1 -ntomp 22 -pin off; runs >> - ntmpi 1 -ntomp 23 -pin off; runs >> - ntmpi 1 -ntomp 23 -pinstride 1 -pin on; doesn't work* >> - ntmpi 1 -ntomp 23 -pinstride 2 -pin on; runs >> - ntmpi 23 -ntomp 1 -pinstride 1 -pin on; doesn't work** >> - ntmpi 23 -ntomp 1 -pinstride 2 -pin on; doesn't work** >> > > Just to confirm, can you please run the **'s with either -ntmpi 24 (to > avoid the DD error). > > >> >> *Error as known. >> >> **The number of ranks you selected (23) contains a large prime factor 23. >> In >> most cases this will lead to bad performance. Choose a number with smaller >> prime factors or set the decomposition (option -dd) manually. >> >> The log file is at: >> https://it-service.zae-bayern.de/Team/index.php/s/fypKB9iZJz8yXq8 >> > > Will have a look and get back with more later. > > >> >> Many thanks again, >> Steffi >> >> -Ursprüngliche Nachricht----- >> Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: >> gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von >> Szilárd Páll >> Gesendet: Freitag, 15. März 2019 16:27 >> An: Discussion list for GROMACS users >> Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs >> >> Hi, >> >> Please share log files with an external service attachments are not >> accepted on the list. >> >> Also, when checking the error with the patch supplied, please run the >> following cases -- no long runs are needed just want to know which of >> these >> runs and which of these doesn't: >> - ntmpi 1 -ntomp 22 -pin on >> - ntmpi 1 -ntomp 22 -pin off >> - ntmpi 1 -ntomp 23 -pin off >> - ntmpi 1 -ntomp 23 -pinstride 1 -pin on >> - ntmpi 1 -ntomp 23 -pinstride 2 -pin on >> - ntmpi 23 -ntomp 1 -pinstride 1 -pin on >> - ntmpi 23 -ntomp 1 -pinstride 2 -pin on >> >> Thanks, >> -- >> Szilárd >> >> >> On Fri, Mar 15, 2019 at 4:04 PM Tafelmeier, Stefanie < >> stefanie.tafelme...@zae-bayern.de> wrote: >> >> > Hi Szilárd, >> > >> > thanks for the quick reply. >> > About the first suggestion, I'll try and give feedback soon. >> > >> > Regarding the second, I attached the log-file for the case of >> > mdrun -v -nt 25 >> > Which ends in the known error message. >> > >> > Again, thanks a lot for your information and help. >> > >> > Best wishes, >> > Steffi >> > >> > >> > >> > -Ursprüngliche Nachricht- >> > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: >> > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von >> Szilárd >> > Páll >> > Gesendet: Freitag, 15. März 2019 15:30 >> > An: Discussion list for GROMACS users >> > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs >> > >> > Hi Stefanie, >> > >> > Unless and until the error and performance-related concerns prove to be >> > related, let's keep those separate. >> > >> > I'd first focus on the former. To be honest, I've never encountered >> such an >> > issue where if you use more than a certain number of threads, the run >> > aborts with that error. To investigate further can you please apply the >> > following patch file which hopefully give more context to the error: >> > https://termbin.com/uhgp >> > (e.g. you can execute the following to accomplish that: >> > curl https://termbin.com/uhgp > devicebuffer.cuh.patch && patch -p0 < >> > devicebuffer.cuh.patch) >> > >> > Regarding the performance-related questions, can you pleas
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
On Fri, Mar 15, 2019 at 5:02 PM Tafelmeier, Stefanie < stefanie.tafelme...@zae-bayern.de> wrote: > Hi, > > about the tests: > - ntmpi 1 -ntomp 22 -pin on; doesn't work* > OK, so this suggests that your previously successful 22-thread runs did not turn on pinning, I assume? Can you please try: -ntmpi 1 -ntomp 1 -pin on -ntmpi 1 -ntomp 2 -pin on that is to check does pinning work at all? Also, please try one/both of the above (assuming they fail with) same binary, but CPU-only run, i.e. -ntmpi 1 -ntomp 1 -pin on -nb cpu > - ntmpi 1 -ntomp 22 -pin off; runs > - ntmpi 1 -ntomp 23 -pin off; runs > - ntmpi 1 -ntomp 23 -pinstride 1 -pin on; doesn't work* > - ntmpi 1 -ntomp 23 -pinstride 2 -pin on; runs > - ntmpi 23 -ntomp 1 -pinstride 1 -pin on; doesn't work** > - ntmpi 23 -ntomp 1 -pinstride 2 -pin on; doesn't work** > Just to confirm, can you please run the **'s with either -ntmpi 24 (to avoid the DD error). > > *Error as known. > > **The number of ranks you selected (23) contains a large prime factor 23. > In > most cases this will lead to bad performance. Choose a number with smaller > prime factors or set the decomposition (option -dd) manually. > > The log file is at: > https://it-service.zae-bayern.de/Team/index.php/s/fypKB9iZJz8yXq8 > Will have a look and get back with more later. > > Many thanks again, > Steffi > > -Ursprüngliche Nachricht- > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd > Páll > Gesendet: Freitag, 15. März 2019 16:27 > An: Discussion list for GROMACS users > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs > > Hi, > > Please share log files with an external service attachments are not > accepted on the list. > > Also, when checking the error with the patch supplied, please run the > following cases -- no long runs are needed just want to know which of these > runs and which of these doesn't: > - ntmpi 1 -ntomp 22 -pin on > - ntmpi 1 -ntomp 22 -pin off > - ntmpi 1 -ntomp 23 -pin off > - ntmpi 1 -ntomp 23 -pinstride 1 -pin on > - ntmpi 1 -ntomp 23 -pinstride 2 -pin on > - ntmpi 23 -ntomp 1 -pinstride 1 -pin on > - ntmpi 23 -ntomp 1 -pinstride 2 -pin on > > Thanks, > -- > Szilárd > > > On Fri, Mar 15, 2019 at 4:04 PM Tafelmeier, Stefanie < > stefanie.tafelme...@zae-bayern.de> wrote: > > > Hi Szilárd, > > > > thanks for the quick reply. > > About the first suggestion, I'll try and give feedback soon. > > > > Regarding the second, I attached the log-file for the case of > > mdrun -v -nt 25 > > Which ends in the known error message. > > > > Again, thanks a lot for your information and help. > > > > Best wishes, > > Steffi > > > > > > > > -----Ursprüngliche Nachricht- > > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: > > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von > Szilárd > > Páll > > Gesendet: Freitag, 15. März 2019 15:30 > > An: Discussion list for GROMACS users > > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs > > > > Hi Stefanie, > > > > Unless and until the error and performance-related concerns prove to be > > related, let's keep those separate. > > > > I'd first focus on the former. To be honest, I've never encountered such > an > > issue where if you use more than a certain number of threads, the run > > aborts with that error. To investigate further can you please apply the > > following patch file which hopefully give more context to the error: > > https://termbin.com/uhgp > > (e.g. you can execute the following to accomplish that: > > curl https://termbin.com/uhgp > devicebuffer.cuh.patch && patch -p0 < > > devicebuffer.cuh.patch) > > > > Regarding the performance-related questions, can you please share a full > > log file of the runs so we can see the machine config, simulation > > system/settings, etc. Without that it is hard to judge what's best for > your > > case. However, if you only have a single GPU (which seems to be the case > > based on the log excerpts) along those two rather beefy CPUs, than you > will > > likely not get much benefit from using all cores and it is normal that > you > > see little to no improvement from using cores of a second CPU socket. > > > > Cheers, > > -- > > Szilárd > > > > > > On Thu, Mar 14, 2019 at 12:47 PM Tafelmeier, Stefanie < > > stefanie.tafelme...@zae-bayern.de> wrote:
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Hi, about the tests: - ntmpi 1 -ntomp 22 -pin on; doesn't work* - ntmpi 1 -ntomp 22 -pin off; runs - ntmpi 1 -ntomp 23 -pin off; runs - ntmpi 1 -ntomp 23 -pinstride 1 -pin on; doesn't work* - ntmpi 1 -ntomp 23 -pinstride 2 -pin on; runs - ntmpi 23 -ntomp 1 -pinstride 1 -pin on; doesn't work** - ntmpi 23 -ntomp 1 -pinstride 2 -pin on; doesn't work** *Error as known. **The number of ranks you selected (23) contains a large prime factor 23. In most cases this will lead to bad performance. Choose a number with smaller prime factors or set the decomposition (option -dd) manually. The log file is at: https://it-service.zae-bayern.de/Team/index.php/s/fypKB9iZJz8yXq8 Many thanks again, Steffi -Ursprüngliche Nachricht- Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd Páll Gesendet: Freitag, 15. März 2019 16:27 An: Discussion list for GROMACS users Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs Hi, Please share log files with an external service attachments are not accepted on the list. Also, when checking the error with the patch supplied, please run the following cases -- no long runs are needed just want to know which of these runs and which of these doesn't: - ntmpi 1 -ntomp 22 -pin on - ntmpi 1 -ntomp 22 -pin off - ntmpi 1 -ntomp 23 -pin off - ntmpi 1 -ntomp 23 -pinstride 1 -pin on - ntmpi 1 -ntomp 23 -pinstride 2 -pin on - ntmpi 23 -ntomp 1 -pinstride 1 -pin on - ntmpi 23 -ntomp 1 -pinstride 2 -pin on Thanks, -- Szilárd On Fri, Mar 15, 2019 at 4:04 PM Tafelmeier, Stefanie < stefanie.tafelme...@zae-bayern.de> wrote: > Hi Szilárd, > > thanks for the quick reply. > About the first suggestion, I'll try and give feedback soon. > > Regarding the second, I attached the log-file for the case of > mdrun -v -nt 25 > Which ends in the known error message. > > Again, thanks a lot for your information and help. > > Best wishes, > Steffi > > > > -Ursprüngliche Nachricht- > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd > Páll > Gesendet: Freitag, 15. März 2019 15:30 > An: Discussion list for GROMACS users > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs > > Hi Stefanie, > > Unless and until the error and performance-related concerns prove to be > related, let's keep those separate. > > I'd first focus on the former. To be honest, I've never encountered such an > issue where if you use more than a certain number of threads, the run > aborts with that error. To investigate further can you please apply the > following patch file which hopefully give more context to the error: > https://termbin.com/uhgp > (e.g. you can execute the following to accomplish that: > curl https://termbin.com/uhgp > devicebuffer.cuh.patch && patch -p0 < > devicebuffer.cuh.patch) > > Regarding the performance-related questions, can you please share a full > log file of the runs so we can see the machine config, simulation > system/settings, etc. Without that it is hard to judge what's best for your > case. However, if you only have a single GPU (which seems to be the case > based on the log excerpts) along those two rather beefy CPUs, than you will > likely not get much benefit from using all cores and it is normal that you > see little to no improvement from using cores of a second CPU socket. > > Cheers, > -- > Szilárd > > > On Thu, Mar 14, 2019 at 12:47 PM Tafelmeier, Stefanie < > stefanie.tafelme...@zae-bayern.de> wrote: > > > Dear all, > > > > I was not sure if the email before reached you, but again many thanks for > > your reply Szilárd. > > > > As written below we are still facing a problem with the performance of > > your workstation. > > I wrote before because of the error message when keeping occurring for > > mdrun simulation: > > > > Assertion failed: > > Condition: stat == cudaSuccess > > Asynchronous H2D copy failed > > > > As I mentioned all Versions to install (Gormacs, Cuda, nvcc, gcc) are the > > newest once now. > > > > If I run mdrun without further settings it will lead to this error > > message. If I run it and choose the thread amount directly the mdrun is > > performing well. But only for –nt numbers between 1 – 22. Higher ones > again > > lead to the before mentioned error message. > > > > In order to investigate in more detail, I tried different versions for > > –nt, –ntmpi – ntomp also combined with –npme: > > - The best performance in the sense of ns/day is with –nt 22 >
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Hi, Please share log files with an external service attachments are not accepted on the list. Also, when checking the error with the patch supplied, please run the following cases -- no long runs are needed just want to know which of these runs and which of these doesn't: - ntmpi 1 -ntomp 22 -pin on - ntmpi 1 -ntomp 22 -pin off - ntmpi 1 -ntomp 23 -pin off - ntmpi 1 -ntomp 23 -pinstride 1 -pin on - ntmpi 1 -ntomp 23 -pinstride 2 -pin on - ntmpi 23 -ntomp 1 -pinstride 1 -pin on - ntmpi 23 -ntomp 1 -pinstride 2 -pin on Thanks, -- Szilárd On Fri, Mar 15, 2019 at 4:04 PM Tafelmeier, Stefanie < stefanie.tafelme...@zae-bayern.de> wrote: > Hi Szilárd, > > thanks for the quick reply. > About the first suggestion, I'll try and give feedback soon. > > Regarding the second, I attached the log-file for the case of > mdrun -v -nt 25 > Which ends in the known error message. > > Again, thanks a lot for your information and help. > > Best wishes, > Steffi > > > > -Ursprüngliche Nachricht- > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd > Páll > Gesendet: Freitag, 15. März 2019 15:30 > An: Discussion list for GROMACS users > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs > > Hi Stefanie, > > Unless and until the error and performance-related concerns prove to be > related, let's keep those separate. > > I'd first focus on the former. To be honest, I've never encountered such an > issue where if you use more than a certain number of threads, the run > aborts with that error. To investigate further can you please apply the > following patch file which hopefully give more context to the error: > https://termbin.com/uhgp > (e.g. you can execute the following to accomplish that: > curl https://termbin.com/uhgp > devicebuffer.cuh.patch && patch -p0 < > devicebuffer.cuh.patch) > > Regarding the performance-related questions, can you please share a full > log file of the runs so we can see the machine config, simulation > system/settings, etc. Without that it is hard to judge what's best for your > case. However, if you only have a single GPU (which seems to be the case > based on the log excerpts) along those two rather beefy CPUs, than you will > likely not get much benefit from using all cores and it is normal that you > see little to no improvement from using cores of a second CPU socket. > > Cheers, > -- > Szilárd > > > On Thu, Mar 14, 2019 at 12:47 PM Tafelmeier, Stefanie < > stefanie.tafelme...@zae-bayern.de> wrote: > > > Dear all, > > > > I was not sure if the email before reached you, but again many thanks for > > your reply Szilárd. > > > > As written below we are still facing a problem with the performance of > > your workstation. > > I wrote before because of the error message when keeping occurring for > > mdrun simulation: > > > > Assertion failed: > > Condition: stat == cudaSuccess > > Asynchronous H2D copy failed > > > > As I mentioned all Versions to install (Gormacs, Cuda, nvcc, gcc) are the > > newest once now. > > > > If I run mdrun without further settings it will lead to this error > > message. If I run it and choose the thread amount directly the mdrun is > > performing well. But only for –nt numbers between 1 – 22. Higher ones > again > > lead to the before mentioned error message. > > > > In order to investigate in more detail, I tried different versions for > > –nt, –ntmpi – ntomp also combined with –npme: > > - The best performance in the sense of ns/day is with –nt 22 > > respectively –ntomp 22 alone. But then only 22 threads are involved. > Which > > is fine if I run more than one mdrun simultaneously, as I can distribute > > the other 66 threads. The GPU usage is then around 65%. > > - A similar good performance is reached with mdrun -ntmpi 4 -ntomp > > 18 -npme 1 -pme gpu -nb gpu. But then 44 threads are involved. The GPU > > usage is then around 50%. > > > > I read the information on > > > http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html > > which was very helpful, bur some things are still not clear now to me: > > I was wondering if there is any other enhancement of the performance? Or > > what is the reason, that –nt maximum is at 22 threads? Could this be > > connected to the sockets (see details below) of your workstation? > > It is not clear to me how a number of thread (-nt) higher 22 can lead to > > the error regarding the Asynchronous H2D copy) > > > >
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Hi Szilárd, thanks for the quick reply. About the first suggestion, I'll try and give feedback soon. Regarding the second, I attached the log-file for the case of mdrun -v -nt 25 Which ends in the known error message. Again, thanks a lot for your information and help. Best wishes, Steffi -Ursprüngliche Nachricht- Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd Páll Gesendet: Freitag, 15. März 2019 15:30 An: Discussion list for GROMACS users Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs Hi Stefanie, Unless and until the error and performance-related concerns prove to be related, let's keep those separate. I'd first focus on the former. To be honest, I've never encountered such an issue where if you use more than a certain number of threads, the run aborts with that error. To investigate further can you please apply the following patch file which hopefully give more context to the error: https://termbin.com/uhgp (e.g. you can execute the following to accomplish that: curl https://termbin.com/uhgp > devicebuffer.cuh.patch && patch -p0 < devicebuffer.cuh.patch) Regarding the performance-related questions, can you please share a full log file of the runs so we can see the machine config, simulation system/settings, etc. Without that it is hard to judge what's best for your case. However, if you only have a single GPU (which seems to be the case based on the log excerpts) along those two rather beefy CPUs, than you will likely not get much benefit from using all cores and it is normal that you see little to no improvement from using cores of a second CPU socket. Cheers, -- Szilárd On Thu, Mar 14, 2019 at 12:47 PM Tafelmeier, Stefanie < stefanie.tafelme...@zae-bayern.de> wrote: > Dear all, > > I was not sure if the email before reached you, but again many thanks for > your reply Szilárd. > > As written below we are still facing a problem with the performance of > your workstation. > I wrote before because of the error message when keeping occurring for > mdrun simulation: > > Assertion failed: > Condition: stat == cudaSuccess > Asynchronous H2D copy failed > > As I mentioned all Versions to install (Gormacs, Cuda, nvcc, gcc) are the > newest once now. > > If I run mdrun without further settings it will lead to this error > message. If I run it and choose the thread amount directly the mdrun is > performing well. But only for –nt numbers between 1 – 22. Higher ones again > lead to the before mentioned error message. > > In order to investigate in more detail, I tried different versions for > –nt, –ntmpi – ntomp also combined with –npme: > - The best performance in the sense of ns/day is with –nt 22 > respectively –ntomp 22 alone. But then only 22 threads are involved. Which > is fine if I run more than one mdrun simultaneously, as I can distribute > the other 66 threads. The GPU usage is then around 65%. > - A similar good performance is reached with mdrun -ntmpi 4 -ntomp > 18 -npme 1 -pme gpu -nb gpu. But then 44 threads are involved. The GPU > usage is then around 50%. > > I read the information on > http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html > which was very helpful, bur some things are still not clear now to me: > I was wondering if there is any other enhancement of the performance? Or > what is the reason, that –nt maximum is at 22 threads? Could this be > connected to the sockets (see details below) of your workstation? > It is not clear to me how a number of thread (-nt) higher 22 can lead to > the error regarding the Asynchronous H2D copy) > > Please excuse all these questions. I would appreciate a lot if you might > have a hint for this problem as well. > > Best regards, > Steffi > > - > > The workstation details are: > Running on 1 node with total 44 cores, 88 logical cores, 1 compatible GPU > Hardware detected: > > CPU info: > Vendor: Intel > Brand: Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz > Family: 6 Model: 85 Stepping: 4 > Features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl clfsh > cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq > pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt > x2apic > > Number of AVX-512 FMA units: 2 > Hardware topology: Basic > Sockets, cores, and logical processors: > Socket 0: [ 0 44] [ 1 45] [ 2 46] [ 3 47] [ 4 48] [ > 5 49] [ 6 50] [ 7 51] [ 8 52] [ 9 53] [ 10 54] [ 11 55] > [ 12 56] [ 13 57] [ 14 58] [ 15 59] [ 16 60] [ 17 61] [ 18 > 62] [ 19 63] [ 20 64] [ 21 65] > Socket 1: [
Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
Hi Stefanie, Unless and until the error and performance-related concerns prove to be related, let's keep those separate. I'd first focus on the former. To be honest, I've never encountered such an issue where if you use more than a certain number of threads, the run aborts with that error. To investigate further can you please apply the following patch file which hopefully give more context to the error: https://termbin.com/uhgp (e.g. you can execute the following to accomplish that: curl https://termbin.com/uhgp > devicebuffer.cuh.patch && patch -p0 < devicebuffer.cuh.patch) Regarding the performance-related questions, can you please share a full log file of the runs so we can see the machine config, simulation system/settings, etc. Without that it is hard to judge what's best for your case. However, if you only have a single GPU (which seems to be the case based on the log excerpts) along those two rather beefy CPUs, than you will likely not get much benefit from using all cores and it is normal that you see little to no improvement from using cores of a second CPU socket. Cheers, -- Szilárd On Thu, Mar 14, 2019 at 12:47 PM Tafelmeier, Stefanie < stefanie.tafelme...@zae-bayern.de> wrote: > Dear all, > > I was not sure if the email before reached you, but again many thanks for > your reply Szilárd. > > As written below we are still facing a problem with the performance of > your workstation. > I wrote before because of the error message when keeping occurring for > mdrun simulation: > > Assertion failed: > Condition: stat == cudaSuccess > Asynchronous H2D copy failed > > As I mentioned all Versions to install (Gormacs, Cuda, nvcc, gcc) are the > newest once now. > > If I run mdrun without further settings it will lead to this error > message. If I run it and choose the thread amount directly the mdrun is > performing well. But only for –nt numbers between 1 – 22. Higher ones again > lead to the before mentioned error message. > > In order to investigate in more detail, I tried different versions for > –nt, –ntmpi – ntomp also combined with –npme: > - The best performance in the sense of ns/day is with –nt 22 > respectively –ntomp 22 alone. But then only 22 threads are involved. Which > is fine if I run more than one mdrun simultaneously, as I can distribute > the other 66 threads. The GPU usage is then around 65%. > - A similar good performance is reached with mdrun -ntmpi 4 -ntomp > 18 -npme 1 -pme gpu -nb gpu. But then 44 threads are involved. The GPU > usage is then around 50%. > > I read the information on > http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html > which was very helpful, bur some things are still not clear now to me: > I was wondering if there is any other enhancement of the performance? Or > what is the reason, that –nt maximum is at 22 threads? Could this be > connected to the sockets (see details below) of your workstation? > It is not clear to me how a number of thread (-nt) higher 22 can lead to > the error regarding the Asynchronous H2D copy) > > Please excuse all these questions. I would appreciate a lot if you might > have a hint for this problem as well. > > Best regards, > Steffi > > - > > The workstation details are: > Running on 1 node with total 44 cores, 88 logical cores, 1 compatible GPU > Hardware detected: > > CPU info: > Vendor: Intel > Brand: Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz > Family: 6 Model: 85 Stepping: 4 > Features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl clfsh > cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq > pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt > x2apic > > Number of AVX-512 FMA units: 2 > Hardware topology: Basic > Sockets, cores, and logical processors: > Socket 0: [ 0 44] [ 1 45] [ 2 46] [ 3 47] [ 4 48] [ > 5 49] [ 6 50] [ 7 51] [ 8 52] [ 9 53] [ 10 54] [ 11 55] > [ 12 56] [ 13 57] [ 14 58] [ 15 59] [ 16 60] [ 17 61] [ 18 > 62] [ 19 63] [ 20 64] [ 21 65] > Socket 1: [ 22 66] [ 23 67] [ 24 68] [ 25 69] [ 26 70] [ > 27 71] [ 28 72] [ 29 73] [ 30 74] [ 31 75] [ 32 76] [ 33 77] > [ 34 78] [ 35 79] [ 36 80] [ 37 81] [ 38 82] [ 39 83] [ 40 > 84] [ 41 85] [ 42 86] [ 43 87] > GPU info: > Number of GPUs detected: 1 > #0: NVIDIA Quadro P6000, compute cap.: 6.1, ECC: no, stat: compatible > > - > > > > -Ursprüngliche Nachricht- > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd > Páll > Gesendet: Donnerstag, 31. Januar 2019 17:15 > An: Discussion list for GROMACS users > Betreff: Re: [gmx-users] WG: Issue with CUDA and gromacs > > On Thu, Jan 31, 2019 at 2:14 PM Szilárd Páll > wrote: > > > > On Wed, Jan 30, 2019 at 5:15 PM Tafelmeier, Stefanie > > wrote: > > > > > > Dear all, > > >
[gmx-users] WG: WG: Issue with CUDA and gromacs
Dear all, I was not sure if the email before reached you, but again many thanks for your reply Szilárd. As written below we are still facing a problem with the performance of your workstation. I wrote before because of the error message when keeping occurring for mdrun simulation: Assertion failed: Condition: stat == cudaSuccess Asynchronous H2D copy failed As I mentioned all Versions to install (Gormacs, Cuda, nvcc, gcc) are the newest once now. If I run mdrun without further settings it will lead to this error message. If I run it and choose the thread amount directly the mdrun is performing well. But only for –nt numbers between 1 – 22. Higher ones again lead to the before mentioned error message. In order to investigate in more detail, I tried different versions for –nt, –ntmpi – ntomp also combined with –npme: - The best performance in the sense of ns/day is with –nt 22 respectively –ntomp 22 alone. But then only 22 threads are involved. Which is fine if I run more than one mdrun simultaneously, as I can distribute the other 66 threads. The GPU usage is then around 65%. - A similar good performance is reached with mdrun -ntmpi 4 -ntomp 18 -npme 1 -pme gpu -nb gpu. But then 44 threads are involved. The GPU usage is then around 50%. I read the information on http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html which was very helpful, bur some things are still not clear now to me: I was wondering if there is any other enhancement of the performance? Or what is the reason, that –nt maximum is at 22 threads? Could this be connected to the sockets (see details below) of your workstation? It is not clear to me how a number of thread (-nt) higher 22 can lead to the error regarding the Asynchronous H2D copy) Please excuse all these questions. I would appreciate a lot if you might have a hint for this problem as well. Best regards, Steffi - The workstation details are: Running on 1 node with total 44 cores, 88 logical cores, 1 compatible GPU Hardware detected: CPU info: Vendor: Intel Brand: Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz Family: 6 Model: 85 Stepping: 4 Features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl clfsh cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic Number of AVX-512 FMA units: 2 Hardware topology: Basic Sockets, cores, and logical processors: Socket 0: [ 0 44] [ 1 45] [ 2 46] [ 3 47] [ 4 48] [ 5 49] [ 6 50] [ 7 51] [ 8 52] [ 9 53] [ 10 54] [ 11 55] [ 12 56] [ 13 57] [ 14 58] [ 15 59] [ 16 60] [ 17 61] [ 18 62] [ 19 63] [ 20 64] [ 21 65] Socket 1: [ 22 66] [ 23 67] [ 24 68] [ 25 69] [ 26 70] [ 27 71] [ 28 72] [ 29 73] [ 30 74] [ 31 75] [ 32 76] [ 33 77] [ 34 78] [ 35 79] [ 36 80] [ 37 81] [ 38 82] [ 39 83] [ 40 84] [ 41 85] [ 42 86] [ 43 87] GPU info: Number of GPUs detected: 1 #0: NVIDIA Quadro P6000, compute cap.: 6.1, ECC: no, stat: compatible - -Ursprüngliche Nachricht- Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd Páll Gesendet: Donnerstag, 31. Januar 2019 17:15 An: Discussion list for GROMACS users Betreff: Re: [gmx-users] WG: Issue with CUDA and gromacs On Thu, Jan 31, 2019 at 2:14 PM Szilárd Páll wrote: > > On Wed, Jan 30, 2019 at 5:15 PM Tafelmeier, Stefanie > wrote: > > > > Dear all, > > > > We are facing an issue with the CUDA toolkit. > > We tried several combinations of gromacs versions and CUDA Toolkits. No > > Toolkit older than 9.2 was possible to try as there are no driver for > > nvidia available for a Quadro P6000. > > Gromacs > > Install the latest 410.xx drivers and it will work; the NVIDIA driver > download website (https://www.nvidia.com/Download/index.aspx) > recommends 410.93. > > Here's a system with CUDA 10-compatible driver running o a system with > a P6000: https://termbin.com/ofzo Sorry, I misread that as "CUDA >=9.2 was not possible". Note that the driver is backward compatible, so you can use a new driver with older CUDA versions. Also note that the oldest driver NVIDIA claims to have P6000 support is 390.59 which is, as far as I know, one gen older than the 396 that the CUDA 9.2 toolkit came with. This is however, not something I'd recommend pursuing, use a new driver from the official site with any CUDA version that GROMACS supports and it should be fine. > > > CUDA > > > > Error message > > > > 2019 > > > > 10.0 > > > > gmx mdrun: > > Assertion failed: > > Condition: stat == cudaSuccess > > Asynchronous H2D copy failed > > > > 2019 > > > > 9.2 > > > > gmx mdrun: > > Assertion failed: > > Condition: stat == cudaSuccess > > Asynchronous H2D copy failed > > > > 2018.5 > > > > 9.2 > > > > gmx