Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-04-11 Thread Tafelmeier, Stefanie
Hi Szilárd,

Many thanks, now it is clear to me also how the tests are verified.

This means, I can trust my energy calculation now.

Thanks again,
Steffi


-Ursprüngliche Nachricht-
Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se 
[mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von 
Szilárd Páll
Gesendet: Mittwoch, 10. April 2019 23:44
An: Discussion list for GROMACS users
Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

Hi,
On Wed, Apr 10, 2019 at 4:19 PM Tafelmeier, Stefanie <
stefanie.tafelme...@zae-bayern.de> wrote:

> Dear Szilárd and Jon,
>
> many thanks for your support.
>
> The system was Ubuntu 18.04 LTS, gcc 7.3 and CUDA 9.2.
> We upgraded now gcc (to 8.2) and CUDA (to 10.1).
>
> Now the regressiontests all pass.
> Also the tests Szilárd ask before are all running. Even just using mdrun
> -nt 80 works.
>

Great, this confirms that there was indeed a strange compatibility issue as
Jon suggested.

Many thanks! It seems that this was the origin of the problem.
>
> Just to be sure, I would like to have a look at the short range value of
> the complex test. As before some passed even without having the right
> values.
>

What do you mean by that?


> Is there a way to compare or a list with the correct outcome?
>

When the regressiontests are executed, the output by default lists all
commands that do the test runs as well as those that verify the outputs,
e.g.

$ perl gmxtest.pl complex
[...]
Testing acetonitrilRF . . . gmx grompp -f ./grompp.mdp -c ./conf -r ./conf
-p ./topol -maxwarn 10  >grompp.out 2>grompp.err
gmx check -s1 ./reference_s.tpr -s2 topol.tpr -tol 0.0001 -abstol 0.001
>checktpr.out 2>checktpr.err
 gmx mdrun-nb cpu   -notunepme >mdrun.out 2>&1
gmx check -e ./reference_s.edr -e2 ener.edr -tol 0.001 -abstol 0.05
-lastener Potential >checkpot.out 2>checkpot.err
gmx check -f ./reference_s.trr -f2 traj.trr -tol 0.001 -abstol 0.05
>checkforce.out 2>checkforce.err
PASSED but check mdp file differences

The gmx check commands do the checking and the the reference_s|d files to
comapre against.

--
Szilárd


> Anyway, here is the link to the tar-ball of the complex folder in case
> there is interest:
> https://it-service.zae-bayern.de/Team/index.php/s/mMyt3MPEfRrn8Ge
>
> Many thanks again for your help.
>
> Best wishes,
> Steffi
>
>
>
>
> -Ursprüngliche Nachricht-
> Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:
> gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von
> Jonathan Vincent
> Gesendet: Dienstag, 9. April 2019 22:13
> An: gmx-us...@gromacs.org
> Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
>
> Hi,
>
> Which operating system are you running on? We have seen some strange
> behavior with large number of threads, gcc 7.3 and a newish version of
> glibc. Specifically the default combination that comes with Ubuntu 18.04
> LTS, but it might be more generic than that.
>
> My suggestion would be to update to gcc 8.3 and CUDA 10.1 (which is
> required for CUDA support of gcc 8), which seemed to fix the problem in
> that case.
>
> If you still have problems we can look at this some more.
>
> Jon
>
> -Original Message-
> From: gromacs.org_gmx-users-boun...@maillist.sys.kth.se <
> gromacs.org_gmx-users-boun...@maillist.sys.kth.se> On Behalf Of Szilárd
> Páll
> Sent: 09 April 2019 20:08
> To: Discussion list for GROMACS users 
> Subject: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
>
> Hi,
>
> One more test I realized it may be relevant considering that we had a
> similar report earlier this year on similar CPU hardware:
> can you please compile with -DGMX_SIMD=AVX2_256 and rerun the tests?
>
> --
> Szilárd
>
>
> On Tue, Apr 9, 2019 at 8:35 PM Szilárd Páll 
> wrote:
>
> > Dear Stefanie,
> >
> > On Fri, Apr 5, 2019 at 11:48 AM Tafelmeier, Stefanie <
> > stefanie.tafelme...@zae-bayern.de> wrote:
> >
> >> Hi Szilárd,
> >>
> >> thanks for your advices.
> >> I performed the tests.
> >> Both performed without errors.
> >>
> >
> > OK, that excludes simple and obvious issues.
> > Wild guess, but can you run those again, but this time prefix the
> > command with "taskset -c 22-32"
> > ? This makes the tests use cores 22-32 just to check if using a
> > specific set of cores may somehow trigger an error.
> >
> > What CUDA version did you use to compiler the memtest tool -- was it
> > the same (CUDA 9.2) as the one used for building GROMACS?
> >
> > Just to get it right; I have to ask in more detail, because the
> > connection
> >> between is th

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-04-10 Thread Szilárd Páll
Hi,
On Wed, Apr 10, 2019 at 4:19 PM Tafelmeier, Stefanie <
stefanie.tafelme...@zae-bayern.de> wrote:

> Dear Szilárd and Jon,
>
> many thanks for your support.
>
> The system was Ubuntu 18.04 LTS, gcc 7.3 and CUDA 9.2.
> We upgraded now gcc (to 8.2) and CUDA (to 10.1).
>
> Now the regressiontests all pass.
> Also the tests Szilárd ask before are all running. Even just using mdrun
> -nt 80 works.
>

Great, this confirms that there was indeed a strange compatibility issue as
Jon suggested.

Many thanks! It seems that this was the origin of the problem.
>
> Just to be sure, I would like to have a look at the short range value of
> the complex test. As before some passed even without having the right
> values.
>

What do you mean by that?


> Is there a way to compare or a list with the correct outcome?
>

When the regressiontests are executed, the output by default lists all
commands that do the test runs as well as those that verify the outputs,
e.g.

$ perl gmxtest.pl complex
[...]
Testing acetonitrilRF . . . gmx grompp -f ./grompp.mdp -c ./conf -r ./conf
-p ./topol -maxwarn 10  >grompp.out 2>grompp.err
gmx check -s1 ./reference_s.tpr -s2 topol.tpr -tol 0.0001 -abstol 0.001
>checktpr.out 2>checktpr.err
 gmx mdrun-nb cpu   -notunepme >mdrun.out 2>&1
gmx check -e ./reference_s.edr -e2 ener.edr -tol 0.001 -abstol 0.05
-lastener Potential >checkpot.out 2>checkpot.err
gmx check -f ./reference_s.trr -f2 traj.trr -tol 0.001 -abstol 0.05
>checkforce.out 2>checkforce.err
PASSED but check mdp file differences

The gmx check commands do the checking and the the reference_s|d files to
comapre against.

--
Szilárd


> Anyway, here is the link to the tar-ball of the complex folder in case
> there is interest:
> https://it-service.zae-bayern.de/Team/index.php/s/mMyt3MPEfRrn8Ge
>
> Many thanks again for your help.
>
> Best wishes,
> Steffi
>
>
>
>
> -Ursprüngliche Nachricht-
> Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:
> gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von
> Jonathan Vincent
> Gesendet: Dienstag, 9. April 2019 22:13
> An: gmx-us...@gromacs.org
> Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
>
> Hi,
>
> Which operating system are you running on? We have seen some strange
> behavior with large number of threads, gcc 7.3 and a newish version of
> glibc. Specifically the default combination that comes with Ubuntu 18.04
> LTS, but it might be more generic than that.
>
> My suggestion would be to update to gcc 8.3 and CUDA 10.1 (which is
> required for CUDA support of gcc 8), which seemed to fix the problem in
> that case.
>
> If you still have problems we can look at this some more.
>
> Jon
>
> -Original Message-
> From: gromacs.org_gmx-users-boun...@maillist.sys.kth.se <
> gromacs.org_gmx-users-boun...@maillist.sys.kth.se> On Behalf Of Szilárd
> Páll
> Sent: 09 April 2019 20:08
> To: Discussion list for GROMACS users 
> Subject: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
>
> Hi,
>
> One more test I realized it may be relevant considering that we had a
> similar report earlier this year on similar CPU hardware:
> can you please compile with -DGMX_SIMD=AVX2_256 and rerun the tests?
>
> --
> Szilárd
>
>
> On Tue, Apr 9, 2019 at 8:35 PM Szilárd Páll 
> wrote:
>
> > Dear Stefanie,
> >
> > On Fri, Apr 5, 2019 at 11:48 AM Tafelmeier, Stefanie <
> > stefanie.tafelme...@zae-bayern.de> wrote:
> >
> >> Hi Szilárd,
> >>
> >> thanks for your advices.
> >> I performed the tests.
> >> Both performed without errors.
> >>
> >
> > OK, that excludes simple and obvious issues.
> > Wild guess, but can you run those again, but this time prefix the
> > command with "taskset -c 22-32"
> > ? This makes the tests use cores 22-32 just to check if using a
> > specific set of cores may somehow trigger an error.
> >
> > What CUDA version did you use to compiler the memtest tool -- was it
> > the same (CUDA 9.2) as the one used for building GROMACS?
> >
> > Just to get it right; I have to ask in more detail, because the
> > connection
> >> between is the CPU/GPU and calculation distribution is still a bit
> >> blurry to me:
> >>
> >> If the output of the regressiontests show that the test crashes after
> >> 1-2 steps, this means there is an issue between the transfer between
> >> the CPU and GPU?
> >> As far as I got the short range calculation part is normally split
> >> into nonbonded -> GPU and bonded -> CPU?
> >>
> >
> > The -nb/

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-04-10 Thread Tafelmeier, Stefanie
Dear Szilárd and Jon,

many thanks for your support.

The system was Ubuntu 18.04 LTS, gcc 7.3 and CUDA 9.2.
We upgraded now gcc (to 8.2) and CUDA (to 10.1). 

Now the regressiontests all pass.
Also the tests Szilárd ask before are all running. Even just using mdrun -nt 80 
works.

Many thanks! It seems that this was the origin of the problem.

Just to be sure, I would like to have a look at the short range value of the 
complex test. As before some passed even without having the right values.
Is there a way to compare or a list with the correct outcome?
Anyway, here is the link to the tar-ball of the complex folder in case there is 
interest:   
https://it-service.zae-bayern.de/Team/index.php/s/mMyt3MPEfRrn8Ge 

Many thanks again for your help.

Best wishes,
Steffi 




-Ursprüngliche Nachricht-
Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se 
[mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von 
Jonathan Vincent
Gesendet: Dienstag, 9. April 2019 22:13
An: gmx-us...@gromacs.org
Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

Hi,

Which operating system are you running on? We have seen some strange behavior 
with large number of threads, gcc 7.3 and a newish version of glibc. 
Specifically the default combination that comes with Ubuntu 18.04 LTS, but it 
might be more generic than that. 

My suggestion would be to update to gcc 8.3 and CUDA 10.1 (which is required 
for CUDA support of gcc 8), which seemed to fix the problem in that case.

If you still have problems we can look at this some more.

Jon

-Original Message-
From: gromacs.org_gmx-users-boun...@maillist.sys.kth.se 
 On Behalf Of Szilárd Páll
Sent: 09 April 2019 20:08
To: Discussion list for GROMACS users 
Subject: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

Hi,

One more test I realized it may be relevant considering that we had a similar 
report earlier this year on similar CPU hardware:
can you please compile with -DGMX_SIMD=AVX2_256 and rerun the tests?

--
Szilárd


On Tue, Apr 9, 2019 at 8:35 PM Szilárd Páll  wrote:

> Dear Stefanie,
>
> On Fri, Apr 5, 2019 at 11:48 AM Tafelmeier, Stefanie < 
> stefanie.tafelme...@zae-bayern.de> wrote:
>
>> Hi Szilárd,
>>
>> thanks for your advices.
>> I performed the tests.
>> Both performed without errors.
>>
>
> OK, that excludes simple and obvious issues.
> Wild guess, but can you run those again, but this time prefix the 
> command with "taskset -c 22-32"
> ? This makes the tests use cores 22-32 just to check if using a 
> specific set of cores may somehow trigger an error.
>
> What CUDA version did you use to compiler the memtest tool -- was it 
> the same (CUDA 9.2) as the one used for building GROMACS?
>
> Just to get it right; I have to ask in more detail, because the 
> connection
>> between is the CPU/GPU and calculation distribution is still a bit 
>> blurry to me:
>>
>> If the output of the regressiontests show that the test crashes after 
>> 1-2 steps, this means there is an issue between the transfer between 
>> the CPU and GPU?
>> As far as I got the short range calculation part is normally split 
>> into nonbonded -> GPU and bonded -> CPU?
>>
>
> The -nb/-pme/-bonded flags control which tasks executes where (if not 
> specified defaults control this); the output contains a report which 
> summarizes where the major force tasks are executed, e.g. this is from 
> one of your log files which tells that PP (i.e. particle tasks like 
> short-range
> nonbonded) and the full PME tasks are offloaded to a GPU with ID 0 
> (and to check which GPU is that you can look at the "Hardware 
> detection" section of the log):
>
> 1 GPU selected for this run.
> Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
>   PP:0,PME:0
> PP tasks will do (non-perturbed) short-ranged interactions on the GPU 
> PME tasks will do all aspects on the GPU
>
> For more details, please see
> http://manual.gromacs.org/documentation/2019.1/user-guide/mdrun-perfor
> mance.html#running-mdrun-with-gpus
>
> We have seen two types of errors so far:
> - "Asynchronous H2D copy failed: invalid argument" which is still 
> mysterious to me and has showed up both in your repeated manual runs 
> as well as the regressiontest; as this aborts the run
> - Failing regressiontests with either invalid results or crashes 
> (below above abort): to be honest I do not know what causes these but 
> given that results
>
> The latter errors indicate incorrect results, in your last "complex" 
> tests tarball I saw some tests failing with LINCS errors (and 
> indicating NaN
> values) and a good fraction of tests failing with a GPU-side 
> assertions -- both of which suggest that t

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-04-09 Thread Jonathan Vincent
Hi,

Which operating system are you running on? We have seen some strange behavior 
with large number of threads, gcc 7.3 and a newish version of glibc. 
Specifically the default combination that comes with Ubuntu 18.04 LTS, but it 
might be more generic than that. 

My suggestion would be to update to gcc 8.3 and CUDA 10.1 (which is required 
for CUDA support of gcc 8), which seemed to fix the problem in that case.

If you still have problems we can look at this some more.

Jon

-Original Message-
From: gromacs.org_gmx-users-boun...@maillist.sys.kth.se 
 On Behalf Of Szilárd Páll
Sent: 09 April 2019 20:08
To: Discussion list for GROMACS users 
Subject: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

Hi,

One more test I realized it may be relevant considering that we had a similar 
report earlier this year on similar CPU hardware:
can you please compile with -DGMX_SIMD=AVX2_256 and rerun the tests?

--
Szilárd


On Tue, Apr 9, 2019 at 8:35 PM Szilárd Páll  wrote:

> Dear Stefanie,
>
> On Fri, Apr 5, 2019 at 11:48 AM Tafelmeier, Stefanie < 
> stefanie.tafelme...@zae-bayern.de> wrote:
>
>> Hi Szilárd,
>>
>> thanks for your advices.
>> I performed the tests.
>> Both performed without errors.
>>
>
> OK, that excludes simple and obvious issues.
> Wild guess, but can you run those again, but this time prefix the 
> command with "taskset -c 22-32"
> ? This makes the tests use cores 22-32 just to check if using a 
> specific set of cores may somehow trigger an error.
>
> What CUDA version did you use to compiler the memtest tool -- was it 
> the same (CUDA 9.2) as the one used for building GROMACS?
>
> Just to get it right; I have to ask in more detail, because the 
> connection
>> between is the CPU/GPU and calculation distribution is still a bit 
>> blurry to me:
>>
>> If the output of the regressiontests show that the test crashes after 
>> 1-2 steps, this means there is an issue between the transfer between 
>> the CPU and GPU?
>> As far as I got the short range calculation part is normally split 
>> into nonbonded -> GPU and bonded -> CPU?
>>
>
> The -nb/-pme/-bonded flags control which tasks executes where (if not 
> specified defaults control this); the output contains a report which 
> summarizes where the major force tasks are executed, e.g. this is from 
> one of your log files which tells that PP (i.e. particle tasks like 
> short-range
> nonbonded) and the full PME tasks are offloaded to a GPU with ID 0 
> (and to check which GPU is that you can look at the "Hardware 
> detection" section of the log):
>
> 1 GPU selected for this run.
> Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
>   PP:0,PME:0
> PP tasks will do (non-perturbed) short-ranged interactions on the GPU 
> PME tasks will do all aspects on the GPU
>
> For more details, please see
> http://manual.gromacs.org/documentation/2019.1/user-guide/mdrun-perfor
> mance.html#running-mdrun-with-gpus
>
> We have seen two types of errors so far:
> - "Asynchronous H2D copy failed: invalid argument" which is still 
> mysterious to me and has showed up both in your repeated manual runs 
> as well as the regressiontest; as this aborts the run
> - Failing regressiontests with either invalid results or crashes 
> (below above abort): to be honest I do not know what causes these but 
> given that results
>
> The latter errors indicate incorrect results, in your last "complex" 
> tests tarball I saw some tests failing with LINCS errors (and 
> indicating NaN
> values) and a good fraction of tests failing with a GPU-side 
> assertions -- both of which suggest that things do go wrong on the GPU.
>
> And does this mean that maybe also the calculation I do, have wrong
>> energies? Can I trust my results?
>>
>
> At this point I can unfortunately not recommend running production 
> simulations on this machine.
>
> Will try to continue exploring the possible errors and I hope you can 
> help out with some test:
>
> - Please run the complex regressiontests (using the RelWithAssert 
> binary) by setting the CUDA_LAUNCH_BLOCKING environment variable. This 
> may allow us to reason better about the source of the errors. Also you 
> can reconfigure with cmake -DGMX_OPENMP_MAX_THREADS=128 to avoid the 
> 88 OpenMP thread errors in tests that you encountered yourself.
>
> - Can you please update compiler GROMACS with CUDA 10 and check if 
> either of two kinds of errors does reproduce. (If it does, if you can 
> upgrade the driver I suggest upgrading to CUDA 10.1).
>
>
>
>>
>> Many thanks again for your support.
>> Best wishes,
>

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-04-09 Thread Szilárd Páll
Hi,

One more test I realized it may be relevant considering that we had a
similar report earlier this year on similar CPU hardware:
can you please compile with -DGMX_SIMD=AVX2_256 and rerun the tests?

--
Szilárd


On Tue, Apr 9, 2019 at 8:35 PM Szilárd Páll  wrote:

> Dear Stefanie,
>
> On Fri, Apr 5, 2019 at 11:48 AM Tafelmeier, Stefanie <
> stefanie.tafelme...@zae-bayern.de> wrote:
>
>> Hi Szilárd,
>>
>> thanks for your advices.
>> I performed the tests.
>> Both performed without errors.
>>
>
> OK, that excludes simple and obvious issues.
> Wild guess, but can you run those again, but this time prefix the command
> with
> "taskset -c 22-32"
> ? This makes the tests use cores 22-32 just to check if using a specific
> set of cores may somehow trigger an error.
>
> What CUDA version did you use to compiler the memtest tool -- was it the
> same (CUDA 9.2) as the one used for building GROMACS?
>
> Just to get it right; I have to ask in more detail, because the connection
>> between is the CPU/GPU and calculation distribution is still a bit blurry
>> to me:
>>
>> If the output of the regressiontests show that the test crashes after 1-2
>> steps, this means there is an issue between the transfer between the CPU
>> and GPU?
>> As far as I got the short range calculation part is normally split into
>> nonbonded -> GPU and bonded -> CPU?
>>
>
> The -nb/-pme/-bonded flags control which tasks executes where (if not
> specified defaults control this); the output contains a report which
> summarizes where the major force tasks are executed, e.g. this is from one
> of your log files which tells that PP (i.e. particle tasks like short-range
> nonbonded) and the full PME tasks are offloaded to a GPU with ID 0 (and to
> check which GPU is that you can look at the "Hardware detection" section of
> the log):
>
> 1 GPU selected for this run.
> Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
>   PP:0,PME:0
> PP tasks will do (non-perturbed) short-ranged interactions on the GPU
> PME tasks will do all aspects on the GPU
>
> For more details, please see
> http://manual.gromacs.org/documentation/2019.1/user-guide/mdrun-performance.html#running-mdrun-with-gpus
>
> We have seen two types of errors so far:
> - "Asynchronous H2D copy failed: invalid argument" which is still
> mysterious to me and has showed up both in your repeated manual runs as
> well as the regressiontest; as this aborts the run
> - Failing regressiontests with either invalid results or crashes (below
> above abort): to be honest I do not know what causes these but given that
> results
>
> The latter errors indicate incorrect results, in your last "complex" tests
> tarball I saw some tests failing with LINCS errors (and indicating NaN
> values) and a good fraction of tests failing with a GPU-side assertions --
> both of which suggest that things do go wrong on the GPU.
>
> And does this mean that maybe also the calculation I do, have wrong
>> energies? Can I trust my results?
>>
>
> At this point I can unfortunately not recommend running production
> simulations on this machine.
>
> Will try to continue exploring the possible errors and I hope you can help
> out with some test:
>
> - Please run the complex regressiontests (using the RelWithAssert binary)
> by setting the CUDA_LAUNCH_BLOCKING environment variable. This may allow us
> to reason better about the source of the errors. Also you can reconfigure
> with cmake -DGMX_OPENMP_MAX_THREADS=128 to avoid the 88 OpenMP thread
> errors in tests that you encountered yourself.
>
> - Can you please update compiler GROMACS with CUDA 10 and check if either
> of two kinds of errors does reproduce. (If it does, if you can upgrade the
> driver I suggest upgrading to CUDA 10.1).
>
>
>
>>
>> Many thanks again for your support.
>> Best wishes,
>> Steffi
>>
>>
> --
> Szilárd
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-04-09 Thread Szilárd Páll
Dear Stefanie,

On Fri, Apr 5, 2019 at 11:48 AM Tafelmeier, Stefanie <
stefanie.tafelme...@zae-bayern.de> wrote:

> Hi Szilárd,
>
> thanks for your advices.
> I performed the tests.
> Both performed without errors.
>

OK, that excludes simple and obvious issues.
Wild guess, but can you run those again, but this time prefix the command
with
"taskset -c 22-32"
? This makes the tests use cores 22-32 just to check if using a specific
set of cores may somehow trigger an error.

What CUDA version did you use to compiler the memtest tool -- was it the
same (CUDA 9.2) as the one used for building GROMACS?

Just to get it right; I have to ask in more detail, because the connection
> between is the CPU/GPU and calculation distribution is still a bit blurry
> to me:
>
> If the output of the regressiontests show that the test crashes after 1-2
> steps, this means there is an issue between the transfer between the CPU
> and GPU?
> As far as I got the short range calculation part is normally split into
> nonbonded -> GPU and bonded -> CPU?
>

The -nb/-pme/-bonded flags control which tasks executes where (if not
specified defaults control this); the output contains a report which
summarizes where the major force tasks are executed, e.g. this is from one
of your log files which tells that PP (i.e. particle tasks like short-range
nonbonded) and the full PME tasks are offloaded to a GPU with ID 0 (and to
check which GPU is that you can look at the "Hardware detection" section of
the log):

1 GPU selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
  PP:0,PME:0
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
PME tasks will do all aspects on the GPU

For more details, please see
http://manual.gromacs.org/documentation/2019.1/user-guide/mdrun-performance.html#running-mdrun-with-gpus

We have seen two types of errors so far:
- "Asynchronous H2D copy failed: invalid argument" which is still
mysterious to me and has showed up both in your repeated manual runs as
well as the regressiontest; as this aborts the run
- Failing regressiontests with either invalid results or crashes (below
above abort): to be honest I do not know what causes these but given that
results

The latter errors indicate incorrect results, in your last "complex" tests
tarball I saw some tests failing with LINCS errors (and indicating NaN
values) and a good fraction of tests failing with a GPU-side assertions --
both of which suggest that things do go wrong on the GPU.

And does this mean that maybe also the calculation I do, have wrong
> energies? Can I trust my results?
>

At this point I can unfortunately not recommend running production
simulations on this machine.

Will try to continue exploring the possible errors and I hope you can help
out with some test:

- Please run the complex regressiontests (using the RelWithAssert binary)
by setting the CUDA_LAUNCH_BLOCKING environment variable. This may allow us
to reason better about the source of the errors. Also you can reconfigure
with cmake -DGMX_OPENMP_MAX_THREADS=128 to avoid the 88 OpenMP thread
errors in tests that you encountered yourself.

- Can you please update compiler GROMACS with CUDA 10 and check if either
of two kinds of errors does reproduce. (If it does, if you can upgrade the
driver I suggest upgrading to CUDA 10.1).



>
> Many thanks again for your support.
> Best wishes,
> Steffi
>
>
--
Szilárd


>
>
>
> -Ursprüngliche Nachricht-
> Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:
> gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd
> Páll
> Gesendet: Freitag, 29. März 2019 01:24
> An: Discussion list for GROMACS users
> Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
>
> Hi,
>
> The standard output of the first set of runs is also something I was
> interested in, but I've found the equivalent in the
> complex/TESTDIR/mdrun.out files. What I see in the regresiontests output is
> that the forces/energies results are simply not correct; some tests simply
> crash after 1-2 steps, but others do complete (like the nbnxn-free-energy/)
> and the short-range energies a clearly far off.
>
> I suggest to try to check if there may be hardware issue:
>
> - run this memory testing tool:
> git clone
> https://github.com/ComputationalRadiationPhysics/cuda_memtest.git
> cd cuda_memtest
> make cuda_memtest CFLAGS='-arch sm_30 -DSM_20 -O3 -DENABLE_NVML=0'
> ./cuda_memtest
>
> - compile and run the gpu-burn tool:
> git clone https://github.com/wilicc/gpu-burn
> cd gpu-burn
> make
> then run
> gpu-burn 300
> to test for 5 minutes.
>
> --
> Szilárd
>
>
> On Thu, Mar 28, 2019 at 3:46 PM Tafelmeier, Stefanie <
> stefanie.tafelme..

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-04-05 Thread Tafelmeier, Stefanie
Hi Szilárd,

thanks for your advices.
I performed the tests.
Both performed without errors.

Just to get it right; I have to ask in more detail, because the connection 
between is the CPU/GPU and calculation distribution is still a bit blurry to me:

If the output of the regressiontests show that the test crashes after 1-2 
steps, this means there is an issue between the transfer between the CPU and 
GPU?
As far as I got the short range calculation part is normally split into 
nonbonded -> GPU and bonded -> CPU?

And does this mean that maybe also the calculation I do, have wrong energies? 
Can I trust my results?

Many thanks again for your support.
Best wishes,
Steffi




-Ursprüngliche Nachricht-
Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se 
[mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von 
Szilárd Páll
Gesendet: Freitag, 29. März 2019 01:24
An: Discussion list for GROMACS users
Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

Hi,

The standard output of the first set of runs is also something I was
interested in, but I've found the equivalent in the
complex/TESTDIR/mdrun.out files. What I see in the regresiontests output is
that the forces/energies results are simply not correct; some tests simply
crash after 1-2 steps, but others do complete (like the nbnxn-free-energy/)
and the short-range energies a clearly far off.

I suggest to try to check if there may be hardware issue:

- run this memory testing tool:
git clone https://github.com/ComputationalRadiationPhysics/cuda_memtest.git
cd cuda_memtest
make cuda_memtest CFLAGS='-arch sm_30 -DSM_20 -O3 -DENABLE_NVML=0'
./cuda_memtest

- compile and run the gpu-burn tool:
git clone https://github.com/wilicc/gpu-burn
cd gpu-burn
make
then run
gpu-burn 300
to test for 5 minutes.

--
Szilárd


On Thu, Mar 28, 2019 at 3:46 PM Tafelmeier, Stefanie <
stefanie.tafelme...@zae-bayern.de> wrote:

> Hi Szilárd,
>
> Thanks again!
>
> Regarding the test:
>   -ntmpi 1 -ntomp 22 -pin on -pinstride 1:  2 out of 5 run
> https://it-service.zae-bayern.de/Team/index.php/s/XEQrYqq4pikGmMy  /
> https://it-service.zae-bayern.de/Team/index.php/s/YBdKKJ9c7zQpEg9
> Including:
>   -nsteps 0 -nb gpu -pme cpu -bonded cpu:   0 run
> https://it-service.zae-bayern.de/Team/index.php/s/YiByc7iXW5AW9ZX
>   -nsteps 0 -nb gpu -pme gpu -bonded cpu:   2 out of 5 run
> https://it-service.zae-bayern.de/Team/index.php/s/JNPXQnEgYtTAxGj   /
> https://it-service.zae-bayern.de/Team/index.php/s/6aq6BQwwbBELqWe
>   -nsteps 0 -nb gpu -pme gpu -bonded gpu:   0 run
> https://it-service.zae-bayern.de/Team/index.php/s/yj4RAqPMFsDNgTc
>
> Including:
>   -ntmpi 1 -ntomp 22 -pin on -pinstride 2:  1 out of 5 run
> https://it-service.zae-bayern.de/Team/index.php/s/q5jHbdJ2EygtDaQ  /
> https://it-service.zae-bayern.de/Team/index.php/s/sRPccwHRxojW9J8
>   -nsteps 0 -nb gpu -pme cpu -bonded cpu:   0 run
> https://it-service.zae-bayern.de/Team/index.php/s/GdKk5N68CY7BGxJ
>   -nsteps 0 -nb gpu -pme gpu -bonded cpu:   1 out of 5 run
> https://it-service.zae-bayern.de/Team/index.php/s/orwzKJMampWwDo5  /
> https://it-service.zae-bayern.de/Team/index.php/s/JXApT4tFtxQWxG6
>   -nsteps 0 -nb gpu -pme gpu -bonded gpu:   0 run
> https://it-service.zae-bayern.de/Team/index.php/s/8YKK7Zxax22RfGQ
>
> Including:
>   -ntmpi 1 -ntomp 22 -pin on -pinstride 4:  1 out of 5 run
> https://it-service.zae-bayern.de/Team/index.php/s/szZjzaxmwfimrgB  /
> https://it-service.zae-bayern.de/Team/index.php/s/QdTd2an9dbE9BSt
>   -nsteps 0 -nb gpu -pme cpu -bonded cpu:   3 out of 5 run
> https://it-service.zae-bayern.de/Team/index.php/s/DPoqKrgcWfF5PKM  /
> https://it-service.zae-bayern.de/Team/index.php/s/3NbsGHtCPsf7zFS
>   -nsteps 0 -nb gpu -pme gpu -bonded cpu:   3 out of 5 run
> https://it-service.zae-bayern.de/Team/index.php/s/WqP4tXjrR8i3455  /
> https://it-service.zae-bayern.de/Team/index.php/s/DACGc86xxKR6pWs
>   -nsteps 0 -nb gpu -pme gpu -bonded gpu:   0 run
> https://it-service.zae-bayern.de/Team/index.php/s/3nKdwA28KySLEdB
>
>
> Regarding the regressiontest:
> Here is the link to the tarball:
> https://it-service.zae-bayern.de/Team/index.php/s/mMyt3MPEfRrn8Ge
>
>
> Thanks again for all your support and fingers crossed!
>
> Best wishes,
> Steffi
>
>
>
>
>
> -Ursprüngliche Nachricht-
> Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:
> gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd
> Páll
> Gesendet: Mittwoch, 27. März 2019 20:27
> An: Discussion list for GROMACS users
> Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
>
> Hi Steffi,
>
> On Wed, Mar 27, 2019 at 1:08 PM Tafelmeier, Stefanie <
> stefanie.tafelme...@zae-bayern.de&g

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-03-29 Thread Szilárd Páll
Hi,

The standard output of the first set of runs is also something I was
interested in, but I've found the equivalent in the
complex/TESTDIR/mdrun.out files. What I see in the regresiontests output is
that the forces/energies results are simply not correct; some tests simply
crash after 1-2 steps, but others do complete (like the nbnxn-free-energy/)
and the short-range energies a clearly far off.

I suggest to try to check if there may be hardware issue:

- run this memory testing tool:
git clone https://github.com/ComputationalRadiationPhysics/cuda_memtest.git
cd cuda_memtest
make cuda_memtest CFLAGS='-arch sm_30 -DSM_20 -O3 -DENABLE_NVML=0'
./cuda_memtest

- compile and run the gpu-burn tool:
git clone https://github.com/wilicc/gpu-burn
cd gpu-burn
make
then run
gpu-burn 300
to test for 5 minutes.

--
Szilárd


On Thu, Mar 28, 2019 at 3:46 PM Tafelmeier, Stefanie <
stefanie.tafelme...@zae-bayern.de> wrote:

> Hi Szilárd,
>
> Thanks again!
>
> Regarding the test:
>   -ntmpi 1 -ntomp 22 -pin on -pinstride 1:  2 out of 5 run
> https://it-service.zae-bayern.de/Team/index.php/s/XEQrYqq4pikGmMy  /
> https://it-service.zae-bayern.de/Team/index.php/s/YBdKKJ9c7zQpEg9
> Including:
>   -nsteps 0 -nb gpu -pme cpu -bonded cpu:   0 run
> https://it-service.zae-bayern.de/Team/index.php/s/YiByc7iXW5AW9ZX
>   -nsteps 0 -nb gpu -pme gpu -bonded cpu:   2 out of 5 run
> https://it-service.zae-bayern.de/Team/index.php/s/JNPXQnEgYtTAxGj   /
> https://it-service.zae-bayern.de/Team/index.php/s/6aq6BQwwbBELqWe
>   -nsteps 0 -nb gpu -pme gpu -bonded gpu:   0 run
> https://it-service.zae-bayern.de/Team/index.php/s/yj4RAqPMFsDNgTc
>
> Including:
>   -ntmpi 1 -ntomp 22 -pin on -pinstride 2:  1 out of 5 run
> https://it-service.zae-bayern.de/Team/index.php/s/q5jHbdJ2EygtDaQ  /
> https://it-service.zae-bayern.de/Team/index.php/s/sRPccwHRxojW9J8
>   -nsteps 0 -nb gpu -pme cpu -bonded cpu:   0 run
> https://it-service.zae-bayern.de/Team/index.php/s/GdKk5N68CY7BGxJ
>   -nsteps 0 -nb gpu -pme gpu -bonded cpu:   1 out of 5 run
> https://it-service.zae-bayern.de/Team/index.php/s/orwzKJMampWwDo5  /
> https://it-service.zae-bayern.de/Team/index.php/s/JXApT4tFtxQWxG6
>   -nsteps 0 -nb gpu -pme gpu -bonded gpu:   0 run
> https://it-service.zae-bayern.de/Team/index.php/s/8YKK7Zxax22RfGQ
>
> Including:
>   -ntmpi 1 -ntomp 22 -pin on -pinstride 4:  1 out of 5 run
> https://it-service.zae-bayern.de/Team/index.php/s/szZjzaxmwfimrgB  /
> https://it-service.zae-bayern.de/Team/index.php/s/QdTd2an9dbE9BSt
>   -nsteps 0 -nb gpu -pme cpu -bonded cpu:   3 out of 5 run
> https://it-service.zae-bayern.de/Team/index.php/s/DPoqKrgcWfF5PKM  /
> https://it-service.zae-bayern.de/Team/index.php/s/3NbsGHtCPsf7zFS
>   -nsteps 0 -nb gpu -pme gpu -bonded cpu:   3 out of 5 run
> https://it-service.zae-bayern.de/Team/index.php/s/WqP4tXjrR8i3455  /
> https://it-service.zae-bayern.de/Team/index.php/s/DACGc86xxKR6pWs
>   -nsteps 0 -nb gpu -pme gpu -bonded gpu:   0 run
> https://it-service.zae-bayern.de/Team/index.php/s/3nKdwA28KySLEdB
>
>
> Regarding the regressiontest:
> Here is the link to the tarball:
> https://it-service.zae-bayern.de/Team/index.php/s/mMyt3MPEfRrn8Ge
>
>
> Thanks again for all your support and fingers crossed!
>
> Best wishes,
> Steffi
>
>
>
>
>
> -Ursprüngliche Nachricht-
> Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:
> gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd
> Páll
> Gesendet: Mittwoch, 27. März 2019 20:27
> An: Discussion list for GROMACS users
> Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
>
> Hi Steffi,
>
> On Wed, Mar 27, 2019 at 1:08 PM Tafelmeier, Stefanie <
> stefanie.tafelme...@zae-bayern.de> wrote:
>
> > Hi Szilárd,
> >
> > thanks again!
> > Here are the links for the log files, that didn't run:
> > Old patch:
> >  -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran*
> > https://it-service.zae-bayern.de/Team/index.php/s/b4AYiMCoHeNgJH3
> >  -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran*
> > https://it-service.zae-bayern.de/Team/index.php/s/JEP2iwFFZCebZLF
> >  -ntmpi 1 -ntomp 22 -pin on -pinstride 4:one out of 5 ran
> > https://it-service.zae-bayern.de/Team/index.php/s/apra2zS7FHdqDQy
> >
> > New patch:
> >  -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran*
> > https://it-service.zae-bayern.de/Team/index.php/s/jAD52jBgNddrS3w
> >  -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran*
> > https://it-service.zae-bayern.de/Team/index.php/s/bcRjtz7r9NekzKB
> >  -ntmpi 1 -ntomp 22 -pin on -pinstride 4:none ran*
> > https://it

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-03-28 Thread Tafelmeier, Stefanie
Hi Szilárd,

Thanks again!

Regarding the test:
  -ntmpi 1 -ntomp 22 -pin on -pinstride 1:  2 out of 5 run  
https://it-service.zae-bayern.de/Team/index.php/s/XEQrYqq4pikGmMy  /  
https://it-service.zae-bayern.de/Team/index.php/s/YBdKKJ9c7zQpEg9 
Including:  
  -nsteps 0 -nb gpu -pme cpu -bonded cpu:   0 run   
https://it-service.zae-bayern.de/Team/index.php/s/YiByc7iXW5AW9ZX 
  -nsteps 0 -nb gpu -pme gpu -bonded cpu:   2 out of 5 run  
https://it-service.zae-bayern.de/Team/index.php/s/JNPXQnEgYtTAxGj   /  
https://it-service.zae-bayern.de/Team/index.php/s/6aq6BQwwbBELqWe 
  -nsteps 0 -nb gpu -pme gpu -bonded gpu:   0 run   
https://it-service.zae-bayern.de/Team/index.php/s/yj4RAqPMFsDNgTc 

Including:  
  -ntmpi 1 -ntomp 22 -pin on -pinstride 2:  1 out of 5 run 
https://it-service.zae-bayern.de/Team/index.php/s/q5jHbdJ2EygtDaQ  /  
https://it-service.zae-bayern.de/Team/index.php/s/sRPccwHRxojW9J8 
  -nsteps 0 -nb gpu -pme cpu -bonded cpu:   0 run
https://it-service.zae-bayern.de/Team/index.php/s/GdKk5N68CY7BGxJ 
  -nsteps 0 -nb gpu -pme gpu -bonded cpu:   1 out of 5 run  
https://it-service.zae-bayern.de/Team/index.php/s/orwzKJMampWwDo5  /  
https://it-service.zae-bayern.de/Team/index.php/s/JXApT4tFtxQWxG6 
  -nsteps 0 -nb gpu -pme gpu -bonded gpu:   0 run   
https://it-service.zae-bayern.de/Team/index.php/s/8YKK7Zxax22RfGQ 

Including:  
  -ntmpi 1 -ntomp 22 -pin on -pinstride 4:  1 out of 5 run
https://it-service.zae-bayern.de/Team/index.php/s/szZjzaxmwfimrgB  / 
https://it-service.zae-bayern.de/Team/index.php/s/QdTd2an9dbE9BSt 
  -nsteps 0 -nb gpu -pme cpu -bonded cpu:   3 out of 5 run 
https://it-service.zae-bayern.de/Team/index.php/s/DPoqKrgcWfF5PKM  /  
https://it-service.zae-bayern.de/Team/index.php/s/3NbsGHtCPsf7zFS 
  -nsteps 0 -nb gpu -pme gpu -bonded cpu:   3 out of 5 run 
https://it-service.zae-bayern.de/Team/index.php/s/WqP4tXjrR8i3455  /  
https://it-service.zae-bayern.de/Team/index.php/s/DACGc86xxKR6pWs 
  -nsteps 0 -nb gpu -pme gpu -bonded gpu:   0 run   
https://it-service.zae-bayern.de/Team/index.php/s/3nKdwA28KySLEdB 


Regarding the regressiontest:
Here is the link to the tarball: 
https://it-service.zae-bayern.de/Team/index.php/s/mMyt3MPEfRrn8Ge 


Thanks again for all your support and fingers crossed!

Best wishes,
Steffi





-Ursprüngliche Nachricht-
Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se 
[mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von 
Szilárd Páll
Gesendet: Mittwoch, 27. März 2019 20:27
An: Discussion list for GROMACS users
Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

Hi Steffi,

On Wed, Mar 27, 2019 at 1:08 PM Tafelmeier, Stefanie <
stefanie.tafelme...@zae-bayern.de> wrote:

> Hi Szilárd,
>
> thanks again!
> Here are the links for the log files, that didn't run:
> Old patch:
>  -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran*
> https://it-service.zae-bayern.de/Team/index.php/s/b4AYiMCoHeNgJH3
>  -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran*
> https://it-service.zae-bayern.de/Team/index.php/s/JEP2iwFFZCebZLF
>  -ntmpi 1 -ntomp 22 -pin on -pinstride 4:one out of 5 ran
> https://it-service.zae-bayern.de/Team/index.php/s/apra2zS7FHdqDQy
>
> New patch:
>  -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran*
> https://it-service.zae-bayern.de/Team/index.php/s/jAD52jBgNddrS3w
>  -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran*
> https://it-service.zae-bayern.de/Team/index.php/s/bcRjtz7r9NekzKB
>  -ntmpi 1 -ntomp 22 -pin on -pinstride 4:none ran*
> https://it-service.zae-bayern.de/Team/index.php/s/b3zp8DNztjE6ssF
>

This still doesn't tell much more unfortunately.

Two more things to try (can be combined)
- please set build with setting first
cmake . -DCMAKE_BUILD_TYPE=RelWithAssert
this may give us some extra debugging information during runs
- please use this patch now -- it will print some additional stuff to the
standard error output so please grab that and share it:
https://termbin.com/zq4q
(you can redirect the output e.g. by gmx mdrun > mdrun.out 2>&1)
- try running (with the above binary build + patch) the above failing case
repeasted a few times:
  -nsteps 0 -nb gpu -pme cpu -bonded cpu
  -nsteps 0 -nb gpu -pme gpu -bonded cpu
  -nsteps 0 -nb gpu -pme gpu -bonded gpu



> Regarding the Regressiontest:
>
> Sorry I didn't get it at the first time.
> If the md.log files are enough here is a folder for the failed parts of
> the complex regression test:
> https://it-service.zae-bayern.de/Team/index.php/s/64KAQBgNoPm4rJ2
>
> If you need any other files or the full directories please let me know.
>

Hmmm, looks like there are more issues here, some log files look truncated
others indicate termination by LINCS errors. Yes, the mdrun.out and
checkpot* files would be useful. How

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-03-27 Thread Szilárd Páll
Hi Steffi,

On Wed, Mar 27, 2019 at 1:08 PM Tafelmeier, Stefanie <
stefanie.tafelme...@zae-bayern.de> wrote:

> Hi Szilárd,
>
> thanks again!
> Here are the links for the log files, that didn't run:
> Old patch:
>  -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran*
> https://it-service.zae-bayern.de/Team/index.php/s/b4AYiMCoHeNgJH3
>  -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran*
> https://it-service.zae-bayern.de/Team/index.php/s/JEP2iwFFZCebZLF
>  -ntmpi 1 -ntomp 22 -pin on -pinstride 4:one out of 5 ran
> https://it-service.zae-bayern.de/Team/index.php/s/apra2zS7FHdqDQy
>
> New patch:
>  -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran*
> https://it-service.zae-bayern.de/Team/index.php/s/jAD52jBgNddrS3w
>  -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran*
> https://it-service.zae-bayern.de/Team/index.php/s/bcRjtz7r9NekzKB
>  -ntmpi 1 -ntomp 22 -pin on -pinstride 4:none ran*
> https://it-service.zae-bayern.de/Team/index.php/s/b3zp8DNztjE6ssF
>

This still doesn't tell much more unfortunately.

Two more things to try (can be combined)
- please set build with setting first
cmake . -DCMAKE_BUILD_TYPE=RelWithAssert
this may give us some extra debugging information during runs
- please use this patch now -- it will print some additional stuff to the
standard error output so please grab that and share it:
https://termbin.com/zq4q
(you can redirect the output e.g. by gmx mdrun > mdrun.out 2>&1)
- try running (with the above binary build + patch) the above failing case
repeasted a few times:
  -nsteps 0 -nb gpu -pme cpu -bonded cpu
  -nsteps 0 -nb gpu -pme gpu -bonded cpu
  -nsteps 0 -nb gpu -pme gpu -bonded gpu



> Regarding the Regressiontest:
>
> Sorry I didn't get it at the first time.
> If the md.log files are enough here is a folder for the failed parts of
> the complex regression test:
> https://it-service.zae-bayern.de/Team/index.php/s/64KAQBgNoPm4rJ2
>
> If you need any other files or the full directories please let me know.
>

Hmmm, looks like there are more issues here, some log files look truncated
others indicate termination by LINCS errors. Yes, the mdrun.out and
checkpot* files would be useful. How about just making a tarball of the
whole complex directory and sharing that?

Hopefully these tests will shed some light on what the issue is.

Cheers,
--
Szilard

Again, a lot of thank for your support.


> Best wishes,
> Steffi
>
>
>
>
>
>
>
>
>
>
> -Ursprüngliche Nachricht-
> Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:
> gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd
> Páll
> Gesendet: Dienstag, 26. März 2019 16:57
> An: Discussion list for GROMACS users
> Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
>
> Hi Steffi,
>
> Thanks for running the tests; yes, the patch file was meant to be applied
> to the unchanged GROMACS 2019 code.
>
> Please also share the log files from thr failed runs, not just the
> copy-paste of the fatal error -- as a result of the additional check there
> might have been a note printed which I was after.
>
> Regarding the regression tests, what I would like to have is the actual
> directories of the tests that failed, i.e. as your log indicates a few of
> the complex tests at least.
>
> Cheers,
> --
> Szilárd
>
> On Tue, Mar 26, 2019 at 1:44 PM Tafelmeier, Stefanie <
> stefanie.tafelme...@zae-bayern.de> wrote:
>
> > Hi Szilárd,
> >
> > thanks again for your answer.
> > Regarding the tests:
> > without the new patch:
> >
> > -ntmpi 1 -ntomp 11 -pin on -pinstride 1:all ran
> > -ntmpi 1 -ntomp 11 -pin on -pinstride 2:all ran
> > -ntmpi 1 -ntomp 11 -pin on -pinstride 4:all ran
> > -ntmpi 1 -ntomp 11 -pin on -pinstride 8:all ran
> > and
> > -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran*
> > -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran*
> > -ntmpi 1 -ntomp 22 -pin on -pinstride 4:one out of 5 ran
> >
> >
> > With the new patch (devicebuffer.cuh had to be the original, right? The
> > already patched didn't work as the lines didn't fit, as far as I
> > understood.):
> >
> > -ntmpi 1 -ntomp 11 -pin on -pinstride 1:all ran
> > -ntmpi 1 -ntomp 11 -pin on -pinstride 2:all ran
> > -ntmpi 1 -ntomp 11 -pin on -pinstride 4:all ran
> > -ntmpi 1 -ntomp 11 -pin on -pinstride 8:all ran
> > and
> > -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran*
> > -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran*
> > -ntmpi 1 -ntomp 22 -pin on -pinstride 4:

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-03-27 Thread Tafelmeier, Stefanie
Hi Szilárd,

thanks again!
Here are the links for the log files, that didn't run:
Old patch:
 -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran*  
https://it-service.zae-bayern.de/Team/index.php/s/b4AYiMCoHeNgJH3 
 -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran*  
https://it-service.zae-bayern.de/Team/index.php/s/JEP2iwFFZCebZLF 
 -ntmpi 1 -ntomp 22 -pin on -pinstride 4:one out of 5 ran   
https://it-service.zae-bayern.de/Team/index.php/s/apra2zS7FHdqDQy 

New patch:
 -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran*  
https://it-service.zae-bayern.de/Team/index.php/s/jAD52jBgNddrS3w 
 -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran*  
https://it-service.zae-bayern.de/Team/index.php/s/bcRjtz7r9NekzKB 
 -ntmpi 1 -ntomp 22 -pin on -pinstride 4:none ran*  
https://it-service.zae-bayern.de/Team/index.php/s/b3zp8DNztjE6ssF 


Regarding the Regressiontest:

Sorry I didn't get it at the first time.
If the md.log files are enough here is a folder for the failed parts of the 
complex regression test:
https://it-service.zae-bayern.de/Team/index.php/s/64KAQBgNoPm4rJ2 

If you need any other files or the full directories please let me know.

Again, a lot of thank for your support.

Best wishes,
Steffi 










-Ursprüngliche Nachricht-
Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se 
[mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von 
Szilárd Páll
Gesendet: Dienstag, 26. März 2019 16:57
An: Discussion list for GROMACS users
Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

Hi Steffi,

Thanks for running the tests; yes, the patch file was meant to be applied
to the unchanged GROMACS 2019 code.

Please also share the log files from thr failed runs, not just the
copy-paste of the fatal error -- as a result of the additional check there
might have been a note printed which I was after.

Regarding the regression tests, what I would like to have is the actual
directories of the tests that failed, i.e. as your log indicates a few of
the complex tests at least.

Cheers,
--
Szilárd

On Tue, Mar 26, 2019 at 1:44 PM Tafelmeier, Stefanie <
stefanie.tafelme...@zae-bayern.de> wrote:

> Hi Szilárd,
>
> thanks again for your answer.
> Regarding the tests:
> without the new patch:
>
> -ntmpi 1 -ntomp 11 -pin on -pinstride 1:all ran
> -ntmpi 1 -ntomp 11 -pin on -pinstride 2:all ran
> -ntmpi 1 -ntomp 11 -pin on -pinstride 4:all ran
> -ntmpi 1 -ntomp 11 -pin on -pinstride 8:all ran
> and
> -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran*
> -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran*
> -ntmpi 1 -ntomp 22 -pin on -pinstride 4:one out of 5 ran
>
>
> With the new patch (devicebuffer.cuh had to be the original, right? The
> already patched didn't work as the lines didn't fit, as far as I
> understood.):
>
> -ntmpi 1 -ntomp 11 -pin on -pinstride 1:all ran
> -ntmpi 1 -ntomp 11 -pin on -pinstride 2:all ran
> -ntmpi 1 -ntomp 11 -pin on -pinstride 4:all ran
> -ntmpi 1 -ntomp 11 -pin on -pinstride 8:all ran
> and
> -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran*
> -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran*
> -ntmpi 1 -ntomp 22 -pin on -pinstride 4:none ran*
>
> * Fatal error:
> Asynchronous H2D copy failed: invalid argument
>
>
> Regarding the regressiontest:
> The LastTest.log is available here:
> https://it-service.zae-bayern.de/Team/index.php/s/3sdki7Cf2x2CEQi
> this was not given in the log:
> The following tests FAILED:
>  42 - regressiontests/complex (Timeout)
>  46 - regressiontests/essentialdynamics (Failed)
> Errors while running CTest
> CMakeFiles/run-ctest-nophys.dir/build.make:57: recipe for target
> 'CMakeFiles/run-ctest-nophys' failed
> make[3]: *** [CMakeFiles/run-ctest-nophys] Error 8
> CMakeFiles/Makefile2:1397: recipe for target
> 'CMakeFiles/run-ctest-nophys.dir/all'failed
> make[2]: *** [CMakeFiles/run-ctest-nophys.dir/all] Error 2
> CMakeFiles/Makefile2:1177: recipe for target
> 'CMakeFiles/check.dir/rule' failed
> make[1]: *** [CMakeFiles/check.dir/rule] Error 2
> Makefile:626: recipe for target 'check' failed
> make: *** [check] Error 2
>
> Many thanks again.
> Best wishes,
> Steffi
>
>
>
>
>
> -Ursprüngliche Nachricht-----
> Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:
> gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd
> Páll
> Gesendet: Montag, 25. März 2019 20:13
> An: Discussion list for GROMACS users
> Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and groma

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-03-26 Thread Szilárd Páll
Hi Steffi,

Thanks for running the tests; yes, the patch file was meant to be applied
to the unchanged GROMACS 2019 code.

Please also share the log files from thr failed runs, not just the
copy-paste of the fatal error -- as a result of the additional check there
might have been a note printed which I was after.

Regarding the regression tests, what I would like to have is the actual
directories of the tests that failed, i.e. as your log indicates a few of
the complex tests at least.

Cheers,
--
Szilárd

On Tue, Mar 26, 2019 at 1:44 PM Tafelmeier, Stefanie <
stefanie.tafelme...@zae-bayern.de> wrote:

> Hi Szilárd,
>
> thanks again for your answer.
> Regarding the tests:
> without the new patch:
>
> -ntmpi 1 -ntomp 11 -pin on -pinstride 1:all ran
> -ntmpi 1 -ntomp 11 -pin on -pinstride 2:all ran
> -ntmpi 1 -ntomp 11 -pin on -pinstride 4:all ran
> -ntmpi 1 -ntomp 11 -pin on -pinstride 8:all ran
> and
> -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran*
> -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran*
> -ntmpi 1 -ntomp 22 -pin on -pinstride 4:one out of 5 ran
>
>
> With the new patch (devicebuffer.cuh had to be the original, right? The
> already patched didn't work as the lines didn't fit, as far as I
> understood.):
>
> -ntmpi 1 -ntomp 11 -pin on -pinstride 1:all ran
> -ntmpi 1 -ntomp 11 -pin on -pinstride 2:all ran
> -ntmpi 1 -ntomp 11 -pin on -pinstride 4:all ran
> -ntmpi 1 -ntomp 11 -pin on -pinstride 8:all ran
> and
> -ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran*
> -ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran*
> -ntmpi 1 -ntomp 22 -pin on -pinstride 4:none ran*
>
> * Fatal error:
> Asynchronous H2D copy failed: invalid argument
>
>
> Regarding the regressiontest:
> The LastTest.log is available here:
> https://it-service.zae-bayern.de/Team/index.php/s/3sdki7Cf2x2CEQi
> this was not given in the log:
> The following tests FAILED:
>  42 - regressiontests/complex (Timeout)
>  46 - regressiontests/essentialdynamics (Failed)
> Errors while running CTest
> CMakeFiles/run-ctest-nophys.dir/build.make:57: recipe for target
> 'CMakeFiles/run-ctest-nophys' failed
> make[3]: *** [CMakeFiles/run-ctest-nophys] Error 8
> CMakeFiles/Makefile2:1397: recipe for target
> 'CMakeFiles/run-ctest-nophys.dir/all'failed
> make[2]: *** [CMakeFiles/run-ctest-nophys.dir/all] Error 2
> CMakeFiles/Makefile2:1177: recipe for target
> 'CMakeFiles/check.dir/rule' failed
> make[1]: *** [CMakeFiles/check.dir/rule] Error 2
> Makefile:626: recipe for target 'check' failed
> make: *** [check] Error 2
>
> Many thanks again.
> Best wishes,
> Steffi
>
>
>
>
>
> -Ursprüngliche Nachricht-
> Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:
> gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd
> Páll
> Gesendet: Montag, 25. März 2019 20:13
> An: Discussion list for GROMACS users
> Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
>
> Hi,
>
>
>
> --
> Szilárd
>
>
> On Mon, Mar 18, 2019 at 2:34 PM Tafelmeier, Stefanie <
> stefanie.tafelme...@zae-bayern.de> wrote:
>
> > Hi,
> >
> > Many thanks again.
> >
> > Regarding the tests:
> > - ntmpi 1 -ntomp 22 -pin on
> > >OK, so this suggests that your previously successful 22-thread runs did
> > not
> > turn on pinning, I assume?
> > It seems so, yet it does not run successfully each time. But if done with
> > 20-threads, which works usually without error, it does not look like the
> > pinning is turned on.
> >
>
> Pinning is only turned on if mdrun can safely assume that the cores of the
> node are not shared by multiple applications. This assumption can only be
> made if all hardware threads of the entire node are used the run itself
> (i.e. in your case 2x22 cores with HyperThreadince hence 2 threads each =
> 88 threads).
>
> -ntmpi 1 -ntomp 1 -pin on; runs
> > -ntmpi 1 -ntomp 2 -pin on; runs
> >
> > - ntmpi 24 -ntomp 1 -pinstride 1 -pin on; runs
> > - ntmpi 24 -ntomp 1 -pinstride 2 -pin on; runs
> >
> > After patch supplied:
> > - ntmpi 1 -ntomp 22 -pin on; sometime runs - sometimes doesn't*   ->
> > md_run.log at :
> > https://it-service.zae-bayern.de/Team/index.php/s/ezXWnQ2pGNeFx6T
> >
> >  md_norun.log at:
> > https://it-service.zae-bayern.de/Team/index.php/s/wYPY7dWEJdwmqJi
> > - ntmpi 1 -ntomp 22 -pin off; sometime runs - sometimes doesn&#x

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-03-26 Thread Tafelmeier, Stefanie
Hi Szilárd,

thanks again for your answer.
Regarding the tests:
without the new patch:

-ntmpi 1 -ntomp 11 -pin on -pinstride 1:all ran
-ntmpi 1 -ntomp 11 -pin on -pinstride 2:all ran
-ntmpi 1 -ntomp 11 -pin on -pinstride 4:all ran
-ntmpi 1 -ntomp 11 -pin on -pinstride 8:all ran
and
-ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran*
-ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran*
-ntmpi 1 -ntomp 22 -pin on -pinstride 4:one out of 5 ran


With the new patch (devicebuffer.cuh had to be the original, right? The already 
patched didn't work as the lines didn't fit, as far as I understood.):

-ntmpi 1 -ntomp 11 -pin on -pinstride 1:all ran
-ntmpi 1 -ntomp 11 -pin on -pinstride 2:all ran
-ntmpi 1 -ntomp 11 -pin on -pinstride 4:all ran
-ntmpi 1 -ntomp 11 -pin on -pinstride 8:all ran
and
-ntmpi 1 -ntomp 22 -pin on -pinstride 1:none ran*
-ntmpi 1 -ntomp 22 -pin on -pinstride 2:none ran*
-ntmpi 1 -ntomp 22 -pin on -pinstride 4:none ran*

* Fatal error:
Asynchronous H2D copy failed: invalid argument


Regarding the regressiontest:
The LastTest.log is available here:
https://it-service.zae-bayern.de/Team/index.php/s/3sdki7Cf2x2CEQi 
this was not given in the log:
The following tests FAILED:
 42 - regressiontests/complex (Timeout)
 46 - regressiontests/essentialdynamics (Failed)
Errors while running CTest
CMakeFiles/run-ctest-nophys.dir/build.make:57: recipe for target 
'CMakeFiles/run-ctest-nophys' failed
make[3]: *** [CMakeFiles/run-ctest-nophys] Error 8
CMakeFiles/Makefile2:1397: recipe for target 
'CMakeFiles/run-ctest-nophys.dir/all'failed
make[2]: *** [CMakeFiles/run-ctest-nophys.dir/all] Error 2
CMakeFiles/Makefile2:1177: recipe for target 
'CMakeFiles/check.dir/rule' failed
make[1]: *** [CMakeFiles/check.dir/rule] Error 2
Makefile:626: recipe for target 'check' failed
make: *** [check] Error 2

Many thanks again. 
Best wishes,
Steffi





-Ursprüngliche Nachricht-
Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se 
[mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von 
Szilárd Páll
Gesendet: Montag, 25. März 2019 20:13
An: Discussion list for GROMACS users
Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

Hi,



--
Szilárd


On Mon, Mar 18, 2019 at 2:34 PM Tafelmeier, Stefanie <
stefanie.tafelme...@zae-bayern.de> wrote:

> Hi,
>
> Many thanks again.
>
> Regarding the tests:
> - ntmpi 1 -ntomp 22 -pin on
> >OK, so this suggests that your previously successful 22-thread runs did
> not
> turn on pinning, I assume?
> It seems so, yet it does not run successfully each time. But if done with
> 20-threads, which works usually without error, it does not look like the
> pinning is turned on.
>

Pinning is only turned on if mdrun can safely assume that the cores of the
node are not shared by multiple applications. This assumption can only be
made if all hardware threads of the entire node are used the run itself
(i.e. in your case 2x22 cores with HyperThreadince hence 2 threads each =
88 threads).

-ntmpi 1 -ntomp 1 -pin on; runs
> -ntmpi 1 -ntomp 2 -pin on; runs
>
> - ntmpi 24 -ntomp 1 -pinstride 1 -pin on; runs
> - ntmpi 24 -ntomp 1 -pinstride 2 -pin on; runs
>
> After patch supplied:
> - ntmpi 1 -ntomp 22 -pin on; sometime runs - sometimes doesn't*   ->
> md_run.log at :
> https://it-service.zae-bayern.de/Team/index.php/s/ezXWnQ2pGNeFx6T
>
>  md_norun.log at:
> https://it-service.zae-bayern.de/Team/index.php/s/wYPY7dWEJdwmqJi
> - ntmpi 1 -ntomp 22 -pin off; sometime runs - sometimes doesn't*   (ran
> before)
> - ntmpi 1 -ntomp 23 -pin off; doesn't work*  (ran before)
>
> - ntmpi 1 -ntomp 23 -pinstride 1 -pin on; doesn't work*
>
> - ntmpi 1 -ntomp 23 -pinstride 2 -pin on; doesn't work*  (ran before)
>


The suspicious thing is that the patch I made only improves the verbosity
of the error reporting, it should have no impact on whether the error is
triggered or not. Considering the above behavior it seems that pinning (at
least the patters tried) has no influence on whether the runs work.

Can you please try:
-ntmpi 1 -ntomp 11 -pin on -pinstride 1
-ntmpi 1 -ntomp 11 -pin on -pinstride 2
-ntmpi 1 -ntomp 11 -pin on -pinstride 4
-ntmpi 1 -ntomp 11 -pin on -pinstride 8
and
-ntmpi 1 -ntomp 22 -pin on -pinstride 1
-ntmpi 1 -ntomp 22 -pin on -pinstride 2
-ntmpi 1 -ntomp 22 -pin on -pinstride 4

And please run these 5 times each (-nsteps 0 is fine to make things quick).

Also, please use this patch
https://termbin.com/r8kk
The same way as you did the one before, it adds another check that might
shed some light on what's going on.

- ntmpi 24 -ntomp 1 -pinstride 1 -pin on; runs
&g

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-03-25 Thread Szilárd Páll
Hi,



--
Szilárd


On Mon, Mar 18, 2019 at 2:34 PM Tafelmeier, Stefanie <
stefanie.tafelme...@zae-bayern.de> wrote:

> Hi,
>
> Many thanks again.
>
> Regarding the tests:
> - ntmpi 1 -ntomp 22 -pin on
> >OK, so this suggests that your previously successful 22-thread runs did
> not
> turn on pinning, I assume?
> It seems so, yet it does not run successfully each time. But if done with
> 20-threads, which works usually without error, it does not look like the
> pinning is turned on.
>

Pinning is only turned on if mdrun can safely assume that the cores of the
node are not shared by multiple applications. This assumption can only be
made if all hardware threads of the entire node are used the run itself
(i.e. in your case 2x22 cores with HyperThreadince hence 2 threads each =
88 threads).

-ntmpi 1 -ntomp 1 -pin on; runs
> -ntmpi 1 -ntomp 2 -pin on; runs
>
> - ntmpi 24 -ntomp 1 -pinstride 1 -pin on; runs
> - ntmpi 24 -ntomp 1 -pinstride 2 -pin on; runs
>
> After patch supplied:
> - ntmpi 1 -ntomp 22 -pin on; sometime runs - sometimes doesn't*   ->
> md_run.log at :
> https://it-service.zae-bayern.de/Team/index.php/s/ezXWnQ2pGNeFx6T
>
>  md_norun.log at:
> https://it-service.zae-bayern.de/Team/index.php/s/wYPY7dWEJdwmqJi
> - ntmpi 1 -ntomp 22 -pin off; sometime runs - sometimes doesn't*   (ran
> before)
> - ntmpi 1 -ntomp 23 -pin off; doesn't work*  (ran before)
>
> - ntmpi 1 -ntomp 23 -pinstride 1 -pin on; doesn't work*
>
> - ntmpi 1 -ntomp 23 -pinstride 2 -pin on; doesn't work*  (ran before)
>


The suspicious thing is that the patch I made only improves the verbosity
of the error reporting, it should have no impact on whether the error is
triggered or not. Considering the above behavior it seems that pinning (at
least the patters tried) has no influence on whether the runs work.

Can you please try:
-ntmpi 1 -ntomp 11 -pin on -pinstride 1
-ntmpi 1 -ntomp 11 -pin on -pinstride 2
-ntmpi 1 -ntomp 11 -pin on -pinstride 4
-ntmpi 1 -ntomp 11 -pin on -pinstride 8
and
-ntmpi 1 -ntomp 22 -pin on -pinstride 1
-ntmpi 1 -ntomp 22 -pin on -pinstride 2
-ntmpi 1 -ntomp 22 -pin on -pinstride 4

And please run these 5 times each (-nsteps 0 is fine to make things quick).

Also, please use this patch
https://termbin.com/r8kk
The same way as you did the one before, it adds another check that might
shed some light on what's going on.

- ntmpi 24 -ntomp 1 -pinstride 1 -pin on; runs
> - ntmpi 24 -ntomp 1 -pinstride 2 -pin on; runs
>
> * Fatal error:
> Asynchronous H2D copy failed: invalid argument
>
> When compiling, the make check shows that the regressiontest-complex and
> regressiontest-essential dynamics fail.
> I am not sure if this is correlated?
>

It might be, please share the outputs of the regressiontests.

--
Szilárd


> Many thanks in advance.
> Best wishes,
> Steffi
>
>
>
>
> -Ursprüngliche Nachricht-
> Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:
> gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd
> Páll
> Gesendet: Freitag, 15. März 2019 17:57
> An: Discussion list for GROMACS users
> Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
>
> On Fri, Mar 15, 2019 at 5:02 PM Tafelmeier, Stefanie <
> stefanie.tafelme...@zae-bayern.de> wrote:
>
> > Hi,
> >
> > about the tests:
> > - ntmpi 1 -ntomp 22 -pin on; doesn't work*
> >
>
> OK, so this suggests that your previously successful 22-thread runs did not
> turn on pinning, I assume?
> Can you please try:
> -ntmpi 1 -ntomp 1 -pin on
> -ntmpi 1 -ntomp 2 -pin on
> that is to check does pinning work at all?
> Also, please try one/both of the above (assuming they fail with) same
> binary, but CPU-only run, i.e.
> -ntmpi 1 -ntomp 1 -pin on -nb cpu
>
>
> > - ntmpi 1 -ntomp 22 -pin off; runs
> > - ntmpi 1 -ntomp 23 -pin off; runs
> > - ntmpi 1 -ntomp 23 -pinstride 1 -pin on; doesn't work*
> > - ntmpi 1 -ntomp 23 -pinstride 2 -pin on; runs
> > - ntmpi 23 -ntomp 1 -pinstride 1 -pin on; doesn't work**
> > - ntmpi 23 -ntomp 1 -pinstride 2 -pin on; doesn't work**
> >
>
> Just to confirm, can you please run the **'s with either -ntmpi 24 (to
> avoid the DD error).
>
>
> >
> > *Error as known.
> >
> > **The number of ranks you selected (23) contains a large prime factor 23.
> > In
> > most cases this will lead to bad performance. Choose a number with
> smaller
> > prime factors or set the decomposition (option -dd) manually.
> >
> > The log file is at:
> > https://it-service.zae-bayern.de/Team/index.php/s/fypKB9iZJz8yXq8
> >
>
> Will have a look and get back with more later.

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-03-18 Thread Tafelmeier, Stefanie
Hi,

Many thanks again. 

Regarding the tests:
- ntmpi 1 -ntomp 22 -pin on
>OK, so this suggests that your previously successful 22-thread runs did not
turn on pinning, I assume?
It seems so, yet it does not run successfully each time. But if done with 
20-threads, which works usually without error, it does not look like the 
pinning is turned on. 

-ntmpi 1 -ntomp 1 -pin on; runs
-ntmpi 1 -ntomp 2 -pin on; runs

- ntmpi 24 -ntomp 1 -pinstride 1 -pin on; runs
- ntmpi 24 -ntomp 1 -pinstride 2 -pin on; runs

After patch supplied:
- ntmpi 1 -ntomp 22 -pin on; sometime runs - sometimes doesn't*   -> md_run.log 
at : https://it-service.zae-bayern.de/Team/index.php/s/ezXWnQ2pGNeFx6T  
 
md_norun.log at: 
https://it-service.zae-bayern.de/Team/index.php/s/wYPY7dWEJdwmqJi 
- ntmpi 1 -ntomp 22 -pin off; sometime runs - sometimes doesn't*   (ran before) 

- ntmpi 1 -ntomp 23 -pin off; doesn't work*  (ran before)   

- ntmpi 1 -ntomp 23 -pinstride 1 -pin on; doesn't work* 
- ntmpi 1 -ntomp 23 -pinstride 2 -pin on; doesn't work*  (ran before)   

- ntmpi 24 -ntomp 1 -pinstride 1 -pin on; runs
- ntmpi 24 -ntomp 1 -pinstride 2 -pin on; runs

* Fatal error:
Asynchronous H2D copy failed: invalid argument

When compiling, the make check shows that the regressiontest-complex and 
regressiontest-essential dynamics fail.
I am not sure if this is correlated?

Many thanks in advance.
Best wishes,
Steffi




-Ursprüngliche Nachricht-
Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se 
[mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von 
Szilárd Páll
Gesendet: Freitag, 15. März 2019 17:57
An: Discussion list for GROMACS users
Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

On Fri, Mar 15, 2019 at 5:02 PM Tafelmeier, Stefanie <
stefanie.tafelme...@zae-bayern.de> wrote:

> Hi,
>
> about the tests:
> - ntmpi 1 -ntomp 22 -pin on; doesn't work*
>

OK, so this suggests that your previously successful 22-thread runs did not
turn on pinning, I assume?
Can you please try:
-ntmpi 1 -ntomp 1 -pin on
-ntmpi 1 -ntomp 2 -pin on
that is to check does pinning work at all?
Also, please try one/both of the above (assuming they fail with) same
binary, but CPU-only run, i.e.
-ntmpi 1 -ntomp 1 -pin on -nb cpu


> - ntmpi 1 -ntomp 22 -pin off; runs
> - ntmpi 1 -ntomp 23 -pin off; runs
> - ntmpi 1 -ntomp 23 -pinstride 1 -pin on; doesn't work*
> - ntmpi 1 -ntomp 23 -pinstride 2 -pin on; runs
> - ntmpi 23 -ntomp 1 -pinstride 1 -pin on; doesn't work**
> - ntmpi 23 -ntomp 1 -pinstride 2 -pin on; doesn't work**
>

Just to confirm, can you please run the **'s with either -ntmpi 24 (to
avoid the DD error).


>
> *Error as known.
>
> **The number of ranks you selected (23) contains a large prime factor 23.
> In
> most cases this will lead to bad performance. Choose a number with smaller
> prime factors or set the decomposition (option -dd) manually.
>
> The log file is at:
> https://it-service.zae-bayern.de/Team/index.php/s/fypKB9iZJz8yXq8
>

Will have a look and get back with more later.


>
> Many thanks again,
> Steffi
>
> -Ursprüngliche Nachricht-
> Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:
> gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd
> Páll
> Gesendet: Freitag, 15. März 2019 16:27
> An: Discussion list for GROMACS users
> Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
>
> Hi,
>
> Please share log files with an external service attachments are not
> accepted on the list.
>
> Also, when checking the error with the patch supplied, please run the
> following cases -- no long runs are needed just want to know which of these
> runs and which of these doesn't:
> - ntmpi 1 -ntomp 22 -pin on
> - ntmpi 1 -ntomp 22 -pin off
> - ntmpi 1 -ntomp 23 -pin off
> - ntmpi 1 -ntomp 23 -pinstride 1 -pin on
> - ntmpi 1 -ntomp 23 -pinstride 2 -pin on
> - ntmpi 23 -ntomp 1 -pinstride 1 -pin on
> - ntmpi 23 -ntomp 1 -pinstride 2 -pin on
>
> Thanks,
> --
> Szilárd
>
>
> On Fri, Mar 15, 2019 at 4:04 PM Tafelmeier, Stefanie <
> stefanie.tafelme...@zae-bayern.de> wrote:
>
> > Hi Szilárd,
> >
> > thanks for the quick reply.
> > About the first suggestion, I'll try and give feedback soon.
> >
> > Regarding the second, I attached the log-file for the case of
> > mdrun -v -nt 25
> > Which ends in the known error message.
> >
> > Again, thanks a lot for your information and help.
> >
> > Best wishes,
> > Steffi
> >
> >
> >
> > -Ursprüngliche Nachricht-

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-03-15 Thread Szilárd Páll
Did you use a binary compiled generated from patched sources? If so can you
please also share the exact error message on the standard output?
--
Szilárd


On Fri, Mar 15, 2019 at 5:57 PM Szilárd Páll  wrote:

> On Fri, Mar 15, 2019 at 5:02 PM Tafelmeier, Stefanie <
> stefanie.tafelme...@zae-bayern.de> wrote:
>
>> Hi,
>>
>> about the tests:
>> - ntmpi 1 -ntomp 22 -pin on; doesn't work*
>>
>
> OK, so this suggests that your previously successful 22-thread runs did
> not turn on pinning, I assume?
> Can you please try:
> -ntmpi 1 -ntomp 1 -pin on
> -ntmpi 1 -ntomp 2 -pin on
> that is to check does pinning work at all?
> Also, please try one/both of the above (assuming they fail with) same
> binary, but CPU-only run, i.e.
> -ntmpi 1 -ntomp 1 -pin on -nb cpu
>
>
>> - ntmpi 1 -ntomp 22 -pin off; runs
>> - ntmpi 1 -ntomp 23 -pin off; runs
>> - ntmpi 1 -ntomp 23 -pinstride 1 -pin on; doesn't work*
>> - ntmpi 1 -ntomp 23 -pinstride 2 -pin on; runs
>> - ntmpi 23 -ntomp 1 -pinstride 1 -pin on; doesn't work**
>> - ntmpi 23 -ntomp 1 -pinstride 2 -pin on; doesn't work**
>>
>
> Just to confirm, can you please run the **'s with either -ntmpi 24 (to
> avoid the DD error).
>
>
>>
>> *Error as known.
>>
>> **The number of ranks you selected (23) contains a large prime factor 23.
>> In
>> most cases this will lead to bad performance. Choose a number with smaller
>> prime factors or set the decomposition (option -dd) manually.
>>
>> The log file is at:
>> https://it-service.zae-bayern.de/Team/index.php/s/fypKB9iZJz8yXq8
>>
>
> Will have a look and get back with more later.
>
>
>>
>> Many thanks again,
>> Steffi
>>
>> -Ursprüngliche Nachricht-----
>> Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:
>> gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von
>> Szilárd Páll
>> Gesendet: Freitag, 15. März 2019 16:27
>> An: Discussion list for GROMACS users
>> Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
>>
>> Hi,
>>
>> Please share log files with an external service attachments are not
>> accepted on the list.
>>
>> Also, when checking the error with the patch supplied, please run the
>> following cases -- no long runs are needed just want to know which of
>> these
>> runs and which of these doesn't:
>> - ntmpi 1 -ntomp 22 -pin on
>> - ntmpi 1 -ntomp 22 -pin off
>> - ntmpi 1 -ntomp 23 -pin off
>> - ntmpi 1 -ntomp 23 -pinstride 1 -pin on
>> - ntmpi 1 -ntomp 23 -pinstride 2 -pin on
>> - ntmpi 23 -ntomp 1 -pinstride 1 -pin on
>> - ntmpi 23 -ntomp 1 -pinstride 2 -pin on
>>
>> Thanks,
>> --
>> Szilárd
>>
>>
>> On Fri, Mar 15, 2019 at 4:04 PM Tafelmeier, Stefanie <
>> stefanie.tafelme...@zae-bayern.de> wrote:
>>
>> > Hi Szilárd,
>> >
>> > thanks for the quick reply.
>> > About the first suggestion, I'll try and give feedback soon.
>> >
>> > Regarding the second, I attached the log-file for the case of
>> > mdrun -v -nt 25
>> > Which ends in the known error message.
>> >
>> > Again, thanks a lot for your information and help.
>> >
>> > Best wishes,
>> > Steffi
>> >
>> >
>> >
>> > -Ursprüngliche Nachricht-
>> > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:
>> > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von
>> Szilárd
>> > Páll
>> > Gesendet: Freitag, 15. März 2019 15:30
>> > An: Discussion list for GROMACS users
>> > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
>> >
>> > Hi Stefanie,
>> >
>> > Unless and until the error and performance-related concerns prove to be
>> > related, let's keep those separate.
>> >
>> > I'd first focus on the former. To be honest, I've never encountered
>> such an
>> > issue where if you use more than a certain number of threads, the run
>> > aborts with that error. To investigate further can you please apply the
>> > following patch file which hopefully give more context to the error:
>> > https://termbin.com/uhgp
>> > (e.g. you can execute the following to accomplish that:
>> > curl https://termbin.com/uhgp > devicebuffer.cuh.patch && patch -p0 <
>> > devicebuffer.cuh.patch)
>> >
>> > Regarding the performance-related questions, can you pleas

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-03-15 Thread Szilárd Páll
On Fri, Mar 15, 2019 at 5:02 PM Tafelmeier, Stefanie <
stefanie.tafelme...@zae-bayern.de> wrote:

> Hi,
>
> about the tests:
> - ntmpi 1 -ntomp 22 -pin on; doesn't work*
>

OK, so this suggests that your previously successful 22-thread runs did not
turn on pinning, I assume?
Can you please try:
-ntmpi 1 -ntomp 1 -pin on
-ntmpi 1 -ntomp 2 -pin on
that is to check does pinning work at all?
Also, please try one/both of the above (assuming they fail with) same
binary, but CPU-only run, i.e.
-ntmpi 1 -ntomp 1 -pin on -nb cpu


> - ntmpi 1 -ntomp 22 -pin off; runs
> - ntmpi 1 -ntomp 23 -pin off; runs
> - ntmpi 1 -ntomp 23 -pinstride 1 -pin on; doesn't work*
> - ntmpi 1 -ntomp 23 -pinstride 2 -pin on; runs
> - ntmpi 23 -ntomp 1 -pinstride 1 -pin on; doesn't work**
> - ntmpi 23 -ntomp 1 -pinstride 2 -pin on; doesn't work**
>

Just to confirm, can you please run the **'s with either -ntmpi 24 (to
avoid the DD error).


>
> *Error as known.
>
> **The number of ranks you selected (23) contains a large prime factor 23.
> In
> most cases this will lead to bad performance. Choose a number with smaller
> prime factors or set the decomposition (option -dd) manually.
>
> The log file is at:
> https://it-service.zae-bayern.de/Team/index.php/s/fypKB9iZJz8yXq8
>

Will have a look and get back with more later.


>
> Many thanks again,
> Steffi
>
> -Ursprüngliche Nachricht-
> Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:
> gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd
> Páll
> Gesendet: Freitag, 15. März 2019 16:27
> An: Discussion list for GROMACS users
> Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
>
> Hi,
>
> Please share log files with an external service attachments are not
> accepted on the list.
>
> Also, when checking the error with the patch supplied, please run the
> following cases -- no long runs are needed just want to know which of these
> runs and which of these doesn't:
> - ntmpi 1 -ntomp 22 -pin on
> - ntmpi 1 -ntomp 22 -pin off
> - ntmpi 1 -ntomp 23 -pin off
> - ntmpi 1 -ntomp 23 -pinstride 1 -pin on
> - ntmpi 1 -ntomp 23 -pinstride 2 -pin on
> - ntmpi 23 -ntomp 1 -pinstride 1 -pin on
> - ntmpi 23 -ntomp 1 -pinstride 2 -pin on
>
> Thanks,
> --
> Szilárd
>
>
> On Fri, Mar 15, 2019 at 4:04 PM Tafelmeier, Stefanie <
> stefanie.tafelme...@zae-bayern.de> wrote:
>
> > Hi Szilárd,
> >
> > thanks for the quick reply.
> > About the first suggestion, I'll try and give feedback soon.
> >
> > Regarding the second, I attached the log-file for the case of
> > mdrun -v -nt 25
> > Which ends in the known error message.
> >
> > Again, thanks a lot for your information and help.
> >
> > Best wishes,
> > Steffi
> >
> >
> >
> > -----Ursprüngliche Nachricht-
> > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:
> > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von
> Szilárd
> > Páll
> > Gesendet: Freitag, 15. März 2019 15:30
> > An: Discussion list for GROMACS users
> > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
> >
> > Hi Stefanie,
> >
> > Unless and until the error and performance-related concerns prove to be
> > related, let's keep those separate.
> >
> > I'd first focus on the former. To be honest, I've never encountered such
> an
> > issue where if you use more than a certain number of threads, the run
> > aborts with that error. To investigate further can you please apply the
> > following patch file which hopefully give more context to the error:
> > https://termbin.com/uhgp
> > (e.g. you can execute the following to accomplish that:
> > curl https://termbin.com/uhgp > devicebuffer.cuh.patch && patch -p0 <
> > devicebuffer.cuh.patch)
> >
> > Regarding the performance-related questions, can you please share a full
> > log file of the runs so we can see the machine config, simulation
> > system/settings, etc. Without that it is hard to judge what's best for
> your
> > case. However, if you only have a single GPU (which seems to be the case
> > based on the log excerpts) along those two rather beefy CPUs, than you
> will
> > likely not get much benefit from using all cores and it is normal that
> you
> > see little to no improvement from using cores of a second CPU socket.
> >
> > Cheers,
> > --
> > Szilárd
> >
> >
> > On Thu, Mar 14, 2019 at 12:47 PM Tafelmeier, Stefanie <
> > stefanie.tafelme...@zae-bayern.de> wrote:

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-03-15 Thread Tafelmeier, Stefanie
Hi,

about the tests:
- ntmpi 1 -ntomp 22 -pin on; doesn't work*  
- ntmpi 1 -ntomp 22 -pin off; runs
- ntmpi 1 -ntomp 23 -pin off; runs
- ntmpi 1 -ntomp 23 -pinstride 1 -pin on; doesn't work*
- ntmpi 1 -ntomp 23 -pinstride 2 -pin on; runs
- ntmpi 23 -ntomp 1 -pinstride 1 -pin on; doesn't work**
- ntmpi 23 -ntomp 1 -pinstride 2 -pin on; doesn't work**

*Error as known.

**The number of ranks you selected (23) contains a large prime factor 23. In
most cases this will lead to bad performance. Choose a number with smaller
prime factors or set the decomposition (option -dd) manually.

The log file is at: 
https://it-service.zae-bayern.de/Team/index.php/s/fypKB9iZJz8yXq8 

Many thanks again,
Steffi

-Ursprüngliche Nachricht-
Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se 
[mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von 
Szilárd Páll
Gesendet: Freitag, 15. März 2019 16:27
An: Discussion list for GROMACS users
Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

Hi,

Please share log files with an external service attachments are not
accepted on the list.

Also, when checking the error with the patch supplied, please run the
following cases -- no long runs are needed just want to know which of these
runs and which of these doesn't:
- ntmpi 1 -ntomp 22 -pin on
- ntmpi 1 -ntomp 22 -pin off
- ntmpi 1 -ntomp 23 -pin off
- ntmpi 1 -ntomp 23 -pinstride 1 -pin on
- ntmpi 1 -ntomp 23 -pinstride 2 -pin on
- ntmpi 23 -ntomp 1 -pinstride 1 -pin on
- ntmpi 23 -ntomp 1 -pinstride 2 -pin on

Thanks,
--
Szilárd


On Fri, Mar 15, 2019 at 4:04 PM Tafelmeier, Stefanie <
stefanie.tafelme...@zae-bayern.de> wrote:

> Hi Szilárd,
>
> thanks for the quick reply.
> About the first suggestion, I'll try and give feedback soon.
>
> Regarding the second, I attached the log-file for the case of
> mdrun -v -nt 25
> Which ends in the known error message.
>
> Again, thanks a lot for your information and help.
>
> Best wishes,
> Steffi
>
>
>
> -Ursprüngliche Nachricht-
> Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:
> gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd
> Páll
> Gesendet: Freitag, 15. März 2019 15:30
> An: Discussion list for GROMACS users
> Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
>
> Hi Stefanie,
>
> Unless and until the error and performance-related concerns prove to be
> related, let's keep those separate.
>
> I'd first focus on the former. To be honest, I've never encountered such an
> issue where if you use more than a certain number of threads, the run
> aborts with that error. To investigate further can you please apply the
> following patch file which hopefully give more context to the error:
> https://termbin.com/uhgp
> (e.g. you can execute the following to accomplish that:
> curl https://termbin.com/uhgp > devicebuffer.cuh.patch && patch -p0 <
> devicebuffer.cuh.patch)
>
> Regarding the performance-related questions, can you please share a full
> log file of the runs so we can see the machine config, simulation
> system/settings, etc. Without that it is hard to judge what's best for your
> case. However, if you only have a single GPU (which seems to be the case
> based on the log excerpts) along those two rather beefy CPUs, than you will
> likely not get much benefit from using all cores and it is normal that you
> see little to no improvement from using cores of a second CPU socket.
>
> Cheers,
> --
> Szilárd
>
>
> On Thu, Mar 14, 2019 at 12:47 PM Tafelmeier, Stefanie <
> stefanie.tafelme...@zae-bayern.de> wrote:
>
> > Dear all,
> >
> > I was not sure if the email before reached you, but again many thanks for
> > your reply Szilárd.
> >
> > As written below we are still facing a problem with the performance of
> > your workstation.
> > I wrote before because of the error message when keeping occurring for
> > mdrun simulation:
> >
> > Assertion failed:
> > Condition: stat == cudaSuccess
> > Asynchronous H2D copy failed
> >
> > As I mentioned all Versions to install (Gormacs, Cuda, nvcc, gcc) are the
> > newest once now.
> >
> > If I run mdrun without further settings it will lead to this error
> > message. If I run it and choose the thread amount directly the mdrun is
> > performing well. But only for –nt numbers between 1 – 22. Higher ones
> again
> > lead to the before mentioned error message.
> >
> > In order to investigate in more detail, I tried different versions for
> > –nt, –ntmpi – ntomp also combined with –npme:
> > -   The best performance in the sense of ns/day is with –nt 22
>

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-03-15 Thread Szilárd Páll
Hi,

Please share log files with an external service attachments are not
accepted on the list.

Also, when checking the error with the patch supplied, please run the
following cases -- no long runs are needed just want to know which of these
runs and which of these doesn't:
- ntmpi 1 -ntomp 22 -pin on
- ntmpi 1 -ntomp 22 -pin off
- ntmpi 1 -ntomp 23 -pin off
- ntmpi 1 -ntomp 23 -pinstride 1 -pin on
- ntmpi 1 -ntomp 23 -pinstride 2 -pin on
- ntmpi 23 -ntomp 1 -pinstride 1 -pin on
- ntmpi 23 -ntomp 1 -pinstride 2 -pin on

Thanks,
--
Szilárd


On Fri, Mar 15, 2019 at 4:04 PM Tafelmeier, Stefanie <
stefanie.tafelme...@zae-bayern.de> wrote:

> Hi Szilárd,
>
> thanks for the quick reply.
> About the first suggestion, I'll try and give feedback soon.
>
> Regarding the second, I attached the log-file for the case of
> mdrun -v -nt 25
> Which ends in the known error message.
>
> Again, thanks a lot for your information and help.
>
> Best wishes,
> Steffi
>
>
>
> -Ursprüngliche Nachricht-
> Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:
> gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd
> Páll
> Gesendet: Freitag, 15. März 2019 15:30
> An: Discussion list for GROMACS users
> Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs
>
> Hi Stefanie,
>
> Unless and until the error and performance-related concerns prove to be
> related, let's keep those separate.
>
> I'd first focus on the former. To be honest, I've never encountered such an
> issue where if you use more than a certain number of threads, the run
> aborts with that error. To investigate further can you please apply the
> following patch file which hopefully give more context to the error:
> https://termbin.com/uhgp
> (e.g. you can execute the following to accomplish that:
> curl https://termbin.com/uhgp > devicebuffer.cuh.patch && patch -p0 <
> devicebuffer.cuh.patch)
>
> Regarding the performance-related questions, can you please share a full
> log file of the runs so we can see the machine config, simulation
> system/settings, etc. Without that it is hard to judge what's best for your
> case. However, if you only have a single GPU (which seems to be the case
> based on the log excerpts) along those two rather beefy CPUs, than you will
> likely not get much benefit from using all cores and it is normal that you
> see little to no improvement from using cores of a second CPU socket.
>
> Cheers,
> --
> Szilárd
>
>
> On Thu, Mar 14, 2019 at 12:47 PM Tafelmeier, Stefanie <
> stefanie.tafelme...@zae-bayern.de> wrote:
>
> > Dear all,
> >
> > I was not sure if the email before reached you, but again many thanks for
> > your reply Szilárd.
> >
> > As written below we are still facing a problem with the performance of
> > your workstation.
> > I wrote before because of the error message when keeping occurring for
> > mdrun simulation:
> >
> > Assertion failed:
> > Condition: stat == cudaSuccess
> > Asynchronous H2D copy failed
> >
> > As I mentioned all Versions to install (Gormacs, Cuda, nvcc, gcc) are the
> > newest once now.
> >
> > If I run mdrun without further settings it will lead to this error
> > message. If I run it and choose the thread amount directly the mdrun is
> > performing well. But only for –nt numbers between 1 – 22. Higher ones
> again
> > lead to the before mentioned error message.
> >
> > In order to investigate in more detail, I tried different versions for
> > –nt, –ntmpi – ntomp also combined with –npme:
> > -   The best performance in the sense of ns/day is with –nt 22
> > respectively –ntomp 22 alone. But then only 22 threads are involved.
> Which
> > is fine if I run more than one mdrun simultaneously, as I can distribute
> > the other 66 threads. The GPU usage is then around 65%.
> > -   A similar good performance is reached with mdrun  -ntmpi 4 -ntomp
> > 18 -npme 1 -pme gpu -nb gpu. But then 44 threads are involved. The GPU
> > usage is then around 50%.
> >
> > I read the information on
> >
> http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html
> > which was very helpful, bur some things are still not clear now to me:
> > I was wondering if there is any other enhancement of the performance? Or
> > what is the reason, that –nt maximum is at 22 threads? Could this be
> > connected to the sockets (see details below) of your workstation?
> > It is not clear to me how a number of thread (-nt) higher 22 can lead to
> > the error regarding the Asynchronous H2D copy)
> >
> > 

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-03-15 Thread Tafelmeier, Stefanie
Hi Szilárd,

thanks for the quick reply.
About the first suggestion, I'll try and give feedback soon.

Regarding the second, I attached the log-file for the case of 
mdrun -v -nt 25
Which ends in the known error message. 

Again, thanks a lot for your information and help.

Best wishes,
Steffi



-Ursprüngliche Nachricht-
Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se 
[mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von 
Szilárd Páll
Gesendet: Freitag, 15. März 2019 15:30
An: Discussion list for GROMACS users
Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

Hi Stefanie,

Unless and until the error and performance-related concerns prove to be
related, let's keep those separate.

I'd first focus on the former. To be honest, I've never encountered such an
issue where if you use more than a certain number of threads, the run
aborts with that error. To investigate further can you please apply the
following patch file which hopefully give more context to the error:
https://termbin.com/uhgp
(e.g. you can execute the following to accomplish that:
curl https://termbin.com/uhgp > devicebuffer.cuh.patch && patch -p0 <
devicebuffer.cuh.patch)

Regarding the performance-related questions, can you please share a full
log file of the runs so we can see the machine config, simulation
system/settings, etc. Without that it is hard to judge what's best for your
case. However, if you only have a single GPU (which seems to be the case
based on the log excerpts) along those two rather beefy CPUs, than you will
likely not get much benefit from using all cores and it is normal that you
see little to no improvement from using cores of a second CPU socket.

Cheers,
--
Szilárd


On Thu, Mar 14, 2019 at 12:47 PM Tafelmeier, Stefanie <
stefanie.tafelme...@zae-bayern.de> wrote:

> Dear all,
>
> I was not sure if the email before reached you, but again many thanks for
> your reply Szilárd.
>
> As written below we are still facing a problem with the performance of
> your workstation.
> I wrote before because of the error message when keeping occurring for
> mdrun simulation:
>
> Assertion failed:
> Condition: stat == cudaSuccess
> Asynchronous H2D copy failed
>
> As I mentioned all Versions to install (Gormacs, Cuda, nvcc, gcc) are the
> newest once now.
>
> If I run mdrun without further settings it will lead to this error
> message. If I run it and choose the thread amount directly the mdrun is
> performing well. But only for –nt numbers between 1 – 22. Higher ones again
> lead to the before mentioned error message.
>
> In order to investigate in more detail, I tried different versions for
> –nt, –ntmpi – ntomp also combined with –npme:
> -   The best performance in the sense of ns/day is with –nt 22
> respectively –ntomp 22 alone. But then only 22 threads are involved. Which
> is fine if I run more than one mdrun simultaneously, as I can distribute
> the other 66 threads. The GPU usage is then around 65%.
> -   A similar good performance is reached with mdrun  -ntmpi 4 -ntomp
> 18 -npme 1 -pme gpu -nb gpu. But then 44 threads are involved. The GPU
> usage is then around 50%.
>
> I read the information on
> http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html
> which was very helpful, bur some things are still not clear now to me:
> I was wondering if there is any other enhancement of the performance? Or
> what is the reason, that –nt maximum is at 22 threads? Could this be
> connected to the sockets (see details below) of your workstation?
> It is not clear to me how a number of thread (-nt) higher 22 can lead to
> the error regarding the Asynchronous H2D copy)
>
> Please excuse all these questions. I would appreciate a lot  if you might
> have a hint for this problem as well.
>
> Best regards,
> Steffi
>
> -
>
> The workstation details are:
> Running on 1 node with total 44 cores, 88 logical cores, 1 compatible GPU
> Hardware detected:
>
>   CPU info:
> Vendor: Intel
> Brand:  Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz
> Family: 6   Model: 85   Stepping: 4
> Features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl clfsh
> cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq
> pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt
> x2apic
>
> Number of AVX-512 FMA units: 2
>   Hardware topology: Basic
> Sockets, cores, and logical processors:
>   Socket  0: [   0  44] [   1  45] [   2  46] [   3  47] [   4  48] [
>  5  49] [   6  50] [   7  51] [   8  52] [   9  53] [  10  54] [  11  55]
> [  12  56] [  13  57] [  14  58] [  15  59] [  16  60] [  17  61] [  18
> 62] [  19  63] [  20  64] [  21  65]
>   Socket  1: [  

Re: [gmx-users] WG: WG: Issue with CUDA and gromacs

2019-03-15 Thread Szilárd Páll
Hi Stefanie,

Unless and until the error and performance-related concerns prove to be
related, let's keep those separate.

I'd first focus on the former. To be honest, I've never encountered such an
issue where if you use more than a certain number of threads, the run
aborts with that error. To investigate further can you please apply the
following patch file which hopefully give more context to the error:
https://termbin.com/uhgp
(e.g. you can execute the following to accomplish that:
curl https://termbin.com/uhgp > devicebuffer.cuh.patch && patch -p0 <
devicebuffer.cuh.patch)

Regarding the performance-related questions, can you please share a full
log file of the runs so we can see the machine config, simulation
system/settings, etc. Without that it is hard to judge what's best for your
case. However, if you only have a single GPU (which seems to be the case
based on the log excerpts) along those two rather beefy CPUs, than you will
likely not get much benefit from using all cores and it is normal that you
see little to no improvement from using cores of a second CPU socket.

Cheers,
--
Szilárd


On Thu, Mar 14, 2019 at 12:47 PM Tafelmeier, Stefanie <
stefanie.tafelme...@zae-bayern.de> wrote:

> Dear all,
>
> I was not sure if the email before reached you, but again many thanks for
> your reply Szilárd.
>
> As written below we are still facing a problem with the performance of
> your workstation.
> I wrote before because of the error message when keeping occurring for
> mdrun simulation:
>
> Assertion failed:
> Condition: stat == cudaSuccess
> Asynchronous H2D copy failed
>
> As I mentioned all Versions to install (Gormacs, Cuda, nvcc, gcc) are the
> newest once now.
>
> If I run mdrun without further settings it will lead to this error
> message. If I run it and choose the thread amount directly the mdrun is
> performing well. But only for –nt numbers between 1 – 22. Higher ones again
> lead to the before mentioned error message.
>
> In order to investigate in more detail, I tried different versions for
> –nt, –ntmpi – ntomp also combined with –npme:
> -   The best performance in the sense of ns/day is with –nt 22
> respectively –ntomp 22 alone. But then only 22 threads are involved. Which
> is fine if I run more than one mdrun simultaneously, as I can distribute
> the other 66 threads. The GPU usage is then around 65%.
> -   A similar good performance is reached with mdrun  -ntmpi 4 -ntomp
> 18 -npme 1 -pme gpu -nb gpu. But then 44 threads are involved. The GPU
> usage is then around 50%.
>
> I read the information on
> http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html
> which was very helpful, bur some things are still not clear now to me:
> I was wondering if there is any other enhancement of the performance? Or
> what is the reason, that –nt maximum is at 22 threads? Could this be
> connected to the sockets (see details below) of your workstation?
> It is not clear to me how a number of thread (-nt) higher 22 can lead to
> the error regarding the Asynchronous H2D copy)
>
> Please excuse all these questions. I would appreciate a lot  if you might
> have a hint for this problem as well.
>
> Best regards,
> Steffi
>
> -
>
> The workstation details are:
> Running on 1 node with total 44 cores, 88 logical cores, 1 compatible GPU
> Hardware detected:
>
>   CPU info:
> Vendor: Intel
> Brand:  Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz
> Family: 6   Model: 85   Stepping: 4
> Features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl clfsh
> cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq
> pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt
> x2apic
>
> Number of AVX-512 FMA units: 2
>   Hardware topology: Basic
> Sockets, cores, and logical processors:
>   Socket  0: [   0  44] [   1  45] [   2  46] [   3  47] [   4  48] [
>  5  49] [   6  50] [   7  51] [   8  52] [   9  53] [  10  54] [  11  55]
> [  12  56] [  13  57] [  14  58] [  15  59] [  16  60] [  17  61] [  18
> 62] [  19  63] [  20  64] [  21  65]
>   Socket  1: [  22  66] [  23  67] [  24  68] [  25  69] [  26  70] [
> 27  71] [  28  72] [  29  73] [  30  74] [  31  75] [  32  76] [  33  77]
> [  34  78] [  35  79] [  36  80] [  37  81] [  38  82] [  39  83] [  40
> 84] [  41  85] [  42  86] [  43  87]
>   GPU info:
> Number of GPUs detected: 1
> #0: NVIDIA Quadro P6000, compute cap.: 6.1, ECC:  no, stat: compatible
>
> -
>
>
>
> -Ursprüngliche Nachricht-
> Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto:
> gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd
> Páll
> Gesendet: Donnerstag, 31. Januar 2019 17:15
> An: Discussion list for GROMACS users
> Betreff: Re: [gmx-users] WG: Issue with CUDA and gromacs
>
> On Thu, Jan 31, 2019 at 2:14 PM Szilárd Páll 
> wrote:
> >
> > On Wed, Jan 30, 2019 at 5:15 PM Tafelmeier, Stefanie
> >  wrote:
> > >
> > > Dear all,
> > >

[gmx-users] WG: WG: Issue with CUDA and gromacs

2019-03-14 Thread Tafelmeier, Stefanie
Dear all,

I was not sure if the email before reached you, but again many thanks for your 
reply Szilárd.

As written below we are still facing a problem with the performance of your 
workstation.
I wrote before because of the error message when keeping occurring for mdrun 
simulation:

Assertion failed:
Condition: stat == cudaSuccess
Asynchronous H2D copy failed

As I mentioned all Versions to install (Gormacs, Cuda, nvcc, gcc) are the 
newest once now.

If I run mdrun without further settings it will lead to this error message. If 
I run it and choose the thread amount directly the mdrun is performing well. 
But only for –nt numbers between 1 – 22. Higher ones again lead to the before 
mentioned error message.
 
In order to investigate in more detail, I tried different versions for –nt, 
–ntmpi – ntomp also combined with –npme: 
-   The best performance in the sense of ns/day is with –nt 22 respectively 
–ntomp 22 alone. But then only 22 threads are involved. Which is fine if I run 
more than one mdrun simultaneously, as I can distribute the other 66 threads. 
The GPU usage is then around 65%.
-   A similar good performance is reached with mdrun  -ntmpi 4 -ntomp 18 
-npme 1 -pme gpu -nb gpu. But then 44 threads are involved. The GPU usage is 
then around 50%.

I read the information on 
http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html 
which was very helpful, bur some things are still not clear now to me:
I was wondering if there is any other enhancement of the performance? Or what 
is the reason, that –nt maximum is at 22 threads? Could this be connected to 
the sockets (see details below) of your workstation? 
It is not clear to me how a number of thread (-nt) higher 22 can lead to the 
error regarding the Asynchronous H2D copy)

Please excuse all these questions. I would appreciate a lot  if you might have 
a hint for this problem as well.

Best regards,
Steffi

-

The workstation details are:
Running on 1 node with total 44 cores, 88 logical cores, 1 compatible GPU
Hardware detected:

  CPU info:
Vendor: Intel
Brand:  Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz
Family: 6   Model: 85   Stepping: 4
Features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl clfsh cmov 
cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm 
pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic

Number of AVX-512 FMA units: 2
  Hardware topology: Basic
Sockets, cores, and logical processors:
  Socket  0: [   0  44] [   1  45] [   2  46] [   3  47] [   4  48] [   5  
49] [   6  50] [   7  51] [   8  52] [   9  53] [  10  54] [  11  55] [  12  
56] [  13  57] [  14  58] [  15  59] [  16  60] [  17  61] [  18  62] [  19  
63] [  20  64] [  21  65]
  Socket  1: [  22  66] [  23  67] [  24  68] [  25  69] [  26  70] [  27  
71] [  28  72] [  29  73] [  30  74] [  31  75] [  32  76] [  33  77] [  34  
78] [  35  79] [  36  80] [  37  81] [  38  82] [  39  83] [  40  84] [  41  
85] [  42  86] [  43  87]
  GPU info:
Number of GPUs detected: 1
#0: NVIDIA Quadro P6000, compute cap.: 6.1, ECC:  no, stat: compatible

-



-Ursprüngliche Nachricht-
Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se 
[mailto:gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von 
Szilárd Páll
Gesendet: Donnerstag, 31. Januar 2019 17:15
An: Discussion list for GROMACS users
Betreff: Re: [gmx-users] WG: Issue with CUDA and gromacs

On Thu, Jan 31, 2019 at 2:14 PM Szilárd Páll  wrote:
>
> On Wed, Jan 30, 2019 at 5:15 PM Tafelmeier, Stefanie
>  wrote:
> >
> > Dear all,
> >
> > We are facing an issue with the CUDA toolkit.
> > We tried several combinations of gromacs versions and CUDA Toolkits. No 
> > Toolkit older than 9.2 was possible to try as there are no driver for 
> > nvidia available for a Quadro P6000.
> > Gromacs
>
> Install the latest 410.xx drivers and it will work; the NVIDIA driver
> download website (https://www.nvidia.com/Download/index.aspx)
> recommends 410.93.
>
> Here's a system with CUDA 10-compatible driver running o a system with
> a P6000: https://termbin.com/ofzo

Sorry, I misread that as "CUDA >=9.2 was not possible".

Note that the driver is backward compatible, so you can use a new
driver with older CUDA versions.

Also note that the oldest driver NVIDIA claims to have P6000 support
is 390.59 which is, as far as I know, one gen older than the 396 that
the CUDA 9.2 toolkit came with. This is however, not something I'd
recommend pursuing, use a new driver from the official site with any
CUDA version that GROMACS supports and it should be fine.

>
> > CUDA
> >
> > Error message
> >
> > 2019
> >
> > 10.0
> >
> > gmx mdrun:
> > Assertion failed:
> > Condition: stat == cudaSuccess
> > Asynchronous H2D copy failed
> >
> > 2019
> >
> > 9.2
> >
> > gmx mdrun:
> > Assertion failed:
> > Condition: stat == cudaSuccess
> > Asynchronous H2D copy failed
> >
> > 2018.5
> >
> > 9.2
> >
> > gmx