Just a quick update here regarding regression tests. On an old machine with
a single puny GTX 960, the 2018 build passes all tests with
Start 9: GpuUtilsUnitTests
9/39 Test #9: GpuUtilsUnitTests Passed5.64 sec
Hope this is useful.
Alex
--
Gromacs Users mailing list
*
Great to hear!
(Also note that one thing we have explicitly focused on is not only peak
performance, but to get as close to peak as possible with just a few CPU
cores! You should be able to get >75% perf with just 3-5 Xeon or 2-3
desktop cores rather than needing a full fast CPU.)
--
Szilárd
On
With -pme gpu, I am reporting 383.032 ns/day vs 270 ns/day with the 2016.4
version. I _did not_ mistype. The system is close to a cubic box of water
with some ions.
Incredible.
Alex
On Thu, Feb 8, 2018 at 12:27 PM, Szilárd Páll
wrote:
> Note that the actual mdrun
Note that the actual mdrun performance need not be affected both of it's
it's a driver persistence issue (you'll just see a few seconds lag at mdrun
startup) or some other CUDA application startup-related lag (an mdrun run
does mostly very different kind of things than this set of particular unit
I keep getting bounce messages from the list, so in case things didn't get
posted...
1. We enabled PM -- still times out.
2. 3-4 days ago we had very fast runs with GPU (2016.4), so I don't know if
we miraculously broke everything to the point where our $25K box performs
worse than Mark's
On Thu, Feb 8, 2018 at 6:54 PM Szilárd Páll wrote:
> BTW, do you have persistence mode (PM) set (see in the nvidia-smi output)?
> If you do not have PM it set nor is there an X server that keeps the driver
> loaded, the driver gets loaded every time a CUDA application is
Got it. Given all the messing around, I am rebuilding GMX and if make check
results are the same, will install. We have an angry postdoc here demanding
tools.
Thank you gentlemen.
Alex
On Thu, Feb 8, 2018 at 10:50 AM, Szilárd Páll
wrote:
> On Thu, Feb 8, 2018 at 6:46
BTW, do you have persistence mode (PM) set (see in the nvidia-smi output)?
If you do not have PM it set nor is there an X server that keeps the driver
loaded, the driver gets loaded every time a CUDA application is started.
This could be causing the lag which shows up as long execution time for
Hi,
Assuming the other test binary has the same behaviour (succeeds when run
manually), then the build is working correctly and you could install it for
general use. But I suspect its performance will suffer from whatever is
causing the slowdown (e.g. compare with old numbers). That's not really
On Thu, Feb 8, 2018 at 6:46 PM, Alex wrote:
> Are you suggesting that i should accept these results and install the 2018
> version?
>
Yes, your GROMACS build seems fine.
make check simply runs the test that I suggested you to run manually (and
which successfully finished).
Are you suggesting that i should accept these results and install the 2018
version?
Thanks,
Alex
On Thu, Feb 8, 2018 at 10:43 AM, Mark Abraham
wrote:
> Hi,
>
> PATH doesn't matter, only what ldd thinks matters.
>
> I have opened
Hi,
PATH doesn't matter, only what ldd thinks matters.
I have opened https://redmine.gromacs.org/issues/2405 to address that the
implementation of these tests are perhaps proving more pain than usefulness
(from this thread and others I have seen).
Mark
On Thu, Feb 8, 2018 at 6:41 PM Alex
That is quite weird. We found that I have PATH values pointing to the old
gmx installation while running these tests. Do you think that could cause
issues?
Alex
On Thu, Feb 8, 2018 at 10:36 AM, Mark Abraham
wrote:
> Hi,
>
> Great. The manual run took 74.5 seconds,
Hi,
Great. The manual run took 74.5 seconds, failing the 30 second timeout. So
the code is fine.
But you have some crazy large overhead going on - gpu_utils-test runs in 7s
on my 2013 desktop with CUDA 9.1.
Mark
On Thu, Feb 8, 2018 at 6:29 PM Alex wrote:
> uh, no sir.
>
uh, no sir.
> 9/39 Test #9: GpuUtilsUnitTests ***Timeout 30.43 sec
On Thu, Feb 8, 2018 at 10:25 AM, Mark Abraham
wrote:
> Hi,
>
> Those all succeeded. Does make check now also succeed?
>
> Mark
>
> On Thu, Feb 8, 2018 at 6:24 PM Alex
Hi,
Those all succeeded. Does make check now also succeed?
Mark
On Thu, Feb 8, 2018 at 6:24 PM Alex wrote:
> Here you are:
>
> [==] Running 35 tests from 7 test cases.
> [--] Global test environment set-up.
> [--] 7 tests from HostAllocatorTest/0,
Here you are:
[==] Running 35 tests from 7 test cases.
[--] Global test environment set-up.
[--] 7 tests from HostAllocatorTest/0, where TypeParam = int
[ RUN ] HostAllocatorTest/0.EmptyMemoryAlwaysWorks
[ OK ] HostAllocatorTest/0.EmptyMemoryAlwaysWorks (5457
It might help to know which of the unit test(s) in that group stall? Can
you run it manually (bin/gpu_utils-test) and report back the standard
output?
--
Szilárd
On Thu, Feb 8, 2018 at 3:56 PM, Alex wrote:
> Nope, still persists after reboot and no other jobs running:
>
Here's some additional info:
#
# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 390.12 Wed Dec 20 07:19:16
PST 2017
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.6)
Forwarding colleague's email below, any suggestions highly appreciated.
Thanks!
Alex
***
I ran the minimal tests suggested in the cuda installation guide.
(bandwidthTest,
deviceQuery) and then I individually ran 10 of the samples provided.
However, many of the samples require a graphics
I did hear yesterday that CUDA's own tests passed, but will update on
that in more detail as soon as people start showing up -- it's 8 am
right now... :)
Thanks Mark,
Alex
On 2/8/2018 7:59 AM, Mark Abraham wrote:
Hi,
OK, but not clear to me if followed the other advice - cleaned out all
Hi,
OK, but not clear to me if followed the other advice - cleaned out all the
NVIDIA stuff (CUDA, runtime, drivers), nor if CUDA's own tests work.
Mark
On Thu, Feb 8, 2018 at 3:57 PM Alex wrote:
> Nope, still persists after reboot and no other jobs running:
> 9/39 Test
Nope, still persists after reboot and no other jobs running:
9/39 Test #9: GpuUtilsUnitTests ***Timeout 30.59 sec
Any additional suggestions?
--
Gromacs Users mailing list
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
I am rebooting the box and kicking out all the jobs until we figure this
out.
Thanks!
Alex
On 2/8/2018 7:27 AM, Szilárd Páll wrote:
BTW, timeouts can be caused by contention from stupid number of ranks/tMPI
threads hammering a single GPU (especially with 2 threads/core with HT),
but I'm not
Mark, Peter -- thanks. Your comments make sense.
--
Gromacs Users mailing list
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
* For (un)subscribe requests visit
BTW, timeouts can be caused by contention from stupid number of ranks/tMPI
threads hammering a single GPU (especially with 2 threads/core with HT),
but I'm not sure if the tests are ever executed with such a huge rank count.
--
Szilárd
On Thu, Feb 8, 2018 at 2:40 PM, Mark Abraham
Hi,
On Thu, Feb 8, 2018 at 2:15 PM Alex wrote:
> Mark and Peter,
>
> Thanks for commenting. I was told that all CUDA tests passed, but I will
> double check on how many of those were actually run. Also, we never
> rebooted the box after CUDA install, and finally we had a
Jup, start with rebooting before trying anything else. There's probably
still old drivers loaded in the kernel.
Peter
On 08-02-18 14:14, Alex wrote:
> Mark and Peter,
>
> Thanks for commenting. I was told that all CUDA tests passed, but I
> will double check on how many of those were actually
Mark and Peter,
Thanks for commenting. I was told that all CUDA tests passed, but I will
double check on how many of those were actually run. Also, we never
rebooted the box after CUDA install, and finally we had a bunch of
gromacs (2016.4) jobs running, because we didn't want to interrupt
Hi,
Or leftovers of the drivers that are now mismatching. That has caused
timeouts for us.
Mark
On Thu, Feb 8, 2018 at 10:55 AM Peter Kroon wrote:
> Hi,
>
>
> with changing failures like this I would start to suspect the hardware
> as well. Mark's suggestion of looking at
Hi,
with changing failures like this I would start to suspect the hardware
as well. Mark's suggestion of looking at simpler test programs than GMX
is a good one :)
Peter
On 08-02-18 09:10, Mark Abraham wrote:
> Hi,
>
> That suggests that your new CUDA installation is differently incomplete.
Update: we seem to have had a hiccup with an orphan CUDA install and that
was causing issues. After wiping everything off and rebuilding the errors
from the initial post disappeared. However, two tests failed during
regression:
95% tests passed, 2 tests failed out of 39
Label Time Summary:
GTest
Hi Mark,
Nothing has been installed yet, so the commands were issued from
/build/bin and so I am not sure about the output of that mdrun-test (let
me know what exact command could make it more informative).
Thank you,
Alex
***
> ./gmx -version
GROMACS version: 2018
Precision:
Hi,
I checked back with the CUDA-facing GROMACS developers. They've run the
code with 9.1 and believe there's no intrinsic problem within GROMACS.
> So I don't have much to suggest other then rebuilding everything cleanly,
as this is an internal non-descript cuFFT/driver error that is not
And this is with:
> gcc --version
> gcc (Ubuntu 5.4.0-6ubuntu1~16.04.6) 5.4.0 20160609
On Tue, Feb 6, 2018 at 1:18 PM, Alex wrote:
> Hi all,
>
> I've just built the latest version and regression tests are running. Here
> is one error:
>
> "Program: mdrun-test, version
35 matches
Mail list logo