subject:"\[gmx\-users\] \[gmx\-developers\] Fatal error\: cudaStreamSynchronize failed in cu_blockwait

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

2014-01-31 Thread AOWI (Anders Ossowicki)

 That's just weird. The Cuda API error detected does not sound good - 
 perhaps it's a sing of a CUDA runtime bug?
Maybe. I don't really have a known good configuration to work from, so 
locating the cause of error is a bit of a shot in the dark. 

 I suggest that you try CUDA 5.0 and see if that works.
Same error, unfortunately.

I have to relinquish the hardware for now, so I can't do anymore probing into 
this. But thank you for your patience and advice!
-- 
Anders Ossowicki

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

2014-01-31 Thread Szilárd Páll

FYI: GROMACS is known to work on IVB-E + K20 hardware, so I'm still
leaning toward thinking that this is either hardware or CUDA software
error.

On Fri, Jan 31, 2014 at 2:57 PM, AOWI (Anders Ossowicki)
a...@novozymes.com wrote:
 That's just weird. The Cuda API error detected does not sound good - 
 perhaps it's a sing of a CUDA runtime bug?
 Maybe. I don't really have a known good configuration to work from, so 
 locating the cause of error is a bit of a shot in the dark.

The next thing I'd have suggested is to plug the card into another
machine and/or plug another card in the machine and try again...

 I suggest that you try CUDA 5.0 and see if that works.
 Same error, unfortunately.

 I have to relinquish the hardware for now, so I can't do anymore probing into 
 this. But thank you for your patience and advice!

... but if you can't than you'll just have to take my word for it:
Xeon E5-2690 + GTX TITAN -  rnase dodec 181 ns/day; rnase dodec+vsites
382 ns/day. ;)

Cheers,
Sz.

 --
 Anders Ossowicki

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

2014-01-31 Thread AOWI (Anders Ossowicki)

 FYI: GROMACS is known to work on IVB-E + K20 hardware, so I'm still leaning 
 toward thinking that this is either hardware or CUDA software error.
Good to know. If we go ahead with this hardware setup, I might return.

 The next thing I'd have suggested is to plug the card into another machine 
 and/or plug another card in the machine and try again...
Unfortunately I don't have physical access to the testbed.

 ... but if you can't than you'll just have to take my word for it:
 Xeon E5-2690 + GTX TITAN -  rnase dodec 181 ns/day; rnase dodec+vsites
 382 ns/day. ;)
Yeah, we know there's some serious performance to be gained, but we need some 
bang/buck numbers for our workloads :-)

-- 
Anders Ossowicki

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

2014-01-30 Thread Szilárd Páll

That sounds strange.

Does the error happen at step? Assuming the it does occur within the
first 10 steps, here are a few things to try:
- Run cuda-memcheck mdrun -nsteps 10;
- Try running with GMX_EMULATE_GPU env. var. set? This will run the
GPU acceleration code-path, but will use CPU kernels (equivalent to
the CUDA but slow implementation).
- Run with GMX_EMULATE_GPU using valgrind: GMX_EMULATE_GPU=1 valgrind
mdrun -nsteps 10

Cheers,
--
Szilárd


On Thu, Jan 30, 2014 at 11:47 AM, AOWI (Anders Ossowicki)
a...@novozymes.com wrote:
 Thanks for your suggestions!

 I would not make any assumptions though, but rather try a few things first:
 - Does the card pass a memtest (sourceforge.net/projects/cudagpumemtest/)?
 The memtest ran for about an hour with no errors.

 - Does the installation pass the regressiontests?
 No. These four complex tests fail, all with the usual error:

 FAILED. Check mdrun.out, md.log files in nbnxn_pme
 FAILED. Check mdrun.out, md.log files in nbnxn_rf
 FAILED. Check mdrun.out, md.log files in nbnxn_rzero
 FAILED. Check mdrun.out, md.log files in nbnxn_vsite

 Everything else passes.

 - Is the error reproducible with other inputs?
 Yes, so far anything that has caused Gromacs to engage the GPU has failed. 
 Our own runs, the samples from the Gromacs website, and the four tests above.

 Also note that with the default invocation of mdrun you are attempting to 
 use all cores/hardware threads in your machine (I assume a 2x12-core IVB-E 
 node with HT on).

 Two Xeon E5-2697V2 processors yes. This is a test server for gauging the 
 potential performance gains of GPGPU with our own runs. We'll stick to a 
 proper CPU-GPU ratio for the performance measurements. This was just me 
 trying to pare it down to the simplest invocation.

 We have had no trouble using other CUDA-enabled tools on this particular test 
 server. NAMD, for example, works fine.
 --
 Anders Ossowicki

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

2014-01-30 Thread AOWI (Anders Ossowicki)

 Does the error happen at step? Assuming the it does occur within the first 10 
 steps, here are a few things to try:

It happens immediately. As in:

$ time mdrun
snip
real0m3.312s
user0m6.768s
sys 0m1.968s
$

 - Run cuda-memcheck mdrun -nsteps 10;

A wild backtrace appeared!

starting mdrun 'RNASE ZF-1A in water'
10 steps,  0.0 ps.
= Program hit error 4 on CUDA API call to cudaStreamSynchronize
= Saved host backtrace up to driver entry point at error
= Host Frame:/usr/lib/nvidia-current/libcuda.so [0x26d660]
= Host Frame:/usr/local/cuda-5.5/lib64/libcudart.so.5.5 
(cudaStreamSynchronize + 0x15e) [0x36f5e]
= Host Frame:/usr/bin/../lib/libmd.so.8 (nbnxn_cuda_wait_gpu + 
0x9d9) [0x3bc779]
= Host Frame:/usr/bin/../lib/libmd.so.8 (do_force_cutsVERLET + 
0x1ff8) [0x275f98]
= Host Frame:/usr/bin/../lib/libmd.so.8 (do_force + 0x3bf) 
[0x27a88f]
= Host Frame:mdrun (do_md + 0x7fc7) [0x34267]
= Host Frame:mdrun (mdrunner + 0x18a1) [0x11491]
= Host Frame:mdrun (cmain + 0x1a30) [0x38cb0]
= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 
0xed) [0x2176d]
= Host Frame:mdrun [0x76a1]
=
snip the usual error

 - Try running with GMX_EMULATE_GPU env. var. set? This will run the GPU 
 acceleration code-path, but will use CPU kernels (equivalent to the CUDA but 
 slow implementation).

This seems to run correctly. 

 - Run with GMX_EMULATE_GPU using valgrind: GMX_EMULATE_GPU=1 valgrind mdrun 
 -nsteps 10
Valgrind dies immediately with 

nztest@ubuntu:~/rnase_bench/rnase_cubic$ GMX_EMULATE_GPU=YesPlease valgrind 
mdrun -nsteps 10
==13510== Memcheck, a memory error detector
==13510== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==13510== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==13510== Command: mdrun -nsteps 10
==13510==
 :-)  G  R  O  M  A  C  S  (-:

vex amd64-IR: unhandled instruction bytes: 0xC5 0xFA 0x2A 0xC2 0xC5 0xFA 0x59 
0xD
==13510== valgrind: Unrecognised instruction at address 0x5b5ac9d.
==13510==at 0x5B5AC9D: rando (in /usr/lib/libgmx.so.8)
==13510==by 0x5BAB0A4: pukeit (in /usr/lib/libgmx.so.8)
==13510==by 0x5BAB420: bromacs (in /usr/lib/libgmx.so.8)
==13510==by 0x5BAB933: CopyRight (in /usr/lib/libgmx.so.8)
==13510==by 0x438E26: cmain (in /usr/bin/mdrun)
==13510==by 0x65D976C: (below main) (libc-start.c:226)
-- 
Anders Ossowicki

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

2014-01-30 Thread Szilárd Páll

On Thu, Jan 30, 2014 at 2:10 PM, AOWI (Anders Ossowicki)
a...@novozymes.com wrote:
 Does the error happen at step? Assuming the it does occur within the first 
 10 steps, here are a few things to try:

 It happens immediately. As in:

 $ time mdrun
 snip
 real0m3.312s
 user0m6.768s
 sys 0m1.968s
 $

Well, with a 24k system a single iteration can be done in 2-3 ms, so
those 3.3 seconds are mostly initialization and some number of steps -
could be one, ten, or even hundred.

 - Run cuda-memcheck mdrun -nsteps 10;

 A wild backtrace appeared!

 starting mdrun 'RNASE ZF-1A in water'
 10 steps,  0.0 ps.
 = Program hit error 4 on CUDA API call to cudaStreamSynchronize
 = Saved host backtrace up to driver entry point at error
 = Host Frame:/usr/lib/nvidia-current/libcuda.so [0x26d660]
 = Host Frame:/usr/local/cuda-5.5/lib64/libcudart.so.5.5 
 (cudaStreamSynchronize + 0x15e) [0x36f5e]
 = Host Frame:/usr/bin/../lib/libmd.so.8 (nbnxn_cuda_wait_gpu + 
 0x9d9) [0x3bc779]
 = Host Frame:/usr/bin/../lib/libmd.so.8 (do_force_cutsVERLET + 
 0x1ff8) [0x275f98]
 = Host Frame:/usr/bin/../lib/libmd.so.8 (do_force + 0x3bf) 
 [0x27a88f]
 = Host Frame:mdrun (do_md + 0x7fc7) [0x34267]
 = Host Frame:mdrun (mdrunner + 0x18a1) [0x11491]
 = Host Frame:mdrun (cmain + 0x1a30) [0x38cb0]
 = Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 
 0xed) [0x2176d]
 = Host Frame:mdrun [0x76a1]
 =
 snip the usual error

That doesn't tell much, could you add a -g to the CXX flags?

 - Try running with GMX_EMULATE_GPU env. var. set? This will run the GPU 
 acceleration code-path, but will use CPU kernels (equivalent to the CUDA but 
 slow implementation).

 This seems to run correctly.

Does correctly mean that you've checked the results or that it
completed without a crash?


 - Run with GMX_EMULATE_GPU using valgrind: GMX_EMULATE_GPU=1 valgrind mdrun 
 -nsteps 10
 Valgrind dies immediately with

 nztest@ubuntu:~/rnase_bench/rnase_cubic$ GMX_EMULATE_GPU=YesPlease valgrind 
 mdrun -nsteps 10
 ==13510== Memcheck, a memory error detector
 ==13510== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
 ==13510== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
 ==13510== Command: mdrun -nsteps 10
 ==13510==
  :-)  G  R  O  M  A  C  S  (-:

 vex amd64-IR: unhandled instruction bytes: 0xC5 0xFA 0x2A 0xC2 0xC5 0xFA 
 0x59 0xD
 ==13510== valgrind: Unrecognised instruction at address 0x5b5ac9d.
 ==13510==at 0x5B5AC9D: rando (in /usr/lib/libgmx.so.8)
 ==13510==by 0x5BAB0A4: pukeit (in /usr/lib/libgmx.so.8)
 ==13510==by 0x5BAB420: bromacs (in /usr/lib/libgmx.so.8)
 ==13510==by 0x5BAB933: CopyRight (in /usr/lib/libgmx.so.8)
 ==13510==by 0x438E26: cmain (in /usr/bin/mdrun)
 ==13510==by 0x65D976C: (below main) (libc-start.c:226)

Yeah, your valgrind does not support encoded instructions (=AVX). Use
SSE4.1 on the CPU and AFAIK you may need to set
GMX_DISTRIBUTABLE_BINARY=ON. However, I do not expect this to shed
more light on the issue.

 --
 Anders Ossowicki

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

2014-01-30 Thread Szilárd Páll

On Thu, Jan 30, 2014 at 4:19 PM, AOWI (Anders Ossowicki)
a...@novozymes.com wrote:
 Well, with a 24k system a single iteration can be done in 2-3 ms, so those 
 3.3 seconds are mostly initialization and some number of steps - could be 
 one, ten, or even hundred.
 Sure, but it fails even with -nsteps 1.

 That doesn't tell much, could you add a -g to the CXX flags?
 Same thing:

There should be line numbers below - and perhaps a bit more
information on what's causing the error - at least that's what I'm
hoping for.

One other thing you could try is to set coulombtype = reaction-field
in the mdp file and re-generate the tpr. These runs will use a
different CUDA kernel. Just guessing, it may not make much difference
at all.

 starting mdrun 'RNASE ZF-1A in water'
 1 steps,  0.0 ps.
 = Program hit error 4 on CUDA API call to cudaStreamSynchronize
 = Saved host backtrace up to driver entry point at error
 = Host Frame:/usr/lib/nvidia-current/libcuda.so [0x26d660]
 = Host Frame:/usr/local/cuda-5.5/lib64/libcudart.so.5.5 
 (cudaStreamSynchronize + 0x15e) [0x36f5e]
 = Host Frame:/usr/bin/../lib/libmd.so.8 (nbnxn_cuda_wait_gpu + 
 0x222) [0xd45ab5]
 = Host Frame:/usr/bin/../lib/libmd.so.8 (do_force_cutsVERLET + 
 0x1d20) [0xc287a5]
 = Host Frame:/usr/bin/../lib/libmd.so.8 (do_force + 0x15d) 
 [0xc2a986]
 = Host Frame:mdrun (do_md + 0x3cd4) [0x2450e]
 = Host Frame:mdrun (mdrunner + 0x1f14) [0x11b50]
 = Host Frame:mdrun (cmain + 0x1dee) [0x2a57d]
 = Host Frame:mdrun (main + 0x20) [0x31c18]
 = Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 
 0xed) [0x2176d]
 = Host Frame:mdrun [0x75e9]
 =

 - Try running with GMX_EMULATE_GPU env. var. set? This will run the GPU 
 acceleration code-path, but will use CPU kernels (equivalent to the CUDA 
 but slow implementation).
 This seems to run correctly.
 Does correctly mean that you've checked the results or that it completed 
 without a crash?
 Just the latter.

 --
 Anders Ossowicki


-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

2014-01-29 Thread Szilárd Páll

Hi Anders,

This mail belongs to the users' list.

This type of error is typically a sign of the CUDA kernel failing due
to a nasty bug in the code or hardware error. The dmesg message is
suspicious and it may be a hint of hardware error (see
https://www.kubuntuforums.net/showthread.php?64133-kwin-crashes-repeatedly)

I would not make any assumptions though, but rather try a few things first:
- Does the card pass a memtest (sourceforge.net/projects/cudagpumemtest/)?
- Does the installation pass the regressiontests?
- Is the error reproducible with other inputs?


Also note that with the default invocation of mdrun you are attempting
to use all cores/hardware threads in your machine (I assume a
2x12-core IVB-E node with HT on). This requires a huge number of
OpenMP threads that will lead to pretty bad performance in the CPU
code. Typically one to one CPU-GPU ratio is decent, especially with
fast Intel Xeons. Use only one socket for your tests or if you plan to
use a single GPU per node, at least use only 1 thread/core - 24
threads in total.

Cheers,
--
Szilárd


On Wed, Jan 29, 2014 at 7:07 PM, AOWI (Anders Ossowicki)
a...@novozymes.com wrote:
 Hello,

 We are testing out Gromacs 4.6.5 with an Nvidia K20 card. We keep running 
 into the error message below, no matter which setup we're trying. In the 
 included case, it was the RNAse example from 
 http://www.gromacs.org/GPU_acceleration. Furthermore, we get the following 
 line in dmesg as well:

NVRM: GPU at :42:00: GPU-d0b07804-027a-5a02-43bc-fd7dc9064637
NVRM: Xid (:42:00): 31, Ch 0003, engmask 0101, intr 1000

 Are we just completely out of luck with this card, or have we done something 
 wrong?

 We've built Gromacs from source against the cuda 5.5 libraries straight from 
 Nvidia. The system is Ubuntu 12.04. Gromacs works fine when it's not using 
 the GPU.
 The card identifies itself as NVIDIA Corporation GK110GL [Tesla K20m] (rev a1)

 This is what we've done to trigger the error:
 $ grompp -f pme_verlet.mdp -c conf.gro -p topol.top
 $ mdrun

 Here is the output from mdrun. The error message tells me absolutely nothing, 
 so any advice on how to proceed with debugging this would be much appreciated.

 Reading file topol.tpr, VERSION 4.6.5 (single precision)
 Changing nstlist from 10 to 40, rlist from 0.9 to 0.996

 Using 1 MPI thread
 Using 48 OpenMP threads

 1 GPU detected:
   #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible

 1 GPU auto-selected for this run.
 Mapping of GPU to the 1 PP rank in this node: #0


 Back Off! I just backed up ener.edr to ./#ener.edr.1#
 starting mdrun 'RNASE ZF-1A in water'
 1 steps, 20.0 ps.

 ---
 Program mdrun, VERSION 4.6.5
 Source code file: 
 /home/nztest/src/gromacs-4.6.5/src/mdlib/nbnxn_cuda/nbnxn_cuda.cu, line: 591

 Fatal error:
 cudaStreamSynchronize failed in cu_blockwait_nb: unspecified launch failure

 For more information and tips for troubleshooting, please check the GROMACS
 website at http://www.gromacs.org/Documentation/Errors
 ---

 Thanks in advance!

 --
 Best Regards
 Anders Ossowicki

 --
 Gromacs Developers mailing list

 * Please search the archive at 
 http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before 
 posting!

 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

 * For (un)subscribe requests visit
 https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or 
 send a mail to gmx-developers-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb

8 matches

Site Navigation

Mail list logo

Footer information