Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb
That's just weird. The Cuda API error detected does not sound good - perhaps it's a sing of a CUDA runtime bug? Maybe. I don't really have a known good configuration to work from, so locating the cause of error is a bit of a shot in the dark. I suggest that you try CUDA 5.0 and see if that works. Same error, unfortunately. I have to relinquish the hardware for now, so I can't do anymore probing into this. But thank you for your patience and advice! -- Anders Ossowicki -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb
FYI: GROMACS is known to work on IVB-E + K20 hardware, so I'm still leaning toward thinking that this is either hardware or CUDA software error. On Fri, Jan 31, 2014 at 2:57 PM, AOWI (Anders Ossowicki) a...@novozymes.com wrote: That's just weird. The Cuda API error detected does not sound good - perhaps it's a sing of a CUDA runtime bug? Maybe. I don't really have a known good configuration to work from, so locating the cause of error is a bit of a shot in the dark. The next thing I'd have suggested is to plug the card into another machine and/or plug another card in the machine and try again... I suggest that you try CUDA 5.0 and see if that works. Same error, unfortunately. I have to relinquish the hardware for now, so I can't do anymore probing into this. But thank you for your patience and advice! ... but if you can't than you'll just have to take my word for it: Xeon E5-2690 + GTX TITAN - rnase dodec 181 ns/day; rnase dodec+vsites 382 ns/day. ;) Cheers, Sz. -- Anders Ossowicki -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb
FYI: GROMACS is known to work on IVB-E + K20 hardware, so I'm still leaning toward thinking that this is either hardware or CUDA software error. Good to know. If we go ahead with this hardware setup, I might return. The next thing I'd have suggested is to plug the card into another machine and/or plug another card in the machine and try again... Unfortunately I don't have physical access to the testbed. ... but if you can't than you'll just have to take my word for it: Xeon E5-2690 + GTX TITAN - rnase dodec 181 ns/day; rnase dodec+vsites 382 ns/day. ;) Yeah, we know there's some serious performance to be gained, but we need some bang/buck numbers for our workloads :-) -- Anders Ossowicki -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb
That sounds strange. Does the error happen at step? Assuming the it does occur within the first 10 steps, here are a few things to try: - Run cuda-memcheck mdrun -nsteps 10; - Try running with GMX_EMULATE_GPU env. var. set? This will run the GPU acceleration code-path, but will use CPU kernels (equivalent to the CUDA but slow implementation). - Run with GMX_EMULATE_GPU using valgrind: GMX_EMULATE_GPU=1 valgrind mdrun -nsteps 10 Cheers, -- Szilárd On Thu, Jan 30, 2014 at 11:47 AM, AOWI (Anders Ossowicki) a...@novozymes.com wrote: Thanks for your suggestions! I would not make any assumptions though, but rather try a few things first: - Does the card pass a memtest (sourceforge.net/projects/cudagpumemtest/)? The memtest ran for about an hour with no errors. - Does the installation pass the regressiontests? No. These four complex tests fail, all with the usual error: FAILED. Check mdrun.out, md.log files in nbnxn_pme FAILED. Check mdrun.out, md.log files in nbnxn_rf FAILED. Check mdrun.out, md.log files in nbnxn_rzero FAILED. Check mdrun.out, md.log files in nbnxn_vsite Everything else passes. - Is the error reproducible with other inputs? Yes, so far anything that has caused Gromacs to engage the GPU has failed. Our own runs, the samples from the Gromacs website, and the four tests above. Also note that with the default invocation of mdrun you are attempting to use all cores/hardware threads in your machine (I assume a 2x12-core IVB-E node with HT on). Two Xeon E5-2697V2 processors yes. This is a test server for gauging the potential performance gains of GPGPU with our own runs. We'll stick to a proper CPU-GPU ratio for the performance measurements. This was just me trying to pare it down to the simplest invocation. We have had no trouble using other CUDA-enabled tools on this particular test server. NAMD, for example, works fine. -- Anders Ossowicki -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb
Does the error happen at step? Assuming the it does occur within the first 10 steps, here are a few things to try: It happens immediately. As in: $ time mdrun snip real0m3.312s user0m6.768s sys 0m1.968s $ - Run cuda-memcheck mdrun -nsteps 10; A wild backtrace appeared! starting mdrun 'RNASE ZF-1A in water' 10 steps, 0.0 ps. = Program hit error 4 on CUDA API call to cudaStreamSynchronize = Saved host backtrace up to driver entry point at error = Host Frame:/usr/lib/nvidia-current/libcuda.so [0x26d660] = Host Frame:/usr/local/cuda-5.5/lib64/libcudart.so.5.5 (cudaStreamSynchronize + 0x15e) [0x36f5e] = Host Frame:/usr/bin/../lib/libmd.so.8 (nbnxn_cuda_wait_gpu + 0x9d9) [0x3bc779] = Host Frame:/usr/bin/../lib/libmd.so.8 (do_force_cutsVERLET + 0x1ff8) [0x275f98] = Host Frame:/usr/bin/../lib/libmd.so.8 (do_force + 0x3bf) [0x27a88f] = Host Frame:mdrun (do_md + 0x7fc7) [0x34267] = Host Frame:mdrun (mdrunner + 0x18a1) [0x11491] = Host Frame:mdrun (cmain + 0x1a30) [0x38cb0] = Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xed) [0x2176d] = Host Frame:mdrun [0x76a1] = snip the usual error - Try running with GMX_EMULATE_GPU env. var. set? This will run the GPU acceleration code-path, but will use CPU kernels (equivalent to the CUDA but slow implementation). This seems to run correctly. - Run with GMX_EMULATE_GPU using valgrind: GMX_EMULATE_GPU=1 valgrind mdrun -nsteps 10 Valgrind dies immediately with nztest@ubuntu:~/rnase_bench/rnase_cubic$ GMX_EMULATE_GPU=YesPlease valgrind mdrun -nsteps 10 ==13510== Memcheck, a memory error detector ==13510== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. ==13510== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==13510== Command: mdrun -nsteps 10 ==13510== :-) G R O M A C S (-: vex amd64-IR: unhandled instruction bytes: 0xC5 0xFA 0x2A 0xC2 0xC5 0xFA 0x59 0xD ==13510== valgrind: Unrecognised instruction at address 0x5b5ac9d. ==13510==at 0x5B5AC9D: rando (in /usr/lib/libgmx.so.8) ==13510==by 0x5BAB0A4: pukeit (in /usr/lib/libgmx.so.8) ==13510==by 0x5BAB420: bromacs (in /usr/lib/libgmx.so.8) ==13510==by 0x5BAB933: CopyRight (in /usr/lib/libgmx.so.8) ==13510==by 0x438E26: cmain (in /usr/bin/mdrun) ==13510==by 0x65D976C: (below main) (libc-start.c:226) -- Anders Ossowicki -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb
On Thu, Jan 30, 2014 at 2:10 PM, AOWI (Anders Ossowicki) a...@novozymes.com wrote: Does the error happen at step? Assuming the it does occur within the first 10 steps, here are a few things to try: It happens immediately. As in: $ time mdrun snip real0m3.312s user0m6.768s sys 0m1.968s $ Well, with a 24k system a single iteration can be done in 2-3 ms, so those 3.3 seconds are mostly initialization and some number of steps - could be one, ten, or even hundred. - Run cuda-memcheck mdrun -nsteps 10; A wild backtrace appeared! starting mdrun 'RNASE ZF-1A in water' 10 steps, 0.0 ps. = Program hit error 4 on CUDA API call to cudaStreamSynchronize = Saved host backtrace up to driver entry point at error = Host Frame:/usr/lib/nvidia-current/libcuda.so [0x26d660] = Host Frame:/usr/local/cuda-5.5/lib64/libcudart.so.5.5 (cudaStreamSynchronize + 0x15e) [0x36f5e] = Host Frame:/usr/bin/../lib/libmd.so.8 (nbnxn_cuda_wait_gpu + 0x9d9) [0x3bc779] = Host Frame:/usr/bin/../lib/libmd.so.8 (do_force_cutsVERLET + 0x1ff8) [0x275f98] = Host Frame:/usr/bin/../lib/libmd.so.8 (do_force + 0x3bf) [0x27a88f] = Host Frame:mdrun (do_md + 0x7fc7) [0x34267] = Host Frame:mdrun (mdrunner + 0x18a1) [0x11491] = Host Frame:mdrun (cmain + 0x1a30) [0x38cb0] = Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xed) [0x2176d] = Host Frame:mdrun [0x76a1] = snip the usual error That doesn't tell much, could you add a -g to the CXX flags? - Try running with GMX_EMULATE_GPU env. var. set? This will run the GPU acceleration code-path, but will use CPU kernels (equivalent to the CUDA but slow implementation). This seems to run correctly. Does correctly mean that you've checked the results or that it completed without a crash? - Run with GMX_EMULATE_GPU using valgrind: GMX_EMULATE_GPU=1 valgrind mdrun -nsteps 10 Valgrind dies immediately with nztest@ubuntu:~/rnase_bench/rnase_cubic$ GMX_EMULATE_GPU=YesPlease valgrind mdrun -nsteps 10 ==13510== Memcheck, a memory error detector ==13510== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. ==13510== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==13510== Command: mdrun -nsteps 10 ==13510== :-) G R O M A C S (-: vex amd64-IR: unhandled instruction bytes: 0xC5 0xFA 0x2A 0xC2 0xC5 0xFA 0x59 0xD ==13510== valgrind: Unrecognised instruction at address 0x5b5ac9d. ==13510==at 0x5B5AC9D: rando (in /usr/lib/libgmx.so.8) ==13510==by 0x5BAB0A4: pukeit (in /usr/lib/libgmx.so.8) ==13510==by 0x5BAB420: bromacs (in /usr/lib/libgmx.so.8) ==13510==by 0x5BAB933: CopyRight (in /usr/lib/libgmx.so.8) ==13510==by 0x438E26: cmain (in /usr/bin/mdrun) ==13510==by 0x65D976C: (below main) (libc-start.c:226) Yeah, your valgrind does not support encoded instructions (=AVX). Use SSE4.1 on the CPU and AFAIK you may need to set GMX_DISTRIBUTABLE_BINARY=ON. However, I do not expect this to shed more light on the issue. -- Anders Ossowicki -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb
On Thu, Jan 30, 2014 at 4:19 PM, AOWI (Anders Ossowicki) a...@novozymes.com wrote: Well, with a 24k system a single iteration can be done in 2-3 ms, so those 3.3 seconds are mostly initialization and some number of steps - could be one, ten, or even hundred. Sure, but it fails even with -nsteps 1. That doesn't tell much, could you add a -g to the CXX flags? Same thing: There should be line numbers below - and perhaps a bit more information on what's causing the error - at least that's what I'm hoping for. One other thing you could try is to set coulombtype = reaction-field in the mdp file and re-generate the tpr. These runs will use a different CUDA kernel. Just guessing, it may not make much difference at all. starting mdrun 'RNASE ZF-1A in water' 1 steps, 0.0 ps. = Program hit error 4 on CUDA API call to cudaStreamSynchronize = Saved host backtrace up to driver entry point at error = Host Frame:/usr/lib/nvidia-current/libcuda.so [0x26d660] = Host Frame:/usr/local/cuda-5.5/lib64/libcudart.so.5.5 (cudaStreamSynchronize + 0x15e) [0x36f5e] = Host Frame:/usr/bin/../lib/libmd.so.8 (nbnxn_cuda_wait_gpu + 0x222) [0xd45ab5] = Host Frame:/usr/bin/../lib/libmd.so.8 (do_force_cutsVERLET + 0x1d20) [0xc287a5] = Host Frame:/usr/bin/../lib/libmd.so.8 (do_force + 0x15d) [0xc2a986] = Host Frame:mdrun (do_md + 0x3cd4) [0x2450e] = Host Frame:mdrun (mdrunner + 0x1f14) [0x11b50] = Host Frame:mdrun (cmain + 0x1dee) [0x2a57d] = Host Frame:mdrun (main + 0x20) [0x31c18] = Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xed) [0x2176d] = Host Frame:mdrun [0x75e9] = - Try running with GMX_EMULATE_GPU env. var. set? This will run the GPU acceleration code-path, but will use CPU kernels (equivalent to the CUDA but slow implementation). This seems to run correctly. Does correctly mean that you've checked the results or that it completed without a crash? Just the latter. -- Anders Ossowicki -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] [gmx-developers] Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb
Hi Anders, This mail belongs to the users' list. This type of error is typically a sign of the CUDA kernel failing due to a nasty bug in the code or hardware error. The dmesg message is suspicious and it may be a hint of hardware error (see https://www.kubuntuforums.net/showthread.php?64133-kwin-crashes-repeatedly) I would not make any assumptions though, but rather try a few things first: - Does the card pass a memtest (sourceforge.net/projects/cudagpumemtest/)? - Does the installation pass the regressiontests? - Is the error reproducible with other inputs? Also note that with the default invocation of mdrun you are attempting to use all cores/hardware threads in your machine (I assume a 2x12-core IVB-E node with HT on). This requires a huge number of OpenMP threads that will lead to pretty bad performance in the CPU code. Typically one to one CPU-GPU ratio is decent, especially with fast Intel Xeons. Use only one socket for your tests or if you plan to use a single GPU per node, at least use only 1 thread/core - 24 threads in total. Cheers, -- Szilárd On Wed, Jan 29, 2014 at 7:07 PM, AOWI (Anders Ossowicki) a...@novozymes.com wrote: Hello, We are testing out Gromacs 4.6.5 with an Nvidia K20 card. We keep running into the error message below, no matter which setup we're trying. In the included case, it was the RNAse example from http://www.gromacs.org/GPU_acceleration. Furthermore, we get the following line in dmesg as well: NVRM: GPU at :42:00: GPU-d0b07804-027a-5a02-43bc-fd7dc9064637 NVRM: Xid (:42:00): 31, Ch 0003, engmask 0101, intr 1000 Are we just completely out of luck with this card, or have we done something wrong? We've built Gromacs from source against the cuda 5.5 libraries straight from Nvidia. The system is Ubuntu 12.04. Gromacs works fine when it's not using the GPU. The card identifies itself as NVIDIA Corporation GK110GL [Tesla K20m] (rev a1) This is what we've done to trigger the error: $ grompp -f pme_verlet.mdp -c conf.gro -p topol.top $ mdrun Here is the output from mdrun. The error message tells me absolutely nothing, so any advice on how to proceed with debugging this would be much appreciated. Reading file topol.tpr, VERSION 4.6.5 (single precision) Changing nstlist from 10 to 40, rlist from 0.9 to 0.996 Using 1 MPI thread Using 48 OpenMP threads 1 GPU detected: #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible 1 GPU auto-selected for this run. Mapping of GPU to the 1 PP rank in this node: #0 Back Off! I just backed up ener.edr to ./#ener.edr.1# starting mdrun 'RNASE ZF-1A in water' 1 steps, 20.0 ps. --- Program mdrun, VERSION 4.6.5 Source code file: /home/nztest/src/gromacs-4.6.5/src/mdlib/nbnxn_cuda/nbnxn_cuda.cu, line: 591 Fatal error: cudaStreamSynchronize failed in cu_blockwait_nb: unspecified launch failure For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors --- Thanks in advance! -- Best Regards Anders Ossowicki -- Gromacs Developers mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.