If it's not too much hassle, you could try uninstalling all CUDA5-related system packages to ensure that PyCUDA is linking to the appropriate CUDA6 library, headers, etc., but I doubt that's actually your problem.
Eric On Thu, Nov 6, 2014 at 8:10 AM, kjs <[email protected]> wrote: > In the routine I describe below, I am beginning to see the following > error. Please note, I was able to successfully run this routine all the > way though when PyCUDA was linked to system CUDA5. The errors started > popping up after I installed CUDA6 system wide and thus recompiled > PyCUDA. I am running Debian Testing. > > Traceback (most recent call last): > File "feature_extractor.py", line 475, in <module> > main() > File "feature_extractor.py", line 467, in main > fe.set_features(fname[0]) > File "feature_extractor.py", line 51, in set_features > self.apply_filters() > File "feature_extractor.py", line 99, in apply_filters > n_jobs='cuda', copy = False, verbose=False) > File "<string>", line 2, in band_stop_filter > File > > "/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/mne-0.9.git-py2.7.egg/mne/utils.py", > line 509, in verbose > return function(*args, **kwargs) > File > > "/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/mne-0.9.git-py2.7.egg/mne/filter.py", > line 742, in band_stop_filter > xf = _filter(x, Fs, freq, gain, filter_length, picks, n_jobs, copy) > File > > "/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/mne-0.9.git-py2.7.egg/mne/filter.py", > line 345, in _filter > n_jobs=n_jobs) > File > > "/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/mne-0.9.git-py2.7.egg/mne/filter.py", > line 141, in _overlap_add_filter > n_segments, n_seg, cuda_dict) > File > > "/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/mne-0.9.git-py2.7.egg/mne/filter.py", > line 173, in _1d_overlap_filter > prod = fft_multiply_repeated(h_fft, seg, cuda_dict) > File > > "/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/mne-0.9.git-py2.7.egg/mne/cuda.py", > line 196, in fft_multiply_repeated > x = np.array(cuda_dict['x'].get(), dtype=x.dtype, subok=True, > File > > "/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/gpuarray.py", > line 264, in get > drv.memcpy_dtoh(ary, self.gpudata) > pycuda._driver.LaunchError: cuMemcpyDtoH failed: launch timeout > PyCUDA WARNING: a clean-up operation failed (dead context maybe?) > cuMemFree failed: launch timeout > PyCUDA WARNING: a clean-up operation failed (dead context maybe?) > cuMemFree failed: launch timeout > PyCUDA WARNING: a clean-up operation failed (dead context maybe?) > cuMemFree failed: launch timeout > PyCUDA WARNING: a clean-up operation failed (dead context maybe?) > cuModuleUnload failed: launch timeout > > Thanks, > Kevin > > > kjs wrote: > > > > > > Eric Larson wrote: > >> Hey Kevin, > >> > >> Not sure about the CUDA limitations, I'll let others speak to that... > >> > >> But in developing the mne-python CUDA filtering code, IIRC the primary > >> limitation was (by far) transferring the data to and from the GPU. The > FFT > >> computations themselves were a fraction of the total time. I suspect > using > >> multiple jobs won't help CUDA filtering very much since the jobs would > >> presumably compete for the same memory bandwidth, but I would love to be > >> wrong about this. If it works better, it would be great to open an > >> mne-python issue for it, as we are always looking for speedups :) > >> > >> Cheers, > >> Eric > >> On Nov 1, 2014 7:21 PM, "kjs" <[email protected]> wrote: > >> > >>> Hello, > >>> > >>> I have written an MPI routine in Python that sends jobs to N worker > >>> processes. The root process handles file IO and the workers do > >>> computation. In the worker processes calls are made to the cuda enabled > >>> GPU to do FFTs. > >>> > >>> Is it safe to have N processes potentially making calls to the same GPU > >>> at the same time? I have not made any amendments to the cuda code[0], > >>> and have little knowledge of what could possibly go wrong. > >>> > >>> Thanks much, > >>> Kevin > >>> > >>> [0] I am using python-mne with cuda enabled to call scikits.cuda.fft > >>> https://github.com/mne-tools/mne-python/blob/master/mne/cuda.py > >>> > >>> _______________________________________________ > >>> PyCUDA mailing list > >>> [email protected] > >>> http://lists.tiker.net/listinfo/pycuda > >>> > >>> > >> > > > > Thanks Andreas, this is good to know. I noticed that even though pycuda > > is currently only using one of two GPUs, that GPU is only ever at ~35% > > memory and ~22% processing utilization. This could be related to Eric's > > observation that the PCI-e 16x bus bandwidth reaches capacity while the > > GPU is pushing out fast FFT'ed arrays. Thus allowing for only one or two > > arrays in the GPU at the same time. > > > > From what I have seen, using cuda speeds up my FFTs ~2x. Though, the > > workers do many other computations on the CPU. It's a worst case > > scenario that all N workers are trying to send data to the GPU at the > > same time. > > > > -Kevin > > > > > > > > _______________________________________________ > > PyCUDA mailing list > > [email protected] > > http://lists.tiker.net/listinfo/pycuda > > > > _______________________________________________ > PyCUDA mailing list > [email protected] > http://lists.tiker.net/listinfo/pycuda > >
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
