Re: [PyCUDA] Question How to Safely Use pycuda with mpi4py + CUDA6 (pycuda._driver.LaunchError: cuMemcpyDtoH failed: launch timeout)

Eric Larson Thu, 06 Nov 2014 08:28:06 -0800

If it's not too much hassle, you could try uninstalling all CUDA5-related
system packages to ensure that PyCUDA is linking to the appropriate CUDA6
library, headers, etc., but I doubt that's actually your problem.


Eric


On Thu, Nov 6, 2014 at 8:10 AM, kjs <[email protected]> wrote:

> In the routine I describe below, I am beginning to see the following
> error. Please note, I was able to successfully run this routine all the
> way though when PyCUDA was linked to system CUDA5. The errors started
> popping up after I installed CUDA6 system wide and thus recompiled
> PyCUDA. I am running Debian Testing.
>
> Traceback (most recent call last):
>   File "feature_extractor.py", line 475, in <module>
>     main()
>   File "feature_extractor.py", line 467, in main
>     fe.set_features(fname[0])
>   File "feature_extractor.py", line 51, in set_features
>     self.apply_filters()
>   File "feature_extractor.py", line 99, in apply_filters
>     n_jobs='cuda', copy = False, verbose=False)
>   File "<string>", line 2, in band_stop_filter
>   File
>
> "/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/mne-0.9.git-py2.7.egg/mne/utils.py",
> line 509, in verbose
>     return function(*args, **kwargs)
>   File
>
> "/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/mne-0.9.git-py2.7.egg/mne/filter.py",
> line 742, in band_stop_filter
>     xf = _filter(x, Fs, freq, gain, filter_length, picks, n_jobs, copy)
>   File
>
> "/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/mne-0.9.git-py2.7.egg/mne/filter.py",
> line 345, in _filter
>     n_jobs=n_jobs)
>   File
>
> "/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/mne-0.9.git-py2.7.egg/mne/filter.py",
> line 141, in _overlap_add_filter
>     n_segments, n_seg, cuda_dict)
>   File
>
> "/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/mne-0.9.git-py2.7.egg/mne/filter.py",
> line 173, in _1d_overlap_filter
>     prod = fft_multiply_repeated(h_fft, seg, cuda_dict)
>   File
>
> "/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/mne-0.9.git-py2.7.egg/mne/cuda.py",
> line 196, in fft_multiply_repeated
>     x = np.array(cuda_dict['x'].get(), dtype=x.dtype, subok=True,
>   File
>
> "/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/gpuarray.py",
> line 264, in get
>     drv.memcpy_dtoh(ary, self.gpudata)
> pycuda._driver.LaunchError: cuMemcpyDtoH failed: launch timeout
> PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
> cuMemFree failed: launch timeout
> PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
> cuMemFree failed: launch timeout
> PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
> cuMemFree failed: launch timeout
> PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
> cuModuleUnload failed: launch timeout
>
> Thanks,
> Kevin
>
>
> kjs wrote:
> >
> >
> > Eric Larson wrote:
> >> Hey Kevin,
> >>
> >> Not sure about the CUDA limitations, I'll let others speak to that...
> >>
> >> But in developing the mne-python CUDA filtering code, IIRC the primary
> >> limitation was (by far) transferring the data to and from the GPU. The
> FFT
> >> computations themselves were a fraction of the total time. I suspect
> using
> >> multiple jobs won't help CUDA filtering very much since the jobs would
> >> presumably compete for the same memory bandwidth, but I would love to be
> >> wrong about this. If it works better, it would be great to open an
> >> mne-python issue for it, as we are always looking for speedups :)
> >>
> >> Cheers,
> >> Eric
> >> On Nov 1, 2014 7:21 PM, "kjs" <[email protected]> wrote:
> >>
> >>> Hello,
> >>>
> >>> I have written an MPI routine in Python that sends jobs to N worker
> >>> processes. The root process handles file IO and the workers do
> >>> computation. In the worker processes calls are made to the cuda enabled
> >>> GPU to do FFTs.
> >>>
> >>> Is it safe to have N processes potentially making calls to the same GPU
> >>> at the same time? I have not made any amendments to the cuda code[0],
> >>> and have little knowledge of what could possibly go wrong.
> >>>
> >>> Thanks much,
> >>> Kevin
> >>>
> >>> [0] I am using python-mne with cuda enabled to call scikits.cuda.fft
> >>> https://github.com/mne-tools/mne-python/blob/master/mne/cuda.py
> >>>
> >>> _______________________________________________
> >>> PyCUDA mailing list
> >>> [email protected]
> >>> http://lists.tiker.net/listinfo/pycuda
> >>>
> >>>
> >>
> >
> > Thanks Andreas, this is good to know. I noticed that even though pycuda
> > is currently only using one of two GPUs, that GPU is only ever at ~35%
> > memory and ~22% processing utilization. This could be related to Eric's
> > observation that the PCI-e 16x bus bandwidth reaches capacity while the
> > GPU is pushing out fast FFT'ed arrays. Thus allowing for only one or two
> > arrays in the GPU at the same time.
> >
> > From what I have seen, using cuda speeds up my FFTs ~2x. Though, the
> > workers do many other computations on the CPU. It's a worst case
> > scenario that all N workers are trying to send data to the GPU at the
> > same time.
> >
> > -Kevin
> >
> >
> >
> > _______________________________________________
> > PyCUDA mailing list
> > [email protected]
> > http://lists.tiker.net/listinfo/pycuda
> >
>
> _______________________________________________
> PyCUDA mailing list
> [email protected]
> http://lists.tiker.net/listinfo/pycuda
>
>

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Question How to Safely Use pycuda with mpi4py + CUDA6 (pycuda._driver.LaunchError: cuMemcpyDtoH failed: launch timeout)

Reply via email to