Re: [PyCUDA] Context being sporadically destroyed when using multiple threads and contexts

2018-11-08 Thread Andreas Kloeckner
Noah Young  writes:
> I'm trying to run jobs on several GPUs at the same time using multiple
> threads, each with its own context. Sometimes this works flawlessly, but
> ~75% of the time I get a cuModuleLoadDataEx error telling me the context
> has been destroyed. What's frustrating is that nothing changes between
> failed and successful runs of the code. From what I can tell it's down to
> luck whether or not the error comes up:

"Context destroyed" is akin to a segmentation fault on the CPU. You
should find evidence that your code performed an illegal access, e.g.,
using 'dmesg' in the kernel log. (If you see a message "NVRM Xid ...",
that points to the problem) My first suspicion would be a bug in your
code.

Andreas


signature.asc
Description: PGP signature
___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda


[PyCUDA] Context being sporadically destroyed when using multiple threads and contexts

2018-11-08 Thread Noah Young
I'm trying to run jobs on several GPUs at the same time using multiple
threads, each with its own context. Sometimes this works flawlessly, but
~75% of the time I get a cuModuleLoadDataEx error telling me the context
has been destroyed. What's frustrating is that nothing changes between
failed and successful runs of the code. From what I can tell it's down to
luck whether or not the error comes up:

~/anaconda3/lib/python3.6/site-packages/pycuda/compiler.py in
__init__(self, source, nvcc, options, keep, no_extern_c, arch, code,
cache_dir, include_dirs)292 293 from pycuda.driver
import module_from_buffer--> 294 self.module =
module_from_buffer(cubin)295 296 self._bind_module()
LogicError: cuModuleLoadDataEx failed: context is destroyed -


I start by making the contexts

from pycuda import driver as cuda
cuda.init()
contexts = []
for i in range(cuda.Device.count()):
c = cuda.Device(i).make_context()
c.pop()
contexts.append(c)

... and setting up a function to use each context, i.e.

import numpy as np
def do_work(ctx):
with Acquire(ctx):
a = gpuarray.to_gpu(np.random.rand(100, 400, 400))
b = gpuarray.to_gpu(np.random.rand(100, 400, 400))
for _ in range(10):
c = (a + b) / 2
out = c.get()
return out

where `Acquire` is a context manager that handles pushing and popping:

class Acquire:
def __init__(self, context):
self.ctx = context
def __enter__(self):
self.ctx.push()
return self.ctx
def __exit__(self, type, value, traceback):
self.ctx.pop()

and here I run the code in parallel using a pool of threaded workers via
joblib

from joblib import Parallel, delayed
pool = Parallel(n_jobs=len(contexts), verbose=8, prefer='threads')
with pool:
# Pass 1
sum(pool(delayed(do_work)(ctx) for ctx in contexts))
# Pass 2
sum(pool(delayed(do_work)(ctx) for ctx in contexts))

Note that I do several "passes" of work (I'll need to do 50 or so in my
real application) with the same thread pool. It seems like the crash always
happens somewhere in the second pass, or not at all. Any ideas about how to
keep my contexts from getting destroyed?

*System info*
Ubuntu 16.04 (Amazon Deep Learning AMI)
CUDA driver version 396.44
4x V100 GPUs
Python 3.6
pycuda version 2018.1.1
___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda