What is the output of "nvidia-smi" On Thu, Apr 27, 2017 at 3:53 PM anurag kumar <[email protected]> wrote:
> By the way, I am aware of the other question which is similar ( > https://groups.google.com/forum/#!topic/theano-users/l9FlhYIiWMo). But I > could not see a definite answer in that. > > > On Thursday, April 27, 2017 at 3:51:31 PM UTC-4, anurag kumar wrote: >> >> Hi, >> I have 4 Tesla K80 gpus on my system. I want to run different process on >> them at the same time. Basically network with different parameters so that >> I can train 4 different network at the same time. >> >> I have libgpuarray and pygpu installed. The first job >> as THEANO_FLAGS=device=cuda0 python training_run.py runs just fine and uses >> first gpu. >> >> But when I try to use the second gpu as THEANO_FLAGS=device=cuda1 python >> training_run.py it gives the error below and falls back on cpu. >> >> -------------------------- >> ERROR (theano.gpuarray): Could not initialize pygpu, support disabled >> Traceback (most recent call last): >> File >> "/usr/local/lib/python2.7/dist-packages/theano/gpuarray/__init__.py", line >> 164, in <module> >> use(config.device) >> File >> "/usr/local/lib/python2.7/dist-packages/theano/gpuarray/__init__.py", line >> 151, in use >> init_dev(device) >> File >> "/usr/local/lib/python2.7/dist-packages/theano/gpuarray/__init__.py", line >> 60, in init_dev >> sched=config.gpuarray.sched) >> File "pygpu/gpuarray.pyx", line 634, in pygpu.gpuarray.init >> (pygpu/gpuarray.c:9417) >> File "pygpu/gpuarray.pyx", line 584, in pygpu.gpuarray.pygpu_init >> (pygpu/gpuarray.c:9108) >> File "pygpu/gpuarray.pyx", line 1060, in >> pygpu.gpuarray.GpuContext.__cinit__ (pygpu/gpuarray.c:13470) >> GpuArrayException: cuMemAllocHost: CUDA_ERROR_MAP_FAILED: mapping of >> buffer object failed: 1 >> --------------------------- >> >> >> *Is there a solution for this ?* >> >> I tried *using the older cuda backend* as THEANO_FLAGS=device=gpu0 >> python training_run.py. In this cases I am able to successfully use first >> (gpu0) and second (gpu1) but when I try running the 3rd job on gpu2 >> as THEANO_FLAGS=device=gpu2 python training_run.py, it gives the following >> error >> >> Traceback (most recent call last): >> File "training_run.py", line 1, in <module> >> import lasagne.layers as layers >> File "/usr/local/lib/python2.7/dist-packages/lasagne/__init__.py", line >> 12, in <module> >> import theano >> File "/usr/local/lib/python2.7/dist-packages/theano/__init__.py", line >> 108, in <module> >> import theano.sandbox.cuda >> File >> "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/__init__.py", >> line 728, in <module> >> use(device=config.device, force=config.force_device, >> test_driver=False) >> File >> "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/__init__.py", >> line 518, in use >> cuda_initialization_error_message)) >> *EnvironmentError: You forced the use of gpu device gpu0, but CUDA >> initialization failed with error:* >> *Unable to get the number of gpus available: OS call failed or operation >> not supported on this OS* >> >> This is very strange. By the way, all my gpus works just fine. So there >> is no problem with gpu2. In fact if I run first 2 jobs on gpu1 and gpu2 and >> then try using gpu0 it gives the same error. >> >> >> Any help is much appreciated. >> >> Thanks, >> Anurag >> >> >> -- > > --- > You received this message because you are subscribed to the Google Groups > "theano-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
