What is the output of "nvidia-smi"

On Thu, Apr 27, 2017 at 3:53 PM anurag kumar <[email protected]> wrote:

> By the way, I am aware of the other question which is similar (
> https://groups.google.com/forum/#!topic/theano-users/l9FlhYIiWMo). But I
> could not see a definite answer in that.
>
>
> On Thursday, April 27, 2017 at 3:51:31 PM UTC-4, anurag kumar wrote:
>>
>> Hi,
>> I have 4 Tesla K80 gpus on my system. I want to run different process on
>> them at the same time. Basically network with different parameters so that
>> I can train 4 different network at the same time.
>>
>> I have libgpuarray and pygpu installed. The first job
>> as THEANO_FLAGS=device=cuda0 python training_run.py runs just fine and uses
>> first gpu.
>>
>> But when I try to use the second gpu as THEANO_FLAGS=device=cuda1 python
>> training_run.py it gives the error below and falls back on cpu.
>>
>> --------------------------
>> ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
>> Traceback (most recent call last):
>>   File
>> "/usr/local/lib/python2.7/dist-packages/theano/gpuarray/__init__.py", line
>> 164, in <module>
>>     use(config.device)
>>   File
>> "/usr/local/lib/python2.7/dist-packages/theano/gpuarray/__init__.py", line
>> 151, in use
>>     init_dev(device)
>>   File
>> "/usr/local/lib/python2.7/dist-packages/theano/gpuarray/__init__.py", line
>> 60, in init_dev
>>     sched=config.gpuarray.sched)
>>   File "pygpu/gpuarray.pyx", line 634, in pygpu.gpuarray.init
>> (pygpu/gpuarray.c:9417)
>>   File "pygpu/gpuarray.pyx", line 584, in pygpu.gpuarray.pygpu_init
>> (pygpu/gpuarray.c:9108)
>>   File "pygpu/gpuarray.pyx", line 1060, in
>> pygpu.gpuarray.GpuContext.__cinit__ (pygpu/gpuarray.c:13470)
>> GpuArrayException: cuMemAllocHost: CUDA_ERROR_MAP_FAILED: mapping of
>> buffer object failed: 1
>> ---------------------------
>>
>>
>> *Is there a solution for this ?*
>>
>> I tried *using the older cuda backend* as THEANO_FLAGS=device=gpu0
>> python training_run.py. In this cases I am able to successfully use first
>> (gpu0) and second (gpu1) but when I try running the 3rd job on gpu2
>> as THEANO_FLAGS=device=gpu2 python training_run.py,  it gives the following
>> error
>>
>> Traceback (most recent call last):
>>   File "training_run.py", line 1, in <module>
>>     import lasagne.layers as layers
>>   File "/usr/local/lib/python2.7/dist-packages/lasagne/__init__.py", line
>> 12, in <module>
>>     import theano
>>   File "/usr/local/lib/python2.7/dist-packages/theano/__init__.py", line
>> 108, in <module>
>>     import theano.sandbox.cuda
>>   File
>> "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/__init__.py",
>> line 728, in <module>
>>     use(device=config.device, force=config.force_device,
>> test_driver=False)
>>   File
>> "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/__init__.py",
>> line 518, in use
>>     cuda_initialization_error_message))
>> *EnvironmentError: You forced the use of gpu device gpu0, but CUDA
>> initialization failed with error:*
>> *Unable to get the number of gpus available: OS call failed or operation
>> not supported on this OS*
>>
>> This is very strange. By the way, all my gpus works just fine. So there
>> is no problem with gpu2. In fact if I run first 2 jobs on gpu1 and gpu2 and
>> then try using gpu0 it gives the same error.
>>
>>
>> Any help is much appreciated.
>>
>> Thanks,
>> Anurag
>>
>>
>> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to