Re: [theano-users] Re: Using all gpus - Different process on different gpus

anurag kumar Thu, 27 Apr 2017 19:16:54 -0700

This is the one while using 2 gpus. Please ignore the previous one.  

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                 
   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. 
ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute 
M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 0000:05:00.0     Off |                   
 0 |
| N/A   68C    P0   134W / 149W |   3320MiB / 11439MiB |     93%     
 Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 0000:06:00.0     Off |                   
 0 |
| N/A   51C    P0   148W / 149W |   5188MiB / 11439MiB |    100%     
 Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 0000:84:00.0     Off |                   
 0 |
| N/A   26C    P8    26W / 149W |     82MiB / 11439MiB |      0%     
 Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 0000:85:00.0     Off |                   
 0 |
| N/A   24C    P8    29W / 149W |     82MiB / 11439MiB |      0%     
 Default |
+-------------------------------+----------------------+----------------------+


+-----------------------------------------------------------------------------+
| Processes:                                                       GPU 
Memory |
|  GPU       PID  Type  Process name                               Usage   
   |
|=============================================================================|
|    0     18956    C   nvidia-cuda-mps-server                       
 3318MiB |
|    1     18956    C   nvidia-cuda-mps-server                       
 5696MiB |
|    2     18956    C   nvidia-cuda-mps-server                         
 80MiB |
|    3     18956    C   nvidia-cuda-mps-server                         
 80MiB |
+-----------------------------------------------------------------------------+


















On Thursday, April 27, 2017 at 10:14:29 PM UTC-4, anurag kumar wrote:
>
>
> +-----------------------------------------------------------------------------+
> | NVIDIA-SMI 375.26                 Driver Version: 375.26                 
>    |
>
> |-------------------------------+----------------------+----------------------+
> | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. 
> ECC |
> | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute 
> M. |
>
> |===============================+======================+======================|
> |   0  Tesla K80           Off  | 0000:05:00.0     Off |                   
>  0 |
> | N/A   68C    P0   134W / 149W |   82MiB / 11439MiB |     93%     
>  Default |
>
> +-------------------------------+----------------------+----------------------+
> |   1  Tesla K80           Off  | 0000:06:00.0     Off |                   
>  0 |
> | N/A   51C    P0   148W / 149W |   82MiB / 11439MiB |    100%     
>  Default |
>
> +-------------------------------+----------------------+----------------------+
> |   2  Tesla K80           Off  | 0000:84:00.0     Off |                   
>  0 |
> | N/A   26C    P8    26W / 149W |     82MiB / 11439MiB |      0%     
>  Default |
>
> +-------------------------------+----------------------+----------------------+
> |   3  Tesla K80           Off  | 0000:85:00.0     Off |                   
>  0 |
> | N/A   24C    P8    29W / 149W |     82MiB / 11439MiB |      0%     
>  Default |
>
> +-------------------------------+----------------------+----------------------+
>
>
> +-----------------------------------------------------------------------------+
> | Processes:                                                       GPU 
> Memory |
> |  GPU       PID  Type  Process name                               Usage   
>    |
>
> |=============================================================================|
> |    0     18956    C   nvidia-cuda-mps-server                         
>  80MiB |
> |    1     18956    C   nvidia-cuda-mps-server                         
>  80MiB |
> |    2     18956    C   nvidia-cuda-mps-server                         
>  80MiB |
> |    3     18956    C   nvidia-cuda-mps-server                         
>  80MiB |
>
> +-----------------------------------------------------------------------------+
>
>
>
>
> On Thursday, April 27, 2017 at 5:48:15 PM UTC-4, nouiz wrote:
>>
>> What is the output of "nvidia-smi"
>>
>> On Thu, Apr 27, 2017 at 3:53 PM anurag kumar <[email protected]> wrote:
>>
>>> By the way, I am aware of the other question which is similar (
>>> https://groups.google.com/forum/#!topic/theano-users/l9FlhYIiWMo). But 
>>> I could not see a definite answer in that. 
>>>
>>>
>>> On Thursday, April 27, 2017 at 3:51:31 PM UTC-4, anurag kumar wrote:
>>>>
>>>> Hi,
>>>> I have 4 Tesla K80 gpus on my system. I want to run different process 
>>>> on them at the same time. Basically network with different parameters so 
>>>> that I can train 4 different network at the same time. 
>>>>
>>>> I have libgpuarray and pygpu installed. The first job 
>>>> as THEANO_FLAGS=device=cuda0 python training_run.py runs just fine and 
>>>> uses 
>>>> first gpu.  
>>>>
>>>> But when I try to use the second gpu as THEANO_FLAGS=device=cuda1 
>>>> python training_run.py it gives the error below and falls back on cpu. 
>>>>
>>>> --------------------------
>>>> ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
>>>> Traceback (most recent call last):
>>>>   File 
>>>> "/usr/local/lib/python2.7/dist-packages/theano/gpuarray/__init__.py", line 
>>>> 164, in <module>
>>>>     use(config.device)
>>>>   File 
>>>> "/usr/local/lib/python2.7/dist-packages/theano/gpuarray/__init__.py", line 
>>>> 151, in use
>>>>     init_dev(device)
>>>>   File 
>>>> "/usr/local/lib/python2.7/dist-packages/theano/gpuarray/__init__.py", line 
>>>> 60, in init_dev
>>>>     sched=config.gpuarray.sched)
>>>>   File "pygpu/gpuarray.pyx", line 634, in pygpu.gpuarray.init 
>>>> (pygpu/gpuarray.c:9417)
>>>>   File "pygpu/gpuarray.pyx", line 584, in pygpu.gpuarray.pygpu_init 
>>>> (pygpu/gpuarray.c:9108)
>>>>   File "pygpu/gpuarray.pyx", line 1060, in 
>>>> pygpu.gpuarray.GpuContext.__cinit__ (pygpu/gpuarray.c:13470)
>>>> GpuArrayException: cuMemAllocHost: CUDA_ERROR_MAP_FAILED: mapping of 
>>>> buffer object failed: 1
>>>> ---------------------------
>>>>
>>>>
>>>> *Is there a solution for this ?*
>>>>
>>>> I tried *using the older cuda backend* as THEANO_FLAGS=device=gpu0 
>>>> python training_run.py. In this cases I am able to successfully use first 
>>>> (gpu0) and second (gpu1) but when I try running the 3rd job on gpu2 
>>>> as THEANO_FLAGS=device=gpu2 python training_run.py,  it gives the 
>>>> following 
>>>> error
>>>>
>>>> Traceback (most recent call last):
>>>>   File "training_run.py", line 1, in <module>
>>>>     import lasagne.layers as layers
>>>>   File "/usr/local/lib/python2.7/dist-packages/lasagne/__init__.py", 
>>>> line 12, in <module>
>>>>     import theano
>>>>   File "/usr/local/lib/python2.7/dist-packages/theano/__init__.py", 
>>>> line 108, in <module>
>>>>     import theano.sandbox.cuda
>>>>   File 
>>>> "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/__init__.py", 
>>>> line 728, in <module>
>>>>     use(device=config.device, force=config.force_device, 
>>>> test_driver=False)
>>>>   File 
>>>> "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/__init__.py", 
>>>> line 518, in use
>>>>     cuda_initialization_error_message))
>>>> *EnvironmentError: You forced the use of gpu device gpu0, but CUDA 
>>>> initialization failed with error:*
>>>> *Unable to get the number of gpus available: OS call failed or 
>>>> operation not supported on this OS*
>>>>
>>>> This is very strange. By the way, all my gpus works just fine. So there 
>>>> is no problem with gpu2. In fact if I run first 2 jobs on gpu1 and gpu2 
>>>> and 
>>>> then try using gpu0 it gives the same error.
>>>>
>>>>
>>>> Any help is much appreciated. 
>>>>
>>>> Thanks,
>>>> Anurag
>>>>
>>>>
>>>> -- 
>>>
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "theano-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] Re: Using all gpus - Different process on different gpus

Reply via email to