This is the one while using 2 gpus. Please ignore the previous one. +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.26 Driver Version: 375.26 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 Off | 0000:05:00.0 Off | 0 | | N/A 68C P0 134W / 149W | 3320MiB / 11439MiB | 93% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K80 Off | 0000:06:00.0 Off | 0 | | N/A 51C P0 148W / 149W | 5188MiB / 11439MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K80 Off | 0000:84:00.0 Off | 0 | | N/A 26C P8 26W / 149W | 82MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K80 Off | 0000:85:00.0 Off | 0 | | N/A 24C P8 29W / 149W | 82MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 18956 C nvidia-cuda-mps-server 3318MiB | | 1 18956 C nvidia-cuda-mps-server 5696MiB | | 2 18956 C nvidia-cuda-mps-server 80MiB | | 3 18956 C nvidia-cuda-mps-server 80MiB | +-----------------------------------------------------------------------------+ On Thursday, April 27, 2017 at 10:14:29 PM UTC-4, anurag kumar wrote: > > > +-----------------------------------------------------------------------------+ > | NVIDIA-SMI 375.26 Driver Version: 375.26 > | > > |-------------------------------+----------------------+----------------------+ > | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. > ECC | > | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute > M. | > > |===============================+======================+======================| > | 0 Tesla K80 Off | 0000:05:00.0 Off | > 0 | > | N/A 68C P0 134W / 149W | 82MiB / 11439MiB | 93% > Default | > > +-------------------------------+----------------------+----------------------+ > | 1 Tesla K80 Off | 0000:06:00.0 Off | > 0 | > | N/A 51C P0 148W / 149W | 82MiB / 11439MiB | 100% > Default | > > +-------------------------------+----------------------+----------------------+ > | 2 Tesla K80 Off | 0000:84:00.0 Off | > 0 | > | N/A 26C P8 26W / 149W | 82MiB / 11439MiB | 0% > Default | > > +-------------------------------+----------------------+----------------------+ > | 3 Tesla K80 Off | 0000:85:00.0 Off | > 0 | > | N/A 24C P8 29W / 149W | 82MiB / 11439MiB | 0% > Default | > > +-------------------------------+----------------------+----------------------+ > > > +-----------------------------------------------------------------------------+ > | Processes: GPU > Memory | > | GPU PID Type Process name Usage > | > > |=============================================================================| > | 0 18956 C nvidia-cuda-mps-server > 80MiB | > | 1 18956 C nvidia-cuda-mps-server > 80MiB | > | 2 18956 C nvidia-cuda-mps-server > 80MiB | > | 3 18956 C nvidia-cuda-mps-server > 80MiB | > > +-----------------------------------------------------------------------------+ > > > > > On Thursday, April 27, 2017 at 5:48:15 PM UTC-4, nouiz wrote: >> >> What is the output of "nvidia-smi" >> >> On Thu, Apr 27, 2017 at 3:53 PM anurag kumar <[email protected]> wrote: >> >>> By the way, I am aware of the other question which is similar ( >>> https://groups.google.com/forum/#!topic/theano-users/l9FlhYIiWMo). But >>> I could not see a definite answer in that. >>> >>> >>> On Thursday, April 27, 2017 at 3:51:31 PM UTC-4, anurag kumar wrote: >>>> >>>> Hi, >>>> I have 4 Tesla K80 gpus on my system. I want to run different process >>>> on them at the same time. Basically network with different parameters so >>>> that I can train 4 different network at the same time. >>>> >>>> I have libgpuarray and pygpu installed. The first job >>>> as THEANO_FLAGS=device=cuda0 python training_run.py runs just fine and >>>> uses >>>> first gpu. >>>> >>>> But when I try to use the second gpu as THEANO_FLAGS=device=cuda1 >>>> python training_run.py it gives the error below and falls back on cpu. >>>> >>>> -------------------------- >>>> ERROR (theano.gpuarray): Could not initialize pygpu, support disabled >>>> Traceback (most recent call last): >>>> File >>>> "/usr/local/lib/python2.7/dist-packages/theano/gpuarray/__init__.py", line >>>> 164, in <module> >>>> use(config.device) >>>> File >>>> "/usr/local/lib/python2.7/dist-packages/theano/gpuarray/__init__.py", line >>>> 151, in use >>>> init_dev(device) >>>> File >>>> "/usr/local/lib/python2.7/dist-packages/theano/gpuarray/__init__.py", line >>>> 60, in init_dev >>>> sched=config.gpuarray.sched) >>>> File "pygpu/gpuarray.pyx", line 634, in pygpu.gpuarray.init >>>> (pygpu/gpuarray.c:9417) >>>> File "pygpu/gpuarray.pyx", line 584, in pygpu.gpuarray.pygpu_init >>>> (pygpu/gpuarray.c:9108) >>>> File "pygpu/gpuarray.pyx", line 1060, in >>>> pygpu.gpuarray.GpuContext.__cinit__ (pygpu/gpuarray.c:13470) >>>> GpuArrayException: cuMemAllocHost: CUDA_ERROR_MAP_FAILED: mapping of >>>> buffer object failed: 1 >>>> --------------------------- >>>> >>>> >>>> *Is there a solution for this ?* >>>> >>>> I tried *using the older cuda backend* as THEANO_FLAGS=device=gpu0 >>>> python training_run.py. In this cases I am able to successfully use first >>>> (gpu0) and second (gpu1) but when I try running the 3rd job on gpu2 >>>> as THEANO_FLAGS=device=gpu2 python training_run.py, it gives the >>>> following >>>> error >>>> >>>> Traceback (most recent call last): >>>> File "training_run.py", line 1, in <module> >>>> import lasagne.layers as layers >>>> File "/usr/local/lib/python2.7/dist-packages/lasagne/__init__.py", >>>> line 12, in <module> >>>> import theano >>>> File "/usr/local/lib/python2.7/dist-packages/theano/__init__.py", >>>> line 108, in <module> >>>> import theano.sandbox.cuda >>>> File >>>> "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/__init__.py", >>>> line 728, in <module> >>>> use(device=config.device, force=config.force_device, >>>> test_driver=False) >>>> File >>>> "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/__init__.py", >>>> line 518, in use >>>> cuda_initialization_error_message)) >>>> *EnvironmentError: You forced the use of gpu device gpu0, but CUDA >>>> initialization failed with error:* >>>> *Unable to get the number of gpus available: OS call failed or >>>> operation not supported on this OS* >>>> >>>> This is very strange. By the way, all my gpus works just fine. So there >>>> is no problem with gpu2. In fact if I run first 2 jobs on gpu1 and gpu2 >>>> and >>>> then try using gpu0 it gives the same error. >>>> >>>> >>>> Any help is much appreciated. >>>> >>>> Thanks, >>>> Anurag >>>> >>>> >>>> -- >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "theano-users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
