The multi-gpu getting used by one process was fixed. I forgot if the fix is in THeano 0.9 or the development version.
I would recommand to use the new gpu back-end: https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29 You can use theano.gpuarray.use('cuda0') similar to the old back-end. You can also use this linux shell trick: THEANO_FLAGS=device=cuda0 python ... this will create that flag only for the new created process. On Thu, May 4, 2017 at 5:05 AM Richard Hankins <[email protected]> wrote: > Hi, > > Sorry for the late reply. Haven't looked at this in a while. But just > checked my set up and i've got device = cpu in .theanorc. But i'm not > forcing the device. > Checking nvidia-smi both my GPU's are in default compute mode. To select > different devices I use the following: > > import theano.sandbox.cuda > theano.sandbox.cuda.use("gpuX") > > Hope this helps, > > Richard > > On Thu, Apr 27, 2017 at 8:33 PM, anurag kumar <[email protected]> wrote: > >> Is there a final solution to this problem ? I am having similar problem. >> >> Best, >> Anurag >> >> On Sunday, May 1, 2016 at 9:24:59 AM UTC-4, RHankins wrote: >>> >>> Sorry I meant "One on gpu0 and one on gpu1 (It begins by running a >>> process of gpu1 then starts another on gpu0)". >>> >>> On Sunday, May 1, 2016 at 2:19:58 PM UTC+1, RHankins wrote: >>>> >>>> Right. I know why it was throwing an error when i had device=cpu >>>> because i also had force_device=True. Setting device=cpu and not setting >>>> force_device allows me to select different gpus using >>>> theano.sandbox.cuda.use(). >>>> >>>> But i'm still having the problem with it running on mulitple gpus. If I >>>> select gpu0 (Titan X) it runs a single process on the correct gpu. If I >>>> select gpu1 (GTX 980) to run exactly the same code it runs 2 processes. One >>>> on gpu0 and one on gpu1 (It begins by running a process of gpu0 then starts >>>> another on gpu1). It doesn't matter if they are run simultaneously or not. >>>> Or if when they are run simultaneously, something was already running on >>>> gpu0 or gpu1. Should I use nvidia-smi to force the code to run on a single >>>> gpu using >>>> >>>> nvidia-smi −−compute−mode=EXCLUSIVE_PROCESS? >>>> >>>> >>>> My only concern is if I want to run different programs at the same time >>>> I will end up having mutiple processes running on the same gpu. So will >>>> they interfer with each other if they are importing the same modules? Is it >>>> okay to run multiple processes on the same gpu? Will it effect the results? >>>> Or does it not matter? >>>> >>>> Cheers, >>>> >>>> R >>>> >>>> On Friday, April 29, 2016 at 9:55:33 PM UTC+1, nouiz wrote: >>>>> >>>>> You can ignore those 2 errrors. It is just that those test seem too >>>>> sensitive. >>>>> >>>>> If you set a device in your theanorc file that isn't 'cpu' and call >>>>> use() on another one, it is normal that Theano don't like this, as only 1 >>>>> GPU is supported in the current back-end. The new one support multiple >>>>> GPU. >>>>> >>>>> Does it work if you try to use gpu0? Does something was already >>>>> running on gpu1? >>>>> >>>>> Fred >>>>> >>>>> On Fri, Apr 29, 2016 at 11:52 AM, RHankins <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Fred, >>>>>> >>>>>> Thanks for your response. I'm using cuda back-end so I didn't install >>>>>> libgpuarray. Or am I supposed to install libgpuarray as well? When I say >>>>>> tests, I just mean testing some new code out not running theano tests (as >>>>>> in nose tests). >>>>>> >>>>>> I'm using Lasagne as I saw that you suggested to someone else to use >>>>>> >>>>>> import theano.sandbox.cuda >>>>>> >>>>>> theano.sandbox.cuda.use("gpu1") >>>>>> >>>>>> >>>>>> If in .theanorc device = gpu0 I get the following message >>>>>> >>>>>> WARNING (theano.sandbox.cuda): Ignoring call to use(1), GPU number 0 >>>>>> is already in use. >>>>>> >>>>>> >>>>>> If in .theanorc device = cpu I get the following message >>>>>> >>>>>> WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu1 is >>>>>> not available (error: cuda unavailable) >>>>>> >>>>>> >>>>>> I updated Theano and Lasagne to the lastest versions - (0.9.0dev0) >>>>>> and (0.2.dev1) respectively but I've still got the same problem. But in >>>>>> addition to this now when I run theano.test() it won't pass. I get the >>>>>> following errors >>>>>> >>>>>> >>>>>> ====================================================================== >>>>>> ERROR: test_grad (theano.tensor.tests.test_basic.ArctanhInplaceTester) >>>>>> ---------------------------------------------------------------------- >>>>>> Traceback (most recent call last): >>>>>> File >>>>>> "/usr/local/lib/python2.7/dist-packages/theano/tensor/tests/test_basic.py", >>>>>> line 483, in test_grad >>>>>> eps=_grad_eps) >>>>>> File >>>>>> "/usr/local/lib/python2.7/dist-packages/theano/tests/unittest_tools.py", >>>>>> line 91, in verify_grad >>>>>> T.verify_grad(op, pt, n_tests, rng, *args, **kwargs) >>>>>> File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", >>>>>> line 1709, in verify_grad >>>>>> abs_tol, rel_tol) >>>>>> GradientError: GradientError: numeric gradient and analytic gradient >>>>>> exceed tolerance: >>>>>> At position 4 of argument 0, >>>>>> abs. error = 3.537018, abs. tolerance = 0.010000 >>>>>> rel. error = 0.013429, rel. tolerance = 0.010000 >>>>>> Exception args: >>>>>> The error happened with the following inputs:, [array([[ 0.28898013, >>>>>> 0.98691875, -0.37341487], >>>>>> [-0.83661169, -0.99454761, -0.57619613]], dtype=float32)], >>>>>> The value of eps is:, None, >>>>>> The out_type is:, None, Test arctanh_inplace::normal: Error occurred >>>>>> while computing the gradient on the following inputs: [array([[ >>>>>> 0.28898013, >>>>>> 0.98691875, -0.37341487], >>>>>> [-0.83661169, -0.99454761, -0.57619613]], dtype=float32)] >>>>>> >>>>>> ====================================================================== >>>>>> ERROR: test_grad (theano.tensor.tests.test_basic.ArctanhTester) >>>>>> ---------------------------------------------------------------------- >>>>>> Traceback (most recent call last): >>>>>> File >>>>>> "/usr/local/lib/python2.7/dist-packages/theano/tensor/tests/test_basic.py", >>>>>> line 483, in test_grad >>>>>> eps=_grad_eps) >>>>>> File >>>>>> "/usr/local/lib/python2.7/dist-packages/theano/tests/unittest_tools.py", >>>>>> line 91, in verify_grad >>>>>> T.verify_grad(op, pt, n_tests, rng, *args, **kwargs) >>>>>> File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", >>>>>> line 1709, in verify_grad >>>>>> abs_tol, rel_tol) >>>>>> GradientError: GradientError: numeric gradient and analytic gradient >>>>>> exceed tolerance: >>>>>> At position 4 of argument 0, >>>>>> abs. error = 3.537018, abs. tolerance = 0.010000 >>>>>> rel. error = 0.013429, rel. tolerance = 0.010000 >>>>>> Exception args: >>>>>> The error happened with the following inputs:, [array([[ 0.28898013, >>>>>> 0.98691875, -0.37341487], >>>>>> [-0.83661169, -0.99454761, -0.57619613]], dtype=float32)], >>>>>> The value of eps is:, None, >>>>>> The out_type is:, None, Test Elemwise{arctanh,no_inplace}::normal: >>>>>> Error occurred while computing the gradient on the following inputs: >>>>>> [array([[ 0.28898013, 0.98691875, -0.37341487], >>>>>> [-0.83661169, -0.99454761, -0.57619613]], dtype=float32)] >>>>>> >>>>>> ---------------------------------------------------------------------- >>>>>> Ran 3028 tests in 1688.020s >>>>>> >>>>>> FAILED (SKIP=108, errors=2) >>>>>> >>>>>> >>>>>> On Thursday, April 28, 2016 at 2:54:36 AM UTC+1, nouiz wrote: >>>>>>> >>>>>>> Did you install the new gpu back-end libgpuarray? If so, we know >>>>>>> there is a problem that you describe like this, but I only saw it in >>>>>>> Theano >>>>>>> tests. When you mean tests, do you mean running your own job test or >>>>>>> Theano >>>>>>> tests? >>>>>>> >>>>>>> Fred >>>>>>> >>>>>>> On Wed, Apr 27, 2016 at 12:34 PM, RHankins <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Update: Even with no experiment running on gpu0. Running some test >>>>>>>> code with gpu1 selected as the default device in .theanorc, according >>>>>>>> to >>>>>>>> nvidia-smi, it is still launches two seperate processes on gpu0 and >>>>>>>> gpu1? >>>>>>>> >>>>>>>> Any thoughts? Appreciate everyones help. >>>>>>>> >>>>>>>> Richard >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Monday, April 25, 2016 at 9:54:59 PM UTC+1, RHankins wrote: >>>>>>>>> >>>>>>>>> Hi guys, >>>>>>>>> >>>>>>>>> I have two gpus and want to be able to run different processes in >>>>>>>>> each one so I can experiment with different model parameters etc. I am >>>>>>>>> currently running experiments on gpu0 whilst testing out new code >>>>>>>>> using >>>>>>>>> gpu1. Gpu1 is selected as the default device in .theanorc. When I >>>>>>>>> want to >>>>>>>>> run experiments on gpu0 I've been using the following code in my >>>>>>>>> programs. >>>>>>>>> >>>>>>>>> os.environ["THEANO_FLAGS"]="device=gpu0" >>>>>>>>> import theano >>>>>>>>> >>>>>>>>> I thought this was working. However, whilst inspecting nvidia-smi >>>>>>>>> recently I noticed that when I started testing some new code on gpu0 >>>>>>>>> it >>>>>>>>> started running processes on both gpu0 and gpu1. An experiment was >>>>>>>>> already >>>>>>>>> running on gpu0. And both the code for the experiment and the test >>>>>>>>> code >>>>>>>>> import shared modules which also import theano. >>>>>>>>> >>>>>>>>> Am I selecting the gpus in the wrong manner? Also since it appears >>>>>>>>> that the test code was running on both gpus would it invalidate the >>>>>>>>> results >>>>>>>>> of the experiment? Would they interfere with each other? >>>>>>>>> >>>>>>>>> Thanks in advance. >>>>>>>>> >>>>>>>>> -- >>>>>>>> >>>>>>>> --- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "theano-users" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> >>>>>>> -- >>>>>> >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "theano-users" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >> >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "theano-users" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/theano-users/l9FlhYIiWMo/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> [email protected]. > > >> For more options, visit https://groups.google.com/d/optout. >> > -- > > --- > You received this message because you are subscribed to the Google Groups > "theano-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
