Is there a final solution to this problem ? I am having similar problem. Best, Anurag
On Sunday, May 1, 2016 at 9:24:59 AM UTC-4, RHankins wrote: > > Sorry I meant "One on gpu0 and one on gpu1 (It begins by running a process > of gpu1 then starts another on gpu0)". > > On Sunday, May 1, 2016 at 2:19:58 PM UTC+1, RHankins wrote: >> >> Right. I know why it was throwing an error when i had device=cpu because >> i also had force_device=True. Setting device=cpu and not setting >> force_device allows me to select different gpus using >> theano.sandbox.cuda.use(). >> >> But i'm still having the problem with it running on mulitple gpus. If I >> select gpu0 (Titan X) it runs a single process on the correct gpu. If I >> select gpu1 (GTX 980) to run exactly the same code it runs 2 processes. One >> on gpu0 and one on gpu1 (It begins by running a process of gpu0 then starts >> another on gpu1). It doesn't matter if they are run simultaneously or not. >> Or if when they are run simultaneously, something was already running on >> gpu0 or gpu1. Should I use nvidia-smi to force the code to run on a single >> gpu using >> >> nvidia-smi −−compute−mode=EXCLUSIVE_PROCESS? >> >> >> My only concern is if I want to run different programs at the same time I >> will end up having mutiple processes running on the same gpu. So will they >> interfer with each other if they are importing the same modules? Is it okay >> to run multiple processes on the same gpu? Will it effect the results? Or >> does it not matter? >> >> Cheers, >> >> R >> >> On Friday, April 29, 2016 at 9:55:33 PM UTC+1, nouiz wrote: >>> >>> You can ignore those 2 errrors. It is just that those test seem too >>> sensitive. >>> >>> If you set a device in your theanorc file that isn't 'cpu' and call >>> use() on another one, it is normal that Theano don't like this, as only 1 >>> GPU is supported in the current back-end. The new one support multiple GPU. >>> >>> Does it work if you try to use gpu0? Does something was already running >>> on gpu1? >>> >>> Fred >>> >>> On Fri, Apr 29, 2016 at 11:52 AM, RHankins <[email protected]> >>> wrote: >>> >>>> Hi Fred, >>>> >>>> Thanks for your response. I'm using cuda back-end so I didn't install >>>> libgpuarray. Or am I supposed to install libgpuarray as well? When I say >>>> tests, I just mean testing some new code out not running theano tests (as >>>> in nose tests). >>>> >>>> I'm using Lasagne as I saw that you suggested to someone else to use >>>> >>>> import theano.sandbox.cuda >>>> >>>> theano.sandbox.cuda.use("gpu1") >>>> >>>> >>>> If in .theanorc device = gpu0 I get the following message >>>> >>>> WARNING (theano.sandbox.cuda): Ignoring call to use(1), GPU number 0 is >>>> already in use. >>>> >>>> >>>> If in .theanorc device = cpu I get the following message >>>> >>>> WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu1 is >>>> not available (error: cuda unavailable) >>>> >>>> >>>> I updated Theano and Lasagne to the lastest versions - (0.9.0dev0) and >>>> (0.2.dev1) respectively but I've still got the same problem. But in >>>> addition to this now when I run theano.test() it won't pass. I get the >>>> following errors >>>> >>>> >>>> ====================================================================== >>>> ERROR: test_grad (theano.tensor.tests.test_basic.ArctanhInplaceTester) >>>> ---------------------------------------------------------------------- >>>> Traceback (most recent call last): >>>> File >>>> "/usr/local/lib/python2.7/dist-packages/theano/tensor/tests/test_basic.py", >>>> >>>> line 483, in test_grad >>>> eps=_grad_eps) >>>> File >>>> "/usr/local/lib/python2.7/dist-packages/theano/tests/unittest_tools.py", >>>> line 91, in verify_grad >>>> T.verify_grad(op, pt, n_tests, rng, *args, **kwargs) >>>> File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", line >>>> 1709, in verify_grad >>>> abs_tol, rel_tol) >>>> GradientError: GradientError: numeric gradient and analytic gradient >>>> exceed tolerance: >>>> At position 4 of argument 0, >>>> abs. error = 3.537018, abs. tolerance = 0.010000 >>>> rel. error = 0.013429, rel. tolerance = 0.010000 >>>> Exception args: >>>> The error happened with the following inputs:, [array([[ 0.28898013, >>>> 0.98691875, -0.37341487], >>>> [-0.83661169, -0.99454761, -0.57619613]], dtype=float32)], >>>> The value of eps is:, None, >>>> The out_type is:, None, Test arctanh_inplace::normal: Error occurred >>>> while computing the gradient on the following inputs: [array([[ >>>> 0.28898013, >>>> 0.98691875, -0.37341487], >>>> [-0.83661169, -0.99454761, -0.57619613]], dtype=float32)] >>>> >>>> ====================================================================== >>>> ERROR: test_grad (theano.tensor.tests.test_basic.ArctanhTester) >>>> ---------------------------------------------------------------------- >>>> Traceback (most recent call last): >>>> File >>>> "/usr/local/lib/python2.7/dist-packages/theano/tensor/tests/test_basic.py", >>>> >>>> line 483, in test_grad >>>> eps=_grad_eps) >>>> File >>>> "/usr/local/lib/python2.7/dist-packages/theano/tests/unittest_tools.py", >>>> line 91, in verify_grad >>>> T.verify_grad(op, pt, n_tests, rng, *args, **kwargs) >>>> File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py", line >>>> 1709, in verify_grad >>>> abs_tol, rel_tol) >>>> GradientError: GradientError: numeric gradient and analytic gradient >>>> exceed tolerance: >>>> At position 4 of argument 0, >>>> abs. error = 3.537018, abs. tolerance = 0.010000 >>>> rel. error = 0.013429, rel. tolerance = 0.010000 >>>> Exception args: >>>> The error happened with the following inputs:, [array([[ 0.28898013, >>>> 0.98691875, -0.37341487], >>>> [-0.83661169, -0.99454761, -0.57619613]], dtype=float32)], >>>> The value of eps is:, None, >>>> The out_type is:, None, Test Elemwise{arctanh,no_inplace}::normal: >>>> Error occurred while computing the gradient on the following inputs: >>>> [array([[ 0.28898013, 0.98691875, -0.37341487], >>>> [-0.83661169, -0.99454761, -0.57619613]], dtype=float32)] >>>> >>>> ---------------------------------------------------------------------- >>>> Ran 3028 tests in 1688.020s >>>> >>>> FAILED (SKIP=108, errors=2) >>>> >>>> >>>> On Thursday, April 28, 2016 at 2:54:36 AM UTC+1, nouiz wrote: >>>>> >>>>> Did you install the new gpu back-end libgpuarray? If so, we know there >>>>> is a problem that you describe like this, but I only saw it in Theano >>>>> tests. When you mean tests, do you mean running your own job test or >>>>> Theano >>>>> tests? >>>>> >>>>> Fred >>>>> >>>>> On Wed, Apr 27, 2016 at 12:34 PM, RHankins <[email protected]> >>>>> wrote: >>>>> >>>>>> Update: Even with no experiment running on gpu0. Running some test >>>>>> code with gpu1 selected as the default device in .theanorc, according to >>>>>> nvidia-smi, it is still launches two seperate processes on gpu0 and gpu1? >>>>>> >>>>>> Any thoughts? Appreciate everyones help. >>>>>> >>>>>> Richard >>>>>> >>>>>> >>>>>> >>>>>> On Monday, April 25, 2016 at 9:54:59 PM UTC+1, RHankins wrote: >>>>>>> >>>>>>> Hi guys, >>>>>>> >>>>>>> I have two gpus and want to be able to run different processes in >>>>>>> each one so I can experiment with different model parameters etc. I am >>>>>>> currently running experiments on gpu0 whilst testing out new code using >>>>>>> gpu1. Gpu1 is selected as the default device in .theanorc. When I want >>>>>>> to >>>>>>> run experiments on gpu0 I've been using the following code in my >>>>>>> programs. >>>>>>> >>>>>>> os.environ["THEANO_FLAGS"]="device=gpu0" >>>>>>> import theano >>>>>>> >>>>>>> I thought this was working. However, whilst inspecting nvidia-smi >>>>>>> recently I noticed that when I started testing some new code on gpu0 it >>>>>>> started running processes on both gpu0 and gpu1. An experiment was >>>>>>> already >>>>>>> running on gpu0. And both the code for the experiment and the test code >>>>>>> import shared modules which also import theano. >>>>>>> >>>>>>> Am I selecting the gpus in the wrong manner? Also since it appears >>>>>>> that the test code was running on both gpus would it invalidate the >>>>>>> results >>>>>>> of the experiment? Would they interfere with each other? >>>>>>> >>>>>>> Thanks in advance. >>>>>>> >>>>>>> -- >>>>>> >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "theano-users" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>> >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "theano-users" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
