Re: [theano-users] Re: Different processes on different gpus

Frédéric Bastien Fri, 05 May 2017 09:50:29 -0700

The multi-gpu getting used by one process was fixed. I forgot if the fix is
in THeano 0.9 or the development version.


I would recommand to use the new gpu back-end:

https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

You can use theano.gpuarray.use('cuda0') similar to the old back-end.

You can also use this linux shell trick:

THEANO_FLAGS=device=cuda0 python ...

this will create that flag only for the new created process.

On Thu, May 4, 2017 at 5:05 AM Richard Hankins <[email protected]>
wrote:

> Hi,
>
> Sorry for the late reply. Haven't looked at this in a while. But just
> checked my set up and i've got device = cpu in .theanorc. But i'm not
> forcing the device.
> Checking nvidia-smi both my GPU's are in default compute mode. To select
> different devices I use the following:
>
> import theano.sandbox.cuda
> theano.sandbox.cuda.use("gpuX")
>
> Hope this helps,
>
> Richard
>
> On Thu, Apr 27, 2017 at 8:33 PM, anurag kumar <[email protected]> wrote:
>
>> Is there a final solution to this problem ? I am having similar problem.
>>
>> Best,
>> Anurag
>>
>> On Sunday, May 1, 2016 at 9:24:59 AM UTC-4, RHankins wrote:
>>>
>>> Sorry I meant "One on gpu0 and one on gpu1 (It begins by running a
>>> process of gpu1 then starts another on gpu0)".
>>>
>>> On Sunday, May 1, 2016 at 2:19:58 PM UTC+1, RHankins wrote:
>>>>
>>>> Right. I know why it was throwing an error when i had device=cpu
>>>> because i also had force_device=True. Setting device=cpu and not setting
>>>> force_device allows me to select different gpus using
>>>> theano.sandbox.cuda.use().
>>>>
>>>> But i'm still having the problem with it running on mulitple gpus. If I
>>>> select gpu0 (Titan X) it runs a single process on the correct gpu. If I
>>>> select gpu1 (GTX 980) to run exactly the same code it runs 2 processes. One
>>>> on gpu0 and one on gpu1 (It begins by running a process of gpu0 then starts
>>>> another on gpu1). It doesn't matter if they are run simultaneously or not.
>>>> Or if when they are run simultaneously, something was already running on
>>>> gpu0 or gpu1. Should I use nvidia-smi to force the code to run on a single
>>>> gpu using
>>>>
>>>> nvidia-smi  −−compute−mode=EXCLUSIVE_PROCESS?
>>>>
>>>>
>>>> My only concern is if I want to run different programs at the same time
>>>> I will end up having mutiple processes running on the same gpu. So will
>>>> they interfer with each other if they are importing the same modules? Is it
>>>> okay to run multiple processes on the same gpu? Will it effect the results?
>>>> Or does it not matter?
>>>>
>>>> Cheers,
>>>>
>>>> R
>>>>
>>>> On Friday, April 29, 2016 at 9:55:33 PM UTC+1, nouiz wrote:
>>>>>
>>>>> You can ignore those 2 errrors. It is just that those test seem too
>>>>> sensitive.
>>>>>
>>>>> If you set a device in your theanorc file that isn't 'cpu' and call
>>>>> use() on another one, it is normal that Theano don't like this, as only 1
>>>>> GPU is supported in the current back-end. The new one support multiple 
>>>>> GPU.
>>>>>
>>>>> Does it work if you try to use gpu0? Does something was already
>>>>> running on gpu1?
>>>>>
>>>>> Fred
>>>>>
>>>>> On Fri, Apr 29, 2016 at 11:52 AM, RHankins <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Fred,
>>>>>>
>>>>>> Thanks for your response. I'm using cuda back-end so I didn't install
>>>>>> libgpuarray. Or am I supposed to install libgpuarray as well? When I say
>>>>>> tests, I just mean testing some new code out not running theano tests (as
>>>>>> in nose tests).
>>>>>>
>>>>>> I'm using Lasagne as I saw that you suggested to someone else to use
>>>>>>
>>>>>> import theano.sandbox.cuda
>>>>>>
>>>>>> theano.sandbox.cuda.use("gpu1")
>>>>>>
>>>>>>
>>>>>> If in .theanorc device = gpu0 I get the following message
>>>>>>
>>>>>> WARNING (theano.sandbox.cuda): Ignoring call to use(1), GPU number 0
>>>>>> is already in use.
>>>>>>
>>>>>>
>>>>>> If in .theanorc device = cpu I get the following message
>>>>>>
>>>>>> WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu1 is
>>>>>> not available (error: cuda unavailable)
>>>>>>
>>>>>>
>>>>>> I updated Theano and Lasagne to the lastest versions - (0.9.0dev0)
>>>>>> and (0.2.dev1) respectively but I've still got the same problem. But in
>>>>>> addition to this now when I run theano.test() it won't pass. I get the
>>>>>> following errors
>>>>>>
>>>>>>
>>>>>> ======================================================================
>>>>>> ERROR: test_grad (theano.tensor.tests.test_basic.ArctanhInplaceTester)
>>>>>> ----------------------------------------------------------------------
>>>>>> Traceback (most recent call last):
>>>>>> File
>>>>>> "/usr/local/lib/python2.7/dist-packages/theano/tensor/tests/test_basic.py",
>>>>>> line 483, in test_grad
>>>>>> eps=_grad_eps)
>>>>>> File
>>>>>> "/usr/local/lib/python2.7/dist-packages/theano/tests/unittest_tools.py",
>>>>>> line 91, in verify_grad
>>>>>> T.verify_grad(op, pt, n_tests, rng, *args, **kwargs)
>>>>>> File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py",
>>>>>> line 1709, in verify_grad
>>>>>> abs_tol, rel_tol)
>>>>>> GradientError: GradientError: numeric gradient and analytic gradient
>>>>>> exceed tolerance:
>>>>>> At position 4 of argument 0,
>>>>>> abs. error = 3.537018, abs. tolerance = 0.010000
>>>>>> rel. error = 0.013429, rel. tolerance = 0.010000
>>>>>> Exception args:
>>>>>> The error happened with the following inputs:, [array([[ 0.28898013,
>>>>>> 0.98691875, -0.37341487],
>>>>>> [-0.83661169, -0.99454761, -0.57619613]], dtype=float32)],
>>>>>> The value of eps is:, None,
>>>>>> The out_type is:, None, Test arctanh_inplace::normal: Error occurred
>>>>>> while computing the gradient on the following inputs: [array([[ 
>>>>>> 0.28898013,
>>>>>> 0.98691875, -0.37341487],
>>>>>> [-0.83661169, -0.99454761, -0.57619613]], dtype=float32)]
>>>>>>
>>>>>> ======================================================================
>>>>>> ERROR: test_grad (theano.tensor.tests.test_basic.ArctanhTester)
>>>>>> ----------------------------------------------------------------------
>>>>>> Traceback (most recent call last):
>>>>>> File
>>>>>> "/usr/local/lib/python2.7/dist-packages/theano/tensor/tests/test_basic.py",
>>>>>> line 483, in test_grad
>>>>>> eps=_grad_eps)
>>>>>> File
>>>>>> "/usr/local/lib/python2.7/dist-packages/theano/tests/unittest_tools.py",
>>>>>> line 91, in verify_grad
>>>>>> T.verify_grad(op, pt, n_tests, rng, *args, **kwargs)
>>>>>> File "/usr/local/lib/python2.7/dist-packages/theano/gradient.py",
>>>>>> line 1709, in verify_grad
>>>>>> abs_tol, rel_tol)
>>>>>> GradientError: GradientError: numeric gradient and analytic gradient
>>>>>> exceed tolerance:
>>>>>> At position 4 of argument 0,
>>>>>> abs. error = 3.537018, abs. tolerance = 0.010000
>>>>>> rel. error = 0.013429, rel. tolerance = 0.010000
>>>>>> Exception args:
>>>>>> The error happened with the following inputs:, [array([[ 0.28898013,
>>>>>> 0.98691875, -0.37341487],
>>>>>> [-0.83661169, -0.99454761, -0.57619613]], dtype=float32)],
>>>>>> The value of eps is:, None,
>>>>>> The out_type is:, None, Test Elemwise{arctanh,no_inplace}::normal:
>>>>>> Error occurred while computing the gradient on the following inputs:
>>>>>> [array([[ 0.28898013, 0.98691875, -0.37341487],
>>>>>> [-0.83661169, -0.99454761, -0.57619613]], dtype=float32)]
>>>>>>
>>>>>> ----------------------------------------------------------------------
>>>>>> Ran 3028 tests in 1688.020s
>>>>>>
>>>>>> FAILED (SKIP=108, errors=2)
>>>>>>
>>>>>>
>>>>>> On Thursday, April 28, 2016 at 2:54:36 AM UTC+1, nouiz wrote:
>>>>>>>
>>>>>>> Did you install the new gpu back-end libgpuarray? If so, we know
>>>>>>> there is a problem that you describe like this, but I only saw it in 
>>>>>>> Theano
>>>>>>> tests. When you mean tests, do you mean running your own job test or 
>>>>>>> Theano
>>>>>>> tests?
>>>>>>>
>>>>>>> Fred
>>>>>>>
>>>>>>> On Wed, Apr 27, 2016 at 12:34 PM, RHankins <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Update: Even with no experiment running on gpu0. Running some test
>>>>>>>> code with gpu1 selected as the default device in .theanorc, according 
>>>>>>>> to
>>>>>>>> nvidia-smi, it is still launches two seperate processes on gpu0 and 
>>>>>>>> gpu1?
>>>>>>>>
>>>>>>>> Any thoughts? Appreciate everyones help.
>>>>>>>>
>>>>>>>> Richard
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Monday, April 25, 2016 at 9:54:59 PM UTC+1, RHankins wrote:
>>>>>>>>>
>>>>>>>>> Hi guys,
>>>>>>>>>
>>>>>>>>> I have two gpus and want to be able to run different processes in
>>>>>>>>> each one so I can experiment with different model parameters etc. I am
>>>>>>>>> currently running experiments on gpu0 whilst testing out new code 
>>>>>>>>> using
>>>>>>>>> gpu1. Gpu1 is selected as the default device in .theanorc. When I 
>>>>>>>>> want to
>>>>>>>>> run experiments on gpu0 I've been using the following code in my 
>>>>>>>>> programs.
>>>>>>>>>
>>>>>>>>> os.environ["THEANO_FLAGS"]="device=gpu0"
>>>>>>>>> import theano
>>>>>>>>>
>>>>>>>>> I thought this was working. However, whilst inspecting nvidia-smi
>>>>>>>>> recently I noticed that when I started testing some new code on gpu0 
>>>>>>>>> it
>>>>>>>>> started running processes on both gpu0 and gpu1. An experiment was 
>>>>>>>>> already
>>>>>>>>> running on gpu0. And both the code for the experiment and the test 
>>>>>>>>> code
>>>>>>>>> import shared modules which also import theano.
>>>>>>>>>
>>>>>>>>> Am I selecting the gpus in the wrong manner? Also since it appears
>>>>>>>>> that the test code was running on both gpus would it invalidate the 
>>>>>>>>> results
>>>>>>>>> of the experiment? Would they interfere with each other?
>>>>>>>>>
>>>>>>>>> Thanks in advance.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>
>>>>>>>> ---
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "theano-users" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to [email protected].
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>
>>>>>> ---
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "theano-users" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>> --
>>
>> ---
>> You received this message because you are subscribed to a topic in the
>> Google Groups "theano-users" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/theano-users/l9FlhYIiWMo/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>
>
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] Re: Different processes on different gpus

Reply via email to