[theano-users] Re: Force operation to run on CPU, not GPU

Ramana Subramanyam Tue, 03 Jan 2017 05:42:00 -0800

Hi, 
>From what I know, I can tell of two ways. You can change the context name 
to something other than cuda so that the graph will not be moved to the 
GPU. Else, you can change the type to complex. I am aware of only these two 
ways which are hack. When I know more, I'll post a better way/ Fred will.


Ramana

On Wednesday, December 28, 2016 at 7:20:37 PM UTC+1, justHeuristic wrote:
>
> Hello!
>
> I happen to have a large shared variable where the data is stored, then i 
> take sample of that data and do some heavy computation on GPU.
>
> Is it possible to make theano run some operations (sample from shared in 
> RAM) on CPU and others (heavy computations) on GPU?
>
> By default, theano just sends everything onto GPU which causes memoryerror.
>
> The simplified example looks like this:
>
> #use gpu nvidia geforce 1080 gtx
> import os
> os.environ["THEANO_FLAGS"]="device=gpu"
>
> import theano
> import theano.tensor as T
> import numpy as np
>
> rng = T.shared_randomstreams.RandomStreams(123415)
>
> #initialize shared with a LOT of data (since we are on CPU). If you have 
> >8gb GPU memory, just add 1/3 more.
> large_shared = 
> T._shared(np.repeat(np.random.randn(10,20,110,70).astype('float32'),5000,axis=0))
>
>
> #transfer to cpu again just to make sure
> large_shared = large_shared.transfer('cpu')
>
> #sample random integers for indices. Transform them to cpu to make sure
> sample_ix = 
> rng.random_integers([100],0,large_shared.shape[0]).transfer('cpu')
>
> #draw sample
> sample = large_shared[sample_ix]
>
> #compile function that operates on sample.
> #for simplicity, function is just a sum of squares
> f = theano.function([],T.sum(sample**2),)
>
> print theano.printing.debugprint(f)
>
> #and run it
> f()
>
>
> and the output:
>
> HostFromGpu [id A] ''   5
>  |GpuCAReduce{pre=sqr,red=add}{1,1,1,1} [id B] ''   4
>    |GpuAdvancedSubtensor1 [id C] ''   3
>      |GpuFromHost [id D] ''   1
>      | |<TensorType(float32, 4D)> [id E]
>      |RandomFunction{random_integers_helper}.1 [id F] ''   2
>        |<RandomStateType> [id G]
>        |TensorConstant{(1,) of 100} [id H]
>        |TensorConstant{0} [id I]
>        |Shape_i{0} [id J] ''   0
>          |<TensorType(float32, 4D)> [id E]
> RandomFunction{random_integers_helper}.0 [id F] ''   2
> None
>
> ---------------------------------------------------------------------------MemoryError
>                                Traceback (most recent call 
> last)<ipython-input-1-5e2da2a1b28a> in <module>()     28      29 #and run 
> it---> 30 f()
> /anaconda3/envs/py27/lib/python2.7/site-packages/theano/compile/function_module.pyc
>  in __call__(self, *args, **kwargs)    884                     
> node=self.fn.nodes[self.fn.position_of_error],    885                     
> thunk=thunk,--> 886                     storage_map=getattr(self.fn, 
> 'storage_map', None))    887             else:    888                 # 
> old-style linkers raise their own exceptions
> /anaconda3/envs/py27/lib/python2.7/site-packages/theano/gof/link.pyc in 
> raise_with_op(node, thunk, exc_info, storage_map)    323         # extra long 
> error message in that case.    324         pass--> 325     reraise(exc_type, 
> exc_value, exc_trace)    326     327 
> /anaconda3/envs/py27/lib/python2.7/site-packages/theano/compile/function_module.pyc
>  in __call__(self, *args, **kwargs)    871         try:    872             
> outputs =\--> 873                 self.fn() if output_subset is None else\    
> 874                 self.fn(output_subset=output_subset)    875         
> except Exception:
> MemoryError: Error allocating 30800000000 bytes of device memory 
> (CNMEM_STATUS_OUT_OF_MEMORY).
> Apply node that caused the error: GpuFromHost(<TensorType(float32, 4D)>)
> Toposort index: 1
> Inputs types: [TensorType(float32, 4D)]
> Inputs shapes: [(50000, 20, 110, 70)]
> Inputs strides: [(616000, 30800, 280, 4)]
> Inputs values: ['not shown']
> Outputs clients: [[GpuAdvancedSubtensor1(GpuFromHost.0, 
> RandomFunction{random_integers_helper}.1)]]
>
> HINT: Re-running with most Theano optimization disabled could give you a 
> back-trace of when this node was created. This can be done with by setting 
> the Theano flag 'optimizer=fast_compile'. If that does not work, Theano 
> optimizations can be disabled with 'optimizer=None'.
> HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and 
> storage map footprint of this apply node.
>
> ______________
>
> The large number (308....) is the data tensor (10*20*110*70*5000*4bytes)
>
> The question, again - can i make sampling on CPU and explicitly transfer 
> computation to GPU manually?
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[theano-users] Re: Force operation to run on CPU, not GPU

Reply via email to