[theano-users] Force operation to run on CPU, not GPU

justHeuristic Wed, 28 Dec 2016 10:20:55 -0800

Hello!

I happen to have a large shared variable where the data is stored, then i 
take sample of that data and do some heavy computation on GPU.


Is it possible to make theano run some operations (sample from shared in 
RAM) on CPU and others (heavy computations) on GPU?

By default, theano just sends everything onto GPU which causes memoryerror.

The simplified example looks like this:

#use gpu nvidia geforce 1080 gtx
import os
os.environ["THEANO_FLAGS"]="device=gpu"

import theano
import theano.tensor as T
import numpy as np

rng = T.shared_randomstreams.RandomStreams(123415)

#initialize shared with a LOT of data (since we are on CPU). If you have 
>8gb GPU memory, just add 1/3 more.
large_shared = 
T._shared(np.repeat(np.random.randn(10,20,110,70).astype('float32'),5000,axis=0))


#transfer to cpu again just to make sure
large_shared = large_shared.transfer('cpu')

#sample random integers for indices. Transform them to cpu to make sure
sample_ix = 
rng.random_integers([100],0,large_shared.shape[0]).transfer('cpu')

#draw sample
sample = large_shared[sample_ix]

#compile function that operates on sample.
#for simplicity, function is just a sum of squares
f = theano.function([],T.sum(sample**2),)

print theano.printing.debugprint(f)

#and run it
f()


and the output:

HostFromGpu [id A] ''   5
 |GpuCAReduce{pre=sqr,red=add}{1,1,1,1} [id B] ''   4
   |GpuAdvancedSubtensor1 [id C] ''   3
     |GpuFromHost [id D] ''   1
     | |<TensorType(float32, 4D)> [id E]
     |RandomFunction{random_integers_helper}.1 [id F] ''   2
       |<RandomStateType> [id G]
       |TensorConstant{(1,) of 100} [id H]
       |TensorConstant{0} [id I]
       |Shape_i{0} [id J] ''   0
         |<TensorType(float32, 4D)> [id E]
RandomFunction{random_integers_helper}.0 [id F] ''   2
None

---------------------------------------------------------------------------MemoryError
                               Traceback (most recent call 
last)<ipython-input-1-5e2da2a1b28a> in <module>()     28      29 #and run 
it---> 30 f()
/anaconda3/envs/py27/lib/python2.7/site-packages/theano/compile/function_module.pyc
 in __call__(self, *args, **kwargs)    884                     
node=self.fn.nodes[self.fn.position_of_error],    885                     
thunk=thunk,--> 886                     storage_map=getattr(self.fn, 
'storage_map', None))    887             else:    888                 # 
old-style linkers raise their own exceptions
/anaconda3/envs/py27/lib/python2.7/site-packages/theano/gof/link.pyc in 
raise_with_op(node, thunk, exc_info, storage_map)    323         # extra long 
error message in that case.    324         pass--> 325     reraise(exc_type, 
exc_value, exc_trace)    326     327 
/anaconda3/envs/py27/lib/python2.7/site-packages/theano/compile/function_module.pyc
 in __call__(self, *args, **kwargs)    871         try:    872             
outputs =\--> 873                 self.fn() if output_subset is None else\    
874                 self.fn(output_subset=output_subset)    875         except 
Exception:
MemoryError: Error allocating 30800000000 bytes of device memory 
(CNMEM_STATUS_OUT_OF_MEMORY).
Apply node that caused the error: GpuFromHost(<TensorType(float32, 4D)>)
Toposort index: 1
Inputs types: [TensorType(float32, 4D)]
Inputs shapes: [(50000, 20, 110, 70)]
Inputs strides: [(616000, 30800, 280, 4)]
Inputs values: ['not shown']
Outputs clients: [[GpuAdvancedSubtensor1(GpuFromHost.0, 
RandomFunction{random_integers_helper}.1)]]

HINT: Re-running with most Theano optimization disabled could give you a 
back-trace of when this node was created. This can be done with by setting the 
Theano flag 'optimizer=fast_compile'. If that does not work, Theano 
optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and 
storage map footprint of this apply node.

______________

The large number (308....) is the data tensor (10*20*110*70*5000*4bytes)

The question, again - can i make sampling on CPU and explicitly transfer 
computation to GPU manually?

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[theano-users] Force operation to run on CPU, not GPU

Reply via email to