Hello!
I happen to have a large shared variable where the data is stored, then i
take sample of that data and do some heavy computation on GPU.
Is it possible to make theano run some operations (sample from shared in
RAM) on CPU and others (heavy computations) on GPU?
By default, theano just sends everything onto GPU which causes memoryerror.
The simplified example looks like this:
#use gpu nvidia geforce 1080 gtx
import os
os.environ["THEANO_FLAGS"]="device=gpu"
import theano
import theano.tensor as T
import numpy as np
rng = T.shared_randomstreams.RandomStreams(123415)
#initialize shared with a LOT of data (since we are on CPU). If you have
>8gb GPU memory, just add 1/3 more.
large_shared =
T._shared(np.repeat(np.random.randn(10,20,110,70).astype('float32'),5000,axis=0))
#transfer to cpu again just to make sure
large_shared = large_shared.transfer('cpu')
#sample random integers for indices. Transform them to cpu to make sure
sample_ix =
rng.random_integers([100],0,large_shared.shape[0]).transfer('cpu')
#draw sample
sample = large_shared[sample_ix]
#compile function that operates on sample.
#for simplicity, function is just a sum of squares
f = theano.function([],T.sum(sample**2),)
print theano.printing.debugprint(f)
#and run it
f()
and the output:
HostFromGpu [id A] '' 5
|GpuCAReduce{pre=sqr,red=add}{1,1,1,1} [id B] '' 4
|GpuAdvancedSubtensor1 [id C] '' 3
|GpuFromHost [id D] '' 1
| |<TensorType(float32, 4D)> [id E]
|RandomFunction{random_integers_helper}.1 [id F] '' 2
|<RandomStateType> [id G]
|TensorConstant{(1,) of 100} [id H]
|TensorConstant{0} [id I]
|Shape_i{0} [id J] '' 0
|<TensorType(float32, 4D)> [id E]
RandomFunction{random_integers_helper}.0 [id F] '' 2
None
---------------------------------------------------------------------------MemoryError
Traceback (most recent call
last)<ipython-input-1-5e2da2a1b28a> in <module>() 28 29 #and run
it---> 30 f()
/anaconda3/envs/py27/lib/python2.7/site-packages/theano/compile/function_module.pyc
in __call__(self, *args, **kwargs) 884
node=self.fn.nodes[self.fn.position_of_error], 885
thunk=thunk,--> 886 storage_map=getattr(self.fn,
'storage_map', None)) 887 else: 888 #
old-style linkers raise their own exceptions
/anaconda3/envs/py27/lib/python2.7/site-packages/theano/gof/link.pyc in
raise_with_op(node, thunk, exc_info, storage_map) 323 # extra long
error message in that case. 324 pass--> 325 reraise(exc_type,
exc_value, exc_trace) 326 327
/anaconda3/envs/py27/lib/python2.7/site-packages/theano/compile/function_module.pyc
in __call__(self, *args, **kwargs) 871 try: 872
outputs =\--> 873 self.fn() if output_subset is None else\
874 self.fn(output_subset=output_subset) 875 except
Exception:
MemoryError: Error allocating 30800000000 bytes of device memory
(CNMEM_STATUS_OUT_OF_MEMORY).
Apply node that caused the error: GpuFromHost(<TensorType(float32, 4D)>)
Toposort index: 1
Inputs types: [TensorType(float32, 4D)]
Inputs shapes: [(50000, 20, 110, 70)]
Inputs strides: [(616000, 30800, 280, 4)]
Inputs values: ['not shown']
Outputs clients: [[GpuAdvancedSubtensor1(GpuFromHost.0,
RandomFunction{random_integers_helper}.1)]]
HINT: Re-running with most Theano optimization disabled could give you a
back-trace of when this node was created. This can be done with by setting the
Theano flag 'optimizer=fast_compile'. If that does not work, Theano
optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and
storage map footprint of this apply node.
______________
The large number (308....) is the data tensor (10*20*110*70*5000*4bytes)
The question, again - can i make sampling on CPU and explicitly transfer
computation to GPU manually?
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.