What happens if you set gpuarray.preallocate to something much smaller, or 
even to -1?

Also, I see the script uses multiprocessing. Weird things happen if new 
Python processes are spawned after the GPU has been initialized. This is a 
limitation of how cuda handles GPU contexts I believe.
The solution would be not to use `device=cuda`, but `device=cpu`, and call 
`theano.gpuarray.use('cuda')` manually in the subprocess, or after all 
processes have been launched.

On Sunday, July 2, 2017 at 3:59:31 PM UTC-4, Daniel Seita wrote:
>
> I am attempting to run some reinforcement learning code on the GPU. (The 
> code is https://github.com/openai/imitation if it matters, running 
> `scripts/run_rl_mj.py`.)
>
> I converted the code to run on float32 by changing the way the data is 
> supplied via numpy. Unfortunately, with the new GPU backend, I am gettting 
> an out of memory error, despite having 12GB of memory on my Titan X Pascal 
> GPU. Here are my settings:
>
> $ cat ~/.theanorc 
> [global] 
> device = cuda 
> floatX = float32 
>
> [gpuarray] 
> preallocate = 1 
>
> [cuda] 
> root = /usr/local/cuda-8.0
>
>
> Theano seems to be importing correctly:
>
> $ ipython
> Python 2.7.13 |Anaconda custom (64-bit)| (default, Dec 20 2016, 23:09:15) 
>  
> Type "copyright", "credits" or "license" for more information. 
> IPython 5.3.0 -- An enhanced Interactive Python. 
> ?         -> Introduction and overview of IPython's features. 
> %quickref -> Quick reference. 
> help      -> Python's own help system. 
> object?   -> Details about 'object', use 'object??' for extra details. 
>
> In [1]: import theano 
> Using cuDNN version 5105 on context None 
> Preallocating 11576/12186 Mb (0.950000) on cuda 
> Mapped name None to device cuda: TITAN X (Pascal) (0000:01:00.0) 
>
> In [2]: 
>
>
>
> Unfortunately, running `python scripts/run_rl_mj.py --env_name 
> CartPole-v0 --log trpo_logs/CartPole-v0` on the very low-dimensional 
> CartPole setting (state space is just four numbers, actions are just one 
> number) gives me (after a bit of a setup):
>
>
> Traceback (most recent call last):
>
>   File "scripts/run_rl_mj.py", line 116, in <module>
>
>     main()
>
>   File "scripts/run_rl_mj.py", line 109, in main
>
>     iter_info = opt.step()
>
>   File "/home/daniel/imitation_noise/policyopt/rl.py", line 280, in step
>
>     cfg=self.sim_cfg)
>
>   File "/home/daniel/imitation_noise/policyopt/__init__.py", line 411, in 
> sim_mp
>
>     traj = job.get()
>
>   File "/home/daniel/anaconda2/lib/python2.7/multiprocessing/pool.py", 
> line 567, in get
>
>     raise self._value
>
> pygpu.gpuarray.GpuArrayException: Out of memory
>
> Apply node that caused the error: GpuFromHost<None>(obsfeat_B_Df)
>
> Toposort index: 4
>
> Inputs types: [TensorType(float32, matrix)]
>
> Inputs shapes: [(1, 4)]
>
> Inputs strides: [(16, 4)]
>
> Inputs values: [array([[ 0.04058,  0.00428,  0.03311, -0.02898]], 
> dtype=float32)]
>
> Outputs clients: [[GpuElemwise{Composite{((i0 - i1) / 
> i2)}}[]<gpuarray>(GpuFromHost<None>.0, 
> /GibbsPolicy/obsnorm/Standardizer/mean_1_D, GpuElemwise{Composite{(i0 + 
> sqrt((i1 * (Composite{(i0 - sqr(i1))}(i2, i3) + Abs(Composite{(i0 - 
> sqr(i1))}(i2, i3))))))}}[]<gpuarray>.0)]]
>
>
> HINT: Re-running with most Theano optimization disabled could give you a 
> back-trace of when this node was created. This can be done with by setting 
> the Theano flag 'optimizer=fast_compile'. If that does not work, Theano 
> optimizations can be disabled with 'optimizer=None'.
>
> HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and 
> storage map footprint of this apply node.
>
> Closing remaining open files:trpo_logs/CartPole-v0...done
>
>
> What I'm confused about is that
>
>    - This happens right at the beginning of the reinforcement learning, 
>    so it's not as if the algorithm has been running a long time and then ran 
>    out of memory.
>    - The input shapes are quite small, (1,4) and (16,4). In addition, the 
>    output is supposed to do normalization and several other element-wise 
>    operations. None of this suggests high memory usage.
>
> I tried `optimizer = fast_compile` and re-ran this, but the error message 
> was actually less informative (it contains a subset of the above error 
> message). Running with `exception_verbosity = high` results in a different 
> error message:
>
>
> Max traj len: 200
>
> Traceback (most recent call last):
>
>   File "scripts/run_rl_mj.py", line 116, in <module>
>
>     main()
>
>   File "scripts/run_rl_mj.py", line 109, in main
>
>     iter_info = opt.step()
>
>   File "/home/daniel/imitation_noise/policyopt/rl.py", line 280, in step
>
>     cfg=self.sim_cfg)
>
>   File "/home/daniel/imitation_noise/policyopt/__init__.py", line 411, in 
> sim_mp
>
>     traj = job.get()
>
>   File "/home/daniel/anaconda2/lib/python2.7/multiprocessing/pool.py", 
> line 567, in get
>
>     raise self._value
>
> pygpu.gpuarray.GpuArrayException: initialization error
>
> Closing remaining open files:trpo_logs/CartPole-v0...done
>
> It somehow didn't even reach the correct point in the code??
>
> I noticed a similar issue here: 
> https://github.com/costapt/vess2ret/issues/5 which seems to suggest that 
> the problem is not limited to just this script. What do you suggest I do? 
> Thanks.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to