[theano-users] Re: Getting "pygpu.gpuarray.GpuArrayException: Out of memory" for a small application
Thanks Pascal. I tried using gpu preallocate 0.01 and 0.1. The run with 0.1, for instance, starts like this: $ python scripts/run_rl_mj.py --env_name CartPole-v0 --log trpo_logs/ CartPole-v0 Using cuDNN version 5105 on context None Preallocating 1218/12186 Mb (0.10) on cuda Mapped name None to device cuda: TITAN X (Pascal) (:01:00.0) But the same error message results: Traceback (most recent call last): File "scripts/run_rl_mj.py", line 116, in main() File "scripts/run_rl_mj.py", line 109, in main iter_info = opt.step() File "/home/daniel/imitation_noise/policyopt/rl.py", line 283, in step cfg=self.sim_cfg) File "/home/daniel/imitation_noise/policyopt/__init__.py", line 425, in sim_mp traj = job.get() File "/home/daniel/anaconda2/lib/python2.7/multiprocessing/pool.py", line 567, in get raise self._value pygpu.gpuarray.GpuArrayException: Out of memory Apply node that caused the error: GpuFromHost(obsfeat_B_Df) Toposort index: 1 Inputs types: [TensorType(float32, matrix)] Inputs shapes: [(1, 4)] Inputs strides: [(16, 4)] Inputs values: [array([[ 0.02563, -0.03082, 0.01663, -0.00558]], dtype= float32)] Outputs clients: [[GpuElemwise{Composite{((i0 - i1) / i2)}}[(0, 0)](GpuFromHost.0, /GibbsPolicy/obsnorm/Standardizer/mean_1_D, GpuElemwise{Composite{(i0 + sqrt((i1 * (Composite{(i0 - sqr(i1))}(i2, i3) + Abs(Composite{(i0 - sqr(i1))}(i2, i3))}}[].0)]] HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node. With -1 as the preallocate, I get this to start: $ python scripts/run_rl_mj.py --env_name CartPole-v0 --log trpo_logs/ CartPole-v0 Using cuDNN version 5105 on context None Disabling allocation cache on cuda I get a similar error message except it's slightly different, with an initialization error, but the same part of the code is running into problems: Traceback (most recent call last): File "scripts/run_rl_mj.py", line 116, in main() File "scripts/run_rl_mj.py", line 109, in main iter_info = opt.step() File "/home/daniel/imitation_noise/policyopt/rl.py", line 283, in step cfg=self.sim_cfg) File "/home/daniel/imitation_noise/policyopt/__init__.py", line 425, in sim_mp traj = job.get() File "/home/daniel/anaconda2/lib/python2.7/multiprocessing/pool.py", line 567, in get raise self._value pygpu.gpuarray.GpuArrayException: initialization error Apply node that caused the error: GpuFromHost(obsfeat_B_Df) Toposort index: 1 Inputs types: [TensorType(float32, matrix)] Inputs shapes: [(1, 4)] Inputs strides: [(16, 4)] Inputs values: [array([[ 0.01357, -0.02611, 0.0341 , 0.0162 ]], dtype= float32)] Outputs clients: [[GpuElemwise{Composite{((i0 - i1) / i2)}}[(0, 0)](GpuFromHost.0, /GibbsPolicy/obsnorm/Standardizer/mean_1_D, GpuElemwise{Composite{(i0 + sqrt((i1 * (Composite{(i0 - sqr(i1))}(i2, i3) + Abs(Composite{(i0 - sqr(i1))}(i2, i3))}}[].0)]] HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node. Yes, the code seems to be using multiprocessing. I will try to see if I can find out how to deal with the multiprocessing, or perhaps just disable it. On Monday, July 3, 2017 at 3:08:44 PM UTC-7, Pascal Lamblin wrote: > > What happens if you set gpuarray.preallocate to something much smaller, or > even to -1? > > Also, I see the script uses multiprocessing. Weird things happen if new > Python processes are spawned after the GPU has been initialized. This is a > limitation of how cuda handles GPU contexts I believe. > The solution would be not to use `device=cuda`, but `device=cpu`, and call > `theano.gpuarray.use('cuda')` manually in the subprocess, or after all > processes have been launched. > > On Sunday, July 2, 2017 at 3:59:31 PM UTC-4, Daniel Seita wrote: >> >> I am attempting to run some reinforcement learning code on the GPU. (The >> code is https://github.com/openai/imitation if it matters, running >> `scripts/run_rl_mj.py`.) >> >> I converted the code to run on float32 by changing the way the data is >> supplied via numpy. Unfortunately, with the new GPU backend, I am gettting >> an out of memory error, despite having 12GB of memory on my Titan X Pascal >> GPU. Here are my settings: >> >> $ cat ~/.theanorc >> [global] >> device = cuda >> floatX = float32 >> >> [gpuarray] >> preallocate = 1 >> >> [cuda] >> root =
[theano-users] Re: Getting "pygpu.gpuarray.GpuArrayException: Out of memory" for a small application
What happens if you set gpuarray.preallocate to something much smaller, or even to -1? Also, I see the script uses multiprocessing. Weird things happen if new Python processes are spawned after the GPU has been initialized. This is a limitation of how cuda handles GPU contexts I believe. The solution would be not to use `device=cuda`, but `device=cpu`, and call `theano.gpuarray.use('cuda')` manually in the subprocess, or after all processes have been launched. On Sunday, July 2, 2017 at 3:59:31 PM UTC-4, Daniel Seita wrote: > > I am attempting to run some reinforcement learning code on the GPU. (The > code is https://github.com/openai/imitation if it matters, running > `scripts/run_rl_mj.py`.) > > I converted the code to run on float32 by changing the way the data is > supplied via numpy. Unfortunately, with the new GPU backend, I am gettting > an out of memory error, despite having 12GB of memory on my Titan X Pascal > GPU. Here are my settings: > > $ cat ~/.theanorc > [global] > device = cuda > floatX = float32 > > [gpuarray] > preallocate = 1 > > [cuda] > root = /usr/local/cuda-8.0 > > > Theano seems to be importing correctly: > > $ ipython > Python 2.7.13 |Anaconda custom (64-bit)| (default, Dec 20 2016, 23:09:15) > > Type "copyright", "credits" or "license" for more information. > IPython 5.3.0 -- An enhanced Interactive Python. > ? -> Introduction and overview of IPython's features. > %quickref -> Quick reference. > help -> Python's own help system. > object? -> Details about 'object', use 'object??' for extra details. > > In [1]: import theano > Using cuDNN version 5105 on context None > Preallocating 11576/12186 Mb (0.95) on cuda > Mapped name None to device cuda: TITAN X (Pascal) (:01:00.0) > > In [2]: > > > > Unfortunately, running `python scripts/run_rl_mj.py --env_name > CartPole-v0 --log trpo_logs/CartPole-v0` on the very low-dimensional > CartPole setting (state space is just four numbers, actions are just one > number) gives me (after a bit of a setup): > > > Traceback (most recent call last): > > File "scripts/run_rl_mj.py", line 116, in > > main() > > File "scripts/run_rl_mj.py", line 109, in main > > iter_info = opt.step() > > File "/home/daniel/imitation_noise/policyopt/rl.py", line 280, in step > > cfg=self.sim_cfg) > > File "/home/daniel/imitation_noise/policyopt/__init__.py", line 411, in > sim_mp > > traj = job.get() > > File "/home/daniel/anaconda2/lib/python2.7/multiprocessing/pool.py", > line 567, in get > > raise self._value > > pygpu.gpuarray.GpuArrayException: Out of memory > > Apply node that caused the error: GpuFromHost(obsfeat_B_Df) > > Toposort index: 4 > > Inputs types: [TensorType(float32, matrix)] > > Inputs shapes: [(1, 4)] > > Inputs strides: [(16, 4)] > > Inputs values: [array([[ 0.04058, 0.00428, 0.03311, -0.02898]], > dtype=float32)] > > Outputs clients: [[GpuElemwise{Composite{((i0 - i1) / > i2)}}[](GpuFromHost.0, > /GibbsPolicy/obsnorm/Standardizer/mean_1_D, GpuElemwise{Composite{(i0 + > sqrt((i1 * (Composite{(i0 - sqr(i1))}(i2, i3) + Abs(Composite{(i0 - > sqr(i1))}(i2, i3))}}[].0)]] > > > HINT: Re-running with most Theano optimization disabled could give you a > back-trace of when this node was created. This can be done with by setting > the Theano flag 'optimizer=fast_compile'. If that does not work, Theano > optimizations can be disabled with 'optimizer=None'. > > HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and > storage map footprint of this apply node. > > Closing remaining open files:trpo_logs/CartPole-v0...done > > > What I'm confused about is that > >- This happens right at the beginning of the reinforcement learning, >so it's not as if the algorithm has been running a long time and then ran >out of memory. >- The input shapes are quite small, (1,4) and (16,4). In addition, the >output is supposed to do normalization and several other element-wise >operations. None of this suggests high memory usage. > > I tried `optimizer = fast_compile` and re-ran this, but the error message > was actually less informative (it contains a subset of the above error > message). Running with `exception_verbosity = high` results in a different > error message: > > > Max traj len: 200 > > Traceback (most recent call last): > > File "scripts/run_rl_mj.py", line 116, in > > main() > > File "scripts/run_rl_mj.py", line 109, in main > > iter_info = opt.step() > > File "/home/daniel/imitation_noise/policyopt/rl.py", line 280, in step > > cfg=self.sim_cfg) > > File "/home/daniel/imitation_noise/policyopt/__init__.py", line 411, in > sim_mp > > traj = job.get() > > File "/home/daniel/anaconda2/lib/python2.7/multiprocessing/pool.py", > line 567, in get > > raise self._value > > pygpu.gpuarray.GpuArrayException: initialization error > > Closing remaining open