[theano-users] Re: Getting "pygpu.gpuarray.GpuArrayException: Out of memory" for a small application

Daniel Seita Mon, 03 Jul 2017 21:52:04 -0700

 

Thanks Pascal.



I tried using gpu preallocate 0.01 and 0.1. The run with 0.1, for instance, 
starts like this:


$ python scripts/run_rl_mj.py --env_name CartPole-v0 --log trpo_logs/
CartPole-v0
Using cuDNN version 5105 on context None 
Preallocating 1218/12186 Mb (0.100000) on cuda 
Mapped name None to device cuda: TITAN X (Pascal) (0000:01:00.0)


But the same error message results:
Traceback (most recent call last):
  File "scripts/run_rl_mj.py", line 116, in <module>
    main()
  File "scripts/run_rl_mj.py", line 109, in main
    iter_info = opt.step()
  File "/home/daniel/imitation_noise/policyopt/rl.py", line 283, in step
    cfg=self.sim_cfg)
  File "/home/daniel/imitation_noise/policyopt/__init__.py", line 425, in 
sim_mp
    traj = job.get()
  File "/home/daniel/anaconda2/lib/python2.7/multiprocessing/pool.py", line 
567, in get
    raise self._value
pygpu.gpuarray.GpuArrayException: Out of memory

Apply node that caused the error: GpuFromHost<None>(obsfeat_B_Df)
Toposort index: 1
Inputs types: [TensorType(float32, matrix)]
Inputs shapes: [(1, 4)]
Inputs strides: [(16, 4)]
Inputs values: [array([[ 0.02563, -0.03082,  0.01663, -0.00558]], dtype=
float32)]
Outputs clients: [[GpuElemwise{Composite{((i0 - i1) / i2)}}[(0, 0)]<gpuarray
>(GpuFromHost<None>.0, /GibbsPolicy/obsnorm/Standardizer/mean_1_D, 
GpuElemwise{Composite{(i0 + sqrt((i1 * (Composite{(i0 - sqr(i1))}(i2, i3) + 
Abs(Composite{(i0 - sqr(i1))}(i2, i3))))))}}[]<gpuarray>.0)]]

HINT: Re-running with most Theano optimization disabled could give you a 
back-trace of when this node was created. This can be done with by setting 
the Theano flag 'optimizer=fast_compile'. If that does not work, Theano 
optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and 
storage map footprint of this apply node.


With -1 as the preallocate, I get this to start:


$ python scripts/run_rl_mj.py --env_name CartPole-v0 --log trpo_logs/
CartPole-v0
Using cuDNN version 5105 on context None
Disabling allocation cache on cuda


I get a similar error message except it's slightly different, with an 
initialization error, but the same part of the code is running into 
problems:


Traceback (most recent call last):
  File "scripts/run_rl_mj.py", line 116, in <module>
    main()
  File "scripts/run_rl_mj.py", line 109, in main
    iter_info = opt.step()
  File "/home/daniel/imitation_noise/policyopt/rl.py", line 283, in step
    cfg=self.sim_cfg)
  File "/home/daniel/imitation_noise/policyopt/__init__.py", line 425, in 
sim_mp
    traj = job.get()
  File "/home/daniel/anaconda2/lib/python2.7/multiprocessing/pool.py", line 
567, in get
    raise self._value
pygpu.gpuarray.GpuArrayException: initialization error

Apply node that caused the error: GpuFromHost<None>(obsfeat_B_Df)
Toposort index: 1
Inputs types: [TensorType(float32, matrix)]
Inputs shapes: [(1, 4)]
Inputs strides: [(16, 4)]
Inputs values: [array([[ 0.01357, -0.02611,  0.0341 ,  0.0162 ]], dtype=
float32)]
Outputs clients: [[GpuElemwise{Composite{((i0 - i1) / i2)}}[(0, 0)]<gpuarray
>(GpuFromHost<None>.0, /GibbsPolicy/obsnorm/Standardizer/mean_1_D, 
GpuElemwise{Composite{(i0 + sqrt((i1 * (Composite{(i0 - sqr(i1))}(i2, i3) + 
Abs(Composite{(i0 - sqr(i1))}(i2, i3))))))}}[]<gpuarray>.0)]]

HINT: Re-running with most Theano optimization disabled could give you a 
back-trace of when this node was created. This can be done with by setting 
the Theano flag 'optimizer=fast_compile'. If that does not work, Theano 
optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and 
storage map footprint of this apply node. 


Yes, the code seems to be using multiprocessing. I will try to see if I can 
find out how to deal with the multiprocessing, or perhaps just disable it.

On Monday, July 3, 2017 at 3:08:44 PM UTC-7, Pascal Lamblin wrote:
>
> What happens if you set gpuarray.preallocate to something much smaller, or 
> even to -1?
>
> Also, I see the script uses multiprocessing. Weird things happen if new 
> Python processes are spawned after the GPU has been initialized. This is a 
> limitation of how cuda handles GPU contexts I believe.
> The solution would be not to use `device=cuda`, but `device=cpu`, and call 
> `theano.gpuarray.use('cuda')` manually in the subprocess, or after all 
> processes have been launched.
>
> On Sunday, July 2, 2017 at 3:59:31 PM UTC-4, Daniel Seita wrote:
>>
>> I am attempting to run some reinforcement learning code on the GPU. (The 
>> code is https://github.com/openai/imitation if it matters, running 
>> `scripts/run_rl_mj.py`.)
>>
>> I converted the code to run on float32 by changing the way the data is 
>> supplied via numpy. Unfortunately, with the new GPU backend, I am gettting 
>> an out of memory error, despite having 12GB of memory on my Titan X Pascal 
>> GPU. Here are my settings:
>>
>> $ cat ~/.theanorc 
>> [global] 
>> device = cuda 
>> floatX = float32 
>>
>> [gpuarray] 
>> preallocate = 1 
>>
>> [cuda] 
>> root = /usr/local/cuda-8.0
>>
>>
>> Theano seems to be importing correctly:
>>
>> $ ipython
>> Python 2.7.13 |Anaconda custom (64-bit)| (default, Dec 20 2016, 23:09:15) 
>>  
>> Type "copyright", "credits" or "license" for more information. 
>> IPython 5.3.0 -- An enhanced Interactive Python. 
>> ?         -> Introduction and overview of IPython's features. 
>> %quickref -> Quick reference. 
>> help      -> Python's own help system. 
>> object?   -> Details about 'object', use 'object??' for extra details. 
>>
>> In [1]: import theano 
>> Using cuDNN version 5105 on context None 
>> Preallocating 11576/12186 Mb (0.950000) on cuda 
>> Mapped name None to device cuda: TITAN X (Pascal) (0000:01:00.0) 
>>
>> In [2]: 
>>
>>
>>
>> Unfortunately, running `python scripts/run_rl_mj.py --env_name 
>> CartPole-v0 --log trpo_logs/CartPole-v0` on the very low-dimensional 
>> CartPole setting (state space is just four numbers, actions are just one 
>> number) gives me (after a bit of a setup):
>>
>>
>> Traceback (most recent call last):
>>
>>   File "scripts/run_rl_mj.py", line 116, in <module>
>>
>>     main()
>>
>>   File "scripts/run_rl_mj.py", line 109, in main
>>
>>     iter_info = opt.step()
>>
>>   File "/home/daniel/imitation_noise/policyopt/rl.py", line 280, in step
>>
>>     cfg=self.sim_cfg)
>>
>>   File "/home/daniel/imitation_noise/policyopt/__init__.py", line 411, 
>> in sim_mp
>>
>>     traj = job.get()
>>
>>   File "/home/daniel/anaconda2/lib/python2.7/multiprocessing/pool.py", 
>> line 567, in get
>>
>>     raise self._value
>>
>> pygpu.gpuarray.GpuArrayException: Out of memory
>>
>> Apply node that caused the error: GpuFromHost<None>(obsfeat_B_Df)
>>
>> Toposort index: 4
>>
>> Inputs types: [TensorType(float32, matrix)]
>>
>> Inputs shapes: [(1, 4)]
>>
>> Inputs strides: [(16, 4)]
>>
>> Inputs values: [array([[ 0.04058,  0.00428,  0.03311, -0.02898]], 
>> dtype=float32)]
>>
>> Outputs clients: [[GpuElemwise{Composite{((i0 - i1) / 
>> i2)}}[]<gpuarray>(GpuFromHost<None>.0, 
>> /GibbsPolicy/obsnorm/Standardizer/mean_1_D, GpuElemwise{Composite{(i0 + 
>> sqrt((i1 * (Composite{(i0 - sqr(i1))}(i2, i3) + Abs(Composite{(i0 - 
>> sqr(i1))}(i2, i3))))))}}[]<gpuarray>.0)]]
>>
>>
>> HINT: Re-running with most Theano optimization disabled could give you a 
>> back-trace of when this node was created. This can be done with by setting 
>> the Theano flag 'optimizer=fast_compile'. If that does not work, Theano 
>> optimizations can be disabled with 'optimizer=None'.
>>
>> HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and 
>> storage map footprint of this apply node.
>>
>> Closing remaining open files:trpo_logs/CartPole-v0...done
>>
>>
>> What I'm confused about is that
>>
>>    - This happens right at the beginning of the reinforcement learning, 
>>    so it's not as if the algorithm has been running a long time and then ran 
>>    out of memory.
>>    - The input shapes are quite small, (1,4) and (16,4). In addition, 
>>    the output is supposed to do normalization and several other element-wise 
>>    operations. None of this suggests high memory usage.
>>
>> I tried `optimizer = fast_compile` and re-ran this, but the error message 
>> was actually less informative (it contains a subset of the above error 
>> message). Running with `exception_verbosity = high` results in a different 
>> error message:
>>
>>
>> Max traj len: 200
>>
>> Traceback (most recent call last):
>>
>>   File "scripts/run_rl_mj.py", line 116, in <module>
>>
>>     main()
>>
>>   File "scripts/run_rl_mj.py", line 109, in main
>>
>>     iter_info = opt.step()
>>
>>   File "/home/daniel/imitation_noise/policyopt/rl.py", line 280, in step
>>
>>     cfg=self.sim_cfg)
>>
>>   File "/home/daniel/imitation_noise/policyopt/__init__.py", line 411, 
>> in sim_mp
>>
>>     traj = job.get()
>>
>>   File "/home/daniel/anaconda2/lib/python2.7/multiprocessing/pool.py", 
>> line 567, in get
>>
>>     raise self._value
>>
>> pygpu.gpuarray.GpuArrayException: initialization error
>>
>> Closing remaining open files:trpo_logs/CartPole-v0...done
>>
>> It somehow didn't even reach the correct point in the code??
>>
>> I noticed a similar issue here: 
>> https://github.com/costapt/vess2ret/issues/5 which seems to suggest that 
>> the problem is not limited to just this script. What do you suggest I do? 
>> Thanks.
>>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[theano-users] Re: Getting "pygpu.gpuarray.GpuArrayException: Out of memory" for a small application

Reply via email to