Re: [theano-users] Error using floatX = float16 to save memory

Frédéric Bastien Thu, 13 Oct 2016 08:42:20 -0700

For the memory error, the problem is that you try to allocate 14G for a
shared variable on a 12G GPU. This is probably not what you want to do.


Use theano.tensor.nnet.conv3d now (not conv3d2d.conv3d() or dnn_conv3d).
But we need to fix the memory problem. conv3d2d.conv3d probably cause an
upcast to float32 in the computation. That would explain the last error.

On Thu, Oct 13, 2016 at 6:14 AM, <[email protected]> wrote:

> Hi,
> I'm doing tests on Tesla K40 and  Theano==0.9.0.dev3:
> whether I use float 16 or float32, theano.gpuarray.dnn.dnn_conv  or
> theano.tensor.nnet.conv3d2d.conv3d  it works only for small images but  if
> I increase the images size there are problems with memory.
> I had not this issue when I was using float32 with the previous Theano
> version and same parameters and images size.
>
> I try:
>
> floatX = float16
> device = cuda
>
>  #theano.gpuarray.dnn.dnn_conv
>  out = dnn_conv(img= input,
>                         kerns= self.W,
>                         border_mode='valid',
>                         subsample=(1,1,1),
>                         conv_mode='conv',
>                         direction_hint=None,
>                         workmem=None,
>                         algo=None,
>                         precision=None)
>
>
> This is the output for small 3d images, the convnet is working:
>
> Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:42:40)
> [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> Anaconda is brought to you by Continuum Analytics.
> Please check out: http://continuum.io/thanks and https://anaconda.org
> >>> runfile('/home/luca/data/DeepLearningTutorials/Theano-
> 3D-Convnet-master/convnet3d/core/run_multi_conv.py',
> wdir='/home/luca/data/DeepLearningTutorials/Theano-
> 3D-Convnet-master/convnet3d/core')
> Mapped name None to device cuda: Tesla K40c
> Using cuDNN version 5103 on context None
> Disabling C code for Elemwise{mul,no_inplace} due to unsupported float16
> Disabling C code for Elemwise{Cast{float32}} due to unsupported float16
> Disabling C code for MaxAndArgmax due to unsupported float16
>
>
> start time:
> 13/10/2016
> 11:43:00
>
>
> Image_dim_1: 30
> Image_dim_2: 30
> Image_dim_3: 30
>
>
> training @ iter =  0
> training @ iter =  400
> training cost 0.701
> epoch 1, training batch 574/574, validation error 48.04 %
> -----------
>
>
> if I increase the size of the images: pygpu.gpuarray.GpuArrayException:
> an illegal memory access was encountered
>
> Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:42:40)
> [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> Anaconda is brought to you by Continuum Analytics.
> Please check out: http://continuum.io/thanks and https://anaconda.org
> >>> runfile('/home/luca/data/DeepLearningTutorials/Theano-
> 3D-Convnet-master/convnet3d/core/run_multi_conv.py',
> wdir='/home/luca/data/DeepLearningTutorials/Theano-
> 3D-Convnet-master/convnet3d/core')
> Mapped name None to device cuda: Tesla K40c
> Using cuDNN version 5103 on context None
> Disabling C code for Elemwise{mul,no_inplace} due to unsupported float16
> Disabling C code for Elemwise{Cast{float32}} due to unsupported float16
> Disabling C code for MaxAndArgmax due to unsupported float16
>
>
> start time:
> 13/10/2016
> 11:52:45
>
>
> Image_dim_1: 90
> Image_dim_2: 90
> Image_dim_3: 90
>
>
> training @ iter =  0
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/home/luca/anaconda2/lib/python2.7/site-packages/
> spyder/utils/site/sitecustomize.py", line 866, in runfile
>     execfile(filename, namespace)
>   File "/home/luca/anaconda2/lib/python2.7/site-packages/
> spyder/utils/site/sitecustomize.py", line 94, in execfile
>     builtins.execfile(filename, *where)
>   File "/home/luca/data/DeepLearningTutorials/Theano-
> 3D-Convnet-master/convnet3d/core/run_multi_conv.py", line 42, in <module>
>     run_experiments()
>   File "/home/luca/data/DeepLearningTutorials/Theano-
> 3D-Convnet-master/convnet3d/core/run_multi_conv.py", line 33, in
> run_experiments
>     Zoom = 0.0
>   File "mpr_convnet_class.py", line 322, in __init__
>     training_cost_ij=train_model(a, b)
>   File "/home/luca/data/Theano-master/theano/compile/function_module.py",
> line 879, in __call__
>     storage_map=getattr(self.fn, 'storage_map', None))
>   File "/home/luca/data/Theano-master/theano/gof/link.py", line 167, in
> raise_with_op
>     "\nInputs values: %s" % scalar_values)
>   File "pygpu/gpuarray.pyx", line 1941, in pygpu.gpuarray.GpuArray.__repr__
> (pygpu/gpuarray.c:24742)
>   File 
> "/home/luca/anaconda2/lib/python2.7/site-packages/numpy/core/numeric.py",
> line 482, in asarray
>     return array(a, dtype, copy=False, order=order)
>   File "pygpu/gpuarray.pyx", line 1572, in pygpu.gpuarray.GpuArray.__array__
> (pygpu/gpuarray.c:20224)
>   File "pygpu/gpuarray.pyx", line 1320, in pygpu.gpuarray.pygpu_as_ndarray
> (pygpu/gpuarray.c:17346)
>   File "pygpu/gpuarray.pyx", line 347, in pygpu.gpuarray.array_read
> (pygpu/gpuarray.c:6114)
> pygpu.gpuarray.GpuArrayException: an illegal memory access was encountered
>
> -----------------
> If I try float 32 and theano.gpuarray.dnn.dnn_conv  I also have memory
> problems:  pygpu.gpuarray.GpuArrayException: out of memory
>
> Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:42:40)
> [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> Anaconda is brought to you by Continuum Analytics.
> Please check out: http://continuum.io/thanks and https://anaconda.org
> >>> runfile('/home/luca/data/DeepLearningTutorials/Theano-
> 3D-Convnet-master/convnet3d/core/run_multi_conv.py',
> wdir='/home/luca/data/DeepLearningTutorials/Theano-
> 3D-Convnet-master/convnet3d/core')
> Mapped name None to device cuda: Tesla K40c
> Using cuDNN version 5103 on context None
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/home/luca/anaconda2/lib/python2.7/site-packages/
> spyder/utils/site/sitecustomize.py", line 866, in runfile
>     execfile(filename, namespace)
>   File "/home/luca/anaconda2/lib/python2.7/site-packages/
> spyder/utils/site/sitecustomize.py", line 94, in execfile
>     builtins.execfile(filename, *where)
>   File "/home/luca/data/DeepLearningTutorials/Theano-
> 3D-Convnet-master/convnet3d/core/run_multi_conv.py", line 42, in <module>
>     run_experiments()
>   File "/home/luca/data/DeepLearningTutorials/Theano-
> 3D-Convnet-master/convnet3d/core/run_multi_conv.py", line 33, in
> run_experiments
>     Zoom = 0.0
>   File "mpr_convnet_class.py", line 242, in __init__
>     b)
>   File "mlp.py", line 199, in __init__
>     borrow=True,
>   File "mlp.py", line 138, in __init__
>     self.W = shared(value=W_val, borrow=borrow, name=layer_name+'_W')
>   File "/home/luca/data/Theano-master/theano/compile/sharedvalue.py",
> line 247, in shared
>     allow_downcast=allow_downcast, **kwargs)
>   File "/home/luca/data/Theano-master/theano/gpuarray/type.py", line 624,
> in gpuarray_shared_constructor
>     context=type.context)
>   File "pygpu/gpuarray.pyx", line 935, in pygpu.gpuarray.array
> (pygpu/gpuarray.c:12296)
>   File "pygpu/gpuarray.pyx", line 633, in pygpu.gpuarray.pygpu_fromhostdata
> (pygpu/gpuarray.c:9119)
>   File "pygpu/gpuarray.pyx", line 264, in pygpu.gpuarray.array_copy_from_host
> (pygpu/gpuarray.c:5008)
> pygpu.gpuarray.GpuArrayException: out of memory
>
> ------
> If I try
> theano.tensor.nnet.conv3d2d.conv3d,
> floatX = float32,
> device = gpu
>
> I also have memory problems: MemoryError: ('Error allocating 14224896000
> bytes of device memory (out of memory).', "you might consider using
> 'theano.shared(..., borrow=True)'")
>
> Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:42:40)
> [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> Anaconda is brought to you by Continuum Analytics.
> Please check out: http://continuum.io/thanks and https://anaconda.org
> >>> runfile('/home/luca/data/DeepLearningTutorials/Theano-
> 3D-Convnet-master/convnet3d/core/run_multi_conv.py',
> wdir='/home/luca/data/DeepLearningTutorials/Theano-
> 3D-Convnet-master/convnet3d/core')
> Using gpu device 0: Tesla K40c (CNMeM is disabled, cuDNN 5103)
> Error allocating 14224896000 bytes of device memory (out of memory).
> Driver report 11961581568 bytes free and 12079136768 bytes total
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/home/luca/anaconda2/lib/python2.7/site-packages/
> spyder/utils/site/sitecustomize.py", line 866, in runfile
>     execfile(filename, namespace)
>   File "/home/luca/anaconda2/lib/python2.7/site-packages/
> spyder/utils/site/sitecustomize.py", line 94, in execfile
>     builtins.execfile(filename, *where)
>   File "/home/luca/data/DeepLearningTutorials/Theano-
> 3D-Convnet-master/convnet3d/core/run_multi_conv.py", line 42, in <module>
>     run_experiments()
>   File "/home/luca/data/DeepLearningTutorials/Theano-
> 3D-Convnet-master/convnet3d/core/run_multi_conv.py", line 33, in
> run_experiments
>     Zoom = 0.0
>   File "mpr_convnet_class.py", line 242, in __init__
>     b)
>   File "mlp.py", line 199, in __init__
>     borrow=True,
>   File "mlp.py", line 138, in __init__
>     self.W = shared(value=W_val, borrow=borrow, name=layer_name+'_W')
>   File "/home/luca/data/Theano-master/theano/compile/sharedvalue.py",
> line 247, in shared
>     allow_downcast=allow_downcast, **kwargs)
>   File "/home/luca/data/Theano-master/theano/sandbox/cuda/var.py", line
> 242, in float32_shared_constructor
>     deviceval = type_support_filter(value, type.broadcastable, False, None)
> MemoryError: ('Error allocating 14224896000 bytes of device memory (out of
> memory).', "you might consider using 'theano.shared(..., borrow=True)'")
>
> ----------------
> If I try
> theano.tensor.nnet.conv3d2d.conv3d,
> floatX = float16,
> device = gpu
>
> I have TypeError
>
> Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:42:40)
> [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> Anaconda is brought to you by Continuum Analytics.
> Please check out: http://continuum.io/thanks and https://anaconda.org
> >>> runfile('/home/luca/data/DeepLearningTutorials/Theano-
> 3D-Convnet-master/convnet3d/core/run_multi_conv.py',
> wdir='/home/luca/data/DeepLearningTutorials/Theano-
> 3D-Convnet-master/convnet3d/core')
> Using gpu device 0: Tesla K40c (CNMeM is disabled, cuDNN 5103)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/home/luca/anaconda2/lib/python2.7/site-packages/
> spyder/utils/site/sitecustomize.py", line 866, in runfile
>     execfile(filename, namespace)
>   File "/home/luca/anaconda2/lib/python2.7/site-packages/
> spyder/utils/site/sitecustomize.py", line 94, in execfile
>     builtins.execfile(filename, *where)
>   File "/home/luca/data/DeepLearningTutorials/Theano-
> 3D-Convnet-master/convnet3d/core/run_multi_conv.py", line 42, in <module>
>     run_experiments()
>   File "/home/luca/data/DeepLearningTutorials/Theano-
> 3D-Convnet-master/convnet3d/core/run_multi_conv.py", line 33, in
> run_experiments
>     Zoom = 0.0
>   File "mpr_convnet_class.py", line 269, in __init__
>     train_model = theano.function([x,y],cost, updates=updates)
>   File "/home/luca/data/Theano-master/theano/compile/function.py", line
> 326, in function
>     output_keys=output_keys)
>   File "/home/luca/data/Theano-master/theano/compile/pfunc.py", line 449,
> in pfunc
>     no_default_updates=no_default_updates)
>   File "/home/luca/data/Theano-master/theano/compile/pfunc.py", line 208,
> in rebuild_collect_shared
>     raise TypeError(err_msg, err_sug)
> TypeError: ('An update must have the same type as the original shared
> variable (shared_var=DropoutLogisticRegression_W,
> shared_var.type=TensorType(float16, matrix), 
> update_val=Elemwise{sub,no_inplace}.0,
> update_val.type=TensorType(float32, matrix)).', 'If the difference is
> related to the broadcast pattern, you can call the tensor.unbroadcast(var,
> axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')
>
> Many thanks
> Luca
>
>
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] Error using floatX = float16 to save memory

Reply via email to