For the memory error, the problem is that you try to allocate 14G for a shared variable on a 12G GPU. This is probably not what you want to do.
Use theano.tensor.nnet.conv3d now (not conv3d2d.conv3d() or dnn_conv3d). But we need to fix the memory problem. conv3d2d.conv3d probably cause an upcast to float32 in the computation. That would explain the last error. On Thu, Oct 13, 2016 at 6:14 AM, <[email protected]> wrote: > Hi, > I'm doing tests on Tesla K40 and Theano==0.9.0.dev3: > whether I use float 16 or float32, theano.gpuarray.dnn.dnn_conv or > theano.tensor.nnet.conv3d2d.conv3d it works only for small images but if > I increase the images size there are problems with memory. > I had not this issue when I was using float32 with the previous Theano > version and same parameters and images size. > > I try: > > floatX = float16 > device = cuda > > #theano.gpuarray.dnn.dnn_conv > out = dnn_conv(img= input, > kerns= self.W, > border_mode='valid', > subsample=(1,1,1), > conv_mode='conv', > direction_hint=None, > workmem=None, > algo=None, > precision=None) > > > This is the output for small 3d images, the convnet is working: > > Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:42:40) > [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > Anaconda is brought to you by Continuum Analytics. > Please check out: http://continuum.io/thanks and https://anaconda.org > >>> runfile('/home/luca/data/DeepLearningTutorials/Theano- > 3D-Convnet-master/convnet3d/core/run_multi_conv.py', > wdir='/home/luca/data/DeepLearningTutorials/Theano- > 3D-Convnet-master/convnet3d/core') > Mapped name None to device cuda: Tesla K40c > Using cuDNN version 5103 on context None > Disabling C code for Elemwise{mul,no_inplace} due to unsupported float16 > Disabling C code for Elemwise{Cast{float32}} due to unsupported float16 > Disabling C code for MaxAndArgmax due to unsupported float16 > > > start time: > 13/10/2016 > 11:43:00 > > > Image_dim_1: 30 > Image_dim_2: 30 > Image_dim_3: 30 > > > training @ iter = 0 > training @ iter = 400 > training cost 0.701 > epoch 1, training batch 574/574, validation error 48.04 % > ----------- > > > if I increase the size of the images: pygpu.gpuarray.GpuArrayException: > an illegal memory access was encountered > > Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:42:40) > [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > Anaconda is brought to you by Continuum Analytics. > Please check out: http://continuum.io/thanks and https://anaconda.org > >>> runfile('/home/luca/data/DeepLearningTutorials/Theano- > 3D-Convnet-master/convnet3d/core/run_multi_conv.py', > wdir='/home/luca/data/DeepLearningTutorials/Theano- > 3D-Convnet-master/convnet3d/core') > Mapped name None to device cuda: Tesla K40c > Using cuDNN version 5103 on context None > Disabling C code for Elemwise{mul,no_inplace} due to unsupported float16 > Disabling C code for Elemwise{Cast{float32}} due to unsupported float16 > Disabling C code for MaxAndArgmax due to unsupported float16 > > > start time: > 13/10/2016 > 11:52:45 > > > Image_dim_1: 90 > Image_dim_2: 90 > Image_dim_3: 90 > > > training @ iter = 0 > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/home/luca/anaconda2/lib/python2.7/site-packages/ > spyder/utils/site/sitecustomize.py", line 866, in runfile > execfile(filename, namespace) > File "/home/luca/anaconda2/lib/python2.7/site-packages/ > spyder/utils/site/sitecustomize.py", line 94, in execfile > builtins.execfile(filename, *where) > File "/home/luca/data/DeepLearningTutorials/Theano- > 3D-Convnet-master/convnet3d/core/run_multi_conv.py", line 42, in <module> > run_experiments() > File "/home/luca/data/DeepLearningTutorials/Theano- > 3D-Convnet-master/convnet3d/core/run_multi_conv.py", line 33, in > run_experiments > Zoom = 0.0 > File "mpr_convnet_class.py", line 322, in __init__ > training_cost_ij=train_model(a, b) > File "/home/luca/data/Theano-master/theano/compile/function_module.py", > line 879, in __call__ > storage_map=getattr(self.fn, 'storage_map', None)) > File "/home/luca/data/Theano-master/theano/gof/link.py", line 167, in > raise_with_op > "\nInputs values: %s" % scalar_values) > File "pygpu/gpuarray.pyx", line 1941, in pygpu.gpuarray.GpuArray.__repr__ > (pygpu/gpuarray.c:24742) > File > "/home/luca/anaconda2/lib/python2.7/site-packages/numpy/core/numeric.py", > line 482, in asarray > return array(a, dtype, copy=False, order=order) > File "pygpu/gpuarray.pyx", line 1572, in pygpu.gpuarray.GpuArray.__array__ > (pygpu/gpuarray.c:20224) > File "pygpu/gpuarray.pyx", line 1320, in pygpu.gpuarray.pygpu_as_ndarray > (pygpu/gpuarray.c:17346) > File "pygpu/gpuarray.pyx", line 347, in pygpu.gpuarray.array_read > (pygpu/gpuarray.c:6114) > pygpu.gpuarray.GpuArrayException: an illegal memory access was encountered > > ----------------- > If I try float 32 and theano.gpuarray.dnn.dnn_conv I also have memory > problems: pygpu.gpuarray.GpuArrayException: out of memory > > Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:42:40) > [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > Anaconda is brought to you by Continuum Analytics. > Please check out: http://continuum.io/thanks and https://anaconda.org > >>> runfile('/home/luca/data/DeepLearningTutorials/Theano- > 3D-Convnet-master/convnet3d/core/run_multi_conv.py', > wdir='/home/luca/data/DeepLearningTutorials/Theano- > 3D-Convnet-master/convnet3d/core') > Mapped name None to device cuda: Tesla K40c > Using cuDNN version 5103 on context None > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/home/luca/anaconda2/lib/python2.7/site-packages/ > spyder/utils/site/sitecustomize.py", line 866, in runfile > execfile(filename, namespace) > File "/home/luca/anaconda2/lib/python2.7/site-packages/ > spyder/utils/site/sitecustomize.py", line 94, in execfile > builtins.execfile(filename, *where) > File "/home/luca/data/DeepLearningTutorials/Theano- > 3D-Convnet-master/convnet3d/core/run_multi_conv.py", line 42, in <module> > run_experiments() > File "/home/luca/data/DeepLearningTutorials/Theano- > 3D-Convnet-master/convnet3d/core/run_multi_conv.py", line 33, in > run_experiments > Zoom = 0.0 > File "mpr_convnet_class.py", line 242, in __init__ > b) > File "mlp.py", line 199, in __init__ > borrow=True, > File "mlp.py", line 138, in __init__ > self.W = shared(value=W_val, borrow=borrow, name=layer_name+'_W') > File "/home/luca/data/Theano-master/theano/compile/sharedvalue.py", > line 247, in shared > allow_downcast=allow_downcast, **kwargs) > File "/home/luca/data/Theano-master/theano/gpuarray/type.py", line 624, > in gpuarray_shared_constructor > context=type.context) > File "pygpu/gpuarray.pyx", line 935, in pygpu.gpuarray.array > (pygpu/gpuarray.c:12296) > File "pygpu/gpuarray.pyx", line 633, in pygpu.gpuarray.pygpu_fromhostdata > (pygpu/gpuarray.c:9119) > File "pygpu/gpuarray.pyx", line 264, in pygpu.gpuarray.array_copy_from_host > (pygpu/gpuarray.c:5008) > pygpu.gpuarray.GpuArrayException: out of memory > > ------ > If I try > theano.tensor.nnet.conv3d2d.conv3d, > floatX = float32, > device = gpu > > I also have memory problems: MemoryError: ('Error allocating 14224896000 > bytes of device memory (out of memory).', "you might consider using > 'theano.shared(..., borrow=True)'") > > Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:42:40) > [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > Anaconda is brought to you by Continuum Analytics. > Please check out: http://continuum.io/thanks and https://anaconda.org > >>> runfile('/home/luca/data/DeepLearningTutorials/Theano- > 3D-Convnet-master/convnet3d/core/run_multi_conv.py', > wdir='/home/luca/data/DeepLearningTutorials/Theano- > 3D-Convnet-master/convnet3d/core') > Using gpu device 0: Tesla K40c (CNMeM is disabled, cuDNN 5103) > Error allocating 14224896000 bytes of device memory (out of memory). > Driver report 11961581568 bytes free and 12079136768 bytes total > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/home/luca/anaconda2/lib/python2.7/site-packages/ > spyder/utils/site/sitecustomize.py", line 866, in runfile > execfile(filename, namespace) > File "/home/luca/anaconda2/lib/python2.7/site-packages/ > spyder/utils/site/sitecustomize.py", line 94, in execfile > builtins.execfile(filename, *where) > File "/home/luca/data/DeepLearningTutorials/Theano- > 3D-Convnet-master/convnet3d/core/run_multi_conv.py", line 42, in <module> > run_experiments() > File "/home/luca/data/DeepLearningTutorials/Theano- > 3D-Convnet-master/convnet3d/core/run_multi_conv.py", line 33, in > run_experiments > Zoom = 0.0 > File "mpr_convnet_class.py", line 242, in __init__ > b) > File "mlp.py", line 199, in __init__ > borrow=True, > File "mlp.py", line 138, in __init__ > self.W = shared(value=W_val, borrow=borrow, name=layer_name+'_W') > File "/home/luca/data/Theano-master/theano/compile/sharedvalue.py", > line 247, in shared > allow_downcast=allow_downcast, **kwargs) > File "/home/luca/data/Theano-master/theano/sandbox/cuda/var.py", line > 242, in float32_shared_constructor > deviceval = type_support_filter(value, type.broadcastable, False, None) > MemoryError: ('Error allocating 14224896000 bytes of device memory (out of > memory).', "you might consider using 'theano.shared(..., borrow=True)'") > > ---------------- > If I try > theano.tensor.nnet.conv3d2d.conv3d, > floatX = float16, > device = gpu > > I have TypeError > > Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:42:40) > [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > Anaconda is brought to you by Continuum Analytics. > Please check out: http://continuum.io/thanks and https://anaconda.org > >>> runfile('/home/luca/data/DeepLearningTutorials/Theano- > 3D-Convnet-master/convnet3d/core/run_multi_conv.py', > wdir='/home/luca/data/DeepLearningTutorials/Theano- > 3D-Convnet-master/convnet3d/core') > Using gpu device 0: Tesla K40c (CNMeM is disabled, cuDNN 5103) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/home/luca/anaconda2/lib/python2.7/site-packages/ > spyder/utils/site/sitecustomize.py", line 866, in runfile > execfile(filename, namespace) > File "/home/luca/anaconda2/lib/python2.7/site-packages/ > spyder/utils/site/sitecustomize.py", line 94, in execfile > builtins.execfile(filename, *where) > File "/home/luca/data/DeepLearningTutorials/Theano- > 3D-Convnet-master/convnet3d/core/run_multi_conv.py", line 42, in <module> > run_experiments() > File "/home/luca/data/DeepLearningTutorials/Theano- > 3D-Convnet-master/convnet3d/core/run_multi_conv.py", line 33, in > run_experiments > Zoom = 0.0 > File "mpr_convnet_class.py", line 269, in __init__ > train_model = theano.function([x,y],cost, updates=updates) > File "/home/luca/data/Theano-master/theano/compile/function.py", line > 326, in function > output_keys=output_keys) > File "/home/luca/data/Theano-master/theano/compile/pfunc.py", line 449, > in pfunc > no_default_updates=no_default_updates) > File "/home/luca/data/Theano-master/theano/compile/pfunc.py", line 208, > in rebuild_collect_shared > raise TypeError(err_msg, err_sug) > TypeError: ('An update must have the same type as the original shared > variable (shared_var=DropoutLogisticRegression_W, > shared_var.type=TensorType(float16, matrix), > update_val=Elemwise{sub,no_inplace}.0, > update_val.type=TensorType(float32, matrix)).', 'If the difference is > related to the broadcast pattern, you can call the tensor.unbroadcast(var, > axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.') > > Many thanks > Luca > > > > > -- > > --- > You received this message because you are subscribed to the Google Groups > "theano-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
