Frederic, In ops.py I can't find shape_i_op thanks On Thursday, July 21, 2016 at 11:50:51 AM UTC+2, [email protected] wrote: > > Frederic, > this is the feedback afterl the upgrades about float 16. > > Python 2.7.11 |Anaconda custom (64-bit)| (default, Dec 6 2015, 18:08:32) > [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > Anaconda is brought to you by Continuum Analytics. > Please check out: http://continuum.io/thanks and https://anaconda.org > >>> import run_multi_conv > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "run_multi_conv.py", line 1, in <module> > import mpr_convnet_class as conv > File "mpr_convnet_class.py", line 2, in <module> > from convnet3d import ConvLayer, PoolLayer > File "convnet3d.py", line 3, in <module> > from theano.tensor.nnet.conv3d2d import conv3d > File "/home/luca/data/Theano-master/theano/__init__.py", line 125, in > <module> > import theano.gpuarray > File "/home/luca/data/Theano-master/theano/gpuarray/__init__.py", line > 31, in <module> > from . import fft, dnn, opt, nerv, extra_ops > File "/home/luca/data/Theano-master/theano/gpuarray/dnn.py", line 17, in > <module> > from theano.compile.ops import shape_i, shape_i_op > ImportError: cannot import name shape_i_op > >>> > > > > > > On Thursday, July 21, 2016 at 11:15:06 AM UTC+2, [email protected] > wrote: >> >> Frederic, >> I'll do it and give you a feedback, >> many thanks >> Luca >> >> On Tuesday, July 19, 2016 at 10:09:21 PM UTC+2, nouiz wrote: >>> >>> We have a PR that upgrade some stuff about float16: >>> >>> https://github.com/Theano/Theano/pull/4764/files >>> >>> It probably fix your problem. Can you try it to confirm that you don't >>> have a different problem? >>> >>> thanks >>> >>> Frédéric >>> >>> On Fri, Jul 15, 2016 at 4:55 AM, <[email protected]> wrote: >>> >>>> ok I try. >>>> thanks >>>> >>>> On Thursday, July 14, 2016 at 11:44:41 PM UTC+2, Arnaud Bergeron wrote: >>>>> >>>>> I can't reproduce your problem using a simple convolution in float16. >>>>> >>>>> Either this is because your code is doing something unexpected or >>>>> because the problem has been fixed in the development version. >>>>> >>>>> In nay case the development version is a much better option for the >>>>> new backend and float16 so I encourage you to upgrade and try again: >>>>> http://deeplearning.net/software/theano/install.html#bleeding-edge-install-instructions >>>>> . >>>>> >>>>> 2016-07-14 4:22 GMT-04:00 <[email protected]>: >>>>> >>>>>> Here is .theanorc: >>>>>> >>>>>> [global] >>>>>> floatX = float16 >>>>>> device=cuda >>>>>> [cuda] >>>>>> root = /usr/local/cuda-7.5 >>>>>> >>>>>> >>>>>> [nvcc] >>>>>> fastmath=True >>>>>> >>>>>> optimizer = fast_compile >>>>>> >>>>>> On Thursday, July 14, 2016 at 10:19:56 AM UTC+2, >>>>>> [email protected] wrote: >>>>>>> >>>>>>> Hi Arnaud, >>>>>>> I put _f16_ok = True in dnn.py ( attached). >>>>>>> >>>>>>> This is the error I received: >>>>>>> >>>>>>> Python 2.7.11 |Anaconda custom (64-bit)| (default, Dec 6 2015, >>>>>>> 18:08:32) >>>>>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2 >>>>>>> Type "help", "copyright", "credits" or "license" for more >>>>>>> information. >>>>>>> Anaconda is brought to you by Continuum Analytics. >>>>>>> Please check out: http://continuum.io/thanks and >>>>>>> https://anaconda.org >>>>>>> >>> import run_multi_conv >>>>>>> >>>>>>> Mapped name None to device cuda: GeForce 840M >>>>>>> WARNING (theano.gof.compilelock): Overriding existing lock by dead >>>>>>> process '3202' (I am process '3351') >>>>>>> Using cuDNN version 5005 on context None >>>>>>> /home/luca/data/Theano-master/theano/tensor/signal/downsample.py:6: >>>>>>> UserWarning: downsample module has been moved to the >>>>>>> theano.tensor.signal.pool module. >>>>>>> "downsample module has been moved to the theano.tensor.signal.pool >>>>>>> module.") >>>>>>> >>> >>>>>>> >>> run_multi_conv.run_experiments() >>>>>>> Disabling C code for Elemwise{mul,no_inplace} due to unsupported >>>>>>> float16 >>>>>>> Disabling C code for Elemwise{Cast{float32}} due to unsupported >>>>>>> float16 >>>>>>> Disabling C code for Elemwise{Cast{float16}} due to unsupported >>>>>>> float16 >>>>>>> Disabling C code for Elemwise{Cast{float16}} due to unsupported >>>>>>> float16 >>>>>>> Disabling C code for Alloc due to unsupported float16 >>>>>>> Disabling C code for Cast{float16} due to unsupported float16 >>>>>>> Disabling C code for Cast{float16} due to unsupported float16 >>>>>>> Disabling C code for Cast{float16} due to unsupported float16 >>>>>>> Disabling C code for Cast{float16} due to unsupported float16 >>>>>>> Disabling C code for RandomFunction{binomial} due to unsupported >>>>>>> float16 >>>>>>> Disabling C code for RandomFunction{binomial} due to unsupported >>>>>>> float16 >>>>>>> =============================== >>>>>>> 00001 #include <Python.h> >>>>>>> 00002 #include <iostream> >>>>>>> 00003 #include "theano_mod_helper.h" >>>>>>> 00004 #include <gpuarray/array.h> >>>>>>> 00005 #include <gpuarray/kernel.h> >>>>>>> 00006 #include <gpuarray/error.h> >>>>>>> 00007 #include <gpuarray/buffer.h> >>>>>>> 00008 #include <gpuarray/buffer_blas.h> >>>>>>> 00009 #include <numpy/arrayobject.h> >>>>>>> 00010 #include <gpuarray_api.h> >>>>>>> 00011 #include <math.h> >>>>>>> 00012 #include <numpy/arrayscalars.h> >>>>>>> 00013 #include "cudnn.h" >>>>>>> 00014 #include "cudnn_helper.h" >>>>>>> 00015 #include "gpuarray_helper.h" >>>>>>> 00016 #include "gpuarray/types.h" >>>>>>> 00017 #include "gpuarray/array.h" >>>>>>> 00018 #include "gpuarray/util.h" >>>>>>> 00019 #include "gpuarray/ext_cuda.h" >>>>>>> 00020 #include "gpuarray_api.h" >>>>>>> 00021 #include "numpy_compat.h" >>>>>>> 00022 ////////////////////// >>>>>>> 00023 //// Support Code >>>>>>> 00024 ////////////////////// >>>>>>> 00025 >>>>>>> 00026 >>>>>>> 00027 >>>>>>> 00028 static int >>>>>>> 00029 c_set_tensorNd(PyGpuArrayObject *var, >>>>>>> cudnnTensorDescriptor_t desc) { >>>>>>> 00030 cudnnDataType_t dt; >>>>>>> 00031 size_t ds; >>>>>>> 00032 switch (var->ga.typecode) { >>>>>>> 00033 case GA_FLOAT: >>>>>>> 00034 dt = CUDNN_DATA_FLOAT; >>>>>>> 00035 break; >>>>>>> 00036 case GA_DOUBLE: >>>>>>> 00037 dt = CUDNN_DATA_DOUBLE; >>>>>>> 00038 break; >>>>>>> 00039 #if CUDNN_VERSION > 3000 >>>>>>> 00040 case GA_HALF: >>>>>>> 00041 dt = CUDNN_DATA_HALF; >>>>>>> 00042 break; >>>>>>> 00043 #endif >>>>>>> 00044 default: >>>>>>> 00045 PyErr_SetString(PyExc_TypeError, "Non-float datatype in >>>>>>> c_set_tensorNd"); >>>>>>> 00046 return -1; >>>>>>> 00047 } >>>>>>> 00048 ds = gpuarray_get_elsize(var->ga.typecode); >>>>>>> 00049 >>>>>>> 00050 int strs[5], dims[5], default_stride = 1; >>>>>>> 00051 unsigned int nd = PyGpuArray_NDIM(var); >>>>>>> 00052 >>>>>>> 00053 if (nd > 5) { >>>>>>> 00054 PyErr_SetString(PyExc_TypeError, "Tensor of more than >>>>>>> 5d"); >>>>>>> 00055 return -1; >>>>>>> 00056 } >>>>>>> 00057 >>>>>>> 00058 for (unsigned int _i = nd; _i > 0; _i--) { >>>>>>> 00059 unsigned int i = _i - 1; >>>>>>> 00060 strs[i] = PyGpuArray_STRIDE(var, i) ? >>>>>>> 00061 PyGpuArray_STRIDE(var, i)/ds : default_stride; >>>>>>> 00062 default_stride *= PyGpuArray_DIM(var, i); >>>>>>> 00063 dims[i] = PyGpuArray_DIM(var, i); >>>>>>> 00064 } >>>>>>> 00065 >>>>>>> 00066 cudnnStatus_t err = cudnnSetTensorNdDescriptor(desc, dt, >>>>>>> nd, dims, strs); >>>>>>> 00067 if (err != CUDNN_STATUS_SUCCESS) { >>>>>>> 00068 PyErr_Format(PyExc_RuntimeError, >>>>>>> 00069 "Could not set tensorNd descriptor: %s", >>>>>>> 00070 cudnnGetErrorString(err)); >>>>>>> 00071 return -1; >>>>>>> 00072 } >>>>>>> 00073 return 0; >>>>>>> 00074 } >>>>>>> 00075 >>>>>>> 00076 static int >>>>>>> 00077 c_set_filter(PyGpuArrayObject *var, cudnnFilterDescriptor_t >>>>>>> desc) { >>>>>>> 00078 cudnnDataType_t dt; >>>>>>> 00079 cudnnStatus_t err; >>>>>>> 00080 >>>>>>> 00081 if (!GpuArray_IS_C_CONTIGUOUS(&var->ga)) { >>>>>>> 00082 PyErr_SetString(PyExc_ValueError, >>>>>>> 00083 "Only contiguous filters (kernels) are >>>>>>> supported."); >>>>>>> 00084 return -1; >>>>>>> 00085 } >>>>>>> 00086 switch (var->ga.typecode) { >>>>>>> 00087 case GA_FLOAT: >>>>>>> 00088 dt = CUDNN_DATA_FLOAT; >>>>>>> 00089 break; >>>>>>> 00090 case GA_DOUBLE: >>>>>>> 00091 dt = CUDNN_DATA_DOUBLE; >>>>>>> 00092 break; >>>>>>> 00093 #if CUDNN_VERSION > 3000 >>>>>>> 00094 case GA_HALF: >>>>>>> 00095 dt = CUDNN_DATA_HALF; >>>>>>> 00096 break; >>>>>>> 00097 #endif >>>>>>> 00098 default: >>>>>>> 00099 PyErr_SetString(PyExc_TypeError, "Non-float datatype in >>>>>>> c_set_filter"); >>>>>>> 00100 return -1; >>>>>>> 00101 } >>>>>>> 00102 >>>>>>> 00103 int dims[5]; >>>>>>> 00104 unsigned int nd = PyGpuArray_NDIM(var); >>>>>>> 00105 >>>>>>> 00106 if (nd > 5) { >>>>>>> 00107 PyErr_SetString(PyExc_TypeError, "Tensor of more than >>>>>>> 5d"); >>>>>>> 00108 return -1; >>>>>>> 00109 } >>>>>>> 00110 >>>>>>> 00111 for (unsigned int _i = nd; _i > 0; _i--) { >>>>>>> 00112 unsigned int i = _i - 1; >>>>>>> 00113 dims[i] = PyGpuArray_DIM(var, i); >>>>>>> 00114 } >>>>>>> 00115 >>>>>>> 00116 #if CUDNN_VERSION >= 5000 >>>>>>> 00117 err = cudnnSetFilterNdDescriptor(desc, dt, >>>>>>> CUDNN_TENSOR_NCHW, nd, dims); >>>>>>> 00118 #else >>>>>>> 00119 err = cudnnSetFilterNdDescriptor(desc, dt, nd, dims); >>>>>>> 00120 #endif >>>>>>> 00121 >>>>>>> 00122 if (err != CUDNN_STATUS_SUCCESS) { >>>>>>> 00123 PyErr_Format(PyExc_RuntimeError, >>>>>>> 00124 "Could not set filter descriptor: %s.", >>>>>>> 00125 cudnnGetErrorString(err)); >>>>>>> 00126 return -1; >>>>>>> 00127 } >>>>>>> 00128 return 0; >>>>>>> 00129 } >>>>>>> 00130 >>>>>>> 00131 >>>>>>> 00132 >>>>>>> 00133 namespace { >>>>>>> 00134 struct >>>>>>> __struct_compiled_op_86feacd077d8749f42b5d82709a80ba3 { >>>>>>> 00135 PyObject* __ERROR; >>>>>>> 00136 >>>>>>> 00137 PyObject* storage_V3; >>>>>>> 00138 PyObject* storage_V5; >>>>>>> 00139 PyObject* storage_V7; >>>>>>> 00140 PyObject* storage_V9; >>>>>>> 00141 PyObject* storage_V11; >>>>>>> 00142 PyObject* storage_V13; >>>>>>> 00143 PyObject* storage_V1; >>>>>>> 00144 PyObject* storage_V15; >>>>>>> 00145 >>>>>>> 00146 PyObject* py_V15; >>>>>>> 00147 PyGpuContextObject *V15; >>>>>>> 00148 #define DTYPE_INPUT_0 npy_float16 >>>>>>> 00149 #define TYPENUM_INPUT_0 23 >>>>>>> 00150 #define ITEMSIZE_INPUT_0 2 >>>>>>> 00151 #define DTYPE_INPUT_1 npy_float16 >>>>>>> 00152 #define TYPENUM_INPUT_1 23 >>>>>>> 00153 #define ITEMSIZE_INPUT_1 2 >>>>>>> 00154 #define DTYPE_INPUT_2 npy_float16 >>>>>>> 00155 #define TYPENUM_INPUT_2 23 >>>>>>> 00156 #define ITEMSIZE_INPUT_2 2 >>>>>>> 00157 #define DTYPE_INPUT_4 npy_float16 >>>>>>> 00158 #define TYPENUM_INPUT_4 23 >>>>>>> 00159 #define ITEMSIZE_INPUT_4 2 >>>>>>> 00160 #define DTYPE_INPUT_5 npy_float16 >>>>>>> 00161 #define TYPENUM_INPUT_5 23 >>>>>>> 00162 #define ITEMSIZE_INPUT_5 2 >>>>>>> 00163 #define DTYPE_OUTPUT_0 npy_float16 >>>>>>> 00164 #define TYPENUM_OUTPUT_0 23 >>>>>>> 00165 #define ITEMSIZE_OUTPUT_0 2 >>>>>>> 00166 #define APPLY_SPECIFIC(str) >>>>>>> str##_node_86feacd077d8749f42b5d82709a80ba3_0 >>>>>>> 00167 #define CONV_INPLACE 1 >>>>>>> 00168 #define CONV_ALGO >>>>>>> CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM >>>>>>> 00169 >>>>>>> 00170 >>>>>>> 00171 PyGpuContextObject *ctx; >>>>>>> 00172 cudnnHandle_t APPLY_SPECIFIC(_handle); >>>>>>> 00173 >>>>>>> 00174 >>>>>>> 00175 cudnnTensorDescriptor_t APPLY_SPECIFIC(input); >>>>>>> 00176 cudnnTensorDescriptor_t APPLY_SPECIFIC(output); >>>>>>> 00177 cudnnFilterDescriptor_t APPLY_SPECIFIC(kerns); >>>>>>> 00178 >>>>>>> 00179 >>>>>>> 00180 >>>>>>> 00181 #ifdef CHOOSE_ALGO >>>>>>> 00182 int reuse_algo; >>>>>>> 00183 cudnnConvolutionFwdAlgo_t prev_algo; >>>>>>> 00184 #ifndef CHOOSE_ONCE >>>>>>> 00185 size_t prev_img_dims[5]; >>>>>>> 00186 size_t prev_kern_dims[5]; >>>>>>> 00187 #endif >>>>>>> 00188 #endif >>>>>>> 00189 >>>>>>> 00190 int >>>>>>> 00191 APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, >>>>>>> PyGpuArrayObject *kerns, >>>>>>> 00192 PyGpuArrayObject *om, >>>>>>> 00193 cudnnConvolutionDescriptor_t desc, >>>>>>> 00194 double alpha, double beta, >>>>>>> 00195 PyGpuArrayObject **output, >>>>>>> 00196 PyGpuContextObject *c) { >>>>>>> 00197 cudnnStatus_t err = CUDNN_STATUS_SUCCESS; >>>>>>> 00198 float af = alpha, bf = beta; >>>>>>> 00199 void *alpha_p; >>>>>>> 00200 void *beta_p; >>>>>>> 00201 >>>>>>> 00202 if (PyGpuArray_DIMS(input)[1] != >>>>>>> PyGpuArray_DIMS(kerns)[1]) { >>>>>>> 00203 PyErr_SetString(PyExc_ValueError, >>>>>>> 00204 "images and kernel must have the same stack >>>>>>> size"); >>>>>>> 00205 return 1; >>>>>>> 00206 } >>>>>>> 00207 >>>>>>> 00208 if (c_set_tensorNd(input, APPLY_SPECIFIC(input)) == -1) >>>>>>> 00209 return 1; >>>>>>> 00210 if (c_set_filter(kerns, APPLY_SPECIFIC(kerns)) == -1) >>>>>>> 00211 return 1; >>>>>>> 00212 >>>>>>> 00213 switch (input->ga.typecode) { >>>>>>> 00214 case GA_DOUBLE: >>>>>>> 00215 alpha_p = (void *)α >>>>>>> 00216 beta_p = (void *)β >>>>>>> 00217 break; >>>>>>> 00218 case GA_FLOAT: >>>>>>> 00219 case GA_HALF: >>>>>>> 00220 alpha_p = (void *)⁡ >>>>>>> 00221 beta_p = (void *)&bf; >>>>>>> 00222 break; >>>>>>> 00223 default: >>>>>>> 00224 PyErr_SetString(PyExc_TypeError, "Unsupported type in >>>>>>> convolution"); >>>>>>> 00225 return 1; >>>>>>> 00226 } >>>>>>> 00227 >>>>>>> 00228 #ifdef CONV_INPLACE >>>>>>> 00229 Py_XDECREF(*output); >>>>>>> 00230 *output = om; >>>>>>> 00231 Py_INCREF(*output); >>>>>>> 00232 #else >>>>>>> 00233 if (theano_prep_output(output, PyGpuArray_NDIM(om), >>>>>>> PyGpuArray_DIMS(om), >>>>>>> 00234 om->ga.typecode, GA_C_ORDER, c) != >>>>>>> 0) >>>>>>> 00235 return 1; >>>>>>> 00236 if (beta != 0.0 && pygpu_move(*output, om)) >>>>>>> 00237 return 1; >>>>>>> 00238 #endif >>>>>>> 00239 >>>>>>> 00240 if (c_set_tensorNd(*output, APPLY_SPECIFIC(output)) == -1) >>>>>>> 00241 return 1; >>>>>>> 00242 >>>>>>> 00243 cudnnConvolutionFwdAlgo_t algo = CONV_ALGO; >>>>>>> 00244 >>>>>>> 00245 cuda_enter(c->ctx); >>>>>>> 00246 #ifdef CHOOSE_ALGO >>>>>>> 00247 #ifndef CHOOSE_ONCE >>>>>>> 00248 reuse_algo = 1; >>>>>>> 00249 for (unsigned int i = 0; i < PyGpuArray_NDIM(input); i++) >>>>>>> { >>>>>>> 00250 reuse_algo = (reuse_algo && >>>>>>> 00251 PyGpuArray_DIM(input, i) == >>>>>>> prev_img_dims[i]); >>>>>>> 00252 reuse_algo = (reuse_algo && >>>>>>> 00253 PyGpuArray_DIM(kerns, i) == >>>>>>> prev_kern_dims[i]); >>>>>>> 00254 } >>>>>>> 00255 #endif >>>>>>> 00256 >>>>>>> 00257 if (!reuse_algo) { >>>>>>> 00258 #ifdef CHOOSE_TIME >>>>>>> 00259 int count; >>>>>>> 00260 cudnnConvolutionFwdAlgoPerf_t choice; >>>>>>> 00261 err = cudnnFindConvolutionForwardAlgorithm( >>>>>>> 00262 APPLY_SPECIFIC(_handle), APPLY_SPECIFIC(input), >>>>>>> APPLY_SPECIFIC(kerns), >>>>>>> 00263 desc, APPLY_SPECIFIC(output), 1, &count, &choice); >>>>>>> 00264 >>>>>>> 00265 if (err != CUDNN_STATUS_SUCCESS) { >>>>>>> 00266 PyErr_Format(PyExc_RuntimeError, >>>>>>> 00267 "error selecting convolution algo: %s", >>>>>>> 00268 cudnnGetErrorString(err)); >>>>>>> 00269 cuda_exit(c->ctx); >>>>>>> 00270 return 1; >>>>>>> 00271 } >>>>>>> 00272 algo = choice.algo; >>>>>>> 00273 #else >>>>>>> 00274 size_t free; >>>>>>> 00275 int err2 = gpucontext_property(c->ctx, >>>>>>> GA_CTX_PROP_FREE_GMEM, &free); >>>>>>> 00276 >>>>>>> 00277 if (err2 != GA_NO_ERROR) { >>>>>>> 00278 PyErr_Format(PyExc_RuntimeError, "Error when trying >>>>>>> to find the " >>>>>>> 00279 "memory information on the GPU"); >>>>>>> 00280 cuda_exit(c->ctx); >>>>>>> 00281 return 1; >>>>>>> 00282 } >>>>>>> 00283 >>>>>>> 00284 err = cudnnGetConvolutionForwardAlgorithm( >>>>>>> 00285 APPLY_SPECIFIC(_handle), APPLY_SPECIFIC(input), >>>>>>> APPLY_SPECIFIC(kerns), >>>>>>> 00286 desc, APPLY_SPECIFIC(output), >>>>>>> 00287 CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT, free, >>>>>>> &algo); >>>>>>> 00288 if (err != CUDNN_STATUS_SUCCESS) { >>>>>>> 00289 PyErr_Format(PyExc_RuntimeError, >>>>>>> 00290 "error selecting convolution algo: %s", >>>>>>> 00291 cudnnGetErrorString(err)); >>>>>>> 00292 cuda_exit(c->ctx); >>>>>>> 00293 return 1; >>>>>>> 00294 } >>>>>>> 00295 #endif >>>>>>> 00296 prev_algo = algo; >>>>>>> 00297 } else { >>>>>>> 00298 algo = prev_algo; >>>>>>> 00299 } >>>>>>> 00300 >>>>>>> 00301 #ifdef CHOOSE_ONCE >>>>>>> 00302 reuse_algo = 1; >>>>>>> 00303 #else >>>>>>> 00304 for (unsigned int i = 0; i < PyGpuArray_NDIM(input); i++) >>>>>>> { >>>>>>> 00305 prev_img_dims[i] = PyGpuArray_DIM(input, i); >>>>>>> 00306 prev_kern_dims[i] = PyGpuArray_DIM(kerns, i); >>>>>>> 00307 } >>>>>>> 00308 #endif >>>>>>> 00309 >>>>>>> 00310 #endif >>>>>>> 00311 >>>>>>> 00312 /* These two algos are not supported for 3d conv */ >>>>>>> 00313 if (PyGpuArray_NDIM(input) == 5 && >>>>>>> 00314 (algo == >>>>>>> CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM || >>>>>>> 00315 algo == CUDNN_CONVOLUTION_FWD_ALGO_GEMM)) >>>>>>> 00316 algo = CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM; >>>>>>> 00317 >>>>>>> 00318 // The FFT implementation does not support strides, 1x1 >>>>>>> filters or inputs >>>>>>> 00319 // with a spatial dimension larger than 1024. The >>>>>>> tiled-FFT implementation >>>>>>> 00320 // does not support strides. >>>>>>> 00321 // If the chosen implementation is FFT or tiled-FFT, >>>>>>> validate that it can >>>>>>> 00322 // be used on the current data and default to a safe >>>>>>> implementation if it >>>>>>> 00323 // can't. >>>>>>> 00324 // The following code is 2d-specific but it is fine as >>>>>>> FFT and tiled-FFT are >>>>>>> 00325 // defined only for 2d filters >>>>>>> 00326 if ((algo == CUDNN_CONVOLUTION_FWD_ALGO_FFT || >>>>>>> 00327 algo == CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING) && >>>>>>> PyGpuArray_NDIM(input) == 4) { >>>>>>> 00328 >>>>>>> 00329 // Extract the properties of the convolution descriptor >>>>>>> 00330 int nd; >>>>>>> 00331 int pad[2]; >>>>>>> 00332 int stride[2]; >>>>>>> 00333 int upscale[2]; >>>>>>> 00334 cudnnConvolutionMode_t mode; >>>>>>> 00335 cudnnDataType_t data_type; >>>>>>> 00336 err = cudnnGetConvolutionNdDescriptor(desc, 2, &nd, >>>>>>> pad, stride, >>>>>>> 00337 upscale, >>>>>>> &mode, &data_type); >>>>>>> 00338 if (err != CUDNN_STATUS_SUCCESS) { >>>>>>> 00339 PyErr_Format(PyExc_RuntimeError, >>>>>>> 00340 "error getting convolution properties: >>>>>>> %s", >>>>>>> 00341 cudnnGetErrorString(err)); >>>>>>> 00342 cuda_exit(c->ctx); >>>>>>> 00343 return 1; >>>>>>> 00344 } >>>>>>> 00345 >>>>>>> 00346 if (algo == CUDNN_CONVOLUTION_FWD_ALGO_FFT) >>>>>>> 00347 { >>>>>>> 00348 if (stride[0] != 1 || stride[1] != 1 || >>>>>>> 00349 PyGpuArray_DIM(input, 2) > 1024 || >>>>>>> PyGpuArray_DIM(input, 3) > 1024 || >>>>>>> 00350 (PyGpuArray_DIM(kerns, 2) == 1 && >>>>>>> PyGpuArray_DIM(kerns, 3) == 1)) >>>>>>> 00351 { >>>>>>> 00352 algo = CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM; >>>>>>> 00353 } >>>>>>> 00354 } >>>>>>> 00355 else >>>>>>> 00356 { >>>>>>> 00357 // algo == CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING >>>>>>> 00358 if (stride[0] != 1 || stride[1] != 1) >>>>>>> 00359 { >>>>>>> 00360 algo = CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM; >>>>>>> 00361 } >>>>>>> 00362 } >>>>>>> 00363 } >>>>>>> 00364 >>>>>>> 00365 { >>>>>>> 00366 size_t worksize; >>>>>>> 00367 gpudata *workspace; >>>>>>> 00368 err = >>>>>>> cudnnGetConvolutionForwardWorkspaceSize(APPLY_SPECIFIC(_handle), >>>>>>> 00369 >>>>>>> APPLY_SPECIFIC(input), >>>>>>> 00370 >>>>>>> APPLY_SPECIFIC(kerns), >>>>>>> 00371 desc, >>>>>>> 00372 >>>>>>> APPLY_SPECIFIC(output), >>>>>>> 00373 algo, >>>>>>> 00374 >>>>>>> &worksize); >>>>>>> 00375 >>>>>>> 00376 if (err == CUDNN_STATUS_NOT_SUPPORTED) { >>>>>>> 00377 // Fallback to none algo if not supported >>>>>>> 00378 // TODO: Print a warning >>>>>>> 00379 algo = CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM; >>>>>>> 00380 >>>>>>> 00381 err = >>>>>>> cudnnGetConvolutionForwardWorkspaceSize(APPLY_SPECIFIC(_handle), >>>>>>> 00382 >>>>>>> APPLY_SPECIFIC(input), >>>>>>> 00383 >>>>>>> APPLY_SPECIFIC(kerns), >>>>>>> 00384 desc, >>>>>>> 00385 >>>>>>> APPLY_SPECIFIC(output), >>>>>>> 00386 algo, >>>>>>> 00387 >>>>>>> &worksize); >>>>>>> 00388 } >>>>>>> 00389 >>>>>>> 00390 if (err != CUDNN_STATUS_SUCCESS) { >>>>>>> 00391 PyErr_Format(PyExc_RuntimeError, >>>>>>> 00392 "error getting worksize: %s", >>>>>>> 00393 cudnnGetErrorString(err)); >>>>>>> 00394 cuda_exit(c->ctx); >>>>>>> 00395 return 1; >>>>>>> 00396 } >>>>>>> 00397 >>>>>>> 00398 /* >>>>>>> 00399 * This is less than ideal since we need to free it >>>>>>> after (which >>>>>>> 00400 * introduces a synchronization point. But we don't >>>>>>> have a module >>>>>>> 00401 * to place a nice get_work_mem() function in. >>>>>>> 00402 */ >>>>>>> 00403 if (worksize != 0) { >>>>>>> 00404 workspace = gpudata_alloc(c->ctx, worksize, NULL, 0, >>>>>>> NULL); >>>>>>> 00405 if (workspace == NULL) { >>>>>>> 00406 PyErr_SetString(PyExc_RuntimeError, >>>>>>> 00407 "Could not allocate working >>>>>>> memory"); >>>>>>> 00408 cuda_exit(c->ctx); >>>>>>> 00409 return 1; >>>>>>> 00410 } >>>>>>> 00411 } >>>>>>> 00412 >>>>>>> 00413 cuda_wait(input->ga.data, GPUARRAY_CUDA_WAIT_READ); >>>>>>> 00414 cuda_wait(kerns->ga.data, GPUARRAY_CUDA_WAIT_READ); >>>>>>> 00415 cuda_wait((*output)->ga.data, GPUARRAY_CUDA_WAIT_WRITE); >>>>>>> 00416 >>>>>>> 00417 err = cudnnConvolutionForward( >>>>>>> 00418 APPLY_SPECIFIC(_handle), >>>>>>> 00419 alpha_p, >>>>>>> 00420 APPLY_SPECIFIC(input), PyGpuArray_DEV_DATA(input), >>>>>>> 00421 APPLY_SPECIFIC(kerns), PyGpuArray_DEV_DATA(kerns), >>>>>>> 00422 desc, algo, >>>>>>> 00423 worksize == 0 ? NULL : *(void **)workspace, worksize, >>>>>>> 00424 beta_p, >>>>>>> 00425 APPLY_SPECIFIC(output), PyGpuArray_DEV_DATA(*output)); >>>>>>> 00426 >>>>>>> 00427 if (worksize != 0) >>>>>>> 00428 gpudata_release(workspace); >>>>>>> 00429 >>>>>>> 00430 cuda_record(input->ga.data, GPUARRAY_CUDA_WAIT_READ); >>>>>>> 00431 cuda_record(kerns->ga.data, GPUARRAY_CUDA_WAIT_READ); >>>>>>> 00432 cuda_record((*output)->ga.data, >>>>>>> GPUARRAY_CUDA_WAIT_WRITE); >>>>>>> 00433 } >>>>>>> 00434 cuda_exit(c->ctx); >>>>>>> 00435 >>>>>>> 00436 if (err != CUDNN_STATUS_SUCCESS) { >>>>>>> 00437 PyErr_Format(PyExc_RuntimeError, "error doing >>>>>>> operation: %s", >>>>>>> 00438 cudnnGetErrorString(err)); >>>>>>> 00439 return 1; >>>>>>> 00440 } >>>>>>> 00441 return 0; >>>>>>> 00442 } >>>>>>> 00443 >>>>>>> 00444 #undef DTYPE_INPUT_0 >>>>>>> 00445 #undef TYPENUM_INPUT_0 >>>>>>> 00446 #undef ITEMSIZE_INPUT_0 >>>>>>> 00447 #undef DTYPE_INPUT_1 >>>>>>> 00448 #undef TYPENUM_INPUT_1 >>>>>>> 00449 #undef ITEMSIZE_INPUT_1 >>>>>>> 00450 #undef DTYPE_INPUT_2 >>>>>>> 00451 #undef TYPENUM_INPUT_2 >>>>>>> 00452 #undef ITEMSIZE_INPUT_2 >>>>>>> 00453 #undef DTYPE_INPUT_4 >>>>>>> 00454 #undef TYPENUM_INPUT_4 >>>>>>> 00455 #undef ITEMSIZE_INPUT_4 >>>>>>> 00456 #undef DTYPE_INPUT_5 >>>>>>> 00457 #undef TYPENUM_INPUT_5 >>>>>>> 00458 #undef ITEMSIZE_INPUT_5 >>>>>>> 00459 #undef DTYPE_OUTPUT_0 >>>>>>> 00460 #undef TYPENUM_OUTPUT_0 >>>>>>> 00461 #undef ITEMSIZE_OUTPUT_0 >>>>>>> 00462 #undef APPLY_SPECIFIC >>>>>>> 00463 #undef CONV_INPLACE >>>>>>> 00464 #undef CONV_ALGO >>>>>>> 00465 >>>>>>> 00466 >>>>>>> __struct_compiled_op_86feacd077d8749f42b5d82709a80ba3() { >>>>>>> 00467 // This is only somewhat safe because we: >>>>>>> 00468 // 1) Are not a virtual class >>>>>>> 00469 // 2) Do not use any virtual classes in the >>>>>>> members >>>>>>> 00470 // 3) Deal with mostly POD and pointers >>>>>>> 00471 >>>>>>> 00472 // If this changes, we would have to revise >>>>>>> this, but for >>>>>>> 00473 // now I am tired of chasing segfaults because >>>>>>> 00474 // initialization code had an error and some >>>>>>> pointer has >>>>>>> 00475 // a junk value. >>>>>>> 00476 memset(this, 0, sizeof(*this)); >>>>>>> 00477 } >>>>>>> 00478 >>>>>>> ~__struct_compiled_op_86feacd077d8749f42b5d82709a80ba3(void) { >>>>>>> 00479 cleanup(); >>>>>>> 00480 } >>>>>>> 00481 >>>>>>> 00482 int init(PyObject* __ERROR, PyObject* storage_V3, >>>>>>> PyObject* storage_V5, PyObject* storage_V7, PyObject* storage_V9, >>>>>>> PyObject* >>>>>>> storage_V11, PyObject* storage_V13, PyObject* storage_V1, PyObject* >>>>>>> storage_V15) { >>>>>>> 00483 Py_XINCREF(storage_V3); >>>>>>> 00484 Py_XINCREF(storage_V5); >>>>>>> 00485 Py_XINCREF(storage_V7); >>>>>>> 00486 Py_XINCREF(storage_V9); >>>>>>> 00487 Py_XINCREF(storage_V11); >>>>>>> 00488 Py_XINCREF(storage_V13); >>>>>>> 00489 Py_XINCREF(storage_V1); >>>>>>> 00490 Py_XINCREF(storage_V15); >>>>>>> 00491 this->storage_V3 = storage_V3; >>>>>>> 00492 this->storage_V5 = storage_V5; >>>>>>> 00493 this->storage_V7 = storage_V7; >>>>>>> 00494 this->storage_V9 = storage_V9; >>>>>>> 00495 this->storage_V11 = storage_V11; >>>>>>> 00496 this->storage_V13 = storage_V13; >>>>>>> 00497 this->storage_V1 = storage_V1; >>>>>>> 00498 this->storage_V15 = storage_V15; >>>>>>> 00499 >>>>>>> 00500 >>>>>>> 00501 >>>>>>> 00502 >>>>>>> 00503 >>>>>>> 00504 >>>>>>> 00505 >>>>>>> 00506 >>>>>>> 00507 >>>>>>> 00508 py_V15 = PyList_GET_ITEM(storage_V15, 0); >>>>>>> 00509 {Py_XINCREF(py_V15);} >>>>>>> 00510 >>>>>>> 00511 if (!PyObject_TypeCheck(py_V15, &PyGpuContextType)) { >>>>>>> 00512 PyErr_SetString(PyExc_TypeError, "expected a GpuContext"); >>>>>>> 00513 { >>>>>>> 00514 if (!PyErr_Occurred()) { >>>>>>> 00515 PyErr_SetString(PyExc_RuntimeError, >>>>>>> 00516 "Unexpected error in an Op's C code. " >>>>>>> 00517 "No Python exception was set."); >>>>>>> 00518 } >>>>>>> 00519 return 15; >>>>>>> 00520 } >>>>>>> 00521 } >>>>>>> 00522 >>>>>>> 00523 V15 = (PyGpuContextObject *)py_V15; >>>>>>> 00524 Py_INCREF(V15); >>>>>>> 00525 >>>>>>> 00526 >>>>>>> 00527 #define DTYPE_INPUT_0 npy_float16 >>>>>>> 00528 #define TYPENUM_INPUT_0 23 >>>>>>> 00529 #define ITEMSIZE_INPUT_0 2 >>>>>>> 00530 #define DTYPE_INPUT_1 npy_float16 >>>>>>> 00531 #define TYPENUM_INPUT_1 23 >>>>>>> 00532 #define ITEMSIZE_INPUT_1 2 >>>>>>> 00533 #define DTYPE_INPUT_2 npy_float16 >>>>>>> 00534 #define TYPENUM_INPUT_2 23 >>>>>>> 00535 #define ITEMSIZE_INPUT_2 2 >>>>>>> 00536 #define DTYPE_INPUT_4 npy_float16 >>>>>>> 00537 #define TYPENUM_INPUT_4 23 >>>>>>> 00538 #define ITEMSIZE_INPUT_4 2 >>>>>>> 00539 #define DTYPE_INPUT_5 npy_float16 >>>>>>> 00540 #define TYPENUM_INPUT_5 23 >>>>>>> 00541 #define ITEMSIZE_INPUT_5 2 >>>>>>> 00542 #define DTYPE_OUTPUT_0 npy_float16 >>>>>>> 00543 #define TYPENUM_OUTPUT_0 23 >>>>>>> 00544 #define ITEMSIZE_OUTPUT_0 2 >>>>>>> 00545 #define APPLY_SPECIFIC(str) >>>>>>> str##_node_86feacd077d8749f42b5d82709a80ba3_0 >>>>>>> 00546 #define CONV_INPLACE 1 >>>>>>> 00547 #define CONV_ALGO >>>>>>> CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM >>>>>>> 00548 #define FAIL { \ >>>>>>> 00549 if (!PyErr_Occurred()) { \ >>>>>>> 00550 PyErr_SetString(PyExc_RuntimeError, \ >>>>>>> 00551 "Unexpected error in an Op's C code. " \ >>>>>>> 00552 "No Python exception was set."); \ >>>>>>> 00553 } \ >>>>>>> 00554 return 17; \ >>>>>>> 00555 } >>>>>>> 00556 #define PARAMS V15 >>>>>>> 00557 >>>>>>> 00558 >>>>>>> 00559 { >>>>>>> 00560 // We need to keep a reference here to have it available >>>>>>> in the destructor. >>>>>>> 00561 ctx = PARAMS; >>>>>>> 00562 Py_INCREF(ctx); >>>>>>> 00563 >>>>>>> 00564 cuda_enter(PARAMS->ctx); >>>>>>> 00565 cudnnStatus_t err; >>>>>>> 00566 APPLY_SPECIFIC(_handle) = NULL; >>>>>>> 00567 if ((err = cudnnCreate(&APPLY_SPECIFIC(_handle))) != >>>>>>> CUDNN_STATUS_SUCCESS) { >>>>>>> 00568 PyErr_Format(PyExc_RuntimeError, "could not create >>>>>>> cuDNN handle: %s", >>>>>>> 00569 cudnnGetErrorString(err)); >>>>>>> 00570 cuda_exit(PARAMS->ctx); >>>>>>> 00571 FAIL; >>>>>>> 00572 } >>>>>>> 00573 if ((err = cudnnSetStream(APPLY_SPECIFIC(_handle), >>>>>>> 00574 cuda_get_stream(PARAMS->ctx))) >>>>>>> != CUDNN_STATUS_SUCCESS) { >>>>>>> 00575 PyErr_Format(PyExc_RuntimeError, "Could not set cudnn >>>>>>> stream: %s", >>>>>>> 00576 cudnnGetErrorString(err)); >>>>>>> 00577 cuda_exit(PARAMS->ctx); >>>>>>> 00578 FAIL; >>>>>>> 00579 } >>>>>>> 00580 cuda_exit(PARAMS->ctx); >>>>>>> 00581 } >>>>>>> 00582 >>>>>>> 00583 >>>>>>> 00584 >>>>>>> 00585 cudnnStatus_t APPLY_SPECIFIC(err); >>>>>>> 00586 APPLY_SPECIFIC(input) = NULL; >>>>>>> 00587 APPLY_SPECIFIC(output) = NULL; >>>>>>> 00588 APPLY_SPECIFIC(kerns) = NULL; >>>>>>> 00589 if ((APPLY_SPECIFIC(err) = >>>>>>> cudnnCreateTensorDescriptor(&APPLY_SPECIFIC(input))) != >>>>>>> CUDNN_STATUS_SUCCESS) { >>>>>>> 00590 PyErr_Format(PyExc_MemoryError, "could not allocate >>>>>>> tensor descriptor " >>>>>>> 00591 "(inp): %s", >>>>>>> cudnnGetErrorString(APPLY_SPECIFIC(err))); >>>>>>> 00592 FAIL; >>>>>>> 00593 } >>>>>>> 00594 if ((APPLY_SPECIFIC(err) = >>>>>>> cudnnCreateTensorDescriptor(&APPLY_SPECIFIC(output))) != >>>>>>> CUDNN_STATUS_SUCCESS) { >>>>>>> 00595 PyErr_Format(PyExc_MemoryError, "could not allocate >>>>>>> tensor descriptor " >>>>>>> 00596 "(out): %s", >>>>>>> cudnnGetErrorString(APPLY_SPECIFIC(err))); >>>>>>> 00597 FAIL; >>>>>>> 00598 } >>>>>>> 00599 if ((APPLY_SPECIFIC(err) = >>>>>>> cudnnCreateFilterDescriptor(&APPLY_SPECIFIC(kerns))) != >>>>>>> CUDNN_STATUS_SUCCESS) { >>>>>>> 00600 PyErr_Format(PyExc_MemoryError, "could not allocate >>>>>>> filter descriptor: %s", >>>>>>> 00601 cudnnGetErrorString(APPLY_SPECIFIC(err))); >>>>>>> 00602 FAIL; >>>>>>> 00603 } >>>>>>> 00604 >>>>>>> 00605 >>>>>>> 00606 >>>>>>> 00607 #ifdef CHOOSE_ALGO >>>>>>> 00608 reuse_algo = 0; >>>>>>> 00609 prev_algo = CONV_ALGO; >>>>>>> 00610 #ifndef CHOOSE_ONCE >>>>>>> 00611 memset(prev_img_dims, 0, sizeof(prev_img_dims)); >>>>>>> 00612 memset(prev_kern_dims, 0, sizeof(prev_kern_dims)); >>>>>>> 00613 #endif >>>>>>> 00614 #endif >>>>>>> 00615 >>>>>>> 00616 >>>>>>> 00617 #undef FAIL >>>>>>> 00618 #undef PARAMS >>>>>>> 00619 #undef DTYPE_INPUT_0 >>>>>>> 00620 #undef TYPENUM_INPUT_0 >>>>>>> 00621 #undef ITEMSIZE_INPUT_0 >>>>>>> 00622 #undef DTYPE_INPUT_1 >>>>>>> 00623 #undef TYPENUM_INPUT_1 >>>>>>> 00624 #undef ITEMSIZE_INPUT_1 >>>>>>> 00625 #undef DTYPE_INPUT_2 >>>>>>> 00626 #undef TYPENUM_INPUT_2 >>>>>>> 00627 #undef ITEMSIZE_INPUT_2 >>>>>>> 00628 #undef DTYPE_INPUT_4 >>>>>>> 00629 #undef TYPENUM_INPUT_4 >>>>>>> 00630 #undef ITEMSIZE_INPUT_4 >>>>>>> 00631 #undef DTYPE_INPUT_5 >>>>>>> 00632 #undef TYPENUM_INPUT_5 >>>>>>> 00633 #undef ITEMSIZE_INPUT_5 >>>>>>> 00634 #undef DTYPE_OUTPUT_0 >>>>>>> 00635 #undef TYPENUM_OUTPUT_0 >>>>>>> 00636 #undef ITEMSIZE_OUTPUT_0 >>>>>>> 00637 #undef APPLY_SPECIFIC >>>>>>> 00638 #undef CONV_INPLACE >>>>>>> 00639 #undef CONV_ALGO >>>>>>> 00640 this->__ERROR = __ERROR; >>>>>>> 00641 return 0; >>>>>>> 00642 } >>>>>>> 00643 void cleanup(void) { >>>>>>> 00644 __label_1: >>>>>>> 00645 >>>>>>> 00646 double __DUMMY_1; >>>>>>> 00647 __label_3: >>>>>>> 00648 >>>>>>> 00649 double __DUMMY_3; >>>>>>> 00650 __label_5: >>>>>>> 00651 >>>>>>> 00652 double __DUMMY_5; >>>>>>> 00653 __label_7: >>>>>>> 00654 >>>>>>> 00655 double __DUMMY_7; >>>>>>> 00656 __label_9: >>>>>>> 00657 >>>>>>> 00658 double __DUMMY_9; >>>>>>> 00659 __label_11: >>>>>>> 00660 >>>>>>> 00661 double __DUMMY_11; >>>>>>> 00662 __label_13: >>>>>>> 00663 >>>>>>> 00664 double __DUMMY_13; >>>>>>> 00665 __label_15: >>>>>>> 00666 Py_XDECREF(V15); V15 = NULL; >>>>>>> 00667 {Py_XDECREF(py_V15);} >>>>>>> 00668 >>>>>>> 00669 double __DUMMY_15; >>>>>>> 00670 __label_18: >>>>>>> 00671 >>>>>>> 00672 #define DTYPE_INPUT_0 npy_float16 >>>>>>> 00673 #define TYPENUM_INPUT_0 23 >>>>>>> 00674 #define ITEMSIZE_INPUT_0 2 >>>>>>> 00675 #define DTYPE_INPUT_1 npy_float16 >>>>>>> 00676 #define TYPENUM_INPUT_1 23 >>>>>>> 00677 #define ITEMSIZE_INPUT_1 2 >>>>>>> 00678 #define DTYPE_INPUT_2 npy_float16 >>>>>>> 00679 #define TYPENUM_INPUT_2 23 >>>>>>> 00680 #define ITEMSIZE_INPUT_2 2 >>>>>>> 00681 #define DTYPE_INPUT_4 npy_float16 >>>>>>> 00682 #define TYPENUM_INPUT_4 23 >>>>>>> 00683 #define ITEMSIZE_INPUT_4 2 >>>>>>> 00684 #define DTYPE_INPUT_5 npy_float16 >>>>>>> 00685 #define TYPENUM_INPUT_5 23 >>>>>>> 00686 #define ITEMSIZE_INPUT_5 2 >>>>>>> 00687 #define DTYPE_OUTPUT_0 npy_float16 >>>>>>> 00688 #define TYPENUM_OUTPUT_0 23 >>>>>>> 00689 #define ITEMSIZE_OUTPUT_0 2 >>>>>>> 00690 #define APPLY_SPECIFIC(str) >>>>>>> str##_node_86feacd077d8749f42b5d82709a80ba3_0 >>>>>>> 00691 #define CONV_INPLACE 1 >>>>>>> 00692 #define CONV_ALGO >>>>>>> CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM >>>>>>> 00693 >>>>>>> 00694 >>>>>>> 00695 cuda_enter(ctx->ctx); >>>>>>> 00696 cudnnDestroy(APPLY_SPECIFIC(_handle)); >>>>>>> 00697 cuda_exit(ctx->ctx); >>>>>>> 00698 Py_DECREF((PyObject *)ctx); >>>>>>> 00699 >>>>>>> 00700 >>>>>>> 00701 if (APPLY_SPECIFIC(input) != NULL) >>>>>>> 00702 cudnnDestroyTensorDescriptor(APPLY_SPECIFIC(input)); >>>>>>> 00703 if (APPLY_SPECIFIC(output) != NULL) >>>>>>> 00704 cudnnDestroyTensorDescriptor(APPLY_SPECIFIC(output)); >>>>>>> 00705 if (APPLY_SPECIFIC(kerns) != NULL) >>>>>>> 00706 cudnnDestroyFilterDescriptor(APPLY_SPECIFIC(kerns)); >>>>>>> 00707 >>>>>>> 00708 #undef DTYPE_INPUT_0 >>>>>>> 00709 #undef TYPENUM_INPUT_0 >>>>>>> 00710 #undef ITEMSIZE_INPUT_0 >>>>>>> 00711 #undef DTYPE_INPUT_1 >>>>>>> 00712 #undef TYPENUM_INPUT_1 >>>>>>> 00713 #undef ITEMSIZE_INPUT_1 >>>>>>> 00714 #undef DTYPE_INPUT_2 >>>>>>> 00715 #undef TYPENUM_INPUT_2 >>>>>>> 00716 #undef ITEMSIZE_INPUT_2 >>>>>>> 00717 #undef DTYPE_INPUT_4 >>>>>>> 00718 #undef TYPENUM_INPUT_4 >>>>>>> 00719 #undef ITEMSIZE_INPUT_4 >>>>>>> 00720 #undef DTYPE_INPUT_5 >>>>>>> 00721 #undef TYPENUM_INPUT_5 >>>>>>> 00722 #undef ITEMSIZE_INPUT_5 >>>>>>> 00723 #undef DTYPE_OUTPUT_0 >>>>>>> 00724 #undef TYPENUM_OUTPUT_0 >>>>>>> 00725 #undef ITEMSIZE_OUTPUT_0 >>>>>>> 00726 #undef APPLY_SPECIFIC >>>>>>> 00727 #undef CONV_INPLACE >>>>>>> 00728 #undef CONV_ALGO >>>>>>> 00729 double __DUMMY_18; >>>>>>> 00730 >>>>>>> 00731 Py_XDECREF(this->storage_V3); >>>>>>> 00732 Py_XDECREF(this->storage_V5); >>>>>>> 00733 Py_XDECREF(this->storage_V7); >>>>>>> 00734 Py_XDECREF(this->storage_V9); >>>>>>> 00735 Py_XDECREF(this->storage_V11); >>>>>>> 00736 Py_XDECREF(this->storage_V13); >>>>>>> 00737 Py_XDECREF(this->storage_V1); >>>>>>> 00738 Py_XDECREF(this->storage_V15); >>>>>>> 00739 } >>>>>>> 00740 int run(void) { >>>>>>> 00741 int __failure = 0; >>>>>>> 00742 >>>>>>> 00743 PyObject* py_V1; >>>>>>> 00744 >>>>>>> 00745 PyGpuArrayObject *V1; >>>>>>> 00746 >>>>>>> 00747 PyObject* py_V3; >>>>>>> 00748 >>>>>>> 00749 PyGpuArrayObject *V3; >>>>>>> 00750 >>>>>>> 00751 PyObject* py_V5; >>>>>>> 00752 >>>>>>> 00753 PyGpuArrayObject *V5; >>>>>>> 00754 >>>>>>> 00755 PyObject* py_V7; >>>>>>> 00756 >>>>>>> 00757 PyGpuArrayObject *V7; >>>>>>> 00758 >>>>>>> 00759 PyObject* py_V9; >>>>>>> 00760 >>>>>>> 00761 cudnnConvolutionDescriptor_t V9; >>>>>>> 00762 >>>>>>> 00763 PyObject* py_V11; >>>>>>> 00764 >>>>>>> 00765 typedef npy_float16 V11_dtype; // >>>>>>> Deprecated use dtype_V11 instead. >>>>>>> 00766 typedef npy_float16 dtype_V11; >>>>>>> 00767 >>>>>>> 00768 npy_float16 V11; >>>>>>> 00769 >>>>>>> 00770 PyObject* py_V13; >>>>>>> 00771 >>>>>>> 00772 typedef npy_float16 V13_dtype; // >>>>>>> Deprecated use dtype_V13 instead. >>>>>>> 00773 typedef npy_float16 dtype_V13; >>>>>>> 00774 >>>>>>> 00775 npy_float16 V13; >>>>>>> 00776 >>>>>>> 00777 { >>>>>>> 00778 >>>>>>> 00779 py_V1 = PyList_GET_ITEM(storage_V1, 0); >>>>>>> 00780 {Py_XINCREF(py_V1);} >>>>>>> 00781 >>>>>>> 00782 if (py_V1 == Py_None) >>>>>>> 00783 { >>>>>>> 00784 V1 = NULL; >>>>>>> 00785 } >>>>>>> 00786 else >>>>>>> 00787 { >>>>>>> 00788 >>>>>>> 00789 V1 = NULL; >>>>>>> 00790 if (py_V1 == Py_None) { >>>>>>> 00791 PyErr_SetString(PyExc_ValueError, "expected a >>>>>>> GpuArray, not None"); >>>>>>> 00792 { >>>>>>> 00793 __failure = 2; >>>>>>> 00794 if (!PyErr_Occurred()) { >>>>>>> 00795 PyErr_SetString(PyExc_RuntimeError, >>>>>>> 00796 "Unexpected error in an Op's C code. " >>>>>>> 00797 "No Python exception was set."); >>>>>>> 00798 } >>>>>>> 00799 goto __label_2;} >>>>>>> 00800 } >>>>>>> 00801 /* First check if we are the base type exactly (the >>>>>>> most common case), >>>>>>> 00802 then do the full subclass check if needed. */ >>>>>>> 00803 if (py_V1->ob_type != &PyGpuArrayType && >>>>>>> 00804 !PyObject_TypeCheck(py_V1, &PyGpuArrayType)) { >>>>>>> 00805 PyE >>>>>>> >>>>>> -- >>>>>> >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "theano-users" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>> >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "theano-users" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>>
-- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
