Re: [theano-users] Error using floatX = float16 to save memory

luca . wagner . 0812 Thu, 21 Jul 2016 02:51:11 -0700

Frederic,
this is the feedback afterl the upgrades about float 16.

Python 2.7.11 |Anaconda custom (64-bit)| (default, Dec  6 2015, 18:08:32) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import run_multi_conv
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "run_multi_conv.py", line 1, in <module>
    import mpr_convnet_class as conv
  File "mpr_convnet_class.py", line 2, in <module>
    from convnet3d import ConvLayer, PoolLayer
  File "convnet3d.py", line 3, in <module>
    from theano.tensor.nnet.conv3d2d import conv3d
  File "/home/luca/data/Theano-master/theano/__init__.py", line 125, in 
<module>
    import theano.gpuarray
  File "/home/luca/data/Theano-master/theano/gpuarray/__init__.py", line 
31, in <module>
    from . import fft, dnn, opt, nerv, extra_ops
  File "/home/luca/data/Theano-master/theano/gpuarray/dnn.py", line 17, in 
<module>
    from theano.compile.ops import shape_i, shape_i_op
ImportError: cannot import name shape_i_op
>>>






On Thursday, July 21, 2016 at 11:15:06 AM UTC+2, [email protected] 
wrote:
>
> Frederic,
> I'll do it and give you a feedback,
> many thanks
> Luca
>
> On Tuesday, July 19, 2016 at 10:09:21 PM UTC+2, nouiz wrote:
>>
>> We have a PR that upgrade some stuff about float16:
>>
>> https://github.com/Theano/Theano/pull/4764/files
>>
>> It probably fix your problem. Can you try it to confirm that you don't 
>> have a different problem?
>>
>> thanks
>>
>> Frédéric
>>
>> On Fri, Jul 15, 2016 at 4:55 AM, <[email protected]> wrote:
>>
>>> ok I try.
>>> thanks
>>>
>>> On Thursday, July 14, 2016 at 11:44:41 PM UTC+2, Arnaud Bergeron wrote:
>>>>
>>>> I can't reproduce your problem using a simple convolution in float16.
>>>>
>>>> Either this is because your code is doing something unexpected or 
>>>> because the problem has been fixed in the development version.
>>>>
>>>> In nay case the development version is a much better option for the new 
>>>> backend and float16 so I encourage you to upgrade and try again: 
>>>> http://deeplearning.net/software/theano/install.html#bleeding-edge-install-instructions
>>>> .
>>>>
>>>> 2016-07-14 4:22 GMT-04:00 <[email protected]>:
>>>>
>>>>> Here is .theanorc:
>>>>>
>>>>> [global]
>>>>> floatX = float16
>>>>> device=cuda
>>>>> [cuda] 
>>>>> root = /usr/local/cuda-7.5
>>>>>
>>>>>
>>>>> [nvcc]
>>>>> fastmath=True
>>>>>
>>>>> optimizer = fast_compile
>>>>>
>>>>> On Thursday, July 14, 2016 at 10:19:56 AM UTC+2, [email protected] 
>>>>> wrote:
>>>>>>
>>>>>> Hi Arnaud,
>>>>>> I put _f16_ok = True in dnn.py ( attached).
>>>>>>
>>>>>> This is the error I received:
>>>>>>
>>>>>> Python 2.7.11 |Anaconda custom (64-bit)| (default, Dec  6 2015, 
>>>>>> 18:08:32) 
>>>>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
>>>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>> Anaconda is brought to you by Continuum Analytics.
>>>>>> Please check out: http://continuum.io/thanks and https://anaconda.org
>>>>>> >>> import run_multi_conv
>>>>>>
>>>>>> Mapped name None to device cuda: GeForce 840M
>>>>>> WARNING (theano.gof.compilelock): Overriding existing lock by dead 
>>>>>> process '3202' (I am process '3351')
>>>>>> Using cuDNN version 5005 on context None
>>>>>> /home/luca/data/Theano-master/theano/tensor/signal/downsample.py:6: 
>>>>>> UserWarning: downsample module has been moved to the 
>>>>>> theano.tensor.signal.pool module.
>>>>>>   "downsample module has been moved to the theano.tensor.signal.pool 
>>>>>> module.")
>>>>>> >>> 
>>>>>> >>> run_multi_conv.run_experiments()
>>>>>> Disabling C code for Elemwise{mul,no_inplace} due to unsupported 
>>>>>> float16
>>>>>> Disabling C code for Elemwise{Cast{float32}} due to unsupported 
>>>>>> float16
>>>>>> Disabling C code for Elemwise{Cast{float16}} due to unsupported 
>>>>>> float16
>>>>>> Disabling C code for Elemwise{Cast{float16}} due to unsupported 
>>>>>> float16
>>>>>> Disabling C code for Alloc due to unsupported float16
>>>>>> Disabling C code for Cast{float16} due to unsupported float16
>>>>>> Disabling C code for Cast{float16} due to unsupported float16
>>>>>> Disabling C code for Cast{float16} due to unsupported float16
>>>>>> Disabling C code for Cast{float16} due to unsupported float16
>>>>>> Disabling C code for RandomFunction{binomial} due to unsupported 
>>>>>> float16
>>>>>> Disabling C code for RandomFunction{binomial} due to unsupported 
>>>>>> float16
>>>>>> ===============================
>>>>>> 00001    #include <Python.h>
>>>>>> 00002    #include <iostream>
>>>>>> 00003    #include "theano_mod_helper.h"
>>>>>> 00004    #include <gpuarray/array.h>
>>>>>> 00005    #include <gpuarray/kernel.h>
>>>>>> 00006    #include <gpuarray/error.h>
>>>>>> 00007    #include <gpuarray/buffer.h>
>>>>>> 00008    #include <gpuarray/buffer_blas.h>
>>>>>> 00009    #include <numpy/arrayobject.h>
>>>>>> 00010    #include <gpuarray_api.h>
>>>>>> 00011    #include <math.h>
>>>>>> 00012    #include <numpy/arrayscalars.h>
>>>>>> 00013    #include "cudnn.h"
>>>>>> 00014    #include "cudnn_helper.h"
>>>>>> 00015    #include "gpuarray_helper.h"
>>>>>> 00016    #include "gpuarray/types.h"
>>>>>> 00017    #include "gpuarray/array.h"
>>>>>> 00018    #include "gpuarray/util.h"
>>>>>> 00019    #include "gpuarray/ext_cuda.h"
>>>>>> 00020    #include "gpuarray_api.h"
>>>>>> 00021    #include "numpy_compat.h"
>>>>>> 00022    //////////////////////
>>>>>> 00023    ////  Support Code
>>>>>> 00024    //////////////////////
>>>>>> 00025    
>>>>>> 00026    
>>>>>> 00027    
>>>>>> 00028    static int
>>>>>> 00029    c_set_tensorNd(PyGpuArrayObject *var, 
>>>>>> cudnnTensorDescriptor_t desc) {
>>>>>> 00030      cudnnDataType_t dt;
>>>>>> 00031      size_t ds;
>>>>>> 00032      switch (var->ga.typecode) {
>>>>>> 00033      case GA_FLOAT:
>>>>>> 00034        dt = CUDNN_DATA_FLOAT;
>>>>>> 00035        break;
>>>>>> 00036      case GA_DOUBLE:
>>>>>> 00037        dt = CUDNN_DATA_DOUBLE;
>>>>>> 00038        break;
>>>>>> 00039    #if CUDNN_VERSION > 3000
>>>>>> 00040      case GA_HALF:
>>>>>> 00041        dt = CUDNN_DATA_HALF;
>>>>>> 00042        break;
>>>>>> 00043    #endif
>>>>>> 00044      default:
>>>>>> 00045        PyErr_SetString(PyExc_TypeError, "Non-float datatype in 
>>>>>> c_set_tensorNd");
>>>>>> 00046        return -1;
>>>>>> 00047      }
>>>>>> 00048      ds = gpuarray_get_elsize(var->ga.typecode);
>>>>>> 00049    
>>>>>> 00050      int strs[5], dims[5], default_stride = 1;
>>>>>> 00051      unsigned int nd = PyGpuArray_NDIM(var);
>>>>>> 00052    
>>>>>> 00053      if (nd > 5) {
>>>>>> 00054        PyErr_SetString(PyExc_TypeError, "Tensor of more than 
>>>>>> 5d");
>>>>>> 00055        return -1;
>>>>>> 00056      }
>>>>>> 00057    
>>>>>> 00058      for (unsigned int _i = nd; _i > 0; _i--) {
>>>>>> 00059        unsigned int i = _i - 1;
>>>>>> 00060        strs[i] = PyGpuArray_STRIDE(var, i) ?
>>>>>> 00061          PyGpuArray_STRIDE(var, i)/ds : default_stride;
>>>>>> 00062        default_stride *= PyGpuArray_DIM(var, i);
>>>>>> 00063        dims[i] = PyGpuArray_DIM(var, i);
>>>>>> 00064      }
>>>>>> 00065    
>>>>>> 00066      cudnnStatus_t err = cudnnSetTensorNdDescriptor(desc, dt, 
>>>>>> nd, dims, strs);
>>>>>> 00067      if (err != CUDNN_STATUS_SUCCESS) {
>>>>>> 00068        PyErr_Format(PyExc_RuntimeError,
>>>>>> 00069             "Could not set tensorNd descriptor: %s",
>>>>>> 00070             cudnnGetErrorString(err));
>>>>>> 00071        return -1;
>>>>>> 00072      }
>>>>>> 00073      return 0;
>>>>>> 00074    }
>>>>>> 00075    
>>>>>> 00076    static int
>>>>>> 00077    c_set_filter(PyGpuArrayObject *var, cudnnFilterDescriptor_t 
>>>>>> desc) {
>>>>>> 00078      cudnnDataType_t dt;
>>>>>> 00079      cudnnStatus_t err;
>>>>>> 00080    
>>>>>> 00081      if (!GpuArray_IS_C_CONTIGUOUS(&var->ga)) {
>>>>>> 00082        PyErr_SetString(PyExc_ValueError,
>>>>>> 00083                "Only contiguous filters (kernels) are 
>>>>>> supported.");
>>>>>> 00084        return -1;
>>>>>> 00085      }
>>>>>> 00086      switch (var->ga.typecode) {
>>>>>> 00087      case GA_FLOAT:
>>>>>> 00088        dt = CUDNN_DATA_FLOAT;
>>>>>> 00089        break;
>>>>>> 00090      case GA_DOUBLE:
>>>>>> 00091        dt = CUDNN_DATA_DOUBLE;
>>>>>> 00092        break;
>>>>>> 00093    #if CUDNN_VERSION > 3000
>>>>>> 00094      case GA_HALF:
>>>>>> 00095        dt = CUDNN_DATA_HALF;
>>>>>> 00096        break;
>>>>>> 00097    #endif
>>>>>> 00098      default:
>>>>>> 00099        PyErr_SetString(PyExc_TypeError, "Non-float datatype in 
>>>>>> c_set_filter");
>>>>>> 00100        return -1;
>>>>>> 00101      }
>>>>>> 00102    
>>>>>> 00103      int dims[5];
>>>>>> 00104      unsigned int nd = PyGpuArray_NDIM(var);
>>>>>> 00105    
>>>>>> 00106      if (nd > 5) {
>>>>>> 00107        PyErr_SetString(PyExc_TypeError, "Tensor of more than 
>>>>>> 5d");
>>>>>> 00108        return -1;
>>>>>> 00109      }
>>>>>> 00110    
>>>>>> 00111      for (unsigned int _i = nd; _i > 0; _i--) {
>>>>>> 00112        unsigned int i = _i - 1;
>>>>>> 00113        dims[i] = PyGpuArray_DIM(var, i);
>>>>>> 00114      }
>>>>>> 00115    
>>>>>> 00116    #if CUDNN_VERSION >= 5000
>>>>>> 00117        err = cudnnSetFilterNdDescriptor(desc, dt, 
>>>>>> CUDNN_TENSOR_NCHW, nd, dims);
>>>>>> 00118    #else
>>>>>> 00119        err = cudnnSetFilterNdDescriptor(desc, dt, nd, dims);
>>>>>> 00120    #endif
>>>>>> 00121    
>>>>>> 00122      if (err != CUDNN_STATUS_SUCCESS) {
>>>>>> 00123        PyErr_Format(PyExc_RuntimeError,
>>>>>> 00124             "Could not set filter descriptor: %s.",
>>>>>> 00125             cudnnGetErrorString(err));
>>>>>> 00126        return -1;
>>>>>> 00127      }
>>>>>> 00128      return 0;
>>>>>> 00129    }
>>>>>> 00130    
>>>>>> 00131    
>>>>>> 00132    
>>>>>> 00133        namespace {
>>>>>> 00134        struct 
>>>>>> __struct_compiled_op_86feacd077d8749f42b5d82709a80ba3 {
>>>>>> 00135            PyObject* __ERROR;
>>>>>> 00136    
>>>>>> 00137            PyObject* storage_V3;
>>>>>> 00138    PyObject* storage_V5;
>>>>>> 00139    PyObject* storage_V7;
>>>>>> 00140    PyObject* storage_V9;
>>>>>> 00141    PyObject* storage_V11;
>>>>>> 00142    PyObject* storage_V13;
>>>>>> 00143    PyObject* storage_V1;
>>>>>> 00144    PyObject* storage_V15;
>>>>>> 00145            
>>>>>> 00146        PyObject* py_V15;
>>>>>> 00147        PyGpuContextObject *V15;
>>>>>> 00148    #define DTYPE_INPUT_0 npy_float16
>>>>>> 00149    #define TYPENUM_INPUT_0 23
>>>>>> 00150    #define ITEMSIZE_INPUT_0 2
>>>>>> 00151    #define DTYPE_INPUT_1 npy_float16
>>>>>> 00152    #define TYPENUM_INPUT_1 23
>>>>>> 00153    #define ITEMSIZE_INPUT_1 2
>>>>>> 00154    #define DTYPE_INPUT_2 npy_float16
>>>>>> 00155    #define TYPENUM_INPUT_2 23
>>>>>> 00156    #define ITEMSIZE_INPUT_2 2
>>>>>> 00157    #define DTYPE_INPUT_4 npy_float16
>>>>>> 00158    #define TYPENUM_INPUT_4 23
>>>>>> 00159    #define ITEMSIZE_INPUT_4 2
>>>>>> 00160    #define DTYPE_INPUT_5 npy_float16
>>>>>> 00161    #define TYPENUM_INPUT_5 23
>>>>>> 00162    #define ITEMSIZE_INPUT_5 2
>>>>>> 00163    #define DTYPE_OUTPUT_0 npy_float16
>>>>>> 00164    #define TYPENUM_OUTPUT_0 23
>>>>>> 00165    #define ITEMSIZE_OUTPUT_0 2
>>>>>> 00166    #define APPLY_SPECIFIC(str) 
>>>>>> str##_node_86feacd077d8749f42b5d82709a80ba3_0
>>>>>> 00167    #define CONV_INPLACE 1
>>>>>> 00168    #define CONV_ALGO 
>>>>>> CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
>>>>>> 00169    
>>>>>> 00170    
>>>>>> 00171    PyGpuContextObject *ctx;
>>>>>> 00172    cudnnHandle_t APPLY_SPECIFIC(_handle);
>>>>>> 00173    
>>>>>> 00174    
>>>>>> 00175    cudnnTensorDescriptor_t APPLY_SPECIFIC(input);
>>>>>> 00176    cudnnTensorDescriptor_t APPLY_SPECIFIC(output);
>>>>>> 00177    cudnnFilterDescriptor_t APPLY_SPECIFIC(kerns);
>>>>>> 00178    
>>>>>> 00179    
>>>>>> 00180    
>>>>>> 00181    #ifdef CHOOSE_ALGO
>>>>>> 00182    int reuse_algo;
>>>>>> 00183    cudnnConvolutionFwdAlgo_t prev_algo;
>>>>>> 00184    #ifndef CHOOSE_ONCE
>>>>>> 00185    size_t prev_img_dims[5];
>>>>>> 00186    size_t prev_kern_dims[5];
>>>>>> 00187    #endif
>>>>>> 00188    #endif
>>>>>> 00189    
>>>>>> 00190    int
>>>>>> 00191    APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, 
>>>>>> PyGpuArrayObject *kerns,
>>>>>> 00192                             PyGpuArrayObject *om,
>>>>>> 00193                             cudnnConvolutionDescriptor_t desc,
>>>>>> 00194                             double alpha, double beta,
>>>>>> 00195                             PyGpuArrayObject **output,
>>>>>> 00196                             PyGpuContextObject *c) {
>>>>>> 00197      cudnnStatus_t err = CUDNN_STATUS_SUCCESS;
>>>>>> 00198      float af = alpha, bf = beta;
>>>>>> 00199      void *alpha_p;
>>>>>> 00200      void *beta_p;
>>>>>> 00201    
>>>>>> 00202      if (PyGpuArray_DIMS(input)[1] != 
>>>>>> PyGpuArray_DIMS(kerns)[1]) {
>>>>>> 00203        PyErr_SetString(PyExc_ValueError,
>>>>>> 00204                "images and kernel must have the same stack 
>>>>>> size");
>>>>>> 00205        return 1;
>>>>>> 00206      }
>>>>>> 00207    
>>>>>> 00208      if (c_set_tensorNd(input, APPLY_SPECIFIC(input)) == -1)
>>>>>> 00209        return 1;
>>>>>> 00210      if (c_set_filter(kerns, APPLY_SPECIFIC(kerns)) == -1)
>>>>>> 00211        return 1;
>>>>>> 00212    
>>>>>> 00213      switch (input->ga.typecode) {
>>>>>> 00214      case GA_DOUBLE:
>>>>>> 00215        alpha_p = (void *)&alpha;
>>>>>> 00216        beta_p = (void *)&beta;
>>>>>> 00217        break;
>>>>>> 00218      case GA_FLOAT:
>>>>>> 00219      case GA_HALF:
>>>>>> 00220        alpha_p = (void *)&af;
>>>>>> 00221        beta_p = (void *)&bf;
>>>>>> 00222        break;
>>>>>> 00223      default:
>>>>>> 00224        PyErr_SetString(PyExc_TypeError, "Unsupported type in 
>>>>>> convolution");
>>>>>> 00225        return 1;
>>>>>> 00226      }
>>>>>> 00227    
>>>>>> 00228    #ifdef CONV_INPLACE
>>>>>> 00229      Py_XDECREF(*output);
>>>>>> 00230      *output = om;
>>>>>> 00231      Py_INCREF(*output);
>>>>>> 00232    #else
>>>>>> 00233      if (theano_prep_output(output, PyGpuArray_NDIM(om), 
>>>>>> PyGpuArray_DIMS(om),
>>>>>> 00234                             om->ga.typecode, GA_C_ORDER, c) != 
>>>>>> 0)
>>>>>> 00235        return 1;
>>>>>> 00236      if (beta != 0.0 && pygpu_move(*output, om))
>>>>>> 00237        return 1;
>>>>>> 00238    #endif
>>>>>> 00239    
>>>>>> 00240      if (c_set_tensorNd(*output, APPLY_SPECIFIC(output)) == -1)
>>>>>> 00241        return 1;
>>>>>> 00242    
>>>>>> 00243      cudnnConvolutionFwdAlgo_t algo = CONV_ALGO;
>>>>>> 00244    
>>>>>> 00245      cuda_enter(c->ctx);
>>>>>> 00246    #ifdef CHOOSE_ALGO
>>>>>> 00247    #ifndef CHOOSE_ONCE
>>>>>> 00248      reuse_algo = 1;
>>>>>> 00249      for (unsigned int i = 0; i < PyGpuArray_NDIM(input); i++) {
>>>>>> 00250        reuse_algo = (reuse_algo &&
>>>>>> 00251                      PyGpuArray_DIM(input, i) == 
>>>>>> prev_img_dims[i]);
>>>>>> 00252        reuse_algo = (reuse_algo &&
>>>>>> 00253                      PyGpuArray_DIM(kerns, i) == 
>>>>>> prev_kern_dims[i]);
>>>>>> 00254      }
>>>>>> 00255    #endif
>>>>>> 00256    
>>>>>> 00257      if (!reuse_algo) {
>>>>>> 00258    #ifdef CHOOSE_TIME
>>>>>> 00259        int count;
>>>>>> 00260        cudnnConvolutionFwdAlgoPerf_t choice;
>>>>>> 00261        err = cudnnFindConvolutionForwardAlgorithm(
>>>>>> 00262          APPLY_SPECIFIC(_handle), APPLY_SPECIFIC(input), 
>>>>>> APPLY_SPECIFIC(kerns),
>>>>>> 00263          desc, APPLY_SPECIFIC(output), 1, &count, &choice);
>>>>>> 00264    
>>>>>> 00265        if (err != CUDNN_STATUS_SUCCESS) {
>>>>>> 00266          PyErr_Format(PyExc_RuntimeError,
>>>>>> 00267                       "error selecting convolution algo: %s",
>>>>>> 00268                       cudnnGetErrorString(err));
>>>>>> 00269          cuda_exit(c->ctx);
>>>>>> 00270          return 1;
>>>>>> 00271        }
>>>>>> 00272        algo = choice.algo;
>>>>>> 00273    #else
>>>>>> 00274        size_t free;
>>>>>> 00275        int err2 = gpucontext_property(c->ctx, 
>>>>>> GA_CTX_PROP_FREE_GMEM, &free);
>>>>>> 00276    
>>>>>> 00277        if (err2 != GA_NO_ERROR) {
>>>>>> 00278          PyErr_Format(PyExc_RuntimeError, "Error when trying to 
>>>>>> find the "
>>>>>> 00279                       "memory information on the GPU");
>>>>>> 00280          cuda_exit(c->ctx);
>>>>>> 00281          return 1;
>>>>>> 00282        }
>>>>>> 00283    
>>>>>> 00284        err = cudnnGetConvolutionForwardAlgorithm(
>>>>>> 00285          APPLY_SPECIFIC(_handle), APPLY_SPECIFIC(input), 
>>>>>> APPLY_SPECIFIC(kerns),
>>>>>> 00286          desc, APPLY_SPECIFIC(output),
>>>>>> 00287          CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT, free, 
>>>>>> &algo);
>>>>>> 00288        if (err != CUDNN_STATUS_SUCCESS) {
>>>>>> 00289          PyErr_Format(PyExc_RuntimeError,
>>>>>> 00290                       "error selecting convolution algo: %s",
>>>>>> 00291                       cudnnGetErrorString(err));
>>>>>> 00292          cuda_exit(c->ctx);
>>>>>> 00293          return 1;
>>>>>> 00294        }
>>>>>> 00295    #endif
>>>>>> 00296        prev_algo = algo;
>>>>>> 00297      } else {
>>>>>> 00298        algo = prev_algo;
>>>>>> 00299      }
>>>>>> 00300    
>>>>>> 00301    #ifdef CHOOSE_ONCE
>>>>>> 00302      reuse_algo = 1;
>>>>>> 00303    #else
>>>>>> 00304      for (unsigned int i = 0; i < PyGpuArray_NDIM(input); i++) {
>>>>>> 00305        prev_img_dims[i] = PyGpuArray_DIM(input, i);
>>>>>> 00306        prev_kern_dims[i] = PyGpuArray_DIM(kerns, i);
>>>>>> 00307      }
>>>>>> 00308    #endif
>>>>>> 00309    
>>>>>> 00310    #endif
>>>>>> 00311    
>>>>>> 00312      /* These two algos are not supported for 3d conv */
>>>>>> 00313      if (PyGpuArray_NDIM(input) == 5 &&
>>>>>> 00314          (algo == 
>>>>>> CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM ||
>>>>>> 00315           algo == CUDNN_CONVOLUTION_FWD_ALGO_GEMM))
>>>>>> 00316        algo = CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM;
>>>>>> 00317    
>>>>>> 00318      // The FFT implementation does not support strides, 1x1 
>>>>>> filters or inputs
>>>>>> 00319      // with a spatial dimension larger than 1024. The 
>>>>>> tiled-FFT implementation
>>>>>> 00320      // does not support strides.
>>>>>> 00321      // If the chosen implementation is FFT or tiled-FFT, 
>>>>>> validate that it can
>>>>>> 00322      // be used on the current data and default to a safe 
>>>>>> implementation if it
>>>>>> 00323      // can't.
>>>>>> 00324      // The following code is 2d-specific but it is fine as FFT 
>>>>>> and tiled-FFT are
>>>>>> 00325      // defined only for 2d filters
>>>>>> 00326      if ((algo == CUDNN_CONVOLUTION_FWD_ALGO_FFT ||
>>>>>> 00327           algo == CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING) && 
>>>>>> PyGpuArray_NDIM(input) == 4) {
>>>>>> 00328    
>>>>>> 00329        // Extract the properties of the convolution descriptor
>>>>>> 00330        int nd;
>>>>>> 00331        int pad[2];
>>>>>> 00332        int stride[2];
>>>>>> 00333        int upscale[2];
>>>>>> 00334        cudnnConvolutionMode_t mode;
>>>>>> 00335        cudnnDataType_t data_type;
>>>>>> 00336        err = cudnnGetConvolutionNdDescriptor(desc, 2, &nd, pad, 
>>>>>> stride,
>>>>>> 00337                                                 upscale, &mode, 
>>>>>> &data_type);
>>>>>> 00338        if (err != CUDNN_STATUS_SUCCESS) {
>>>>>> 00339          PyErr_Format(PyExc_RuntimeError,
>>>>>> 00340                       "error getting convolution properties: 
>>>>>> %s",
>>>>>> 00341                       cudnnGetErrorString(err));
>>>>>> 00342          cuda_exit(c->ctx);
>>>>>> 00343          return 1;
>>>>>> 00344        }
>>>>>> 00345    
>>>>>> 00346        if (algo == CUDNN_CONVOLUTION_FWD_ALGO_FFT)
>>>>>> 00347        {
>>>>>> 00348          if (stride[0] != 1 || stride[1] != 1 ||
>>>>>> 00349              PyGpuArray_DIM(input, 2) > 1024 || 
>>>>>> PyGpuArray_DIM(input, 3) > 1024 ||
>>>>>> 00350              (PyGpuArray_DIM(kerns, 2) == 1 && 
>>>>>> PyGpuArray_DIM(kerns, 3) == 1))
>>>>>> 00351          {
>>>>>> 00352            algo = CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM;
>>>>>> 00353          }
>>>>>> 00354        }
>>>>>> 00355        else
>>>>>> 00356        {
>>>>>> 00357          // algo == CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING
>>>>>> 00358          if (stride[0] != 1 || stride[1] != 1)
>>>>>> 00359          {
>>>>>> 00360            algo = CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM;
>>>>>> 00361          }
>>>>>> 00362        }
>>>>>> 00363      }
>>>>>> 00364    
>>>>>> 00365      {
>>>>>> 00366        size_t worksize;
>>>>>> 00367        gpudata *workspace;
>>>>>> 00368        err = 
>>>>>> cudnnGetConvolutionForwardWorkspaceSize(APPLY_SPECIFIC(_handle),
>>>>>> 00369                                                      
>>>>>> APPLY_SPECIFIC(input),
>>>>>> 00370                                                      
>>>>>> APPLY_SPECIFIC(kerns),
>>>>>> 00371                                                      desc,
>>>>>> 00372                                                      
>>>>>> APPLY_SPECIFIC(output),
>>>>>> 00373                                                      algo,
>>>>>> 00374                                                      &worksize);
>>>>>> 00375    
>>>>>> 00376        if (err == CUDNN_STATUS_NOT_SUPPORTED) {
>>>>>> 00377          // Fallback to none algo if not supported
>>>>>> 00378          // TODO: Print a warning
>>>>>> 00379          algo = CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM;
>>>>>> 00380    
>>>>>> 00381          err = 
>>>>>> cudnnGetConvolutionForwardWorkspaceSize(APPLY_SPECIFIC(_handle),
>>>>>> 00382                                                        
>>>>>> APPLY_SPECIFIC(input),
>>>>>> 00383                                                        
>>>>>> APPLY_SPECIFIC(kerns),
>>>>>> 00384                                                        desc,
>>>>>> 00385                                                        
>>>>>> APPLY_SPECIFIC(output),
>>>>>> 00386                                                        algo,
>>>>>> 00387                                                        
>>>>>> &worksize);
>>>>>> 00388        }
>>>>>> 00389    
>>>>>> 00390        if (err != CUDNN_STATUS_SUCCESS) {
>>>>>> 00391          PyErr_Format(PyExc_RuntimeError,
>>>>>> 00392                       "error getting worksize: %s",
>>>>>> 00393                       cudnnGetErrorString(err));
>>>>>> 00394          cuda_exit(c->ctx);
>>>>>> 00395          return 1;
>>>>>> 00396        }
>>>>>> 00397    
>>>>>> 00398        /*
>>>>>> 00399         * This is less than ideal since we need to free it 
>>>>>> after (which
>>>>>> 00400         * introduces a synchronization point. But we don't have 
>>>>>> a module
>>>>>> 00401         * to place a nice get_work_mem() function in.
>>>>>> 00402         */
>>>>>> 00403        if (worksize != 0) {
>>>>>> 00404          workspace = gpudata_alloc(c->ctx, worksize, NULL, 0, 
>>>>>> NULL);
>>>>>> 00405          if (workspace == NULL) {
>>>>>> 00406            PyErr_SetString(PyExc_RuntimeError,
>>>>>> 00407                            "Could not allocate working memory");
>>>>>> 00408            cuda_exit(c->ctx);
>>>>>> 00409            return 1;
>>>>>> 00410          }
>>>>>> 00411        }
>>>>>> 00412    
>>>>>> 00413        cuda_wait(input->ga.data, GPUARRAY_CUDA_WAIT_READ);
>>>>>> 00414        cuda_wait(kerns->ga.data, GPUARRAY_CUDA_WAIT_READ);
>>>>>> 00415        cuda_wait((*output)->ga.data, GPUARRAY_CUDA_WAIT_WRITE);
>>>>>> 00416    
>>>>>> 00417        err = cudnnConvolutionForward(
>>>>>> 00418          APPLY_SPECIFIC(_handle),
>>>>>> 00419          alpha_p,
>>>>>> 00420          APPLY_SPECIFIC(input), PyGpuArray_DEV_DATA(input),
>>>>>> 00421          APPLY_SPECIFIC(kerns), PyGpuArray_DEV_DATA(kerns),
>>>>>> 00422          desc, algo,
>>>>>> 00423          worksize == 0 ? NULL : *(void **)workspace, worksize,
>>>>>> 00424          beta_p,
>>>>>> 00425          APPLY_SPECIFIC(output), PyGpuArray_DEV_DATA(*output));
>>>>>> 00426    
>>>>>> 00427        if (worksize != 0)
>>>>>> 00428          gpudata_release(workspace);
>>>>>> 00429    
>>>>>> 00430        cuda_record(input->ga.data, GPUARRAY_CUDA_WAIT_READ);
>>>>>> 00431        cuda_record(kerns->ga.data, GPUARRAY_CUDA_WAIT_READ);
>>>>>> 00432        cuda_record((*output)->ga.data, 
>>>>>> GPUARRAY_CUDA_WAIT_WRITE);
>>>>>> 00433      }
>>>>>> 00434      cuda_exit(c->ctx);
>>>>>> 00435    
>>>>>> 00436      if (err != CUDNN_STATUS_SUCCESS) {
>>>>>> 00437        PyErr_Format(PyExc_RuntimeError, "error doing operation: 
>>>>>> %s",
>>>>>> 00438             cudnnGetErrorString(err));
>>>>>> 00439        return 1;
>>>>>> 00440      }
>>>>>> 00441      return 0;
>>>>>> 00442    }
>>>>>> 00443    
>>>>>> 00444    #undef DTYPE_INPUT_0
>>>>>> 00445    #undef TYPENUM_INPUT_0
>>>>>> 00446    #undef ITEMSIZE_INPUT_0
>>>>>> 00447    #undef DTYPE_INPUT_1
>>>>>> 00448    #undef TYPENUM_INPUT_1
>>>>>> 00449    #undef ITEMSIZE_INPUT_1
>>>>>> 00450    #undef DTYPE_INPUT_2
>>>>>> 00451    #undef TYPENUM_INPUT_2
>>>>>> 00452    #undef ITEMSIZE_INPUT_2
>>>>>> 00453    #undef DTYPE_INPUT_4
>>>>>> 00454    #undef TYPENUM_INPUT_4
>>>>>> 00455    #undef ITEMSIZE_INPUT_4
>>>>>> 00456    #undef DTYPE_INPUT_5
>>>>>> 00457    #undef TYPENUM_INPUT_5
>>>>>> 00458    #undef ITEMSIZE_INPUT_5
>>>>>> 00459    #undef DTYPE_OUTPUT_0
>>>>>> 00460    #undef TYPENUM_OUTPUT_0
>>>>>> 00461    #undef ITEMSIZE_OUTPUT_0
>>>>>> 00462    #undef APPLY_SPECIFIC
>>>>>> 00463    #undef CONV_INPLACE
>>>>>> 00464    #undef CONV_ALGO
>>>>>> 00465    
>>>>>> 00466            
>>>>>> __struct_compiled_op_86feacd077d8749f42b5d82709a80ba3() {
>>>>>> 00467                // This is only somewhat safe because we:
>>>>>> 00468                //  1) Are not a virtual class
>>>>>> 00469                //  2) Do not use any virtual classes in the 
>>>>>> members
>>>>>> 00470                //  3) Deal with mostly POD and pointers
>>>>>> 00471    
>>>>>> 00472                // If this changes, we would have to revise 
>>>>>> this, but for
>>>>>> 00473                // now I am tired of chasing segfaults because
>>>>>> 00474                // initialization code had an error and some 
>>>>>> pointer has
>>>>>> 00475                // a junk value.
>>>>>> 00476                memset(this, 0, sizeof(*this));
>>>>>> 00477            }
>>>>>> 00478            
>>>>>> ~__struct_compiled_op_86feacd077d8749f42b5d82709a80ba3(void) {
>>>>>> 00479                cleanup();
>>>>>> 00480            }
>>>>>> 00481    
>>>>>> 00482            int init(PyObject* __ERROR, PyObject* storage_V3, 
>>>>>> PyObject* storage_V5, PyObject* storage_V7, PyObject* storage_V9, 
>>>>>> PyObject* 
>>>>>> storage_V11, PyObject* storage_V13, PyObject* storage_V1, PyObject* 
>>>>>> storage_V15) {
>>>>>> 00483                Py_XINCREF(storage_V3);
>>>>>> 00484    Py_XINCREF(storage_V5);
>>>>>> 00485    Py_XINCREF(storage_V7);
>>>>>> 00486    Py_XINCREF(storage_V9);
>>>>>> 00487    Py_XINCREF(storage_V11);
>>>>>> 00488    Py_XINCREF(storage_V13);
>>>>>> 00489    Py_XINCREF(storage_V1);
>>>>>> 00490    Py_XINCREF(storage_V15);
>>>>>> 00491                this->storage_V3 = storage_V3;
>>>>>> 00492    this->storage_V5 = storage_V5;
>>>>>> 00493    this->storage_V7 = storage_V7;
>>>>>> 00494    this->storage_V9 = storage_V9;
>>>>>> 00495    this->storage_V11 = storage_V11;
>>>>>> 00496    this->storage_V13 = storage_V13;
>>>>>> 00497    this->storage_V1 = storage_V1;
>>>>>> 00498    this->storage_V15 = storage_V15;
>>>>>> 00499                
>>>>>> 00500    
>>>>>> 00501    
>>>>>> 00502    
>>>>>> 00503    
>>>>>> 00504    
>>>>>> 00505    
>>>>>> 00506    
>>>>>> 00507    
>>>>>> 00508        py_V15 = PyList_GET_ITEM(storage_V15, 0);
>>>>>> 00509        {Py_XINCREF(py_V15);}
>>>>>> 00510        
>>>>>> 00511    if (!PyObject_TypeCheck(py_V15, &PyGpuContextType)) {
>>>>>> 00512      PyErr_SetString(PyExc_TypeError, "expected a GpuContext");
>>>>>> 00513      {
>>>>>> 00514            if (!PyErr_Occurred()) {
>>>>>> 00515                PyErr_SetString(PyExc_RuntimeError,
>>>>>> 00516                    "Unexpected error in an Op's C code. "
>>>>>> 00517                    "No Python exception was set.");
>>>>>> 00518                }
>>>>>> 00519            return 15;
>>>>>> 00520    }
>>>>>> 00521    }
>>>>>> 00522    
>>>>>> 00523    V15 = (PyGpuContextObject *)py_V15;
>>>>>> 00524    Py_INCREF(V15);
>>>>>> 00525    
>>>>>> 00526    
>>>>>> 00527    #define DTYPE_INPUT_0 npy_float16
>>>>>> 00528    #define TYPENUM_INPUT_0 23
>>>>>> 00529    #define ITEMSIZE_INPUT_0 2
>>>>>> 00530    #define DTYPE_INPUT_1 npy_float16
>>>>>> 00531    #define TYPENUM_INPUT_1 23
>>>>>> 00532    #define ITEMSIZE_INPUT_1 2
>>>>>> 00533    #define DTYPE_INPUT_2 npy_float16
>>>>>> 00534    #define TYPENUM_INPUT_2 23
>>>>>> 00535    #define ITEMSIZE_INPUT_2 2
>>>>>> 00536    #define DTYPE_INPUT_4 npy_float16
>>>>>> 00537    #define TYPENUM_INPUT_4 23
>>>>>> 00538    #define ITEMSIZE_INPUT_4 2
>>>>>> 00539    #define DTYPE_INPUT_5 npy_float16
>>>>>> 00540    #define TYPENUM_INPUT_5 23
>>>>>> 00541    #define ITEMSIZE_INPUT_5 2
>>>>>> 00542    #define DTYPE_OUTPUT_0 npy_float16
>>>>>> 00543    #define TYPENUM_OUTPUT_0 23
>>>>>> 00544    #define ITEMSIZE_OUTPUT_0 2
>>>>>> 00545    #define APPLY_SPECIFIC(str) 
>>>>>> str##_node_86feacd077d8749f42b5d82709a80ba3_0
>>>>>> 00546    #define CONV_INPLACE 1
>>>>>> 00547    #define CONV_ALGO 
>>>>>> CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
>>>>>> 00548    #define FAIL { \
>>>>>> 00549            if (!PyErr_Occurred()) { \
>>>>>> 00550                PyErr_SetString(PyExc_RuntimeError, \
>>>>>> 00551                    "Unexpected error in an Op's C code. " \
>>>>>> 00552                    "No Python exception was set."); \
>>>>>> 00553                } \
>>>>>> 00554            return 17; \
>>>>>> 00555    }
>>>>>> 00556    #define PARAMS V15
>>>>>> 00557    
>>>>>> 00558    
>>>>>> 00559    {
>>>>>> 00560      // We need to keep a reference here to have it available 
>>>>>> in the destructor.
>>>>>> 00561      ctx = PARAMS;
>>>>>> 00562      Py_INCREF(ctx);
>>>>>> 00563    
>>>>>> 00564      cuda_enter(PARAMS->ctx);
>>>>>> 00565      cudnnStatus_t err;
>>>>>> 00566      APPLY_SPECIFIC(_handle) = NULL;
>>>>>> 00567      if ((err = cudnnCreate(&APPLY_SPECIFIC(_handle))) != 
>>>>>> CUDNN_STATUS_SUCCESS) {
>>>>>> 00568        PyErr_Format(PyExc_RuntimeError, "could not create cuDNN 
>>>>>> handle: %s",
>>>>>> 00569                     cudnnGetErrorString(err));
>>>>>> 00570        cuda_exit(PARAMS->ctx);
>>>>>> 00571        FAIL;
>>>>>> 00572      }
>>>>>> 00573      if ((err = cudnnSetStream(APPLY_SPECIFIC(_handle),
>>>>>> 00574                                cuda_get_stream(PARAMS->ctx))) 
>>>>>> != CUDNN_STATUS_SUCCESS) {
>>>>>> 00575        PyErr_Format(PyExc_RuntimeError, "Could not set cudnn 
>>>>>> stream: %s",
>>>>>> 00576                     cudnnGetErrorString(err));
>>>>>> 00577        cuda_exit(PARAMS->ctx);
>>>>>> 00578        FAIL;
>>>>>> 00579      }
>>>>>> 00580      cuda_exit(PARAMS->ctx);
>>>>>> 00581    }
>>>>>> 00582    
>>>>>> 00583    
>>>>>> 00584    
>>>>>> 00585    cudnnStatus_t APPLY_SPECIFIC(err);
>>>>>> 00586    APPLY_SPECIFIC(input) = NULL;
>>>>>> 00587    APPLY_SPECIFIC(output) = NULL;
>>>>>> 00588    APPLY_SPECIFIC(kerns) = NULL;
>>>>>> 00589    if ((APPLY_SPECIFIC(err) = 
>>>>>> cudnnCreateTensorDescriptor(&APPLY_SPECIFIC(input))) != 
>>>>>> CUDNN_STATUS_SUCCESS) {
>>>>>> 00590      PyErr_Format(PyExc_MemoryError, "could not allocate tensor 
>>>>>> descriptor "
>>>>>> 00591               "(inp): %s", 
>>>>>> cudnnGetErrorString(APPLY_SPECIFIC(err)));
>>>>>> 00592      FAIL;
>>>>>> 00593    }
>>>>>> 00594    if ((APPLY_SPECIFIC(err) = 
>>>>>> cudnnCreateTensorDescriptor(&APPLY_SPECIFIC(output))) != 
>>>>>> CUDNN_STATUS_SUCCESS) {
>>>>>> 00595      PyErr_Format(PyExc_MemoryError, "could not allocate tensor 
>>>>>> descriptor "
>>>>>> 00596                   "(out): %s", 
>>>>>> cudnnGetErrorString(APPLY_SPECIFIC(err)));
>>>>>> 00597      FAIL;
>>>>>> 00598    }
>>>>>> 00599    if ((APPLY_SPECIFIC(err) = 
>>>>>> cudnnCreateFilterDescriptor(&APPLY_SPECIFIC(kerns))) != 
>>>>>> CUDNN_STATUS_SUCCESS) {
>>>>>> 00600      PyErr_Format(PyExc_MemoryError, "could not allocate filter 
>>>>>> descriptor: %s", 
>>>>>> 00601               cudnnGetErrorString(APPLY_SPECIFIC(err)));
>>>>>> 00602      FAIL;
>>>>>> 00603    }
>>>>>> 00604    
>>>>>> 00605    
>>>>>> 00606    
>>>>>> 00607    #ifdef CHOOSE_ALGO
>>>>>> 00608    reuse_algo = 0;
>>>>>> 00609    prev_algo = CONV_ALGO;
>>>>>> 00610    #ifndef CHOOSE_ONCE
>>>>>> 00611    memset(prev_img_dims, 0, sizeof(prev_img_dims));
>>>>>> 00612    memset(prev_kern_dims, 0, sizeof(prev_kern_dims));
>>>>>> 00613    #endif
>>>>>> 00614    #endif
>>>>>> 00615    
>>>>>> 00616    
>>>>>> 00617    #undef FAIL
>>>>>> 00618    #undef PARAMS
>>>>>> 00619    #undef DTYPE_INPUT_0
>>>>>> 00620    #undef TYPENUM_INPUT_0
>>>>>> 00621    #undef ITEMSIZE_INPUT_0
>>>>>> 00622    #undef DTYPE_INPUT_1
>>>>>> 00623    #undef TYPENUM_INPUT_1
>>>>>> 00624    #undef ITEMSIZE_INPUT_1
>>>>>> 00625    #undef DTYPE_INPUT_2
>>>>>> 00626    #undef TYPENUM_INPUT_2
>>>>>> 00627    #undef ITEMSIZE_INPUT_2
>>>>>> 00628    #undef DTYPE_INPUT_4
>>>>>> 00629    #undef TYPENUM_INPUT_4
>>>>>> 00630    #undef ITEMSIZE_INPUT_4
>>>>>> 00631    #undef DTYPE_INPUT_5
>>>>>> 00632    #undef TYPENUM_INPUT_5
>>>>>> 00633    #undef ITEMSIZE_INPUT_5
>>>>>> 00634    #undef DTYPE_OUTPUT_0
>>>>>> 00635    #undef TYPENUM_OUTPUT_0
>>>>>> 00636    #undef ITEMSIZE_OUTPUT_0
>>>>>> 00637    #undef APPLY_SPECIFIC
>>>>>> 00638    #undef CONV_INPLACE
>>>>>> 00639    #undef CONV_ALGO
>>>>>> 00640                this->__ERROR = __ERROR;
>>>>>> 00641                return 0;
>>>>>> 00642            }
>>>>>> 00643            void cleanup(void) {
>>>>>> 00644                __label_1:
>>>>>> 00645    
>>>>>> 00646    double __DUMMY_1;
>>>>>> 00647    __label_3:
>>>>>> 00648    
>>>>>> 00649    double __DUMMY_3;
>>>>>> 00650    __label_5:
>>>>>> 00651    
>>>>>> 00652    double __DUMMY_5;
>>>>>> 00653    __label_7:
>>>>>> 00654    
>>>>>> 00655    double __DUMMY_7;
>>>>>> 00656    __label_9:
>>>>>> 00657    
>>>>>> 00658    double __DUMMY_9;
>>>>>> 00659    __label_11:
>>>>>> 00660    
>>>>>> 00661    double __DUMMY_11;
>>>>>> 00662    __label_13:
>>>>>> 00663    
>>>>>> 00664    double __DUMMY_13;
>>>>>> 00665    __label_15:
>>>>>> 00666    Py_XDECREF(V15); V15 = NULL;
>>>>>> 00667        {Py_XDECREF(py_V15);}
>>>>>> 00668        
>>>>>> 00669    double __DUMMY_15;
>>>>>> 00670    __label_18:
>>>>>> 00671    
>>>>>> 00672    #define DTYPE_INPUT_0 npy_float16
>>>>>> 00673    #define TYPENUM_INPUT_0 23
>>>>>> 00674    #define ITEMSIZE_INPUT_0 2
>>>>>> 00675    #define DTYPE_INPUT_1 npy_float16
>>>>>> 00676    #define TYPENUM_INPUT_1 23
>>>>>> 00677    #define ITEMSIZE_INPUT_1 2
>>>>>> 00678    #define DTYPE_INPUT_2 npy_float16
>>>>>> 00679    #define TYPENUM_INPUT_2 23
>>>>>> 00680    #define ITEMSIZE_INPUT_2 2
>>>>>> 00681    #define DTYPE_INPUT_4 npy_float16
>>>>>> 00682    #define TYPENUM_INPUT_4 23
>>>>>> 00683    #define ITEMSIZE_INPUT_4 2
>>>>>> 00684    #define DTYPE_INPUT_5 npy_float16
>>>>>> 00685    #define TYPENUM_INPUT_5 23
>>>>>> 00686    #define ITEMSIZE_INPUT_5 2
>>>>>> 00687    #define DTYPE_OUTPUT_0 npy_float16
>>>>>> 00688    #define TYPENUM_OUTPUT_0 23
>>>>>> 00689    #define ITEMSIZE_OUTPUT_0 2
>>>>>> 00690    #define APPLY_SPECIFIC(str) 
>>>>>> str##_node_86feacd077d8749f42b5d82709a80ba3_0
>>>>>> 00691    #define CONV_INPLACE 1
>>>>>> 00692    #define CONV_ALGO 
>>>>>> CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
>>>>>> 00693    
>>>>>> 00694    
>>>>>> 00695    cuda_enter(ctx->ctx);
>>>>>> 00696    cudnnDestroy(APPLY_SPECIFIC(_handle));
>>>>>> 00697    cuda_exit(ctx->ctx);
>>>>>> 00698    Py_DECREF((PyObject *)ctx);
>>>>>> 00699    
>>>>>> 00700    
>>>>>> 00701    if (APPLY_SPECIFIC(input) != NULL)
>>>>>> 00702      cudnnDestroyTensorDescriptor(APPLY_SPECIFIC(input));
>>>>>> 00703    if (APPLY_SPECIFIC(output) != NULL)
>>>>>> 00704      cudnnDestroyTensorDescriptor(APPLY_SPECIFIC(output));
>>>>>> 00705    if (APPLY_SPECIFIC(kerns) != NULL)
>>>>>> 00706      cudnnDestroyFilterDescriptor(APPLY_SPECIFIC(kerns));
>>>>>> 00707    
>>>>>> 00708    #undef DTYPE_INPUT_0
>>>>>> 00709    #undef TYPENUM_INPUT_0
>>>>>> 00710    #undef ITEMSIZE_INPUT_0
>>>>>> 00711    #undef DTYPE_INPUT_1
>>>>>> 00712    #undef TYPENUM_INPUT_1
>>>>>> 00713    #undef ITEMSIZE_INPUT_1
>>>>>> 00714    #undef DTYPE_INPUT_2
>>>>>> 00715    #undef TYPENUM_INPUT_2
>>>>>> 00716    #undef ITEMSIZE_INPUT_2
>>>>>> 00717    #undef DTYPE_INPUT_4
>>>>>> 00718    #undef TYPENUM_INPUT_4
>>>>>> 00719    #undef ITEMSIZE_INPUT_4
>>>>>> 00720    #undef DTYPE_INPUT_5
>>>>>> 00721    #undef TYPENUM_INPUT_5
>>>>>> 00722    #undef ITEMSIZE_INPUT_5
>>>>>> 00723    #undef DTYPE_OUTPUT_0
>>>>>> 00724    #undef TYPENUM_OUTPUT_0
>>>>>> 00725    #undef ITEMSIZE_OUTPUT_0
>>>>>> 00726    #undef APPLY_SPECIFIC
>>>>>> 00727    #undef CONV_INPLACE
>>>>>> 00728    #undef CONV_ALGO
>>>>>> 00729    double __DUMMY_18;
>>>>>> 00730    
>>>>>> 00731                Py_XDECREF(this->storage_V3);
>>>>>> 00732    Py_XDECREF(this->storage_V5);
>>>>>> 00733    Py_XDECREF(this->storage_V7);
>>>>>> 00734    Py_XDECREF(this->storage_V9);
>>>>>> 00735    Py_XDECREF(this->storage_V11);
>>>>>> 00736    Py_XDECREF(this->storage_V13);
>>>>>> 00737    Py_XDECREF(this->storage_V1);
>>>>>> 00738    Py_XDECREF(this->storage_V15);
>>>>>> 00739            }
>>>>>> 00740            int run(void) {
>>>>>> 00741                int __failure = 0;
>>>>>> 00742                
>>>>>> 00743        PyObject* py_V1;
>>>>>> 00744        
>>>>>> 00745            PyGpuArrayObject *V1;
>>>>>> 00746            
>>>>>> 00747        PyObject* py_V3;
>>>>>> 00748        
>>>>>> 00749            PyGpuArrayObject *V3;
>>>>>> 00750            
>>>>>> 00751        PyObject* py_V5;
>>>>>> 00752        
>>>>>> 00753            PyGpuArrayObject *V5;
>>>>>> 00754            
>>>>>> 00755        PyObject* py_V7;
>>>>>> 00756        
>>>>>> 00757            PyGpuArrayObject *V7;
>>>>>> 00758            
>>>>>> 00759        PyObject* py_V9;
>>>>>> 00760        
>>>>>> 00761            cudnnConvolutionDescriptor_t V9;
>>>>>> 00762            
>>>>>> 00763        PyObject* py_V11;
>>>>>> 00764        
>>>>>> 00765                    typedef npy_float16 V11_dtype; // Deprecated 
>>>>>> use dtype_V11 instead.
>>>>>> 00766                    typedef npy_float16 dtype_V11;
>>>>>> 00767                
>>>>>> 00768            npy_float16 V11;
>>>>>> 00769            
>>>>>> 00770        PyObject* py_V13;
>>>>>> 00771        
>>>>>> 00772                    typedef npy_float16 V13_dtype; // Deprecated 
>>>>>> use dtype_V13 instead.
>>>>>> 00773                    typedef npy_float16 dtype_V13;
>>>>>> 00774                
>>>>>> 00775            npy_float16 V13;
>>>>>> 00776            
>>>>>> 00777    {
>>>>>> 00778    
>>>>>> 00779        py_V1 = PyList_GET_ITEM(storage_V1, 0);
>>>>>> 00780        {Py_XINCREF(py_V1);}
>>>>>> 00781        
>>>>>> 00782            if (py_V1 == Py_None)
>>>>>> 00783            {
>>>>>> 00784                V1 = NULL;
>>>>>> 00785            }
>>>>>> 00786            else
>>>>>> 00787            {
>>>>>> 00788                
>>>>>> 00789            V1 = NULL;
>>>>>> 00790            if (py_V1 == Py_None) {
>>>>>> 00791                PyErr_SetString(PyExc_ValueError, "expected a 
>>>>>> GpuArray, not None");
>>>>>> 00792                {
>>>>>> 00793            __failure = 2;
>>>>>> 00794            if (!PyErr_Occurred()) {
>>>>>> 00795                PyErr_SetString(PyExc_RuntimeError,
>>>>>> 00796                    "Unexpected error in an Op's C code. "
>>>>>> 00797                    "No Python exception was set.");
>>>>>> 00798                }
>>>>>> 00799            goto __label_2;}
>>>>>> 00800            }
>>>>>> 00801            /* First check if we are the base type exactly (the 
>>>>>> most common case),
>>>>>> 00802               then do the full subclass check if needed. */
>>>>>> 00803            if (py_V1->ob_type != &PyGpuArrayType &&
>>>>>> 00804                !PyObject_TypeCheck(py_V1, &PyGpuArrayType)) {
>>>>>> 00805                PyE
>>>>>>
>>>>> -- 
>>>>>
>>>>> --- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "theano-users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> -- 
>>>
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "theano-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] Error using floatX = float16 to save memory

Reply via email to