Hello,

I wanted to reduce parameter count by switching the regular 2d convolution 
below:
input: (b, ch_in, h, w)
kernel: (ch_out, ch_in, k_h, h_w)
output: (b, ch_out, h, w) ('same' convolution)
fn used: theano.tensor.nnet.conv2d

to a depth wise separate convolution followed by pointwise 1x1 convolution:
input: (b, ch_in, h, w)
kernel_spatial: (ch_in, 1, k_h, h_w)
intermediate: (b, ch_in, h, w) ('same' convolution)
kernel_1x1: (ch_out, ch_in, 1, 1)
output: (b, ch_out, h, w)

I implemented the latter two ways:

1. theano.tensor.nnet.conv2d with num_groups=ch_in for spatial conv 
and theano.tensor.nnet.conv2d again for 1x1 conv

2. theano.tensor.nnet.abstract_conv.separable_conv2d which performs both 
convolutions in one function

In both cases, *separable convolution was slower than regular convolution 
by ~3x. I tried with both cudnn5 and cudnn6. The results were slower on 
cudnn6.*

Thinking perhaps there were non-cudnn ops slowing down the process, I 
profiled the #2 above and the snippet below suggests the slowdown is within 
the cudnn ops:

<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> 
<Class name>
*  42.0%    42.0%     192.444s       1.52e-02s     C    12680      40  
 theano.gpuarray.dnn.GpuDnnConvGradW*
  20.7%    62.7%      94.781s       7.67e-03s     C    12363      39  
 theano.gpuarray.dnn.GpuDnnConvGradI
  15.8%    78.5%      72.418s       5.71e-03s     C    12680      40  
 theano.gpuarray.dnn.GpuDnnConv
   9.0%    87.5%      41.199s       2.89e-03s     C    14265      45  
 theano.gpuarray.dnn.GpuDnnReduction
   4.7%    92.2%      21.666s       1.29e-04s     C   167693     529  
 theano.gpuarray.elemwise.GpuElemwise
   1.9%    94.1%       8.878s       1.27e-03s     C     6974      22  
 theano.gpuarray.dnn.GpuDnnBatchNormGrad
   1.9%    96.0%       8.556s       1.23e-03s     C     6974      22  
 theano.gpuarray.dnn.GpuDnnBatchNorm
   1.4%    97.4%       6.295s       1.99e-02s     Py     317       1  
 theano.tensor.subtensor.AdvancedIncSubtensor
   0.6%    98.0%       2.620s       4.13e-03s     C      634       2  
 theano.gpuarray.elemwise.GpuCAReduceCuda
   0.4%    98.4%       1.987s       2.85e-04s     C     6974      22  
 theano.gpuarray.dnn.GpuDnnBatchNormInference
   0.4%    98.8%       1.728s       5.45e-03s     C      317       1  
 theano.tensor.basic.Alloc
   0.2%    99.0%       1.090s       3.44e-03s     Py     317       1  
 theano.tensor.basic.ARange
   0.2%    99.2%       0.769s       6.07e-04s     C     1268       4  
 theano.gpuarray.basic_ops.GpuFromHost
   0.2%    99.3%       0.754s       1.19e-03s     C      634       2  
 theano.gpuarray.basic_ops.HostFromGpu
   0.2%    99.5%       0.713s       5.62e-04s     C     1268       4  
 theano.gpuarray.basic_ops.GpuJoin
   0.1%    99.6%       0.633s       4.99e-04s     C     1268       4  
 theano.gpuarray.dnn.GpuDnnPoolGrad
   0.1%    99.7%       0.528s       5.91e-06s     C    89394     282  
 theano.gpuarray.basic_ops.GpuContiguous
   0.1%    99.8%       0.283s       2.23e-04s     C     1268       4  
 theano.gpuarray.dnn.GpuDnnPool
   0.0%    99.8%       0.142s       8.15e-07s     C   174350     550  
 theano.tensor.subtensor.Subtensor
   0.0%    99.9%       0.142s       2.04e-05s     C     6974      22  
 theano.gpuarray.rng_mrg.GPUA_mrg_uniform

Is this a cudnn or Theano issue?

Thank you!

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to