Ni Hui created MXNET-97:
---------------------------

             Summary: implement DepthwiseConv2dBackwardFilterKernel from 
tensorflow codebase
                 Key: MXNET-97
                 URL: https://issues.apache.org/jira/browse/MXNET-97
             Project: Apache MXNet
          Issue Type: Improvement
            Reporter: Ni Hui


The current mxnet implementation calls __syncthreads() function too much, which 
is extemely slow.
The new code comes from tensorflow, but the variable names are adjusted for 
consistency.

My model uses depthwise conv heavily, and now its training time per iteration 
is over 5x faster on single P40 gpu. ( old 92s vs new 18s )




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org

Reply via email to