Ni Hui created MXNET-97: --------------------------- Summary: implement DepthwiseConv2dBackwardFilterKernel from tensorflow codebase Key: MXNET-97 URL: https://issues.apache.org/jira/browse/MXNET-97 Project: Apache MXNet Issue Type: Improvement Reporter: Ni Hui
The current mxnet implementation calls __syncthreads() function too much, which is extemely slow. The new code comes from tensorflow, but the variable names are adjusted for consistency. My model uses depthwise conv heavily, and now its training time per iteration is over 5x faster on single P40 gpu. ( old 92s vs new 18s ) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org For additional commands, e-mail: issues-h...@mxnet.apache.org