pgplus1628 opened a new issue #7407: concat operator implementation URL: https://github.com/apache/incubator-mxnet/issues/7407 It seems the current implementation of `concat` operator is based on mshadow. And if the input of concat has multiple NDArray, on gpu, it will launch kernel for many times. Tensorflow has customized kernel for concat operator, it will do kernel launch only once. Any plan to optimize this? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
With regards, Apache Git Services