DickJC123 opened a new pull request #7447: Tensorcore fullyconnected support2
URL: https://github.com/apache/incubator-mxnet/pull/7447
 
 
   Consider this an alternative approach to getting TensorCore working with 
FullyConnected.  It is far simpler than my first PR for this new functionality. 
 If anything, this is my proof that one can invoke TensorCore algos through 
manipulation of the cublas handle along with the existing dot function's use of 
Hgemm and SgemmEx.  This PR also shows the type of per-instance handle 
manipulations that are necessary, since blindly setting the handle globally to 
enable TensorCore will have the unfortunate side-effect of introducing 
fp16-casts on the inputs of fp32-I/O gemms.  Bottom line, I wouldn't expect you 
to accept this PR without a discussion.
   
   I have begun studying the new linear algebra code with the idea of producing 
an enable-TensorCore PR for this new approach.  I notice the new LA code 
doesn't support fp16 I/O gemms yet, and the solution there will not fit the 
mold of the existing function templates.  Also, what is the plan for switching 
over MXNET's use of dot() to use the new functions?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to