DickJC123 commented on a change in pull request #7347: Tensorcore conv deconv support URL: https://github.com/apache/incubator-mxnet/pull/7347#discussion_r132093608
########## File path: src/operator/cudnn_algoreg-inl.h ########## @@ -40,12 +65,17 @@ class CuDNNAlgoReg { oss << "cudnn_data_type=" << cudnn_data_type << ";"; oss << "cudnn_forward_compute_type=" << cudnn_forward_compute_type << ";"; oss << "cudnn_backward_compute_type=" << cudnn_backward_compute_type << ";"; + // A system could be heterogeneous and thus have different algo choices for different + // device ids. 'device_id' could possibly be replaced with gpu compute capability, + // but identical GPUs could technically have different clock settings. + oss << "device_id=" << device_id << ";"; Review comment: I'll update the PR tomorrow by substituting compute capability. This will ensure proper operation for a workstation with both a PASCAL and a VOLTA brick, yet will improve the algo selection speed for an 8-way homogeneous system. Regarding key compaction, I'll point out that all key look-ups are performed during the graph construction phase, never during inference or training. Also, the key look-up times are probably dwarfed by any Find() calls that are run during this same phase. Not sure how the times compare to the Get() calls performed when auto-tuning it turned off. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services