sandeep-krishnamurthy opened a new issue #15757: [Discussion] Unified 
performance tests and dashboard
URL: https://github.com/apache/incubator-mxnet/issues/15757
 
 
   **Problem Statement**
   
   1. Performance tests are not integrated with CI. We do not run any 
performance tests during PR validation and nightly tests. We will not be able 
to catch performance leaks early enough leading to performance degradations, 
regressions caught during or after a release.
   2. Without performance tests with CI, we are unable to track performance 
improvement/degradation and bring in the focus of the community towards 
performance improvement related projects.
   3. With new projects such as NumPy, Large Tensor Support, MKLDNN 1.0 
integration, MShadow deprecation etc... tracking changes in the performance is 
critical. Having tools and integration with CI will make us move faster and 
handle regression swiftly.
   3. Current performance/benchmark tests are too diverse distributed and 
maintained across teams and repos.
       1. We have few performance tests under - 
[benchmark/python](https://github.com/apache/incubator-mxnet/tree/master/benchmark/python)
       2. Recently, operator performance tests 
[opperf](https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf)
       3. MXNet contributors at AWS maintain a suite of performance tests in - 
[awslabs/deeplearning-benchmarks](https://github.com/awslabs/deeplearning-benchmark)
       4. MXNet contributors at Intel maintain a suite of performance tests. 
(repo - ??)
       5. MXNet contributors at NVIDIA  maintain a suite of performance tests. 
(repo - ??)
   4. MXNet currently does not have a common dashboard to view performance 
benchmarks.
   
   **Proposal**
   
   1. At high level we can divide all performance tests into 3 categories:
       1. Kernel level tests - Ex: Conv MKLDNN/CuDNN kernels.
       2. Operator level tests - Ex: OpPerf we have in MXNet. This tests MXNet 
engine and other critical paths involved in execution of an operator.
       3. End to end topology/model tests - Ex: ResNet50-v1 on ImageNet
           1. Training
           2. Inference
   2. We will unify all performance tests distributed across MXNet repo, repos 
maintained by contributors across AWS, NVIDIA, Intel, and others under one 
single umbrella of MXNet performance tests and benchmarks. 
   3. We will integrate these performance tests with MXNet CI system. We need 
to divide tests across PR and nightly/weekly tests.
   4. We will have a unified dashboard with results from nightly builds to see 
the status of MXNet at given point by the community.
   
   This is a topic open for discussion. Please do comment with your 
suggestions/feedbacks.
   
   CC: @apeforest @ChaiBapchya @access2rohit @PatricZhao @TaoLv @ptrendx 
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to