[GitHub] [incubator-mxnet] sandeep-krishnamurthy opened a new issue #15757: [Discussion] Unified performance tests and dashboard

GitBox Mon, 05 Aug 2019 13:33:44 -0700

sandeep-krishnamurthy opened a new issue #15757: [Discussion] Unified
performance tests and dashboard
URL: https://github.com/apache/incubator-mxnet/issues/15757

**Problem Statement**

1. Performance tests are not integrated with CI. We do not run any
performance tests during PR validation and nightly tests. We will not be able
to catch performance leaks early enough leading to performance degradations,
regressions caught during or after a release.
2. Without performance tests with CI, we are unable to track performance
improvement/degradation and bring in the focus of the community towards
performance improvement related projects.
3. With new projects such as NumPy, Large Tensor Support, MKLDNN 1.0
integration, MShadow deprecation etc... tracking changes in the performance is
critical. Having tools and integration with CI will make us move faster and
handle regression swiftly.
3. Current performance/benchmark tests are too diverse distributed and
maintained across teams and repos.
1. We have few performance tests under -
[benchmark/python](https://github.com/apache/incubator-mxnet/tree/master/benchmark/python)
2. Recently, operator performance tests
[opperf](https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf)
3. MXNet contributors at AWS maintain a suite of performance tests in -
[awslabs/deeplearning-benchmarks](https://github.com/awslabs/deeplearning-benchmark)
4. MXNet contributors at Intel maintain a suite of performance tests.
(repo - ??)
5. MXNet contributors at NVIDIA maintain a suite of performance tests.
(repo - ??)
4. MXNet currently does not have a common dashboard to view performance
benchmarks.

**Proposal**

1. At high level we can divide all performance tests into 3 categories:
1. Kernel level tests - Ex: Conv MKLDNN/CuDNN kernels.
2. Operator level tests - Ex: OpPerf we have in MXNet. This tests MXNet
engine and other critical paths involved in execution of an operator.
3. End to end topology/model tests - Ex: ResNet50-v1 on ImageNet
1. Training
2. Inference
2. We will unify all performance tests distributed across MXNet repo, repos
maintained by contributors across AWS, NVIDIA, Intel, and others under one
single umbrella of MXNet performance tests and benchmarks.
3. We will integrate these performance tests with MXNet CI system. We need
to divide tests across PR and nightly/weekly tests.
4. We will have a unified dashboard with results from nightly builds to see
the status of MXNet at given point by the community.

This is a topic open for discussion. Please do comment with your
suggestions/feedbacks.

CC: @apeforest @ChaiBapchya @access2rohit @PatricZhao @TaoLv @ptrendx


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] sandeep-krishnamurthy opened a new issue #15757: [Discussion] Unified performance tests and dashboard

Reply via email to