[apache/incubator-mxnet] [RFC] Apache MXNet 2.0 Roadmap (#16167)

Sheng Zha Fri, 13 Sep 2019 14:48:53 -0700

# Overview

The purpose of this RFC is to organize and present the roadmap towards 2.0. As 
2.0 will be a major release, changes that would break backward compatibility 
are permissible.

The proposed changes in this RFC are either collected from past roadmap
discussions such as #9686, or are based on various common issues from the past.
This RFC organizes these changes into self-contained projects to facilitate
clear definition of project, captures the risks and status quo to the best of
our knowledge. To help navigate, the projects are further divided into several
high-level areas. Some of the listed projects are already in progress, and are
included to provide a clear overview.

The objectives of Apache MXNet 2.0 include:
- Improve expressiveness and usability of user-facing API.
- Improve expressiveness and usability of the technical stack for lower
development cost and maintainability.

In terms of frontend, this roadmap focuses mostly on Python-frontend since
MXNet has been taking a Python-first approach. The expectation with respect to
other language bindings is that they would evolve along with the backend
evolution and make use of the improvements. Given that breaking changes can
occur, maintainers of different language bindings are expected to participate
in related interface definition discussions.

## N1. NumPy

NumPy has long been established as the standard math library in Python, the
most prevalent language for the deep learning community. With this library as
the cornerstone, there are now the largest ecosystem and community for
scientific computing. The popularity of NumPy comes from its flexibility and
generality.

In #14253, the MXNet community reached consensus on moving towards a
NumPy-compatible programing experience and committed to a major endeavor on
providing NumPy compatible operators.

The primary goal of the projects below is to provide the equivalent usability
and expressiveness of NumPy in MXNet to facilitate Deep Learning model
development, which not only helps existing deep learning practitioners but also
provides people in the existing NumPy community with a shortcut for getting
started in Deep Learning. The efforts towards this goal would also help a
secondary goal, which is to enable the existing NumPy ecosystem to utilize GPUs
and accelerators to speed up large scale computation.

cc @apache/mxnet-committers

### NumPy Operator Testing

Scope:
1. adopt __array_function__ and numpy existing tests.
2. extend testing to GPU
3. investigate numpy testing strategies
4. decide correctness criteria for acceptance

### NumPy Operator performance profiling

Scope:
1. Automatically profile the performance of NumPy operators

### NumPy operator coverage

Scope:
1. improve operator until full NumPy coverage, with prioritization towards
operators used in the ecosystem and deep learning in general

Operator coverage as of 07/03/2019

```
| module | NumPy | deepNumPy | jax | cupy |
|-----------|-----------|-----------|-----------|-----------|
| np | 603 | 89 | 445 | 321 |
| ndarray | 71 | 32 | 71 | 56 |
| random | 63 | 5 | 15 | 49 |
| linalg | 31 | 2 | 8 | 15 |
```

### NumPy Extension Operator Reorganization and Renaming

Scope:
1. consistent type usage for index input and return values from sort, topk
#11031 #11134, #12197
2. array creation operators with flexible dtype definition #12290. (dtype=None)
3. moving_mean/moving_var in batchnorm
4. consistent usage of axis vs dim
5. promote or deprecate contrib operators

### NumPy ndarray type extension

Scope:
1. bfloat16 support (not in NumPy yet but useful for deep learning) (low
priority — Intel)
2. boolean type support
3. complex (for FFT)

### NumPy ndarray boolean indexing

Scope:
1. allow boolean masks in NumPy ndarray indexing by adding the operator,
potentially through extending op.where

### Hybridizable basic (and advanced) indexing

Scope:

1. Allow operations such as y = x[1:3, 2, ...] to be hybridizable

Note: Preliminary work: https://github.com/apache/incubator-mxnet/pull/15663

## Graph Enhancement and 3rdparty support

The objective of the following projects is to enable easier development of
third-party extensions without requiring changes to be checked in the MXNet
project. Examples of such extensions include third-party operator library and
accelerators.

### Graph Partitioning for Dynamic Shape Operators

Scope:
1. partition inside control flow operators (and all cached ops)
2. partition on operators with dynamic shapes for partial memory planning and
caching.

### Improved Third-party Operator Support

Scope:
1. allow registering custom operators by exposing C API (and frontend API) to
register NNVM op at runtime.
2. verify serialization, deserialization, and graph passes for graphs with
these operators are working properly.

### Improved Third-party Backend Support (subgraph property)

Scope:
1. expose a graph pass for standard graph partitioning with back-end-specific
criteria as a C API and frontend API.

### Large tensor support by default

Scope:
1. enable default support for tensor with int64 dimension sizes
2. make sure there’s no significant performance regression in operators

Risks:
1. performance regression may happen in a subset of operators, which can
disproportionally affect certain models.
2. compatibility and silent behavior change.

Notes: in progress (RFC:
https://lists.apache.org/thread.html/df53b8c26e9e0433378dd803baba9fec4dd922728a5ce9135dc164b3@%3Cdev.mxnet.apache.org%3E)

## API Changes

The objective of the following projects is to address the technical debts
accumulated during the development of MXNet 0.x and 1.x with respect to the API
definition.

### C-API Clean-up

C-API is the foundational API in MXNet that all language bindings depend on.

Scope:
1. use packed function for flexibility (and potentially efficiency through
avoiding string parsing)
2. do not expose backend accelerator-specific types such as mkldnn::memory in
C-API
3. do not rely on topological ordering for argument passing (#15362).
4. verification of thread-safety and performance for C API

Risks:
1. backend integration may require refactoring or even redesign
2. existing use cases such as other frontend may be broken without substitute
3. feedback is scattered and we may miss the opportunity to change some APIs in
2.0

### Unify Executor

Scope:
1. SymbolBlock equivalent in C/C++, unify the executor implementation for
symbol/module and the one for gluon blocks
2. migrate other versions of inference API

### Gradient of Gradient support

Scope:
1. higher order gradient support for a subset of operators

Risks:
1. large number of backward operators could introduce significant technical
debt if not properly verified.
2. ill-informed prioritization may result in usability issue (e.g. common GAN
not supported)

### Autograd Extension

Scope:
1. improve interface to support specifying intermediate output grad nodes
2. improve interface for better usability. (retain_graph → something not
involving graph)
3. update graph pass for correctness

### NNVM-backend Operator Interface Changes

Scope:
1. support more than one temporary spaces
2. split forward shape/type inference and reverse shape/type inference for
better error messaging.
3. deferred initialization removal (or improve error/info message)
4. accompanying operator implementation changes

Risks:
1. some changes may make operator implementation less error-prone while less
flexible, and thus require some reworking.

## Gluon 2.0

Since the introduction of the Gluon API, it has superceded other API for model
development such as symbolic API and model API. Conceptually, Gluon is the
first attempt in the deep learning community to unify the flexibility of
imperative programming with the performance benefits of symbolic programming,
through trace-based just-in-time compilation.

The objectives of the following projects are:
- address usability issue as a result of the divergence in the behavior of
NDArray and Symbol.
- extend the JIT to improve the coverage of hybridization.
- introduce new functionality to facilitate more areas of research such as
Baysian methods and AutoML.
- improve the usability and performance of the utility in Gluon.

### Unifying symbolic and imperative mode for tensor library

Scope:
1. unify the operator implementation and behaviors of symbolic and imperative
execution modes (#10875)
2. allow naming for ndarray similar to symbol
3. address the necessary changes in shape/type inference.

### Unifying Block and HybridBlock

Scope:
1. move hybridization logic to a JIT decorator
2. extend parameter management to Block
3. user-friendly warning for native control flow in JIT code.

### Gluon Block Enhancement

Scope:
1. inspection of graph internals similar to monitor for Module ([PR
15839](https://github.com/apache/incubator-mxnet/pull/15839))
2. support additional types in argument such as dict, kwargs, None
3. fused parameters and gradients respectively
4. register custom parameter

### Enable Symbolic Shape (& Dtype) for Array Creation in NNVM-backend

Scope:
1. allow flexible creation of array based on shapes of other arrays that are
only known at runtime
2. add constant symbol type as the return value of symbol.shape (?)
3. support constant symbol as operator arguments (?)
4. constant folding for constant symbols

### Gluon Distributions Module

Scope:
1. sampling and pdf definition for distributions. Distribution
https://github.com/amzn/MXFusion.
https://github.com/apache/incubator-mxnet/pull/14617.
2. wrap operators into more usable classes.
3. reproducible global seed

### Gluon Metrics Module

Scope:
1. address usability and performance issues in mxnet.metric using hybridizable
NumPy op

### Gluon Optimizer Module

Scope:
1. API changes such as consistent weight decay (#9881), change default value to
not apply wd on bias terms (#11953)
2. hybridizable optimizers
3. new optimizers (#9182)

### Gluon Data API Extension and Fixes

Scope:
1. address diverging interfaces and remove transform= constructor arg (#11141).
2. reorganize io/image modules and provide data loader instead.
3. lowering dataloader to backend for efficiency (#13593)
4. shared memory propagation?

### Gluon Estimator Extension for Experimenting Utilities

Scope:

1. logging of configuration (DeepNLU), state, and performance for checkpointing
for easier resume
2. pre-defined estimators for common problems

### Gluon Estimator Refactoring for Examples and Tutorials

Scope:
1. modularize and refactor unstructured scripts and examples into estimator
class utilities

### Gluon Distributed Training Usability Enhancement

Scope:
1. more flexibility for communication with kvstore UDFs
2. add distribution strategies to estimator
3. plugin for communication backends (horovod, byteps, parameter server) for
data parallel training
4. data sharding/sampling/streaming enhancement for distributed training

### NNVM-Graph optimization

Scope:
1. fix mirror for memory optimization (Bojian)

## Documentation

Documentation is the most important factor for new adoption of a library. The
following projects aim to:
- address the usability and discoverability issues in the current MXNet website
- improve the quality of documentation to make it correct, clear, and concise.
- help adoption of the changes in MXNet 2.0 from existing users.

### MXNet 2.0 Migration Guide

Scope:
1. document high-level mapping from old functionality to new API for data
pipeline, modeling, optimization, training loop, metric, inspection and
logging, debugging.

Risks:
1. parallel development of the doc may result in outdated doc.
2. auto doc verification is needed.

### MXNet 2.0 Developer Guide

Scope:
1. carefully document the design and contribution guide for features with low
entry bar such as operator, gluon block, doc, optimizer, metric, examples and
tutorials.
2. clear and up-to-date system design overview.
3. clear roadmap

### Adopt beta.mxnet.io as official website

Scope:

1. infrastructure change for new doc build
2. merge into master with [NumPy.mxnet.io](http://NumPy.mxnet.io/)
3. improve load time and browsing experience
4. CDN in popular region such as China, with automated validation and testing.

Note: https://github.com/ThomasDelteil/mxnet.io-v2

## Profiling and Debugging

Profiling and debugging is a common step in the development of deep learning
models, and proper tools can help significantly improve developer's
productivity. The objective of these projects is to provide such tools to make
it easier to discover issues in correctedness and performance of models.

### Memory Profiler

Scope:
1. memory profiler logging support in backend
2. automatic array naming tool based on scope
3. tree-map visualization tool for inspecting profiler dump

### Enhanced Debugging Tool

Scope:
1. Enable user-specified error handling
2. Improve error message
3. Stacktrace inspection in debug API
4. Automatic error reporting tool
5. Runtime API for turning off asynchronous execution

## Advanced Operators

The objective of these projects are to extend the tensor library and operators
for better performance and for advanced use.

### Strided ndarray support

Scope:
1. support strided array in a subset of operators
2. support auto-transpose of strided array in graph pass and executor

### Ragged ndarray and operators

Scope:
1. introduce ragged (variable length) tensor as 1st class tensor. Support
zero-copy from RaggedNDArray to NDArray when no dimension is ragged.
2. Load balancing strategy for operators that take RaggedNDArray as input
3. cover operators for NLP applications (RNN, transformer)

### Improved Sparse Support

Scope:
1. sparse format and operator support
2. scipy coverage
3. operators for graph neural-networks (e.g. ops in minigun)

Minimum support:

* format: csr,
* zerocopy to DLPack
* integration with minigun kernels

Next-level support:

* format: coo and block sparse.

## Building and Configuration

### CMake improvement and Makefile deprecation

Scope:
1. reimplement CMakeLists for DMLC dependencies
2. reimplement CMakeLists for MXNet to support 1) building best performing
binary in any platform 2) building portable binary distribution for pip

### MXNet Configurator

Scope:
1. drop environment variables and centralize them as config.
2. define functionalities that support runtime-switch (candidates: memory pool,
engine, worker thread pools) and expose frontend API
3. allow saving and loading of mxnet system config

## Advanced training and deployment

### Automatic Quantization and Quantized Training for NumPy

Scope:
1. automatic quantization based on heuristic (or learning)
2. BMXNet

Dependency: N1-N5

### Mobile and edge-device deployment

Scope:
1. replace amalgamation with more user-friendly function (TF-lite equivalent).
2. tutorial and example
3. metal support without ONNX

## Performance

### MXNet Execution Overhead

Scope:
1. https://github.com/apache/incubator-mxnet/issues/14883

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16167

[apache/incubator-mxnet] [RFC] Apache MXNet 2.0 Roadmap (#16167)

Reply via email to