Re: [Proposal] New operator graph for MXNet

Pedro Larroy Fri, 17 May 2019 16:29:10 -0700

Hi Tianqi and Junru.

MXNet as a piece of software  is in its teens and needs to mature. The
community needs to have a honest discussion, and decide if MXNet is a
production or a research framework.


If it's a production framework, we need to apply the YAGNI principle
and decide what is and what is not supported, are we focusing on
training or inference. In any case it should be possible to refactor
the code to be solid, easy to maintain, and resilient to bugs. This
includes reducing the surface area for present an future bugs, saying
no to features, and taking advantage of every tool including the C++
type system, as ML makes further inroads into products and our
everyday life it should be held to the same engineering principles as
other pieces of production software, otherwise you end up in bad
situations which can be avoided with good engineering. Is not fun to
debug a dictionary of string to dmlc::any in C++. It's basically just
one level above having to decode machine instructions and hexadecimal
dumps from memory, and we are in 2019, we have tools.

As someone who is supporting MXNet use-cases in production as well as
developing new features, I will say that we are spending too many
efforts in some cases derived from deficiencies in these areas which
can be better spent advancing the SOTA in TVM or adding features to
MXNet.

Taking a high level view of the issue, I don't think right now is
beneficial for either project to be co-dependent. I think in TVM and
NNVM2  you want to iterate and experiment fast and in MXNet you want
to bias towards stability and maintainability, the speed and agility
is naturally going to be different.  In an analogy to programming
languages, MXNet would start to become the Java platform and TVM is
Haskell...  I'm not saying that we should or should not use NNVM2 in
the future. But this is not something that should be sneaked into
MXNet through a sub-repository without discussion, planning and proper
testing.

I have extensively (re)read through Relay, TVM papers, including it's
references. As it stands today, the goals of the TVM project are
different than the goals of MXNet and the design choices and
constraints diverge:

Some of the points you make are surprising to me when I look at the
codebase as a non-PMC member:

Dynamic language support is implemented through the C++ API and
doesn't require dynamic attributes in the graph, could you come with
an example where any modifcation towards a different graph
implementation would affect the bindings of the dynamic languages for
MXNet?

Mental burden of templates: I have never seen so much reliance on
template magic in any other project than MXNet. I don't think for any
of the MXNet developers is difficult to understand a Node class passed
as a template argument to a graph.

TVM is selling typing and pure functional IR, even though for MXNet
developers this is dismissed as a nit and a matter of engineering
taste.

Also, how relevant will be having the graph mutated through a dynamic
language when some of the deep learning community is leaning towards
adding differentiable programming to static languages like Swift?
When you have the hammer of a dynamic language everything looks like a
dictionary of strings.

There is ZERO unit tests for those critical code paths and classes in
NNVM. And no, the end to end python tests don't count as unit tests
for a C++ class without bindings in my book.

Happy weekend.

Pedro.



On Tue, May 14, 2019 at 8:03 PM Tianqi Chen <tqc...@cs.washington.edu> wrote:
>
> The core part of the proposal is to move the graph to be much more strongly
> typed template class.
> I think this is mainly a point of engineering taste, and both sides have
> pros and cons, let me list them before I share my thoughts on this issue:
>
> - Typed fields certainly enjoy more compile-time type checking, on the
> other hand, it is hard to expose
>    template of explosive possibilities to frontend languages.
> - More type-erased fields provide runtime flexibility to store polymorphic
> types as well as extensible attributes for graph optimization
>   - It is hard to use a virtual class to expose every possible attribute
> that an operator might have, such as inlining, storage pattern, gradient
> etc..
>   - The nature of supporting a growing set of operator attribute requires a
> type-erased attrs field.
> - In contrast to your argument(typing is a blocker to features),
> type-erased or typed code can both get to the same feature except, except
> that
>   typed code gets more compile-time errors while type-erased get some of
> them in runtime.
> - Templatized data structures will likely introduce additional metal
> burdens to developers and are not really suitable as a core data structure
>    - Because they imply an explosive number of possible data structures,
> while the core data structure should be a single one.
>
> Now my view(as an MXNet PMC member) on typed vs type-erased style: If MXNet
> is a pure C++ project, I might take more of the typed approach.
> However, MXNet itself is a project that takes python/scala/clojure and
> other frontend languages.
> The introduction of more typing may not align with the original goal as the
> tradeoffs I listed above.
>
> This proposal is really a drastic change of what NNVM does, as well as the
> optimization passes, and given the scope, in your analogy, "a new vehicle
> to solve all the problems"
> rather than a minor patch. It will take a lot of engineering effort to
> bring in new features and adapting the existing ones.
> Because of that, it does merit a discussion about how shall we think about
> the future MXNet2.0.
>
> Technically Relay is a serious candidate. Of course relay, as well as its
> core, is in C++ but maintains the multi-language first principle, that is
> why the example code was in python.
> See more related discussion comparing NNVMv1 and relay:
> https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5
>
> I think the ideal graph data structure candidate for MXNet2.0 should have
> natural support for:
> - Native support of function, module, and recursions
> - Control flows
> - The ability of interpolation with multi-language frontend, e.g. being
> able to prototype graph optimizations in python/scala/clojure if needed.
>
> Adding these support needs significant engineering effort, and I do hope we
> only have to do it once. While I don't want to force any conclusion here,
> I do think Relay is one such candidate.
>
> Tianqi
>
>
> On Tue, May 14, 2019 at 5:58 PM Pedro Larroy <pedro.larroy.li...@gmail.com>
> wrote:
>
> > Hi Tianqi
> >
> > Thanks for the quick response.
> >
> > Could you point to examples where graph.h is being exposed which would
> > not be possible with what I propose? I don't think my proposal is
> > having any impact in language bindings, and the way I describe it
> > doesn't affect having or not having higher language bindings. Please
> > elaborate so I can understand your concern.  Maybe code examples where
> > the graph attributes are being changed from Python?  I don't think we
> > have this on MXNet. This is such a core foundation for MXNet, that I
> > don't think we should compromise on it because other project not
> > directly related to MXNet might want to expose some untyped graph and
> > Node attributes.  The current status makes maintaining the code very
> > painful and also is preventing desired features such as higher order
> > gradients to be developed. I have heard from you many times how speed
> > is critical for us to innovate in this quickly changing field.
> >
> > My proposal is limited to the graph and wouldn't change the way
> > operators are registered and arguments are processed for operators for
> > example.
> >
> >
> > Regarding the second point, the documentation about Relay in the web
> > which I found for example:
> >
> > https://docs.tvm.ai/dev/relay_add_op.html#
> >
> > Is somebody working on making Imperative::Backward use this API? this
> > would be a big change which I'm not aware of. And using an IR is of a
> > much bigger scope than the change I'm proposing here for example.
> >
> > I think I'm having difficulty understanding what are the arguments
> > here. I'm saying I need to change one piece of my car and what you are
> > selling me is a new vehicle here?  Or your suggestion that we use
> > Relay for the graph passes in MXNet?
> >
> > I would like to see C++ code examples, Python examples are not
> > sufficient when we talk about the core MXNet.
> >
> > Pedro.
> >
> >
> >
> >
> >
> >
> > On Tue, May 14, 2019 at 5:39 PM Tianqi Chen <tqc...@cs.washington.edu>
> > wrote:
> > >
> > > Thanks for the proposal. Let me share some of my thoughts:
> > >
> > > Specific comments on the proposal
> > > -----------------------------------------------
> > > The heavy use of generic in the Graph type was a huge departure from
> > > type-erased data structure which was presented in the previous design.
> > > While we understand the advantage of typed language(more compile-time
> > > checking) and type-erased types(more dynamism) the heavy use of
> > > the template will actually make the project solely C++ focused, making it
> > > hard to expose intermediate(templatized) data structure to
> > > other languages like python/scala/clojure.
> > >
> > > While I fully understand some of the lessons taught in programming
> > > C++(reduce shared_ptr, more typing etc.)
> > > We need to think about the context of MXNet project and **the need to
> > > support multi-language as a first-class**.
> > > Some of the type-erased types are design trade-offs made to support these
> > > features, and we need to think more
> > > carefully instead of just applying "rules for C++" which may bring
> > problems.
> > >
> > > Future of NNVM
> > > ----------------------
> > > Given that this thread touched upon what we should do for better
> > > computational graph handling. I would recommend also to take a look at
> > > NNVMv2 -- relay.
> > >
> > > Relay addresses many of the wish-lists in the proposal already, such as
> > > operator fusion, high order gradient, offload to hardware, isolated
> > > compilation, deployment on edge and accelerators etc.
> > > Relay also address problems not yet being mentioned in the proposal,
> > > including control flow and dynamic runtime, automatic layout optimization
> > > etc.
> > >
> > > Tianqi
> > >
> > > On Tue, May 14, 2019 at 5:06 PM Sheng Zha <zhash...@apache.org> wrote:
> > >
> > > > Hi Pedro,
> > > >
> > > > Thanks for taking the inititaive. Skimming through the design doc, I
> > > > didn't see comparison with existing solutions such as relay in tvm,
> > which
> > > > is already a dependency of mxnet already. Could you elaborate on
> > comparison
> > > > with existing solutions in the design doc too?
> > > >
> > > > -sz
> > > >
> > > > On 2019/05/14 23:49:30, Pedro Larroy <pedro.larroy.li...@gmail.com>
> > > > wrote:
> > > > > Hi dev@
> > > > >
> > > > > As a result of my deep dives on the graph machinery I have created a
> > > > > new proposal to improve the operator graph in MXNet.
> > > > >
> > > > > This would mean superseding the use of NNVM Graph in MXNet and having
> > > > > a new implementation that we can use to simplify a lot of code and do
> > > > > powerful graph manipulation and passes such as operator fusion and
> > > > > other optimizations.
> > > > >
> > > > > As it would be a change with big impact and ramifications, your
> > > > > thoughts and feedback on the document would be highly appreciated so
> > > > > we can take potential future interesting use cases:
> > > > >
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0
> > > > >
> > > > > Pedro.
> > > > >
> > > >
> >

Re: [Proposal] New operator graph for MXNet

Reply via email to