Re: [Proposal] New operator graph for MXNet

2019-05-17 Thread Pedro Larroy
Hi Tianqi and Junru.

MXNet as a piece of software  is in its teens and needs to mature. The
community needs to have a honest discussion, and decide if MXNet is a
production or a research framework.

If it's a production framework, we need to apply the YAGNI principle
and decide what is and what is not supported, are we focusing on
training or inference. In any case it should be possible to refactor
the code to be solid, easy to maintain, and resilient to bugs. This
includes reducing the surface area for present an future bugs, saying
no to features, and taking advantage of every tool including the C++
type system, as ML makes further inroads into products and our
everyday life it should be held to the same engineering principles as
other pieces of production software, otherwise you end up in bad
situations which can be avoided with good engineering. Is not fun to
debug a dictionary of string to dmlc::any in C++. It's basically just
one level above having to decode machine instructions and hexadecimal
dumps from memory, and we are in 2019, we have tools.

As someone who is supporting MXNet use-cases in production as well as
developing new features, I will say that we are spending too many
efforts in some cases derived from deficiencies in these areas which
can be better spent advancing the SOTA in TVM or adding features to
MXNet.

Taking a high level view of the issue, I don't think right now is
beneficial for either project to be co-dependent. I think in TVM and
NNVM2  you want to iterate and experiment fast and in MXNet you want
to bias towards stability and maintainability, the speed and agility
is naturally going to be different.  In an analogy to programming
languages, MXNet would start to become the Java platform and TVM is
Haskell...  I'm not saying that we should or should not use NNVM2 in
the future. But this is not something that should be sneaked into
MXNet through a sub-repository without discussion, planning and proper
testing.

I have extensively (re)read through Relay, TVM papers, including it's
references. As it stands today, the goals of the TVM project are
different than the goals of MXNet and the design choices and
constraints diverge:

Some of the points you make are surprising to me when I look at the
codebase as a non-PMC member:

Dynamic language support is implemented through the C++ API and
doesn't require dynamic attributes in the graph, could you come with
an example where any modifcation towards a different graph
implementation would affect the bindings of the dynamic languages for
MXNet?

Mental burden of templates: I have never seen so much reliance on
template magic in any other project than MXNet. I don't think for any
of the MXNet developers is difficult to understand a Node class passed
as a template argument to a graph.

TVM is selling typing and pure functional IR, even though for MXNet
developers this is dismissed as a nit and a matter of engineering
taste.

Also, how relevant will be having the graph mutated through a dynamic
language when some of the deep learning community is leaning towards
adding differentiable programming to static languages like Swift?
When you have the hammer of a dynamic language everything looks like a
dictionary of strings.

There is ZERO unit tests for those critical code paths and classes in
NNVM. And no, the end to end python tests don't count as unit tests
for a C++ class without bindings in my book.

Happy weekend.

Pedro.



On Tue, May 14, 2019 at 8:03 PM Tianqi Chen  wrote:
>
> The core part of the proposal is to move the graph to be much more strongly
> typed template class.
> I think this is mainly a point of engineering taste, and both sides have
> pros and cons, let me list them before I share my thoughts on this issue:
>
> - Typed fields certainly enjoy more compile-time type checking, on the
> other hand, it is hard to expose
>template of explosive possibilities to frontend languages.
> - More type-erased fields provide runtime flexibility to store polymorphic
> types as well as extensible attributes for graph optimization
>   - It is hard to use a virtual class to expose every possible attribute
> that an operator might have, such as inlining, storage pattern, gradient
> etc..
>   - The nature of supporting a growing set of operator attribute requires a
> type-erased attrs field.
> - In contrast to your argument(typing is a blocker to features),
> type-erased or typed code can both get to the same feature except, except
> that
>   typed code gets more compile-time errors while type-erased get some of
> them in runtime.
> - Templatized data structures will likely introduce additional metal
> burdens to developers and are not really suitable as a core data structure
>- Because they imply an explosive number of possible data structures,
> while the core data structure should be a single one.
>
> Now my view(as an MXNet PMC member) on typed vs type-erased style: If MXNet
> is a pure C++ project, I might take more of the 

Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Junru Shao
Hi Zach,

Thank you for raising these points! I am happy to offer more reading
materials about this topic.

*SSA vs ANF.* ANF and SSA are essentially the same thing [1].

*AD in Relay.* Relay is able to do AD through not only control flow, but
also various data structures and higher-order functjon [2].

[1] Appel, Andrew W. "SSA is functional programming." *ACM SIGPLAN
Notices* 33.4
(1998): 17-20.
[2] Roesch, Jared, et al. "Relay: a new IR for machine learning
frameworks." *Proceedings of the 2nd ACM SIGPLAN International Workshop on
Machine Learning and Programming Languages*. ACM, 2018.


On Wed, May 15, 2019 at 12:01 PM Zach Kimberg 
wrote:

> I would like to raise another option to get back on the topic of changing
> the Operator graph structure. On the page discussing Relay IR [1], it
> discusses mainly the difference between a data flow graph like we use now
> and A-normal [2] which is used in some functional compilers. Is there a
> reason we do not want to use a structure based on Single Static Assignment
> Form (wikipedia explanation [3], lecture note explanation [4]). It is used
> almost universally in the compiler community including in LLVM (clang),
> GCC, Oracle JVM, PyPy, Go, Webkit, and Swift [5]. The major reason behind
> it's pervasiveness is that it has proven very effective for analysis and
> transformations when dealing with control flow.
>
> One possible concern is that it might make automatic differentiation more
> difficult [6]. While it certainly is more complicated than a pure
> functional approach, the functional approach requires users to use
> functional programming. Especially with the languages we support now, that
> doesn't seem like a reasonable assumption. Given that the users are already
> introducing the complexity inherent in imperative programming, we have to
> deal with the increased complexity regardless. I think it might be easier
> to have the tools to deal with that rather than attempting to coerce users
> into a different programming paradigm or convert code between paradigms.
> Furthermore, this may become more important if users are increasingly
> making use of control flow like Junru said.
>
> Zach
>
>
> [1] - https://docs.tvm.ai/dev/relay_intro.html
> [2] - https://en.wikipedia.org/wiki/A-normal_form
> [3] - https://en.wikipedia.org/wiki/Static_single_assignment_form
> [4] - https://www.cs.cmu.edu/~rjsimmon/15411-f15/lec/10-ssa.pdf
> [5] -
>
> https://en.wikipedia.org/wiki/Static_single_assignment_form#Compilers_using_SSA_form
> [6] - https://discuss.tvm.ai/t/choice-about-ir-ssa-or-anf/1757/2
>
> On Wed, May 15, 2019 at 11:51 AM Naveen Swamy  wrote:
>
> > Being dismissive and condescending has been exactly what is plaguing this
> > project.
> >
> > I agree the last paragraph sounds very condescending and very dismissive
> > and it breaks many code of conducts listed.
> >
> > On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian <
> > anirudh2...@gmail.com>
> > wrote:
> >
> > > Hi Junru,
> > >
> > > Overall, I appreciate the points you made about the proposal.
> > >
> > > Having said that, I would like to remind the Apache Code of Conduct :
> > > https://www.apache.org/foundation/policies/conduct.
> > > "Be empathetic, welcoming, friendly and patient".
> > >
> > > I find your tone condescending. Clearly you understand what he meant
> from
> > > the context whether you prefer to call IR in compilers or data-flow in
> > > distributed systems. You could very well say lets use this terminology
> to
> > > have a common understanding instead of saying go learn the basic
> > concepts.
> > > Before building a cool brand, its important to build a healthy
> community.
> > >
> > > Anirudh
> > >
> > >
> > > On Wed, May 15, 2019 at 12:03 AM Junru Shao 
> > > wrote:
> > >
> > > > Hi Pedro,
> > > >
> > > > I really appreciate that a diligent and talented engineer eagerly
> wants
> > > to
> > > > improve our system, and am very thankful that you have done so much
> for
> > > our
> > > > community. However, I do want to mention some points that I believe I
> > > > should mention.
> > > >
> > > > While I agree with Tianqi that every design has its pros and cons, I
> > > would
> > > > love to emphasize that a *good taste* of system design is to optimize
> > the
> > > > bottleneck, enhance expressiveness (and usability), i.e. to do what
> > needs
> > > > doing, rather than *trivial nits* that are irrelevant to either
> > > performance
> > > > or expressiveness. Generally speaking, typed or untyped, shared_ptr
> or
> > > > unique_ptr, won't affect the overall performance when it comes to
> deep
> > > > learning workload, specially when we have an async scheduler that
> does
> > > good
> > > > latency hiding in MXNet - to me, these are not major issues that are
> > > worth
> > > > re-designing our entire system.
> > > >
> > > > To benefit users - real-world ML practitioners, the most thing I
> would
> > > love
> > > > to mention is that dataflow graph-based representation is
> 

Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Tianqi Chen
This is a good point. I believe the main question here is not SSA vs
others, but more about CFG vs structured control flow.

SSA is generally equivalent to ANF or dataflow if you ignore the Phi and
CFG blocks. The current relay IR makes use of more structured
control flow so it does not have an explicit CFG(aka goto).

I believe that for deep learning, it is a good idea to get the highest
level information when possible, and structured control-flow block
is certainly more informative(while eliminating the possibility of goto).
Mutation is something that could be handled in Relay, with explicit
annotation.

Most of the current deep learning programs contain parts that need to be
automatically differentiated, which is usually pure, and parts that need to
update parameters, which can be explicitly marked. The center of the
question is: do we try to represent the parts that are pure directly in the
IR, and maintain
 the necessary high-level structures, or do we allow the IR to represent
more arbitrary programs while trying to use analysis(e.g. alias pointer
analysis)
to recover them. I think the former one would be easier given deep learning
programs are already pretty high level.

Now there is also a discussion about again adding CFG to relay to handle
rare cases which do not have to be optimized. But for what I have seen so
far it seems to fits most of the need.

Tianqi

On Wed, May 15, 2019 at 12:01 PM Zach Kimberg 
wrote:

> I would like to raise another option to get back on the topic of changing
> the Operator graph structure. On the page discussing Relay IR [1], it
> discusses mainly the difference between a data flow graph like we use now
> and A-normal [2] which is used in some functional compilers. Is there a
> reason we do not want to use a structure based on Single Static Assignment
> Form (wikipedia explanation [3], lecture note explanation [4]). It is used
> almost universally in the compiler community including in LLVM (clang),
> GCC, Oracle JVM, PyPy, Go, Webkit, and Swift [5]. The major reason behind
> it's pervasiveness is that it has proven very effective for analysis and
> transformations when dealing with control flow.
>
> One possible concern is that it might make automatic differentiation more
> difficult [6]. While it certainly is more complicated than a pure
> functional approach, the functional approach requires users to use
> functional programming. Especially with the languages we support now, that
> doesn't seem like a reasonable assumption. Given that the users are already
> introducing the complexity inherent in imperative programming, we have to
> deal with the increased complexity regardless. I think it might be easier
> to have the tools to deal with that rather than attempting to coerce users
> into a different programming paradigm or convert code between paradigms.
> Furthermore, this may become more important if users are increasingly
> making use of control flow like Junru said.
>
> Zach
>
>
> [1] - https://docs.tvm.ai/dev/relay_intro.html
> [2] - https://en.wikipedia.org/wiki/A-normal_form
> [3] - https://en.wikipedia.org/wiki/Static_single_assignment_form
> [4] - https://www.cs.cmu.edu/~rjsimmon/15411-f15/lec/10-ssa.pdf
> [5] -
>
> https://en.wikipedia.org/wiki/Static_single_assignment_form#Compilers_using_SSA_form
> [6] - https://discuss.tvm.ai/t/choice-about-ir-ssa-or-anf/1757/2
>
> On Wed, May 15, 2019 at 11:51 AM Naveen Swamy  wrote:
>
> > Being dismissive and condescending has been exactly what is plaguing this
> > project.
> >
> > I agree the last paragraph sounds very condescending and very dismissive
> > and it breaks many code of conducts listed.
> >
> > On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian <
> > anirudh2...@gmail.com>
> > wrote:
> >
> > > Hi Junru,
> > >
> > > Overall, I appreciate the points you made about the proposal.
> > >
> > > Having said that, I would like to remind the Apache Code of Conduct :
> > > https://www.apache.org/foundation/policies/conduct.
> > > "Be empathetic, welcoming, friendly and patient".
> > >
> > > I find your tone condescending. Clearly you understand what he meant
> from
> > > the context whether you prefer to call IR in compilers or data-flow in
> > > distributed systems. You could very well say lets use this terminology
> to
> > > have a common understanding instead of saying go learn the basic
> > concepts.
> > > Before building a cool brand, its important to build a healthy
> community.
> > >
> > > Anirudh
> > >
> > >
> > > On Wed, May 15, 2019 at 12:03 AM Junru Shao 
> > > wrote:
> > >
> > > > Hi Pedro,
> > > >
> > > > I really appreciate that a diligent and talented engineer eagerly
> wants
> > > to
> > > > improve our system, and am very thankful that you have done so much
> for
> > > our
> > > > community. However, I do want to mention some points that I believe I
> > > > should mention.
> > > >
> > > > While I agree with Tianqi that every design has its pros and cons, I
> > > would
> > > > love to emphasize 

Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Pedro Larroy
Hi

Thanks for all the materials and keypoints raised. The discussion has
many ramifications, I will think about them and research them very
carefully before replying further. Please also don't quickly dismiss
the points I have raised and reduce them to typed vs untyped or
pedantic C++ comments, we have been debugging missing nodes and
pointers in the graph when doing second order gradient for weeks with
no success due to the design of the graph.

There's 60 years of software development learnings and practice behind
some concepts, and compiler theory that deep learning frameworks can
also take advantage of instead of rediscovering everything again until
we end up in a typed pure functional IR.
In some of the materials linked you also point out limitations of the
current architecture. I think it's good that we raise this topic and
it shows that we need to have a deeper and structured conversation on
how we evolve the dataflow graph in MXNet. Maybe you can help cross
polinizing this conversation between the TVM and MXNet project. If
there's an intention to change from NNVM to NNVM2 I think this should
have been communicated or discussed with the community before.

Until then.

Pedro.




On Tue, May 14, 2019 at 8:03 PM Tianqi Chen  wrote:
>
> The core part of the proposal is to move the graph to be much more strongly
> typed template class.
> I think this is mainly a point of engineering taste, and both sides have
> pros and cons, let me list them before I share my thoughts on this issue:
>
> - Typed fields certainly enjoy more compile-time type checking, on the
> other hand, it is hard to expose
>template of explosive possibilities to frontend languages.
> - More type-erased fields provide runtime flexibility to store polymorphic
> types as well as extensible attributes for graph optimization
>   - It is hard to use a virtual class to expose every possible attribute
> that an operator might have, such as inlining, storage pattern, gradient
> etc..
>   - The nature of supporting a growing set of operator attribute requires a
> type-erased attrs field.
> - In contrast to your argument(typing is a blocker to features),
> type-erased or typed code can both get to the same feature except, except
> that
>   typed code gets more compile-time errors while type-erased get some of
> them in runtime.
> - Templatized data structures will likely introduce additional metal
> burdens to developers and are not really suitable as a core data structure
>- Because they imply an explosive number of possible data structures,
> while the core data structure should be a single one.
>
> Now my view(as an MXNet PMC member) on typed vs type-erased style: If MXNet
> is a pure C++ project, I might take more of the typed approach.
> However, MXNet itself is a project that takes python/scala/clojure and
> other frontend languages.
> The introduction of more typing may not align with the original goal as the
> tradeoffs I listed above.
>
> This proposal is really a drastic change of what NNVM does, as well as the
> optimization passes, and given the scope, in your analogy, "a new vehicle
> to solve all the problems"
> rather than a minor patch. It will take a lot of engineering effort to
> bring in new features and adapting the existing ones.
> Because of that, it does merit a discussion about how shall we think about
> the future MXNet2.0.
>
> Technically Relay is a serious candidate. Of course relay, as well as its
> core, is in C++ but maintains the multi-language first principle, that is
> why the example code was in python.
> See more related discussion comparing NNVMv1 and relay:
> https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5
>
> I think the ideal graph data structure candidate for MXNet2.0 should have
> natural support for:
> - Native support of function, module, and recursions
> - Control flows
> - The ability of interpolation with multi-language frontend, e.g. being
> able to prototype graph optimizations in python/scala/clojure if needed.
>
> Adding these support needs significant engineering effort, and I do hope we
> only have to do it once. While I don't want to force any conclusion here,
> I do think Relay is one such candidate.
>
> Tianqi
>
>
> On Tue, May 14, 2019 at 5:58 PM Pedro Larroy 
> wrote:
>
> > Hi Tianqi
> >
> > Thanks for the quick response.
> >
> > Could you point to examples where graph.h is being exposed which would
> > not be possible with what I propose? I don't think my proposal is
> > having any impact in language bindings, and the way I describe it
> > doesn't affect having or not having higher language bindings. Please
> > elaborate so I can understand your concern.  Maybe code examples where
> > the graph attributes are being changed from Python?  I don't think we
> > have this on MXNet. This is such a core foundation for MXNet, that I
> > don't think we should compromise on it because other project not
> > directly related to MXNet might want to expose some untyped 

Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Junru Shao
Hi Anirudh, Naveen,

Thank you so much for the gentle remainder!

I am not a native speaker and have resulted in the mistake. I would love to
say sincere sorry to Pedro. Pedro is working really hard for growing our
community and improving our code base. I sincerely apologize for what I
have said in a hurry.

Let’s work hard together to grow a healthy community!

Thanks,
Junru

On Wed, May 15, 2019 at 11:51 Naveen Swamy  wrote:

> Being dismissive and condescending has been exactly what is plaguing this
> project.
>
> I agree the last paragraph sounds very condescending and very dismissive
> and it breaks many code of conducts listed.
>
> On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian <
> anirudh2...@gmail.com>
> wrote:
>
> > Hi Junru,
> >
> > Overall, I appreciate the points you made about the proposal.
> >
> > Having said that, I would like to remind the Apache Code of Conduct :
> > https://www.apache.org/foundation/policies/conduct.
> > "Be empathetic, welcoming, friendly and patient".
> >
> > I find your tone condescending. Clearly you understand what he meant from
> > the context whether you prefer to call IR in compilers or data-flow in
> > distributed systems. You could very well say lets use this terminology to
> > have a common understanding instead of saying go learn the basic
> concepts.
> > Before building a cool brand, its important to build a healthy community.
> >
> > Anirudh
> >
> >
> > On Wed, May 15, 2019 at 12:03 AM Junru Shao 
> > wrote:
> >
> > > Hi Pedro,
> > >
> > > I really appreciate that a diligent and talented engineer eagerly wants
> > to
> > > improve our system, and am very thankful that you have done so much for
> > our
> > > community. However, I do want to mention some points that I believe I
> > > should mention.
> > >
> > > While I agree with Tianqi that every design has its pros and cons, I
> > would
> > > love to emphasize that a *good taste* of system design is to optimize
> the
> > > bottleneck, enhance expressiveness (and usability), i.e. to do what
> needs
> > > doing, rather than *trivial nits* that are irrelevant to either
> > performance
> > > or expressiveness. Generally speaking, typed or untyped, shared_ptr or
> > > unique_ptr, won't affect the overall performance when it comes to deep
> > > learning workload, specially when we have an async scheduler that does
> > good
> > > latency hiding in MXNet - to me, these are not major issues that are
> > worth
> > > re-designing our entire system.
> > >
> > > To benefit users - real-world ML practitioners, the most thing I would
> > love
> > > to mention is that dataflow graph-based representation is increasingly
> > > incapable of modern neural networks, because the increasingly appeared
> > > structures like arbitrary control flow (w/ continue, break, etc),
> > > recursion, type conjunction and disjunction, etc. These issues will be
> > our
> > > priority to address, which is brought by Relay, which addresses all
> these
> > > pain points.
> > >
> > > Another minor thing I would love to humbly mention is that, for sake of
> > our
> > > brand, it is our responsibility to be professional about terminologies
> > when
> > > writing an official proposal on Confluence. As one of the numerous
> > > examples, the title of the proposal really shocks me for a while,
> > something
> > > like "operators graph" blah blah so weird. Educate me if I were wrong,
> > but
> > > compiler community would prefer the term "intermediate representation",
> > and
> > > distributed system community would prefer "dataflow graph". If you
> don't
> > > have knowledge in these fields, a better way for efficient
> communication
> > is
> > > to get yourself first familiarize the most basic concepts and then do
> > > discussion. This is a way to save your own valuable time as well.
> > >
> > > Again, thank you so much for your hard work, and hope that we could
> work
> > > together to win customers in the future :-)
> > >
> > > Thanks,
> > > Junru
> > >
> > >
> > > On Tue, May 14, 2019 at 8:03 PM Tianqi Chen 
> > > wrote:
> > >
> > > > The core part of the proposal is to move the graph to be much more
> > > strongly
> > > > typed template class.
> > > > I think this is mainly a point of engineering taste, and both sides
> > have
> > > > pros and cons, let me list them before I share my thoughts on this
> > issue:
> > > >
> > > > - Typed fields certainly enjoy more compile-time type checking, on
> the
> > > > other hand, it is hard to expose
> > > >template of explosive possibilities to frontend languages.
> > > > - More type-erased fields provide runtime flexibility to store
> > > polymorphic
> > > > types as well as extensible attributes for graph optimization
> > > >   - It is hard to use a virtual class to expose every possible
> > attribute
> > > > that an operator might have, such as inlining, storage pattern,
> > gradient
> > > > etc..
> > > >   - The nature of supporting a growing set of operator attribute
> > > requires a
> > > > 

Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Zach Kimberg
I would like to raise another option to get back on the topic of changing
the Operator graph structure. On the page discussing Relay IR [1], it
discusses mainly the difference between a data flow graph like we use now
and A-normal [2] which is used in some functional compilers. Is there a
reason we do not want to use a structure based on Single Static Assignment
Form (wikipedia explanation [3], lecture note explanation [4]). It is used
almost universally in the compiler community including in LLVM (clang),
GCC, Oracle JVM, PyPy, Go, Webkit, and Swift [5]. The major reason behind
it's pervasiveness is that it has proven very effective for analysis and
transformations when dealing with control flow.

One possible concern is that it might make automatic differentiation more
difficult [6]. While it certainly is more complicated than a pure
functional approach, the functional approach requires users to use
functional programming. Especially with the languages we support now, that
doesn't seem like a reasonable assumption. Given that the users are already
introducing the complexity inherent in imperative programming, we have to
deal with the increased complexity regardless. I think it might be easier
to have the tools to deal with that rather than attempting to coerce users
into a different programming paradigm or convert code between paradigms.
Furthermore, this may become more important if users are increasingly
making use of control flow like Junru said.

Zach


[1] - https://docs.tvm.ai/dev/relay_intro.html
[2] - https://en.wikipedia.org/wiki/A-normal_form
[3] - https://en.wikipedia.org/wiki/Static_single_assignment_form
[4] - https://www.cs.cmu.edu/~rjsimmon/15411-f15/lec/10-ssa.pdf
[5] -
https://en.wikipedia.org/wiki/Static_single_assignment_form#Compilers_using_SSA_form
[6] - https://discuss.tvm.ai/t/choice-about-ir-ssa-or-anf/1757/2

On Wed, May 15, 2019 at 11:51 AM Naveen Swamy  wrote:

> Being dismissive and condescending has been exactly what is plaguing this
> project.
>
> I agree the last paragraph sounds very condescending and very dismissive
> and it breaks many code of conducts listed.
>
> On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian <
> anirudh2...@gmail.com>
> wrote:
>
> > Hi Junru,
> >
> > Overall, I appreciate the points you made about the proposal.
> >
> > Having said that, I would like to remind the Apache Code of Conduct :
> > https://www.apache.org/foundation/policies/conduct.
> > "Be empathetic, welcoming, friendly and patient".
> >
> > I find your tone condescending. Clearly you understand what he meant from
> > the context whether you prefer to call IR in compilers or data-flow in
> > distributed systems. You could very well say lets use this terminology to
> > have a common understanding instead of saying go learn the basic
> concepts.
> > Before building a cool brand, its important to build a healthy community.
> >
> > Anirudh
> >
> >
> > On Wed, May 15, 2019 at 12:03 AM Junru Shao 
> > wrote:
> >
> > > Hi Pedro,
> > >
> > > I really appreciate that a diligent and talented engineer eagerly wants
> > to
> > > improve our system, and am very thankful that you have done so much for
> > our
> > > community. However, I do want to mention some points that I believe I
> > > should mention.
> > >
> > > While I agree with Tianqi that every design has its pros and cons, I
> > would
> > > love to emphasize that a *good taste* of system design is to optimize
> the
> > > bottleneck, enhance expressiveness (and usability), i.e. to do what
> needs
> > > doing, rather than *trivial nits* that are irrelevant to either
> > performance
> > > or expressiveness. Generally speaking, typed or untyped, shared_ptr or
> > > unique_ptr, won't affect the overall performance when it comes to deep
> > > learning workload, specially when we have an async scheduler that does
> > good
> > > latency hiding in MXNet - to me, these are not major issues that are
> > worth
> > > re-designing our entire system.
> > >
> > > To benefit users - real-world ML practitioners, the most thing I would
> > love
> > > to mention is that dataflow graph-based representation is increasingly
> > > incapable of modern neural networks, because the increasingly appeared
> > > structures like arbitrary control flow (w/ continue, break, etc),
> > > recursion, type conjunction and disjunction, etc. These issues will be
> > our
> > > priority to address, which is brought by Relay, which addresses all
> these
> > > pain points.
> > >
> > > Another minor thing I would love to humbly mention is that, for sake of
> > our
> > > brand, it is our responsibility to be professional about terminologies
> > when
> > > writing an official proposal on Confluence. As one of the numerous
> > > examples, the title of the proposal really shocks me for a while,
> > something
> > > like "operators graph" blah blah so weird. Educate me if I were wrong,
> > but
> > > compiler community would prefer the term "intermediate representation",
> > and
> > > 

Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Naveen Swamy
Being dismissive and condescending has been exactly what is plaguing this
project.

I agree the last paragraph sounds very condescending and very dismissive
and it breaks many code of conducts listed.

On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian 
wrote:

> Hi Junru,
>
> Overall, I appreciate the points you made about the proposal.
>
> Having said that, I would like to remind the Apache Code of Conduct :
> https://www.apache.org/foundation/policies/conduct.
> "Be empathetic, welcoming, friendly and patient".
>
> I find your tone condescending. Clearly you understand what he meant from
> the context whether you prefer to call IR in compilers or data-flow in
> distributed systems. You could very well say lets use this terminology to
> have a common understanding instead of saying go learn the basic concepts.
> Before building a cool brand, its important to build a healthy community.
>
> Anirudh
>
>
> On Wed, May 15, 2019 at 12:03 AM Junru Shao 
> wrote:
>
> > Hi Pedro,
> >
> > I really appreciate that a diligent and talented engineer eagerly wants
> to
> > improve our system, and am very thankful that you have done so much for
> our
> > community. However, I do want to mention some points that I believe I
> > should mention.
> >
> > While I agree with Tianqi that every design has its pros and cons, I
> would
> > love to emphasize that a *good taste* of system design is to optimize the
> > bottleneck, enhance expressiveness (and usability), i.e. to do what needs
> > doing, rather than *trivial nits* that are irrelevant to either
> performance
> > or expressiveness. Generally speaking, typed or untyped, shared_ptr or
> > unique_ptr, won't affect the overall performance when it comes to deep
> > learning workload, specially when we have an async scheduler that does
> good
> > latency hiding in MXNet - to me, these are not major issues that are
> worth
> > re-designing our entire system.
> >
> > To benefit users - real-world ML practitioners, the most thing I would
> love
> > to mention is that dataflow graph-based representation is increasingly
> > incapable of modern neural networks, because the increasingly appeared
> > structures like arbitrary control flow (w/ continue, break, etc),
> > recursion, type conjunction and disjunction, etc. These issues will be
> our
> > priority to address, which is brought by Relay, which addresses all these
> > pain points.
> >
> > Another minor thing I would love to humbly mention is that, for sake of
> our
> > brand, it is our responsibility to be professional about terminologies
> when
> > writing an official proposal on Confluence. As one of the numerous
> > examples, the title of the proposal really shocks me for a while,
> something
> > like "operators graph" blah blah so weird. Educate me if I were wrong,
> but
> > compiler community would prefer the term "intermediate representation",
> and
> > distributed system community would prefer "dataflow graph". If you don't
> > have knowledge in these fields, a better way for efficient communication
> is
> > to get yourself first familiarize the most basic concepts and then do
> > discussion. This is a way to save your own valuable time as well.
> >
> > Again, thank you so much for your hard work, and hope that we could work
> > together to win customers in the future :-)
> >
> > Thanks,
> > Junru
> >
> >
> > On Tue, May 14, 2019 at 8:03 PM Tianqi Chen 
> > wrote:
> >
> > > The core part of the proposal is to move the graph to be much more
> > strongly
> > > typed template class.
> > > I think this is mainly a point of engineering taste, and both sides
> have
> > > pros and cons, let me list them before I share my thoughts on this
> issue:
> > >
> > > - Typed fields certainly enjoy more compile-time type checking, on the
> > > other hand, it is hard to expose
> > >template of explosive possibilities to frontend languages.
> > > - More type-erased fields provide runtime flexibility to store
> > polymorphic
> > > types as well as extensible attributes for graph optimization
> > >   - It is hard to use a virtual class to expose every possible
> attribute
> > > that an operator might have, such as inlining, storage pattern,
> gradient
> > > etc..
> > >   - The nature of supporting a growing set of operator attribute
> > requires a
> > > type-erased attrs field.
> > > - In contrast to your argument(typing is a blocker to features),
> > > type-erased or typed code can both get to the same feature except,
> except
> > > that
> > >   typed code gets more compile-time errors while type-erased get some
> of
> > > them in runtime.
> > > - Templatized data structures will likely introduce additional metal
> > > burdens to developers and are not really suitable as a core data
> > structure
> > >- Because they imply an explosive number of possible data
> structures,
> > > while the core data structure should be a single one.
> > >
> > > Now my view(as an MXNet PMC member) on typed vs type-erased style: If
> > MXNet
> > > 

Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Anirudh Subramanian
Hi Junru,

Overall, I appreciate the points you made about the proposal.

Having said that, I would like to remind the Apache Code of Conduct :
https://www.apache.org/foundation/policies/conduct.
"Be empathetic, welcoming, friendly and patient".

I find your tone condescending. Clearly you understand what he meant from
the context whether you prefer to call IR in compilers or data-flow in
distributed systems. You could very well say lets use this terminology to
have a common understanding instead of saying go learn the basic concepts.
Before building a cool brand, its important to build a healthy community.

Anirudh


On Wed, May 15, 2019 at 12:03 AM Junru Shao  wrote:

> Hi Pedro,
>
> I really appreciate that a diligent and talented engineer eagerly wants to
> improve our system, and am very thankful that you have done so much for our
> community. However, I do want to mention some points that I believe I
> should mention.
>
> While I agree with Tianqi that every design has its pros and cons, I would
> love to emphasize that a *good taste* of system design is to optimize the
> bottleneck, enhance expressiveness (and usability), i.e. to do what needs
> doing, rather than *trivial nits* that are irrelevant to either performance
> or expressiveness. Generally speaking, typed or untyped, shared_ptr or
> unique_ptr, won't affect the overall performance when it comes to deep
> learning workload, specially when we have an async scheduler that does good
> latency hiding in MXNet - to me, these are not major issues that are worth
> re-designing our entire system.
>
> To benefit users - real-world ML practitioners, the most thing I would love
> to mention is that dataflow graph-based representation is increasingly
> incapable of modern neural networks, because the increasingly appeared
> structures like arbitrary control flow (w/ continue, break, etc),
> recursion, type conjunction and disjunction, etc. These issues will be our
> priority to address, which is brought by Relay, which addresses all these
> pain points.
>
> Another minor thing I would love to humbly mention is that, for sake of our
> brand, it is our responsibility to be professional about terminologies when
> writing an official proposal on Confluence. As one of the numerous
> examples, the title of the proposal really shocks me for a while, something
> like "operators graph" blah blah so weird. Educate me if I were wrong, but
> compiler community would prefer the term "intermediate representation", and
> distributed system community would prefer "dataflow graph". If you don't
> have knowledge in these fields, a better way for efficient communication is
> to get yourself first familiarize the most basic concepts and then do
> discussion. This is a way to save your own valuable time as well.
>
> Again, thank you so much for your hard work, and hope that we could work
> together to win customers in the future :-)
>
> Thanks,
> Junru
>
>
> On Tue, May 14, 2019 at 8:03 PM Tianqi Chen 
> wrote:
>
> > The core part of the proposal is to move the graph to be much more
> strongly
> > typed template class.
> > I think this is mainly a point of engineering taste, and both sides have
> > pros and cons, let me list them before I share my thoughts on this issue:
> >
> > - Typed fields certainly enjoy more compile-time type checking, on the
> > other hand, it is hard to expose
> >template of explosive possibilities to frontend languages.
> > - More type-erased fields provide runtime flexibility to store
> polymorphic
> > types as well as extensible attributes for graph optimization
> >   - It is hard to use a virtual class to expose every possible attribute
> > that an operator might have, such as inlining, storage pattern, gradient
> > etc..
> >   - The nature of supporting a growing set of operator attribute
> requires a
> > type-erased attrs field.
> > - In contrast to your argument(typing is a blocker to features),
> > type-erased or typed code can both get to the same feature except, except
> > that
> >   typed code gets more compile-time errors while type-erased get some of
> > them in runtime.
> > - Templatized data structures will likely introduce additional metal
> > burdens to developers and are not really suitable as a core data
> structure
> >- Because they imply an explosive number of possible data structures,
> > while the core data structure should be a single one.
> >
> > Now my view(as an MXNet PMC member) on typed vs type-erased style: If
> MXNet
> > is a pure C++ project, I might take more of the typed approach.
> > However, MXNet itself is a project that takes python/scala/clojure and
> > other frontend languages.
> > The introduction of more typing may not align with the original goal as
> the
> > tradeoffs I listed above.
> >
> > This proposal is really a drastic change of what NNVM does, as well as
> the
> > optimization passes, and given the scope, in your analogy, "a new vehicle
> > to solve all the problems"
> > rather than a 

Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Junru Shao
Hi Pedro,

I really appreciate that a diligent and talented engineer eagerly wants to
improve our system, and am very thankful that you have done so much for our
community. However, I do want to mention some points that I believe I
should mention.

While I agree with Tianqi that every design has its pros and cons, I would
love to emphasize that a *good taste* of system design is to optimize the
bottleneck, enhance expressiveness (and usability), i.e. to do what needs
doing, rather than *trivial nits* that are irrelevant to either performance
or expressiveness. Generally speaking, typed or untyped, shared_ptr or
unique_ptr, won't affect the overall performance when it comes to deep
learning workload, specially when we have an async scheduler that does good
latency hiding in MXNet - to me, these are not major issues that are worth
re-designing our entire system.

To benefit users - real-world ML practitioners, the most thing I would love
to mention is that dataflow graph-based representation is increasingly
incapable of modern neural networks, because the increasingly appeared
structures like arbitrary control flow (w/ continue, break, etc),
recursion, type conjunction and disjunction, etc. These issues will be our
priority to address, which is brought by Relay, which addresses all these
pain points.

Another minor thing I would love to humbly mention is that, for sake of our
brand, it is our responsibility to be professional about terminologies when
writing an official proposal on Confluence. As one of the numerous
examples, the title of the proposal really shocks me for a while, something
like "operators graph" blah blah so weird. Educate me if I were wrong, but
compiler community would prefer the term "intermediate representation", and
distributed system community would prefer "dataflow graph". If you don't
have knowledge in these fields, a better way for efficient communication is
to get yourself first familiarize the most basic concepts and then do
discussion. This is a way to save your own valuable time as well.

Again, thank you so much for your hard work, and hope that we could work
together to win customers in the future :-)

Thanks,
Junru


On Tue, May 14, 2019 at 8:03 PM Tianqi Chen 
wrote:

> The core part of the proposal is to move the graph to be much more strongly
> typed template class.
> I think this is mainly a point of engineering taste, and both sides have
> pros and cons, let me list them before I share my thoughts on this issue:
>
> - Typed fields certainly enjoy more compile-time type checking, on the
> other hand, it is hard to expose
>template of explosive possibilities to frontend languages.
> - More type-erased fields provide runtime flexibility to store polymorphic
> types as well as extensible attributes for graph optimization
>   - It is hard to use a virtual class to expose every possible attribute
> that an operator might have, such as inlining, storage pattern, gradient
> etc..
>   - The nature of supporting a growing set of operator attribute requires a
> type-erased attrs field.
> - In contrast to your argument(typing is a blocker to features),
> type-erased or typed code can both get to the same feature except, except
> that
>   typed code gets more compile-time errors while type-erased get some of
> them in runtime.
> - Templatized data structures will likely introduce additional metal
> burdens to developers and are not really suitable as a core data structure
>- Because they imply an explosive number of possible data structures,
> while the core data structure should be a single one.
>
> Now my view(as an MXNet PMC member) on typed vs type-erased style: If MXNet
> is a pure C++ project, I might take more of the typed approach.
> However, MXNet itself is a project that takes python/scala/clojure and
> other frontend languages.
> The introduction of more typing may not align with the original goal as the
> tradeoffs I listed above.
>
> This proposal is really a drastic change of what NNVM does, as well as the
> optimization passes, and given the scope, in your analogy, "a new vehicle
> to solve all the problems"
> rather than a minor patch. It will take a lot of engineering effort to
> bring in new features and adapting the existing ones.
> Because of that, it does merit a discussion about how shall we think about
> the future MXNet2.0.
>
> Technically Relay is a serious candidate. Of course relay, as well as its
> core, is in C++ but maintains the multi-language first principle, that is
> why the example code was in python.
> See more related discussion comparing NNVMv1 and relay:
> https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5
>
> I think the ideal graph data structure candidate for MXNet2.0 should have
> natural support for:
> - Native support of function, module, and recursions
> - Control flows
> - The ability of interpolation with multi-language frontend, e.g. being
> able to prototype graph optimizations in python/scala/clojure if 

Re: [Proposal] New operator graph for MXNet

2019-05-14 Thread Tianqi Chen
The core part of the proposal is to move the graph to be much more strongly
typed template class.
I think this is mainly a point of engineering taste, and both sides have
pros and cons, let me list them before I share my thoughts on this issue:

- Typed fields certainly enjoy more compile-time type checking, on the
other hand, it is hard to expose
   template of explosive possibilities to frontend languages.
- More type-erased fields provide runtime flexibility to store polymorphic
types as well as extensible attributes for graph optimization
  - It is hard to use a virtual class to expose every possible attribute
that an operator might have, such as inlining, storage pattern, gradient
etc..
  - The nature of supporting a growing set of operator attribute requires a
type-erased attrs field.
- In contrast to your argument(typing is a blocker to features),
type-erased or typed code can both get to the same feature except, except
that
  typed code gets more compile-time errors while type-erased get some of
them in runtime.
- Templatized data structures will likely introduce additional metal
burdens to developers and are not really suitable as a core data structure
   - Because they imply an explosive number of possible data structures,
while the core data structure should be a single one.

Now my view(as an MXNet PMC member) on typed vs type-erased style: If MXNet
is a pure C++ project, I might take more of the typed approach.
However, MXNet itself is a project that takes python/scala/clojure and
other frontend languages.
The introduction of more typing may not align with the original goal as the
tradeoffs I listed above.

This proposal is really a drastic change of what NNVM does, as well as the
optimization passes, and given the scope, in your analogy, "a new vehicle
to solve all the problems"
rather than a minor patch. It will take a lot of engineering effort to
bring in new features and adapting the existing ones.
Because of that, it does merit a discussion about how shall we think about
the future MXNet2.0.

Technically Relay is a serious candidate. Of course relay, as well as its
core, is in C++ but maintains the multi-language first principle, that is
why the example code was in python.
See more related discussion comparing NNVMv1 and relay:
https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5

I think the ideal graph data structure candidate for MXNet2.0 should have
natural support for:
- Native support of function, module, and recursions
- Control flows
- The ability of interpolation with multi-language frontend, e.g. being
able to prototype graph optimizations in python/scala/clojure if needed.

Adding these support needs significant engineering effort, and I do hope we
only have to do it once. While I don't want to force any conclusion here,
I do think Relay is one such candidate.

Tianqi


On Tue, May 14, 2019 at 5:58 PM Pedro Larroy 
wrote:

> Hi Tianqi
>
> Thanks for the quick response.
>
> Could you point to examples where graph.h is being exposed which would
> not be possible with what I propose? I don't think my proposal is
> having any impact in language bindings, and the way I describe it
> doesn't affect having or not having higher language bindings. Please
> elaborate so I can understand your concern.  Maybe code examples where
> the graph attributes are being changed from Python?  I don't think we
> have this on MXNet. This is such a core foundation for MXNet, that I
> don't think we should compromise on it because other project not
> directly related to MXNet might want to expose some untyped graph and
> Node attributes.  The current status makes maintaining the code very
> painful and also is preventing desired features such as higher order
> gradients to be developed. I have heard from you many times how speed
> is critical for us to innovate in this quickly changing field.
>
> My proposal is limited to the graph and wouldn't change the way
> operators are registered and arguments are processed for operators for
> example.
>
>
> Regarding the second point, the documentation about Relay in the web
> which I found for example:
>
> https://docs.tvm.ai/dev/relay_add_op.html#
>
> Is somebody working on making Imperative::Backward use this API? this
> would be a big change which I'm not aware of. And using an IR is of a
> much bigger scope than the change I'm proposing here for example.
>
> I think I'm having difficulty understanding what are the arguments
> here. I'm saying I need to change one piece of my car and what you are
> selling me is a new vehicle here?  Or your suggestion that we use
> Relay for the graph passes in MXNet?
>
> I would like to see C++ code examples, Python examples are not
> sufficient when we talk about the core MXNet.
>
> Pedro.
>
>
>
>
>
>
> On Tue, May 14, 2019 at 5:39 PM Tianqi Chen 
> wrote:
> >
> > Thanks for the proposal. Let me share some of my thoughts:
> >
> > Specific comments on the proposal
> > 

Re: [Proposal] New operator graph for MXNet

2019-05-14 Thread Pedro Larroy
Hi Tianqi

I thought a bit more about your comments and I think there is a simple
way to address your concerns that satisfies both needs.

We can have a NodeAttributes template class which has a map of string
to any as it's currenlty the case, so the graph can be used in the
highly dynamic scenario that you are concerned about.

Let me know what you think.

Pedro.


On Tue, May 14, 2019 at 5:50 PM Pedro Larroy
 wrote:
>
> Hi Tianqi
>
> Thanks for the quick response.
>
> Could you point to examples where graph.h is being exposed which would
> not be possible with what I propose? I don't think my proposal is
> having any impact in language bindings, and the way I describe it
> doesn't affect having or not having higher language bindings. Please
> elaborate so I can understand your concern.  Maybe code examples where
> the graph attributes are being changed from Python?  I don't think we
> have this on MXNet. This is such a core foundation for MXNet, that I
> don't think we should compromise on it because other project not
> directly related to MXNet might want to expose some untyped graph and
> Node attributes.  The current status makes maintaining the code very
> painful and also is preventing desired features such as higher order
> gradients to be developed. I have heard from you many times how speed
> is critical for us to innovate in this quickly changing field.
>
> My proposal is limited to the graph and wouldn't change the way
> operators are registered and arguments are processed for operators for
> example.
>
>
> Regarding the second point, the documentation about Relay in the web
> which I found for example:
>
> https://docs.tvm.ai/dev/relay_add_op.html#
>
> Is somebody working on making Imperative::Backward use this API? this
> would be a big change which I'm not aware of. And using an IR is of a
> much bigger scope than the change I'm proposing here for example.
>
> I think I'm having difficulty understanding what are the arguments
> here. I'm saying I need to change one piece of my car and what you are
> selling me is a new vehicle here?  Or your suggestion that we use
> Relay for the graph passes in MXNet?
>
> I would like to see C++ code examples, Python examples are not
> sufficient when we talk about the core MXNet.
>
> Pedro.
>
>
>
>
>
>
> On Tue, May 14, 2019 at 5:39 PM Tianqi Chen  wrote:
> >
> > Thanks for the proposal. Let me share some of my thoughts:
> >
> > Specific comments on the proposal
> > ---
> > The heavy use of generic in the Graph type was a huge departure from
> > type-erased data structure which was presented in the previous design.
> > While we understand the advantage of typed language(more compile-time
> > checking) and type-erased types(more dynamism) the heavy use of
> > the template will actually make the project solely C++ focused, making it
> > hard to expose intermediate(templatized) data structure to
> > other languages like python/scala/clojure.
> >
> > While I fully understand some of the lessons taught in programming
> > C++(reduce shared_ptr, more typing etc.)
> > We need to think about the context of MXNet project and **the need to
> > support multi-language as a first-class**.
> > Some of the type-erased types are design trade-offs made to support these
> > features, and we need to think more
> > carefully instead of just applying "rules for C++" which may bring problems.
> >
> > Future of NNVM
> > --
> > Given that this thread touched upon what we should do for better
> > computational graph handling. I would recommend also to take a look at
> > NNVMv2 -- relay.
> >
> > Relay addresses many of the wish-lists in the proposal already, such as
> > operator fusion, high order gradient, offload to hardware, isolated
> > compilation, deployment on edge and accelerators etc.
> > Relay also address problems not yet being mentioned in the proposal,
> > including control flow and dynamic runtime, automatic layout optimization
> > etc.
> >
> > Tianqi
> >
> > On Tue, May 14, 2019 at 5:06 PM Sheng Zha  wrote:
> >
> > > Hi Pedro,
> > >
> > > Thanks for taking the inititaive. Skimming through the design doc, I
> > > didn't see comparison with existing solutions such as relay in tvm, which
> > > is already a dependency of mxnet already. Could you elaborate on 
> > > comparison
> > > with existing solutions in the design doc too?
> > >
> > > -sz
> > >
> > > On 2019/05/14 23:49:30, Pedro Larroy 
> > > wrote:
> > > > Hi dev@
> > > >
> > > > As a result of my deep dives on the graph machinery I have created a
> > > > new proposal to improve the operator graph in MXNet.
> > > >
> > > > This would mean superseding the use of NNVM Graph in MXNet and having
> > > > a new implementation that we can use to simplify a lot of code and do
> > > > powerful graph manipulation and passes such as operator fusion and
> > > > other optimizations.
> > > >
> > > > As it would be a change with big impact and ramifications, your
> 

Re: [Proposal] New operator graph for MXNet

2019-05-14 Thread Pedro Larroy
Hi Tianqi

Thanks for the quick response.

Could you point to examples where graph.h is being exposed which would
not be possible with what I propose? I don't think my proposal is
having any impact in language bindings, and the way I describe it
doesn't affect having or not having higher language bindings. Please
elaborate so I can understand your concern.  Maybe code examples where
the graph attributes are being changed from Python?  I don't think we
have this on MXNet. This is such a core foundation for MXNet, that I
don't think we should compromise on it because other project not
directly related to MXNet might want to expose some untyped graph and
Node attributes.  The current status makes maintaining the code very
painful and also is preventing desired features such as higher order
gradients to be developed. I have heard from you many times how speed
is critical for us to innovate in this quickly changing field.

My proposal is limited to the graph and wouldn't change the way
operators are registered and arguments are processed for operators for
example.


Regarding the second point, the documentation about Relay in the web
which I found for example:

https://docs.tvm.ai/dev/relay_add_op.html#

Is somebody working on making Imperative::Backward use this API? this
would be a big change which I'm not aware of. And using an IR is of a
much bigger scope than the change I'm proposing here for example.

I think I'm having difficulty understanding what are the arguments
here. I'm saying I need to change one piece of my car and what you are
selling me is a new vehicle here?  Or your suggestion that we use
Relay for the graph passes in MXNet?

I would like to see C++ code examples, Python examples are not
sufficient when we talk about the core MXNet.

Pedro.






On Tue, May 14, 2019 at 5:39 PM Tianqi Chen  wrote:
>
> Thanks for the proposal. Let me share some of my thoughts:
>
> Specific comments on the proposal
> ---
> The heavy use of generic in the Graph type was a huge departure from
> type-erased data structure which was presented in the previous design.
> While we understand the advantage of typed language(more compile-time
> checking) and type-erased types(more dynamism) the heavy use of
> the template will actually make the project solely C++ focused, making it
> hard to expose intermediate(templatized) data structure to
> other languages like python/scala/clojure.
>
> While I fully understand some of the lessons taught in programming
> C++(reduce shared_ptr, more typing etc.)
> We need to think about the context of MXNet project and **the need to
> support multi-language as a first-class**.
> Some of the type-erased types are design trade-offs made to support these
> features, and we need to think more
> carefully instead of just applying "rules for C++" which may bring problems.
>
> Future of NNVM
> --
> Given that this thread touched upon what we should do for better
> computational graph handling. I would recommend also to take a look at
> NNVMv2 -- relay.
>
> Relay addresses many of the wish-lists in the proposal already, such as
> operator fusion, high order gradient, offload to hardware, isolated
> compilation, deployment on edge and accelerators etc.
> Relay also address problems not yet being mentioned in the proposal,
> including control flow and dynamic runtime, automatic layout optimization
> etc.
>
> Tianqi
>
> On Tue, May 14, 2019 at 5:06 PM Sheng Zha  wrote:
>
> > Hi Pedro,
> >
> > Thanks for taking the inititaive. Skimming through the design doc, I
> > didn't see comparison with existing solutions such as relay in tvm, which
> > is already a dependency of mxnet already. Could you elaborate on comparison
> > with existing solutions in the design doc too?
> >
> > -sz
> >
> > On 2019/05/14 23:49:30, Pedro Larroy 
> > wrote:
> > > Hi dev@
> > >
> > > As a result of my deep dives on the graph machinery I have created a
> > > new proposal to improve the operator graph in MXNet.
> > >
> > > This would mean superseding the use of NNVM Graph in MXNet and having
> > > a new implementation that we can use to simplify a lot of code and do
> > > powerful graph manipulation and passes such as operator fusion and
> > > other optimizations.
> > >
> > > As it would be a change with big impact and ramifications, your
> > > thoughts and feedback on the document would be highly appreciated so
> > > we can take potential future interesting use cases:
> > >
> > >
> > https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0
> > >
> > > Pedro.
> > >
> >


Re: [Proposal] New operator graph for MXNet

2019-05-14 Thread Tianqi Chen
Thanks for the proposal. Let me share some of my thoughts:

Specific comments on the proposal
---
The heavy use of generic in the Graph type was a huge departure from
type-erased data structure which was presented in the previous design.
While we understand the advantage of typed language(more compile-time
checking) and type-erased types(more dynamism) the heavy use of
the template will actually make the project solely C++ focused, making it
hard to expose intermediate(templatized) data structure to
other languages like python/scala/clojure.

While I fully understand some of the lessons taught in programming
C++(reduce shared_ptr, more typing etc.)
We need to think about the context of MXNet project and **the need to
support multi-language as a first-class**.
Some of the type-erased types are design trade-offs made to support these
features, and we need to think more
carefully instead of just applying "rules for C++" which may bring problems.

Future of NNVM
--
Given that this thread touched upon what we should do for better
computational graph handling. I would recommend also to take a look at
NNVMv2 -- relay.

Relay addresses many of the wish-lists in the proposal already, such as
operator fusion, high order gradient, offload to hardware, isolated
compilation, deployment on edge and accelerators etc.
Relay also address problems not yet being mentioned in the proposal,
including control flow and dynamic runtime, automatic layout optimization
etc.

Tianqi

On Tue, May 14, 2019 at 5:06 PM Sheng Zha  wrote:

> Hi Pedro,
>
> Thanks for taking the inititaive. Skimming through the design doc, I
> didn't see comparison with existing solutions such as relay in tvm, which
> is already a dependency of mxnet already. Could you elaborate on comparison
> with existing solutions in the design doc too?
>
> -sz
>
> On 2019/05/14 23:49:30, Pedro Larroy 
> wrote:
> > Hi dev@
> >
> > As a result of my deep dives on the graph machinery I have created a
> > new proposal to improve the operator graph in MXNet.
> >
> > This would mean superseding the use of NNVM Graph in MXNet and having
> > a new implementation that we can use to simplify a lot of code and do
> > powerful graph manipulation and passes such as operator fusion and
> > other optimizations.
> >
> > As it would be a change with big impact and ramifications, your
> > thoughts and feedback on the document would be highly appreciated so
> > we can take potential future interesting use cases:
> >
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0
> >
> > Pedro.
> >
>


Re: [Proposal] New operator graph for MXNet

2019-05-14 Thread Pedro Larroy
Hi Sheng

Could  you provide relevant links to Relay and what you would
recommend to read so we have a focused discussion instead of me
potentially me miss-searching? Probably I also missed the discussion
or vote in the mail list regarding including TVM as a depedency or
future plans on using Relay.
As far as I know, we have TVM as a dependency because NNVM was
assimilated into it but we are not using it directly.  Is this
correct?

This would help me to add this information to the doc as you request.

Thanks.

Pedro.

On Tue, May 14, 2019 at 5:06 PM Sheng Zha  wrote:
>
> Hi Pedro,
>
> Thanks for taking the inititaive. Skimming through the design doc, I didn't 
> see comparison with existing solutions such as relay in tvm, which is already 
> a dependency of mxnet already. Could you elaborate on comparison with 
> existing solutions in the design doc too?
>
> -sz
>
> On 2019/05/14 23:49:30, Pedro Larroy  wrote:
> > Hi dev@
> >
> > As a result of my deep dives on the graph machinery I have created a
> > new proposal to improve the operator graph in MXNet.
> >
> > This would mean superseding the use of NNVM Graph in MXNet and having
> > a new implementation that we can use to simplify a lot of code and do
> > powerful graph manipulation and passes such as operator fusion and
> > other optimizations.
> >
> > As it would be a change with big impact and ramifications, your
> > thoughts and feedback on the document would be highly appreciated so
> > we can take potential future interesting use cases:
> >
> > https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0
> >
> > Pedro.
> >


Re: [Proposal] New operator graph for MXNet

2019-05-14 Thread Sheng Zha
Hi Pedro,

Thanks for taking the inititaive. Skimming through the design doc, I didn't see 
comparison with existing solutions such as relay in tvm, which is already a 
dependency of mxnet already. Could you elaborate on comparison with existing 
solutions in the design doc too?

-sz

On 2019/05/14 23:49:30, Pedro Larroy  wrote: 
> Hi dev@
> 
> As a result of my deep dives on the graph machinery I have created a
> new proposal to improve the operator graph in MXNet.
> 
> This would mean superseding the use of NNVM Graph in MXNet and having
> a new implementation that we can use to simplify a lot of code and do
> powerful graph manipulation and passes such as operator fusion and
> other optimizations.
> 
> As it would be a change with big impact and ramifications, your
> thoughts and feedback on the document would be highly appreciated so
> we can take potential future interesting use cases:
> 
> https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0
> 
> Pedro.
> 


[Proposal] New operator graph for MXNet

2019-05-14 Thread Pedro Larroy
Hi dev@

As a result of my deep dives on the graph machinery I have created a
new proposal to improve the operator graph in MXNet.

This would mean superseding the use of NNVM Graph in MXNet and having
a new implementation that we can use to simplify a lot of code and do
powerful graph manipulation and passes such as operator fusion and
other optimizations.

As it would be a change with big impact and ramifications, your
thoughts and feedback on the document would be highly appreciated so
we can take potential future interesting use cases:

https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0

Pedro.