Re: [Proposal] New operator graph for MXNet
Hi Tianqi and Junru. MXNet as a piece of software is in its teens and needs to mature. The community needs to have a honest discussion, and decide if MXNet is a production or a research framework. If it's a production framework, we need to apply the YAGNI principle and decide what is and what is not supported, are we focusing on training or inference. In any case it should be possible to refactor the code to be solid, easy to maintain, and resilient to bugs. This includes reducing the surface area for present an future bugs, saying no to features, and taking advantage of every tool including the C++ type system, as ML makes further inroads into products and our everyday life it should be held to the same engineering principles as other pieces of production software, otherwise you end up in bad situations which can be avoided with good engineering. Is not fun to debug a dictionary of string to dmlc::any in C++. It's basically just one level above having to decode machine instructions and hexadecimal dumps from memory, and we are in 2019, we have tools. As someone who is supporting MXNet use-cases in production as well as developing new features, I will say that we are spending too many efforts in some cases derived from deficiencies in these areas which can be better spent advancing the SOTA in TVM or adding features to MXNet. Taking a high level view of the issue, I don't think right now is beneficial for either project to be co-dependent. I think in TVM and NNVM2 you want to iterate and experiment fast and in MXNet you want to bias towards stability and maintainability, the speed and agility is naturally going to be different. In an analogy to programming languages, MXNet would start to become the Java platform and TVM is Haskell... I'm not saying that we should or should not use NNVM2 in the future. But this is not something that should be sneaked into MXNet through a sub-repository without discussion, planning and proper testing. I have extensively (re)read through Relay, TVM papers, including it's references. As it stands today, the goals of the TVM project are different than the goals of MXNet and the design choices and constraints diverge: Some of the points you make are surprising to me when I look at the codebase as a non-PMC member: Dynamic language support is implemented through the C++ API and doesn't require dynamic attributes in the graph, could you come with an example where any modifcation towards a different graph implementation would affect the bindings of the dynamic languages for MXNet? Mental burden of templates: I have never seen so much reliance on template magic in any other project than MXNet. I don't think for any of the MXNet developers is difficult to understand a Node class passed as a template argument to a graph. TVM is selling typing and pure functional IR, even though for MXNet developers this is dismissed as a nit and a matter of engineering taste. Also, how relevant will be having the graph mutated through a dynamic language when some of the deep learning community is leaning towards adding differentiable programming to static languages like Swift? When you have the hammer of a dynamic language everything looks like a dictionary of strings. There is ZERO unit tests for those critical code paths and classes in NNVM. And no, the end to end python tests don't count as unit tests for a C++ class without bindings in my book. Happy weekend. Pedro. On Tue, May 14, 2019 at 8:03 PM Tianqi Chen wrote: > > The core part of the proposal is to move the graph to be much more strongly > typed template class. > I think this is mainly a point of engineering taste, and both sides have > pros and cons, let me list them before I share my thoughts on this issue: > > - Typed fields certainly enjoy more compile-time type checking, on the > other hand, it is hard to expose >template of explosive possibilities to frontend languages. > - More type-erased fields provide runtime flexibility to store polymorphic > types as well as extensible attributes for graph optimization > - It is hard to use a virtual class to expose every possible attribute > that an operator might have, such as inlining, storage pattern, gradient > etc.. > - The nature of supporting a growing set of operator attribute requires a > type-erased attrs field. > - In contrast to your argument(typing is a blocker to features), > type-erased or typed code can both get to the same feature except, except > that > typed code gets more compile-time errors while type-erased get some of > them in runtime. > - Templatized data structures will likely introduce additional metal > burdens to developers and are not really suitable as a core data structure >- Because they imply an explosive number of possible data structures, > while the core data structure should be a single one. > > Now my view(as an MXNet PMC member) on typed vs type-erased style: If MXNet > is a pure C++ project, I might take more of the
Re: [Proposal] New operator graph for MXNet
Hi Zach, Thank you for raising these points! I am happy to offer more reading materials about this topic. *SSA vs ANF.* ANF and SSA are essentially the same thing [1]. *AD in Relay.* Relay is able to do AD through not only control flow, but also various data structures and higher-order functjon [2]. [1] Appel, Andrew W. "SSA is functional programming." *ACM SIGPLAN Notices* 33.4 (1998): 17-20. [2] Roesch, Jared, et al. "Relay: a new IR for machine learning frameworks." *Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages*. ACM, 2018. On Wed, May 15, 2019 at 12:01 PM Zach Kimberg wrote: > I would like to raise another option to get back on the topic of changing > the Operator graph structure. On the page discussing Relay IR [1], it > discusses mainly the difference between a data flow graph like we use now > and A-normal [2] which is used in some functional compilers. Is there a > reason we do not want to use a structure based on Single Static Assignment > Form (wikipedia explanation [3], lecture note explanation [4]). It is used > almost universally in the compiler community including in LLVM (clang), > GCC, Oracle JVM, PyPy, Go, Webkit, and Swift [5]. The major reason behind > it's pervasiveness is that it has proven very effective for analysis and > transformations when dealing with control flow. > > One possible concern is that it might make automatic differentiation more > difficult [6]. While it certainly is more complicated than a pure > functional approach, the functional approach requires users to use > functional programming. Especially with the languages we support now, that > doesn't seem like a reasonable assumption. Given that the users are already > introducing the complexity inherent in imperative programming, we have to > deal with the increased complexity regardless. I think it might be easier > to have the tools to deal with that rather than attempting to coerce users > into a different programming paradigm or convert code between paradigms. > Furthermore, this may become more important if users are increasingly > making use of control flow like Junru said. > > Zach > > > [1] - https://docs.tvm.ai/dev/relay_intro.html > [2] - https://en.wikipedia.org/wiki/A-normal_form > [3] - https://en.wikipedia.org/wiki/Static_single_assignment_form > [4] - https://www.cs.cmu.edu/~rjsimmon/15411-f15/lec/10-ssa.pdf > [5] - > > https://en.wikipedia.org/wiki/Static_single_assignment_form#Compilers_using_SSA_form > [6] - https://discuss.tvm.ai/t/choice-about-ir-ssa-or-anf/1757/2 > > On Wed, May 15, 2019 at 11:51 AM Naveen Swamy wrote: > > > Being dismissive and condescending has been exactly what is plaguing this > > project. > > > > I agree the last paragraph sounds very condescending and very dismissive > > and it breaks many code of conducts listed. > > > > On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian < > > anirudh2...@gmail.com> > > wrote: > > > > > Hi Junru, > > > > > > Overall, I appreciate the points you made about the proposal. > > > > > > Having said that, I would like to remind the Apache Code of Conduct : > > > https://www.apache.org/foundation/policies/conduct. > > > "Be empathetic, welcoming, friendly and patient". > > > > > > I find your tone condescending. Clearly you understand what he meant > from > > > the context whether you prefer to call IR in compilers or data-flow in > > > distributed systems. You could very well say lets use this terminology > to > > > have a common understanding instead of saying go learn the basic > > concepts. > > > Before building a cool brand, its important to build a healthy > community. > > > > > > Anirudh > > > > > > > > > On Wed, May 15, 2019 at 12:03 AM Junru Shao > > > wrote: > > > > > > > Hi Pedro, > > > > > > > > I really appreciate that a diligent and talented engineer eagerly > wants > > > to > > > > improve our system, and am very thankful that you have done so much > for > > > our > > > > community. However, I do want to mention some points that I believe I > > > > should mention. > > > > > > > > While I agree with Tianqi that every design has its pros and cons, I > > > would > > > > love to emphasize that a *good taste* of system design is to optimize > > the > > > > bottleneck, enhance expressiveness (and usability), i.e. to do what > > needs > > > > doing, rather than *trivial nits* that are irrelevant to either > > > performance > > > > or expressiveness. Generally speaking, typed or untyped, shared_ptr > or > > > > unique_ptr, won't affect the overall performance when it comes to > deep > > > > learning workload, specially when we have an async scheduler that > does > > > good > > > > latency hiding in MXNet - to me, these are not major issues that are > > > worth > > > > re-designing our entire system. > > > > > > > > To benefit users - real-world ML practitioners, the most thing I > would > > > love > > > > to mention is that dataflow graph-based representation is >
Re: [Proposal] New operator graph for MXNet
This is a good point. I believe the main question here is not SSA vs others, but more about CFG vs structured control flow. SSA is generally equivalent to ANF or dataflow if you ignore the Phi and CFG blocks. The current relay IR makes use of more structured control flow so it does not have an explicit CFG(aka goto). I believe that for deep learning, it is a good idea to get the highest level information when possible, and structured control-flow block is certainly more informative(while eliminating the possibility of goto). Mutation is something that could be handled in Relay, with explicit annotation. Most of the current deep learning programs contain parts that need to be automatically differentiated, which is usually pure, and parts that need to update parameters, which can be explicitly marked. The center of the question is: do we try to represent the parts that are pure directly in the IR, and maintain the necessary high-level structures, or do we allow the IR to represent more arbitrary programs while trying to use analysis(e.g. alias pointer analysis) to recover them. I think the former one would be easier given deep learning programs are already pretty high level. Now there is also a discussion about again adding CFG to relay to handle rare cases which do not have to be optimized. But for what I have seen so far it seems to fits most of the need. Tianqi On Wed, May 15, 2019 at 12:01 PM Zach Kimberg wrote: > I would like to raise another option to get back on the topic of changing > the Operator graph structure. On the page discussing Relay IR [1], it > discusses mainly the difference between a data flow graph like we use now > and A-normal [2] which is used in some functional compilers. Is there a > reason we do not want to use a structure based on Single Static Assignment > Form (wikipedia explanation [3], lecture note explanation [4]). It is used > almost universally in the compiler community including in LLVM (clang), > GCC, Oracle JVM, PyPy, Go, Webkit, and Swift [5]. The major reason behind > it's pervasiveness is that it has proven very effective for analysis and > transformations when dealing with control flow. > > One possible concern is that it might make automatic differentiation more > difficult [6]. While it certainly is more complicated than a pure > functional approach, the functional approach requires users to use > functional programming. Especially with the languages we support now, that > doesn't seem like a reasonable assumption. Given that the users are already > introducing the complexity inherent in imperative programming, we have to > deal with the increased complexity regardless. I think it might be easier > to have the tools to deal with that rather than attempting to coerce users > into a different programming paradigm or convert code between paradigms. > Furthermore, this may become more important if users are increasingly > making use of control flow like Junru said. > > Zach > > > [1] - https://docs.tvm.ai/dev/relay_intro.html > [2] - https://en.wikipedia.org/wiki/A-normal_form > [3] - https://en.wikipedia.org/wiki/Static_single_assignment_form > [4] - https://www.cs.cmu.edu/~rjsimmon/15411-f15/lec/10-ssa.pdf > [5] - > > https://en.wikipedia.org/wiki/Static_single_assignment_form#Compilers_using_SSA_form > [6] - https://discuss.tvm.ai/t/choice-about-ir-ssa-or-anf/1757/2 > > On Wed, May 15, 2019 at 11:51 AM Naveen Swamy wrote: > > > Being dismissive and condescending has been exactly what is plaguing this > > project. > > > > I agree the last paragraph sounds very condescending and very dismissive > > and it breaks many code of conducts listed. > > > > On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian < > > anirudh2...@gmail.com> > > wrote: > > > > > Hi Junru, > > > > > > Overall, I appreciate the points you made about the proposal. > > > > > > Having said that, I would like to remind the Apache Code of Conduct : > > > https://www.apache.org/foundation/policies/conduct. > > > "Be empathetic, welcoming, friendly and patient". > > > > > > I find your tone condescending. Clearly you understand what he meant > from > > > the context whether you prefer to call IR in compilers or data-flow in > > > distributed systems. You could very well say lets use this terminology > to > > > have a common understanding instead of saying go learn the basic > > concepts. > > > Before building a cool brand, its important to build a healthy > community. > > > > > > Anirudh > > > > > > > > > On Wed, May 15, 2019 at 12:03 AM Junru Shao > > > wrote: > > > > > > > Hi Pedro, > > > > > > > > I really appreciate that a diligent and talented engineer eagerly > wants > > > to > > > > improve our system, and am very thankful that you have done so much > for > > > our > > > > community. However, I do want to mention some points that I believe I > > > > should mention. > > > > > > > > While I agree with Tianqi that every design has its pros and cons, I > > > would > > > > love to emphasize
Re: [Proposal] New operator graph for MXNet
Hi Thanks for all the materials and keypoints raised. The discussion has many ramifications, I will think about them and research them very carefully before replying further. Please also don't quickly dismiss the points I have raised and reduce them to typed vs untyped or pedantic C++ comments, we have been debugging missing nodes and pointers in the graph when doing second order gradient for weeks with no success due to the design of the graph. There's 60 years of software development learnings and practice behind some concepts, and compiler theory that deep learning frameworks can also take advantage of instead of rediscovering everything again until we end up in a typed pure functional IR. In some of the materials linked you also point out limitations of the current architecture. I think it's good that we raise this topic and it shows that we need to have a deeper and structured conversation on how we evolve the dataflow graph in MXNet. Maybe you can help cross polinizing this conversation between the TVM and MXNet project. If there's an intention to change from NNVM to NNVM2 I think this should have been communicated or discussed with the community before. Until then. Pedro. On Tue, May 14, 2019 at 8:03 PM Tianqi Chen wrote: > > The core part of the proposal is to move the graph to be much more strongly > typed template class. > I think this is mainly a point of engineering taste, and both sides have > pros and cons, let me list them before I share my thoughts on this issue: > > - Typed fields certainly enjoy more compile-time type checking, on the > other hand, it is hard to expose >template of explosive possibilities to frontend languages. > - More type-erased fields provide runtime flexibility to store polymorphic > types as well as extensible attributes for graph optimization > - It is hard to use a virtual class to expose every possible attribute > that an operator might have, such as inlining, storage pattern, gradient > etc.. > - The nature of supporting a growing set of operator attribute requires a > type-erased attrs field. > - In contrast to your argument(typing is a blocker to features), > type-erased or typed code can both get to the same feature except, except > that > typed code gets more compile-time errors while type-erased get some of > them in runtime. > - Templatized data structures will likely introduce additional metal > burdens to developers and are not really suitable as a core data structure >- Because they imply an explosive number of possible data structures, > while the core data structure should be a single one. > > Now my view(as an MXNet PMC member) on typed vs type-erased style: If MXNet > is a pure C++ project, I might take more of the typed approach. > However, MXNet itself is a project that takes python/scala/clojure and > other frontend languages. > The introduction of more typing may not align with the original goal as the > tradeoffs I listed above. > > This proposal is really a drastic change of what NNVM does, as well as the > optimization passes, and given the scope, in your analogy, "a new vehicle > to solve all the problems" > rather than a minor patch. It will take a lot of engineering effort to > bring in new features and adapting the existing ones. > Because of that, it does merit a discussion about how shall we think about > the future MXNet2.0. > > Technically Relay is a serious candidate. Of course relay, as well as its > core, is in C++ but maintains the multi-language first principle, that is > why the example code was in python. > See more related discussion comparing NNVMv1 and relay: > https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5 > > I think the ideal graph data structure candidate for MXNet2.0 should have > natural support for: > - Native support of function, module, and recursions > - Control flows > - The ability of interpolation with multi-language frontend, e.g. being > able to prototype graph optimizations in python/scala/clojure if needed. > > Adding these support needs significant engineering effort, and I do hope we > only have to do it once. While I don't want to force any conclusion here, > I do think Relay is one such candidate. > > Tianqi > > > On Tue, May 14, 2019 at 5:58 PM Pedro Larroy > wrote: > > > Hi Tianqi > > > > Thanks for the quick response. > > > > Could you point to examples where graph.h is being exposed which would > > not be possible with what I propose? I don't think my proposal is > > having any impact in language bindings, and the way I describe it > > doesn't affect having or not having higher language bindings. Please > > elaborate so I can understand your concern. Maybe code examples where > > the graph attributes are being changed from Python? I don't think we > > have this on MXNet. This is such a core foundation for MXNet, that I > > don't think we should compromise on it because other project not > > directly related to MXNet might want to expose some untyped
Re: [Proposal] New operator graph for MXNet
Hi Anirudh, Naveen, Thank you so much for the gentle remainder! I am not a native speaker and have resulted in the mistake. I would love to say sincere sorry to Pedro. Pedro is working really hard for growing our community and improving our code base. I sincerely apologize for what I have said in a hurry. Let’s work hard together to grow a healthy community! Thanks, Junru On Wed, May 15, 2019 at 11:51 Naveen Swamy wrote: > Being dismissive and condescending has been exactly what is plaguing this > project. > > I agree the last paragraph sounds very condescending and very dismissive > and it breaks many code of conducts listed. > > On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian < > anirudh2...@gmail.com> > wrote: > > > Hi Junru, > > > > Overall, I appreciate the points you made about the proposal. > > > > Having said that, I would like to remind the Apache Code of Conduct : > > https://www.apache.org/foundation/policies/conduct. > > "Be empathetic, welcoming, friendly and patient". > > > > I find your tone condescending. Clearly you understand what he meant from > > the context whether you prefer to call IR in compilers or data-flow in > > distributed systems. You could very well say lets use this terminology to > > have a common understanding instead of saying go learn the basic > concepts. > > Before building a cool brand, its important to build a healthy community. > > > > Anirudh > > > > > > On Wed, May 15, 2019 at 12:03 AM Junru Shao > > wrote: > > > > > Hi Pedro, > > > > > > I really appreciate that a diligent and talented engineer eagerly wants > > to > > > improve our system, and am very thankful that you have done so much for > > our > > > community. However, I do want to mention some points that I believe I > > > should mention. > > > > > > While I agree with Tianqi that every design has its pros and cons, I > > would > > > love to emphasize that a *good taste* of system design is to optimize > the > > > bottleneck, enhance expressiveness (and usability), i.e. to do what > needs > > > doing, rather than *trivial nits* that are irrelevant to either > > performance > > > or expressiveness. Generally speaking, typed or untyped, shared_ptr or > > > unique_ptr, won't affect the overall performance when it comes to deep > > > learning workload, specially when we have an async scheduler that does > > good > > > latency hiding in MXNet - to me, these are not major issues that are > > worth > > > re-designing our entire system. > > > > > > To benefit users - real-world ML practitioners, the most thing I would > > love > > > to mention is that dataflow graph-based representation is increasingly > > > incapable of modern neural networks, because the increasingly appeared > > > structures like arbitrary control flow (w/ continue, break, etc), > > > recursion, type conjunction and disjunction, etc. These issues will be > > our > > > priority to address, which is brought by Relay, which addresses all > these > > > pain points. > > > > > > Another minor thing I would love to humbly mention is that, for sake of > > our > > > brand, it is our responsibility to be professional about terminologies > > when > > > writing an official proposal on Confluence. As one of the numerous > > > examples, the title of the proposal really shocks me for a while, > > something > > > like "operators graph" blah blah so weird. Educate me if I were wrong, > > but > > > compiler community would prefer the term "intermediate representation", > > and > > > distributed system community would prefer "dataflow graph". If you > don't > > > have knowledge in these fields, a better way for efficient > communication > > is > > > to get yourself first familiarize the most basic concepts and then do > > > discussion. This is a way to save your own valuable time as well. > > > > > > Again, thank you so much for your hard work, and hope that we could > work > > > together to win customers in the future :-) > > > > > > Thanks, > > > Junru > > > > > > > > > On Tue, May 14, 2019 at 8:03 PM Tianqi Chen > > > wrote: > > > > > > > The core part of the proposal is to move the graph to be much more > > > strongly > > > > typed template class. > > > > I think this is mainly a point of engineering taste, and both sides > > have > > > > pros and cons, let me list them before I share my thoughts on this > > issue: > > > > > > > > - Typed fields certainly enjoy more compile-time type checking, on > the > > > > other hand, it is hard to expose > > > >template of explosive possibilities to frontend languages. > > > > - More type-erased fields provide runtime flexibility to store > > > polymorphic > > > > types as well as extensible attributes for graph optimization > > > > - It is hard to use a virtual class to expose every possible > > attribute > > > > that an operator might have, such as inlining, storage pattern, > > gradient > > > > etc.. > > > > - The nature of supporting a growing set of operator attribute > > > requires a > > > >
Re: [Proposal] New operator graph for MXNet
I would like to raise another option to get back on the topic of changing the Operator graph structure. On the page discussing Relay IR [1], it discusses mainly the difference between a data flow graph like we use now and A-normal [2] which is used in some functional compilers. Is there a reason we do not want to use a structure based on Single Static Assignment Form (wikipedia explanation [3], lecture note explanation [4]). It is used almost universally in the compiler community including in LLVM (clang), GCC, Oracle JVM, PyPy, Go, Webkit, and Swift [5]. The major reason behind it's pervasiveness is that it has proven very effective for analysis and transformations when dealing with control flow. One possible concern is that it might make automatic differentiation more difficult [6]. While it certainly is more complicated than a pure functional approach, the functional approach requires users to use functional programming. Especially with the languages we support now, that doesn't seem like a reasonable assumption. Given that the users are already introducing the complexity inherent in imperative programming, we have to deal with the increased complexity regardless. I think it might be easier to have the tools to deal with that rather than attempting to coerce users into a different programming paradigm or convert code between paradigms. Furthermore, this may become more important if users are increasingly making use of control flow like Junru said. Zach [1] - https://docs.tvm.ai/dev/relay_intro.html [2] - https://en.wikipedia.org/wiki/A-normal_form [3] - https://en.wikipedia.org/wiki/Static_single_assignment_form [4] - https://www.cs.cmu.edu/~rjsimmon/15411-f15/lec/10-ssa.pdf [5] - https://en.wikipedia.org/wiki/Static_single_assignment_form#Compilers_using_SSA_form [6] - https://discuss.tvm.ai/t/choice-about-ir-ssa-or-anf/1757/2 On Wed, May 15, 2019 at 11:51 AM Naveen Swamy wrote: > Being dismissive and condescending has been exactly what is plaguing this > project. > > I agree the last paragraph sounds very condescending and very dismissive > and it breaks many code of conducts listed. > > On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian < > anirudh2...@gmail.com> > wrote: > > > Hi Junru, > > > > Overall, I appreciate the points you made about the proposal. > > > > Having said that, I would like to remind the Apache Code of Conduct : > > https://www.apache.org/foundation/policies/conduct. > > "Be empathetic, welcoming, friendly and patient". > > > > I find your tone condescending. Clearly you understand what he meant from > > the context whether you prefer to call IR in compilers or data-flow in > > distributed systems. You could very well say lets use this terminology to > > have a common understanding instead of saying go learn the basic > concepts. > > Before building a cool brand, its important to build a healthy community. > > > > Anirudh > > > > > > On Wed, May 15, 2019 at 12:03 AM Junru Shao > > wrote: > > > > > Hi Pedro, > > > > > > I really appreciate that a diligent and talented engineer eagerly wants > > to > > > improve our system, and am very thankful that you have done so much for > > our > > > community. However, I do want to mention some points that I believe I > > > should mention. > > > > > > While I agree with Tianqi that every design has its pros and cons, I > > would > > > love to emphasize that a *good taste* of system design is to optimize > the > > > bottleneck, enhance expressiveness (and usability), i.e. to do what > needs > > > doing, rather than *trivial nits* that are irrelevant to either > > performance > > > or expressiveness. Generally speaking, typed or untyped, shared_ptr or > > > unique_ptr, won't affect the overall performance when it comes to deep > > > learning workload, specially when we have an async scheduler that does > > good > > > latency hiding in MXNet - to me, these are not major issues that are > > worth > > > re-designing our entire system. > > > > > > To benefit users - real-world ML practitioners, the most thing I would > > love > > > to mention is that dataflow graph-based representation is increasingly > > > incapable of modern neural networks, because the increasingly appeared > > > structures like arbitrary control flow (w/ continue, break, etc), > > > recursion, type conjunction and disjunction, etc. These issues will be > > our > > > priority to address, which is brought by Relay, which addresses all > these > > > pain points. > > > > > > Another minor thing I would love to humbly mention is that, for sake of > > our > > > brand, it is our responsibility to be professional about terminologies > > when > > > writing an official proposal on Confluence. As one of the numerous > > > examples, the title of the proposal really shocks me for a while, > > something > > > like "operators graph" blah blah so weird. Educate me if I were wrong, > > but > > > compiler community would prefer the term "intermediate representation", > > and > > >
Re: [Proposal] New operator graph for MXNet
Being dismissive and condescending has been exactly what is plaguing this project. I agree the last paragraph sounds very condescending and very dismissive and it breaks many code of conducts listed. On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian wrote: > Hi Junru, > > Overall, I appreciate the points you made about the proposal. > > Having said that, I would like to remind the Apache Code of Conduct : > https://www.apache.org/foundation/policies/conduct. > "Be empathetic, welcoming, friendly and patient". > > I find your tone condescending. Clearly you understand what he meant from > the context whether you prefer to call IR in compilers or data-flow in > distributed systems. You could very well say lets use this terminology to > have a common understanding instead of saying go learn the basic concepts. > Before building a cool brand, its important to build a healthy community. > > Anirudh > > > On Wed, May 15, 2019 at 12:03 AM Junru Shao > wrote: > > > Hi Pedro, > > > > I really appreciate that a diligent and talented engineer eagerly wants > to > > improve our system, and am very thankful that you have done so much for > our > > community. However, I do want to mention some points that I believe I > > should mention. > > > > While I agree with Tianqi that every design has its pros and cons, I > would > > love to emphasize that a *good taste* of system design is to optimize the > > bottleneck, enhance expressiveness (and usability), i.e. to do what needs > > doing, rather than *trivial nits* that are irrelevant to either > performance > > or expressiveness. Generally speaking, typed or untyped, shared_ptr or > > unique_ptr, won't affect the overall performance when it comes to deep > > learning workload, specially when we have an async scheduler that does > good > > latency hiding in MXNet - to me, these are not major issues that are > worth > > re-designing our entire system. > > > > To benefit users - real-world ML practitioners, the most thing I would > love > > to mention is that dataflow graph-based representation is increasingly > > incapable of modern neural networks, because the increasingly appeared > > structures like arbitrary control flow (w/ continue, break, etc), > > recursion, type conjunction and disjunction, etc. These issues will be > our > > priority to address, which is brought by Relay, which addresses all these > > pain points. > > > > Another minor thing I would love to humbly mention is that, for sake of > our > > brand, it is our responsibility to be professional about terminologies > when > > writing an official proposal on Confluence. As one of the numerous > > examples, the title of the proposal really shocks me for a while, > something > > like "operators graph" blah blah so weird. Educate me if I were wrong, > but > > compiler community would prefer the term "intermediate representation", > and > > distributed system community would prefer "dataflow graph". If you don't > > have knowledge in these fields, a better way for efficient communication > is > > to get yourself first familiarize the most basic concepts and then do > > discussion. This is a way to save your own valuable time as well. > > > > Again, thank you so much for your hard work, and hope that we could work > > together to win customers in the future :-) > > > > Thanks, > > Junru > > > > > > On Tue, May 14, 2019 at 8:03 PM Tianqi Chen > > wrote: > > > > > The core part of the proposal is to move the graph to be much more > > strongly > > > typed template class. > > > I think this is mainly a point of engineering taste, and both sides > have > > > pros and cons, let me list them before I share my thoughts on this > issue: > > > > > > - Typed fields certainly enjoy more compile-time type checking, on the > > > other hand, it is hard to expose > > >template of explosive possibilities to frontend languages. > > > - More type-erased fields provide runtime flexibility to store > > polymorphic > > > types as well as extensible attributes for graph optimization > > > - It is hard to use a virtual class to expose every possible > attribute > > > that an operator might have, such as inlining, storage pattern, > gradient > > > etc.. > > > - The nature of supporting a growing set of operator attribute > > requires a > > > type-erased attrs field. > > > - In contrast to your argument(typing is a blocker to features), > > > type-erased or typed code can both get to the same feature except, > except > > > that > > > typed code gets more compile-time errors while type-erased get some > of > > > them in runtime. > > > - Templatized data structures will likely introduce additional metal > > > burdens to developers and are not really suitable as a core data > > structure > > >- Because they imply an explosive number of possible data > structures, > > > while the core data structure should be a single one. > > > > > > Now my view(as an MXNet PMC member) on typed vs type-erased style: If > > MXNet > > >
Re: [Proposal] New operator graph for MXNet
Hi Junru, Overall, I appreciate the points you made about the proposal. Having said that, I would like to remind the Apache Code of Conduct : https://www.apache.org/foundation/policies/conduct. "Be empathetic, welcoming, friendly and patient". I find your tone condescending. Clearly you understand what he meant from the context whether you prefer to call IR in compilers or data-flow in distributed systems. You could very well say lets use this terminology to have a common understanding instead of saying go learn the basic concepts. Before building a cool brand, its important to build a healthy community. Anirudh On Wed, May 15, 2019 at 12:03 AM Junru Shao wrote: > Hi Pedro, > > I really appreciate that a diligent and talented engineer eagerly wants to > improve our system, and am very thankful that you have done so much for our > community. However, I do want to mention some points that I believe I > should mention. > > While I agree with Tianqi that every design has its pros and cons, I would > love to emphasize that a *good taste* of system design is to optimize the > bottleneck, enhance expressiveness (and usability), i.e. to do what needs > doing, rather than *trivial nits* that are irrelevant to either performance > or expressiveness. Generally speaking, typed or untyped, shared_ptr or > unique_ptr, won't affect the overall performance when it comes to deep > learning workload, specially when we have an async scheduler that does good > latency hiding in MXNet - to me, these are not major issues that are worth > re-designing our entire system. > > To benefit users - real-world ML practitioners, the most thing I would love > to mention is that dataflow graph-based representation is increasingly > incapable of modern neural networks, because the increasingly appeared > structures like arbitrary control flow (w/ continue, break, etc), > recursion, type conjunction and disjunction, etc. These issues will be our > priority to address, which is brought by Relay, which addresses all these > pain points. > > Another minor thing I would love to humbly mention is that, for sake of our > brand, it is our responsibility to be professional about terminologies when > writing an official proposal on Confluence. As one of the numerous > examples, the title of the proposal really shocks me for a while, something > like "operators graph" blah blah so weird. Educate me if I were wrong, but > compiler community would prefer the term "intermediate representation", and > distributed system community would prefer "dataflow graph". If you don't > have knowledge in these fields, a better way for efficient communication is > to get yourself first familiarize the most basic concepts and then do > discussion. This is a way to save your own valuable time as well. > > Again, thank you so much for your hard work, and hope that we could work > together to win customers in the future :-) > > Thanks, > Junru > > > On Tue, May 14, 2019 at 8:03 PM Tianqi Chen > wrote: > > > The core part of the proposal is to move the graph to be much more > strongly > > typed template class. > > I think this is mainly a point of engineering taste, and both sides have > > pros and cons, let me list them before I share my thoughts on this issue: > > > > - Typed fields certainly enjoy more compile-time type checking, on the > > other hand, it is hard to expose > >template of explosive possibilities to frontend languages. > > - More type-erased fields provide runtime flexibility to store > polymorphic > > types as well as extensible attributes for graph optimization > > - It is hard to use a virtual class to expose every possible attribute > > that an operator might have, such as inlining, storage pattern, gradient > > etc.. > > - The nature of supporting a growing set of operator attribute > requires a > > type-erased attrs field. > > - In contrast to your argument(typing is a blocker to features), > > type-erased or typed code can both get to the same feature except, except > > that > > typed code gets more compile-time errors while type-erased get some of > > them in runtime. > > - Templatized data structures will likely introduce additional metal > > burdens to developers and are not really suitable as a core data > structure > >- Because they imply an explosive number of possible data structures, > > while the core data structure should be a single one. > > > > Now my view(as an MXNet PMC member) on typed vs type-erased style: If > MXNet > > is a pure C++ project, I might take more of the typed approach. > > However, MXNet itself is a project that takes python/scala/clojure and > > other frontend languages. > > The introduction of more typing may not align with the original goal as > the > > tradeoffs I listed above. > > > > This proposal is really a drastic change of what NNVM does, as well as > the > > optimization passes, and given the scope, in your analogy, "a new vehicle > > to solve all the problems" > > rather than a
Re: [Proposal] New operator graph for MXNet
Hi Pedro, I really appreciate that a diligent and talented engineer eagerly wants to improve our system, and am very thankful that you have done so much for our community. However, I do want to mention some points that I believe I should mention. While I agree with Tianqi that every design has its pros and cons, I would love to emphasize that a *good taste* of system design is to optimize the bottleneck, enhance expressiveness (and usability), i.e. to do what needs doing, rather than *trivial nits* that are irrelevant to either performance or expressiveness. Generally speaking, typed or untyped, shared_ptr or unique_ptr, won't affect the overall performance when it comes to deep learning workload, specially when we have an async scheduler that does good latency hiding in MXNet - to me, these are not major issues that are worth re-designing our entire system. To benefit users - real-world ML practitioners, the most thing I would love to mention is that dataflow graph-based representation is increasingly incapable of modern neural networks, because the increasingly appeared structures like arbitrary control flow (w/ continue, break, etc), recursion, type conjunction and disjunction, etc. These issues will be our priority to address, which is brought by Relay, which addresses all these pain points. Another minor thing I would love to humbly mention is that, for sake of our brand, it is our responsibility to be professional about terminologies when writing an official proposal on Confluence. As one of the numerous examples, the title of the proposal really shocks me for a while, something like "operators graph" blah blah so weird. Educate me if I were wrong, but compiler community would prefer the term "intermediate representation", and distributed system community would prefer "dataflow graph". If you don't have knowledge in these fields, a better way for efficient communication is to get yourself first familiarize the most basic concepts and then do discussion. This is a way to save your own valuable time as well. Again, thank you so much for your hard work, and hope that we could work together to win customers in the future :-) Thanks, Junru On Tue, May 14, 2019 at 8:03 PM Tianqi Chen wrote: > The core part of the proposal is to move the graph to be much more strongly > typed template class. > I think this is mainly a point of engineering taste, and both sides have > pros and cons, let me list them before I share my thoughts on this issue: > > - Typed fields certainly enjoy more compile-time type checking, on the > other hand, it is hard to expose >template of explosive possibilities to frontend languages. > - More type-erased fields provide runtime flexibility to store polymorphic > types as well as extensible attributes for graph optimization > - It is hard to use a virtual class to expose every possible attribute > that an operator might have, such as inlining, storage pattern, gradient > etc.. > - The nature of supporting a growing set of operator attribute requires a > type-erased attrs field. > - In contrast to your argument(typing is a blocker to features), > type-erased or typed code can both get to the same feature except, except > that > typed code gets more compile-time errors while type-erased get some of > them in runtime. > - Templatized data structures will likely introduce additional metal > burdens to developers and are not really suitable as a core data structure >- Because they imply an explosive number of possible data structures, > while the core data structure should be a single one. > > Now my view(as an MXNet PMC member) on typed vs type-erased style: If MXNet > is a pure C++ project, I might take more of the typed approach. > However, MXNet itself is a project that takes python/scala/clojure and > other frontend languages. > The introduction of more typing may not align with the original goal as the > tradeoffs I listed above. > > This proposal is really a drastic change of what NNVM does, as well as the > optimization passes, and given the scope, in your analogy, "a new vehicle > to solve all the problems" > rather than a minor patch. It will take a lot of engineering effort to > bring in new features and adapting the existing ones. > Because of that, it does merit a discussion about how shall we think about > the future MXNet2.0. > > Technically Relay is a serious candidate. Of course relay, as well as its > core, is in C++ but maintains the multi-language first principle, that is > why the example code was in python. > See more related discussion comparing NNVMv1 and relay: > https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5 > > I think the ideal graph data structure candidate for MXNet2.0 should have > natural support for: > - Native support of function, module, and recursions > - Control flows > - The ability of interpolation with multi-language frontend, e.g. being > able to prototype graph optimizations in python/scala/clojure if
Re: [Proposal] New operator graph for MXNet
The core part of the proposal is to move the graph to be much more strongly typed template class. I think this is mainly a point of engineering taste, and both sides have pros and cons, let me list them before I share my thoughts on this issue: - Typed fields certainly enjoy more compile-time type checking, on the other hand, it is hard to expose template of explosive possibilities to frontend languages. - More type-erased fields provide runtime flexibility to store polymorphic types as well as extensible attributes for graph optimization - It is hard to use a virtual class to expose every possible attribute that an operator might have, such as inlining, storage pattern, gradient etc.. - The nature of supporting a growing set of operator attribute requires a type-erased attrs field. - In contrast to your argument(typing is a blocker to features), type-erased or typed code can both get to the same feature except, except that typed code gets more compile-time errors while type-erased get some of them in runtime. - Templatized data structures will likely introduce additional metal burdens to developers and are not really suitable as a core data structure - Because they imply an explosive number of possible data structures, while the core data structure should be a single one. Now my view(as an MXNet PMC member) on typed vs type-erased style: If MXNet is a pure C++ project, I might take more of the typed approach. However, MXNet itself is a project that takes python/scala/clojure and other frontend languages. The introduction of more typing may not align with the original goal as the tradeoffs I listed above. This proposal is really a drastic change of what NNVM does, as well as the optimization passes, and given the scope, in your analogy, "a new vehicle to solve all the problems" rather than a minor patch. It will take a lot of engineering effort to bring in new features and adapting the existing ones. Because of that, it does merit a discussion about how shall we think about the future MXNet2.0. Technically Relay is a serious candidate. Of course relay, as well as its core, is in C++ but maintains the multi-language first principle, that is why the example code was in python. See more related discussion comparing NNVMv1 and relay: https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5 I think the ideal graph data structure candidate for MXNet2.0 should have natural support for: - Native support of function, module, and recursions - Control flows - The ability of interpolation with multi-language frontend, e.g. being able to prototype graph optimizations in python/scala/clojure if needed. Adding these support needs significant engineering effort, and I do hope we only have to do it once. While I don't want to force any conclusion here, I do think Relay is one such candidate. Tianqi On Tue, May 14, 2019 at 5:58 PM Pedro Larroy wrote: > Hi Tianqi > > Thanks for the quick response. > > Could you point to examples where graph.h is being exposed which would > not be possible with what I propose? I don't think my proposal is > having any impact in language bindings, and the way I describe it > doesn't affect having or not having higher language bindings. Please > elaborate so I can understand your concern. Maybe code examples where > the graph attributes are being changed from Python? I don't think we > have this on MXNet. This is such a core foundation for MXNet, that I > don't think we should compromise on it because other project not > directly related to MXNet might want to expose some untyped graph and > Node attributes. The current status makes maintaining the code very > painful and also is preventing desired features such as higher order > gradients to be developed. I have heard from you many times how speed > is critical for us to innovate in this quickly changing field. > > My proposal is limited to the graph and wouldn't change the way > operators are registered and arguments are processed for operators for > example. > > > Regarding the second point, the documentation about Relay in the web > which I found for example: > > https://docs.tvm.ai/dev/relay_add_op.html# > > Is somebody working on making Imperative::Backward use this API? this > would be a big change which I'm not aware of. And using an IR is of a > much bigger scope than the change I'm proposing here for example. > > I think I'm having difficulty understanding what are the arguments > here. I'm saying I need to change one piece of my car and what you are > selling me is a new vehicle here? Or your suggestion that we use > Relay for the graph passes in MXNet? > > I would like to see C++ code examples, Python examples are not > sufficient when we talk about the core MXNet. > > Pedro. > > > > > > > On Tue, May 14, 2019 at 5:39 PM Tianqi Chen > wrote: > > > > Thanks for the proposal. Let me share some of my thoughts: > > > > Specific comments on the proposal > >
Re: [Proposal] New operator graph for MXNet
Hi Tianqi I thought a bit more about your comments and I think there is a simple way to address your concerns that satisfies both needs. We can have a NodeAttributes template class which has a map of string to any as it's currenlty the case, so the graph can be used in the highly dynamic scenario that you are concerned about. Let me know what you think. Pedro. On Tue, May 14, 2019 at 5:50 PM Pedro Larroy wrote: > > Hi Tianqi > > Thanks for the quick response. > > Could you point to examples where graph.h is being exposed which would > not be possible with what I propose? I don't think my proposal is > having any impact in language bindings, and the way I describe it > doesn't affect having or not having higher language bindings. Please > elaborate so I can understand your concern. Maybe code examples where > the graph attributes are being changed from Python? I don't think we > have this on MXNet. This is such a core foundation for MXNet, that I > don't think we should compromise on it because other project not > directly related to MXNet might want to expose some untyped graph and > Node attributes. The current status makes maintaining the code very > painful and also is preventing desired features such as higher order > gradients to be developed. I have heard from you many times how speed > is critical for us to innovate in this quickly changing field. > > My proposal is limited to the graph and wouldn't change the way > operators are registered and arguments are processed for operators for > example. > > > Regarding the second point, the documentation about Relay in the web > which I found for example: > > https://docs.tvm.ai/dev/relay_add_op.html# > > Is somebody working on making Imperative::Backward use this API? this > would be a big change which I'm not aware of. And using an IR is of a > much bigger scope than the change I'm proposing here for example. > > I think I'm having difficulty understanding what are the arguments > here. I'm saying I need to change one piece of my car and what you are > selling me is a new vehicle here? Or your suggestion that we use > Relay for the graph passes in MXNet? > > I would like to see C++ code examples, Python examples are not > sufficient when we talk about the core MXNet. > > Pedro. > > > > > > > On Tue, May 14, 2019 at 5:39 PM Tianqi Chen wrote: > > > > Thanks for the proposal. Let me share some of my thoughts: > > > > Specific comments on the proposal > > --- > > The heavy use of generic in the Graph type was a huge departure from > > type-erased data structure which was presented in the previous design. > > While we understand the advantage of typed language(more compile-time > > checking) and type-erased types(more dynamism) the heavy use of > > the template will actually make the project solely C++ focused, making it > > hard to expose intermediate(templatized) data structure to > > other languages like python/scala/clojure. > > > > While I fully understand some of the lessons taught in programming > > C++(reduce shared_ptr, more typing etc.) > > We need to think about the context of MXNet project and **the need to > > support multi-language as a first-class**. > > Some of the type-erased types are design trade-offs made to support these > > features, and we need to think more > > carefully instead of just applying "rules for C++" which may bring problems. > > > > Future of NNVM > > -- > > Given that this thread touched upon what we should do for better > > computational graph handling. I would recommend also to take a look at > > NNVMv2 -- relay. > > > > Relay addresses many of the wish-lists in the proposal already, such as > > operator fusion, high order gradient, offload to hardware, isolated > > compilation, deployment on edge and accelerators etc. > > Relay also address problems not yet being mentioned in the proposal, > > including control flow and dynamic runtime, automatic layout optimization > > etc. > > > > Tianqi > > > > On Tue, May 14, 2019 at 5:06 PM Sheng Zha wrote: > > > > > Hi Pedro, > > > > > > Thanks for taking the inititaive. Skimming through the design doc, I > > > didn't see comparison with existing solutions such as relay in tvm, which > > > is already a dependency of mxnet already. Could you elaborate on > > > comparison > > > with existing solutions in the design doc too? > > > > > > -sz > > > > > > On 2019/05/14 23:49:30, Pedro Larroy > > > wrote: > > > > Hi dev@ > > > > > > > > As a result of my deep dives on the graph machinery I have created a > > > > new proposal to improve the operator graph in MXNet. > > > > > > > > This would mean superseding the use of NNVM Graph in MXNet and having > > > > a new implementation that we can use to simplify a lot of code and do > > > > powerful graph manipulation and passes such as operator fusion and > > > > other optimizations. > > > > > > > > As it would be a change with big impact and ramifications, your >
Re: [Proposal] New operator graph for MXNet
Hi Tianqi Thanks for the quick response. Could you point to examples where graph.h is being exposed which would not be possible with what I propose? I don't think my proposal is having any impact in language bindings, and the way I describe it doesn't affect having or not having higher language bindings. Please elaborate so I can understand your concern. Maybe code examples where the graph attributes are being changed from Python? I don't think we have this on MXNet. This is such a core foundation for MXNet, that I don't think we should compromise on it because other project not directly related to MXNet might want to expose some untyped graph and Node attributes. The current status makes maintaining the code very painful and also is preventing desired features such as higher order gradients to be developed. I have heard from you many times how speed is critical for us to innovate in this quickly changing field. My proposal is limited to the graph and wouldn't change the way operators are registered and arguments are processed for operators for example. Regarding the second point, the documentation about Relay in the web which I found for example: https://docs.tvm.ai/dev/relay_add_op.html# Is somebody working on making Imperative::Backward use this API? this would be a big change which I'm not aware of. And using an IR is of a much bigger scope than the change I'm proposing here for example. I think I'm having difficulty understanding what are the arguments here. I'm saying I need to change one piece of my car and what you are selling me is a new vehicle here? Or your suggestion that we use Relay for the graph passes in MXNet? I would like to see C++ code examples, Python examples are not sufficient when we talk about the core MXNet. Pedro. On Tue, May 14, 2019 at 5:39 PM Tianqi Chen wrote: > > Thanks for the proposal. Let me share some of my thoughts: > > Specific comments on the proposal > --- > The heavy use of generic in the Graph type was a huge departure from > type-erased data structure which was presented in the previous design. > While we understand the advantage of typed language(more compile-time > checking) and type-erased types(more dynamism) the heavy use of > the template will actually make the project solely C++ focused, making it > hard to expose intermediate(templatized) data structure to > other languages like python/scala/clojure. > > While I fully understand some of the lessons taught in programming > C++(reduce shared_ptr, more typing etc.) > We need to think about the context of MXNet project and **the need to > support multi-language as a first-class**. > Some of the type-erased types are design trade-offs made to support these > features, and we need to think more > carefully instead of just applying "rules for C++" which may bring problems. > > Future of NNVM > -- > Given that this thread touched upon what we should do for better > computational graph handling. I would recommend also to take a look at > NNVMv2 -- relay. > > Relay addresses many of the wish-lists in the proposal already, such as > operator fusion, high order gradient, offload to hardware, isolated > compilation, deployment on edge and accelerators etc. > Relay also address problems not yet being mentioned in the proposal, > including control flow and dynamic runtime, automatic layout optimization > etc. > > Tianqi > > On Tue, May 14, 2019 at 5:06 PM Sheng Zha wrote: > > > Hi Pedro, > > > > Thanks for taking the inititaive. Skimming through the design doc, I > > didn't see comparison with existing solutions such as relay in tvm, which > > is already a dependency of mxnet already. Could you elaborate on comparison > > with existing solutions in the design doc too? > > > > -sz > > > > On 2019/05/14 23:49:30, Pedro Larroy > > wrote: > > > Hi dev@ > > > > > > As a result of my deep dives on the graph machinery I have created a > > > new proposal to improve the operator graph in MXNet. > > > > > > This would mean superseding the use of NNVM Graph in MXNet and having > > > a new implementation that we can use to simplify a lot of code and do > > > powerful graph manipulation and passes such as operator fusion and > > > other optimizations. > > > > > > As it would be a change with big impact and ramifications, your > > > thoughts and feedback on the document would be highly appreciated so > > > we can take potential future interesting use cases: > > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0 > > > > > > Pedro. > > > > >
Re: [Proposal] New operator graph for MXNet
Thanks for the proposal. Let me share some of my thoughts: Specific comments on the proposal --- The heavy use of generic in the Graph type was a huge departure from type-erased data structure which was presented in the previous design. While we understand the advantage of typed language(more compile-time checking) and type-erased types(more dynamism) the heavy use of the template will actually make the project solely C++ focused, making it hard to expose intermediate(templatized) data structure to other languages like python/scala/clojure. While I fully understand some of the lessons taught in programming C++(reduce shared_ptr, more typing etc.) We need to think about the context of MXNet project and **the need to support multi-language as a first-class**. Some of the type-erased types are design trade-offs made to support these features, and we need to think more carefully instead of just applying "rules for C++" which may bring problems. Future of NNVM -- Given that this thread touched upon what we should do for better computational graph handling. I would recommend also to take a look at NNVMv2 -- relay. Relay addresses many of the wish-lists in the proposal already, such as operator fusion, high order gradient, offload to hardware, isolated compilation, deployment on edge and accelerators etc. Relay also address problems not yet being mentioned in the proposal, including control flow and dynamic runtime, automatic layout optimization etc. Tianqi On Tue, May 14, 2019 at 5:06 PM Sheng Zha wrote: > Hi Pedro, > > Thanks for taking the inititaive. Skimming through the design doc, I > didn't see comparison with existing solutions such as relay in tvm, which > is already a dependency of mxnet already. Could you elaborate on comparison > with existing solutions in the design doc too? > > -sz > > On 2019/05/14 23:49:30, Pedro Larroy > wrote: > > Hi dev@ > > > > As a result of my deep dives on the graph machinery I have created a > > new proposal to improve the operator graph in MXNet. > > > > This would mean superseding the use of NNVM Graph in MXNet and having > > a new implementation that we can use to simplify a lot of code and do > > powerful graph manipulation and passes such as operator fusion and > > other optimizations. > > > > As it would be a change with big impact and ramifications, your > > thoughts and feedback on the document would be highly appreciated so > > we can take potential future interesting use cases: > > > > > https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0 > > > > Pedro. > > >
Re: [Proposal] New operator graph for MXNet
Hi Sheng Could you provide relevant links to Relay and what you would recommend to read so we have a focused discussion instead of me potentially me miss-searching? Probably I also missed the discussion or vote in the mail list regarding including TVM as a depedency or future plans on using Relay. As far as I know, we have TVM as a dependency because NNVM was assimilated into it but we are not using it directly. Is this correct? This would help me to add this information to the doc as you request. Thanks. Pedro. On Tue, May 14, 2019 at 5:06 PM Sheng Zha wrote: > > Hi Pedro, > > Thanks for taking the inititaive. Skimming through the design doc, I didn't > see comparison with existing solutions such as relay in tvm, which is already > a dependency of mxnet already. Could you elaborate on comparison with > existing solutions in the design doc too? > > -sz > > On 2019/05/14 23:49:30, Pedro Larroy wrote: > > Hi dev@ > > > > As a result of my deep dives on the graph machinery I have created a > > new proposal to improve the operator graph in MXNet. > > > > This would mean superseding the use of NNVM Graph in MXNet and having > > a new implementation that we can use to simplify a lot of code and do > > powerful graph manipulation and passes such as operator fusion and > > other optimizations. > > > > As it would be a change with big impact and ramifications, your > > thoughts and feedback on the document would be highly appreciated so > > we can take potential future interesting use cases: > > > > https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0 > > > > Pedro. > >
Re: [Proposal] New operator graph for MXNet
Hi Pedro, Thanks for taking the inititaive. Skimming through the design doc, I didn't see comparison with existing solutions such as relay in tvm, which is already a dependency of mxnet already. Could you elaborate on comparison with existing solutions in the design doc too? -sz On 2019/05/14 23:49:30, Pedro Larroy wrote: > Hi dev@ > > As a result of my deep dives on the graph machinery I have created a > new proposal to improve the operator graph in MXNet. > > This would mean superseding the use of NNVM Graph in MXNet and having > a new implementation that we can use to simplify a lot of code and do > powerful graph manipulation and passes such as operator fusion and > other optimizations. > > As it would be a change with big impact and ramifications, your > thoughts and feedback on the document would be highly appreciated so > we can take potential future interesting use cases: > > https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0 > > Pedro. >
[Proposal] New operator graph for MXNet
Hi dev@ As a result of my deep dives on the graph machinery I have created a new proposal to improve the operator graph in MXNet. This would mean superseding the use of NNVM Graph in MXNet and having a new implementation that we can use to simplify a lot of code and do powerful graph manipulation and passes such as operator fusion and other optimizations. As it would be a change with big impact and ramifications, your thoughts and feedback on the document would be highly appreciated so we can take potential future interesting use cases: https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0 Pedro.