Re: [DISCUSS] 1.5.0 Release Plan

2019-05-15 Thread Junru Shao
Hi folks,

Here I may have a release blocker for 1.5.0 about implementation of dynamic
shape mechanism, which somehow conflicts with Gluon's deferred
initialization [1].

[1] https://github.com/dmlc/gluon-nlp/issues/706

On Wed, May 15, 2019 at 12:09 PM Anirudh Subramanian 
wrote:

> Hi Lai,
>
> From the discussion I had with Nvidia offline they are targeting on pushing
> the required changes today.
> Since this is important feature for the release, if this gets delayed and
> cannot  be merged by 05/17/2019,
> the code freeze date may need to be changed.
>
> Anirudh
>
> On Wed, May 15, 2019 at 1:23 AM Lv, Tao A  wrote:
>
> > Hi dev,
> >
> > We see there are several github issues [1][2][3][4] about mxnet windows
> > build experience. The team is working intensively [5][6][7] on that to
> fix
> > some problems of MKL-DNN build on windows. We hope these fixes can catch
> > the code freeze and finally enter the 1.5.0 release.
> >
> > The PR against mshadow (#374) was already merged and MXNet PR #14877 is
> > under review - great thanks to CI team for helping on the MKL
> installation
> > request. PR #14952 is document change according to build logic changes in
> > PR #14877. So I think these two PRs should be merged simultaneously.
> > Currently #14877 is experiencing a CI response problem.
> >
> > Please take your time to have a look at these two PRs. Your comments and
> > suggestions are highly appreciated.
> >
> > Thanks,
> > -tao
> >
> > [1] https://github.com/apache/incubator-mxnet/issues/14670
> > [2] https://github.com/apache/incubator-mxnet/issues/14335
> > [3] https://github.com/apache/incubator-mxnet/issues/14203
> > [4] https://github.com/apache/incubator-mxnet/issues/14085
> > [5] https://github.com/apache/incubator-mxnet/pull/14877
> > [6] https://github.com/dmlc/mshadow/pull/374
> > [7] https://github.com/apache/incubator-mxnet/pull/14952
> >
> > -Original Message-
> > From: Lai Wei [mailto:roywei...@gmail.com]
> > Sent: Wednesday, May 15, 2019 2:57 PM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: [DISCUSS] 1.5.0 Release Plan
> >
> > Hi Anirudh,
> >
> > I see there was an offline disucssion
> > <
> >
> https://github.com/apache/incubator-mxnet/pull/14173#pullrequestreview-235846341
> > >
> > and I have updated the AMP feature and your project on the release
> tracker
> > <
> >
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+and+Status
> > >
> > ,
> > Please let me know if you have any updates.
> >
> > Hi @dev,
> > This is a gentle reminder that  the code freeze for 1.5.0 release is on
> > 05/17/2019, please let us know if you have any WIP pull requests aiming
> for
> > 1.5.0 that needs attention.
> > Please understand we already have around 650 commits in master that need
> > to be released in time. We understand TensorRT test in CI is failing and
> > are trying to fix it. Meanwhile please update the tracker if there is any
> > change:
> >
> >
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+and+Status
> >
> > Thanks!
> >
> > Lai
> >
> >
> > On Wed, May 8, 2019 at 11:58 AM Anirudh Subramanian <
> anirudh2...@gmail.com
> > >
> > wrote:
> >
> > > Hi Sheng,
> > >
> > > I had a discussion with nvidia folks offline today (@ptrendx et. al.).
> > > I strongly feel that the AMP feature should be included as part of the
> > > release: https://github.com/apache/incubator-mxnet/pull/14173 .
> > > The PR is aimed for completion for next week but reviews and RFC
> > > discussions may take some time. I would request to extend the release
> > > code freeze by 2 weeks.
> > > Also, I would like to include
> > >
> > > https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32
> > > +to+Mixed+Precision+Models
> > > which
> > > depends on the AMP PR.
> > > I am also aiming for adding a PR by this week end or early next week,
> > > but reviews will take longer than May 17th.
> > >
> > > Anirudh
> > >
> > >
> > > On Mon, May 6, 2019 at 11:49 PM Sheng Zha  wrote:
> > >
> > > > Hi,
> > > >
> > > > While 1.4.1 vote on general@incubator is still on going, I’d like to
> > > > propose that we start preparing 1.5.0 release.
> > > >
> > > > 1.5.0 will include changes that dates back to last year and there
> > > > has
> > > been
> > > > a lot of new features and improvements in it, so it will likely time
> > > > us more time to prepare than 1.4.1. I propose the following timeline:
> > > > - Cut release branch: release branch already cut. Will sync with
> > > > master branch on 5/15/2019 EOD.
> > > > - Code freeze: 5/17/2019. No more changes unless the release branch
> > > > is in a broken state.
> > > > - Tag and vote: 5/20/2019 onward.
> > > >
> > > > Lai Wei (roywei@) expressed to me offline that he’s willing to help
> > > drive
> > > > this release as release manager, and I’m happy to help again as
> > > committer.
> > > >
> > > > If you have features in progress that you’d like to include in 1.5.0:
> > > > - Add your feature to the scope:
> > > >
> 

Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Junru Shao
Hi Zach,

Thank you for raising these points! I am happy to offer more reading
materials about this topic.

*SSA vs ANF.* ANF and SSA are essentially the same thing [1].

*AD in Relay.* Relay is able to do AD through not only control flow, but
also various data structures and higher-order functjon [2].

[1] Appel, Andrew W. "SSA is functional programming." *ACM SIGPLAN
Notices* 33.4
(1998): 17-20.
[2] Roesch, Jared, et al. "Relay: a new IR for machine learning
frameworks." *Proceedings of the 2nd ACM SIGPLAN International Workshop on
Machine Learning and Programming Languages*. ACM, 2018.


On Wed, May 15, 2019 at 12:01 PM Zach Kimberg 
wrote:

> I would like to raise another option to get back on the topic of changing
> the Operator graph structure. On the page discussing Relay IR [1], it
> discusses mainly the difference between a data flow graph like we use now
> and A-normal [2] which is used in some functional compilers. Is there a
> reason we do not want to use a structure based on Single Static Assignment
> Form (wikipedia explanation [3], lecture note explanation [4]). It is used
> almost universally in the compiler community including in LLVM (clang),
> GCC, Oracle JVM, PyPy, Go, Webkit, and Swift [5]. The major reason behind
> it's pervasiveness is that it has proven very effective for analysis and
> transformations when dealing with control flow.
>
> One possible concern is that it might make automatic differentiation more
> difficult [6]. While it certainly is more complicated than a pure
> functional approach, the functional approach requires users to use
> functional programming. Especially with the languages we support now, that
> doesn't seem like a reasonable assumption. Given that the users are already
> introducing the complexity inherent in imperative programming, we have to
> deal with the increased complexity regardless. I think it might be easier
> to have the tools to deal with that rather than attempting to coerce users
> into a different programming paradigm or convert code between paradigms.
> Furthermore, this may become more important if users are increasingly
> making use of control flow like Junru said.
>
> Zach
>
>
> [1] - https://docs.tvm.ai/dev/relay_intro.html
> [2] - https://en.wikipedia.org/wiki/A-normal_form
> [3] - https://en.wikipedia.org/wiki/Static_single_assignment_form
> [4] - https://www.cs.cmu.edu/~rjsimmon/15411-f15/lec/10-ssa.pdf
> [5] -
>
> https://en.wikipedia.org/wiki/Static_single_assignment_form#Compilers_using_SSA_form
> [6] - https://discuss.tvm.ai/t/choice-about-ir-ssa-or-anf/1757/2
>
> On Wed, May 15, 2019 at 11:51 AM Naveen Swamy  wrote:
>
> > Being dismissive and condescending has been exactly what is plaguing this
> > project.
> >
> > I agree the last paragraph sounds very condescending and very dismissive
> > and it breaks many code of conducts listed.
> >
> > On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian <
> > anirudh2...@gmail.com>
> > wrote:
> >
> > > Hi Junru,
> > >
> > > Overall, I appreciate the points you made about the proposal.
> > >
> > > Having said that, I would like to remind the Apache Code of Conduct :
> > > https://www.apache.org/foundation/policies/conduct.
> > > "Be empathetic, welcoming, friendly and patient".
> > >
> > > I find your tone condescending. Clearly you understand what he meant
> from
> > > the context whether you prefer to call IR in compilers or data-flow in
> > > distributed systems. You could very well say lets use this terminology
> to
> > > have a common understanding instead of saying go learn the basic
> > concepts.
> > > Before building a cool brand, its important to build a healthy
> community.
> > >
> > > Anirudh
> > >
> > >
> > > On Wed, May 15, 2019 at 12:03 AM Junru Shao 
> > > wrote:
> > >
> > > > Hi Pedro,
> > > >
> > > > I really appreciate that a diligent and talented engineer eagerly
> wants
> > > to
> > > > improve our system, and am very thankful that you have done so much
> for
> > > our
> > > > community. However, I do want to mention some points that I believe I
> > > > should mention.
> > > >
> > > > While I agree with Tianqi that every design has its pros and cons, I
> > > would
> > > > love to emphasize that a *good taste* of system design is to optimize
> > the
> > > > bottleneck, enhance expressiveness (and usability), i.e. to do what
> > needs
> > > > doing, rather than *trivial nits* that are irrelevant to either
> > > performance
> > > > or expressiveness. Generally speaking, typed or untyped, shared_ptr
> or
> > > > unique_ptr, won't affect the overall performance when it comes to
> deep
> > > > learning workload, specially when we have an async scheduler that
> does
> > > good
> > > > latency hiding in MXNet - to me, these are not major issues that are
> > > worth
> > > > re-designing our entire system.
> > > >
> > > > To benefit users - real-world ML practitioners, the most thing I
> would
> > > love
> > > > to mention is that dataflow graph-based representation is
> 

Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Tianqi Chen
This is a good point. I believe the main question here is not SSA vs
others, but more about CFG vs structured control flow.

SSA is generally equivalent to ANF or dataflow if you ignore the Phi and
CFG blocks. The current relay IR makes use of more structured
control flow so it does not have an explicit CFG(aka goto).

I believe that for deep learning, it is a good idea to get the highest
level information when possible, and structured control-flow block
is certainly more informative(while eliminating the possibility of goto).
Mutation is something that could be handled in Relay, with explicit
annotation.

Most of the current deep learning programs contain parts that need to be
automatically differentiated, which is usually pure, and parts that need to
update parameters, which can be explicitly marked. The center of the
question is: do we try to represent the parts that are pure directly in the
IR, and maintain
 the necessary high-level structures, or do we allow the IR to represent
more arbitrary programs while trying to use analysis(e.g. alias pointer
analysis)
to recover them. I think the former one would be easier given deep learning
programs are already pretty high level.

Now there is also a discussion about again adding CFG to relay to handle
rare cases which do not have to be optimized. But for what I have seen so
far it seems to fits most of the need.

Tianqi

On Wed, May 15, 2019 at 12:01 PM Zach Kimberg 
wrote:

> I would like to raise another option to get back on the topic of changing
> the Operator graph structure. On the page discussing Relay IR [1], it
> discusses mainly the difference between a data flow graph like we use now
> and A-normal [2] which is used in some functional compilers. Is there a
> reason we do not want to use a structure based on Single Static Assignment
> Form (wikipedia explanation [3], lecture note explanation [4]). It is used
> almost universally in the compiler community including in LLVM (clang),
> GCC, Oracle JVM, PyPy, Go, Webkit, and Swift [5]. The major reason behind
> it's pervasiveness is that it has proven very effective for analysis and
> transformations when dealing with control flow.
>
> One possible concern is that it might make automatic differentiation more
> difficult [6]. While it certainly is more complicated than a pure
> functional approach, the functional approach requires users to use
> functional programming. Especially with the languages we support now, that
> doesn't seem like a reasonable assumption. Given that the users are already
> introducing the complexity inherent in imperative programming, we have to
> deal with the increased complexity regardless. I think it might be easier
> to have the tools to deal with that rather than attempting to coerce users
> into a different programming paradigm or convert code between paradigms.
> Furthermore, this may become more important if users are increasingly
> making use of control flow like Junru said.
>
> Zach
>
>
> [1] - https://docs.tvm.ai/dev/relay_intro.html
> [2] - https://en.wikipedia.org/wiki/A-normal_form
> [3] - https://en.wikipedia.org/wiki/Static_single_assignment_form
> [4] - https://www.cs.cmu.edu/~rjsimmon/15411-f15/lec/10-ssa.pdf
> [5] -
>
> https://en.wikipedia.org/wiki/Static_single_assignment_form#Compilers_using_SSA_form
> [6] - https://discuss.tvm.ai/t/choice-about-ir-ssa-or-anf/1757/2
>
> On Wed, May 15, 2019 at 11:51 AM Naveen Swamy  wrote:
>
> > Being dismissive and condescending has been exactly what is plaguing this
> > project.
> >
> > I agree the last paragraph sounds very condescending and very dismissive
> > and it breaks many code of conducts listed.
> >
> > On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian <
> > anirudh2...@gmail.com>
> > wrote:
> >
> > > Hi Junru,
> > >
> > > Overall, I appreciate the points you made about the proposal.
> > >
> > > Having said that, I would like to remind the Apache Code of Conduct :
> > > https://www.apache.org/foundation/policies/conduct.
> > > "Be empathetic, welcoming, friendly and patient".
> > >
> > > I find your tone condescending. Clearly you understand what he meant
> from
> > > the context whether you prefer to call IR in compilers or data-flow in
> > > distributed systems. You could very well say lets use this terminology
> to
> > > have a common understanding instead of saying go learn the basic
> > concepts.
> > > Before building a cool brand, its important to build a healthy
> community.
> > >
> > > Anirudh
> > >
> > >
> > > On Wed, May 15, 2019 at 12:03 AM Junru Shao 
> > > wrote:
> > >
> > > > Hi Pedro,
> > > >
> > > > I really appreciate that a diligent and talented engineer eagerly
> wants
> > > to
> > > > improve our system, and am very thankful that you have done so much
> for
> > > our
> > > > community. However, I do want to mention some points that I believe I
> > > > should mention.
> > > >
> > > > While I agree with Tianqi that every design has its pros and cons, I
> > > would
> > > > love to emphasize 

Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Pedro Larroy
Hi

Thanks for all the materials and keypoints raised. The discussion has
many ramifications, I will think about them and research them very
carefully before replying further. Please also don't quickly dismiss
the points I have raised and reduce them to typed vs untyped or
pedantic C++ comments, we have been debugging missing nodes and
pointers in the graph when doing second order gradient for weeks with
no success due to the design of the graph.

There's 60 years of software development learnings and practice behind
some concepts, and compiler theory that deep learning frameworks can
also take advantage of instead of rediscovering everything again until
we end up in a typed pure functional IR.
In some of the materials linked you also point out limitations of the
current architecture. I think it's good that we raise this topic and
it shows that we need to have a deeper and structured conversation on
how we evolve the dataflow graph in MXNet. Maybe you can help cross
polinizing this conversation between the TVM and MXNet project. If
there's an intention to change from NNVM to NNVM2 I think this should
have been communicated or discussed with the community before.

Until then.

Pedro.




On Tue, May 14, 2019 at 8:03 PM Tianqi Chen  wrote:
>
> The core part of the proposal is to move the graph to be much more strongly
> typed template class.
> I think this is mainly a point of engineering taste, and both sides have
> pros and cons, let me list them before I share my thoughts on this issue:
>
> - Typed fields certainly enjoy more compile-time type checking, on the
> other hand, it is hard to expose
>template of explosive possibilities to frontend languages.
> - More type-erased fields provide runtime flexibility to store polymorphic
> types as well as extensible attributes for graph optimization
>   - It is hard to use a virtual class to expose every possible attribute
> that an operator might have, such as inlining, storage pattern, gradient
> etc..
>   - The nature of supporting a growing set of operator attribute requires a
> type-erased attrs field.
> - In contrast to your argument(typing is a blocker to features),
> type-erased or typed code can both get to the same feature except, except
> that
>   typed code gets more compile-time errors while type-erased get some of
> them in runtime.
> - Templatized data structures will likely introduce additional metal
> burdens to developers and are not really suitable as a core data structure
>- Because they imply an explosive number of possible data structures,
> while the core data structure should be a single one.
>
> Now my view(as an MXNet PMC member) on typed vs type-erased style: If MXNet
> is a pure C++ project, I might take more of the typed approach.
> However, MXNet itself is a project that takes python/scala/clojure and
> other frontend languages.
> The introduction of more typing may not align with the original goal as the
> tradeoffs I listed above.
>
> This proposal is really a drastic change of what NNVM does, as well as the
> optimization passes, and given the scope, in your analogy, "a new vehicle
> to solve all the problems"
> rather than a minor patch. It will take a lot of engineering effort to
> bring in new features and adapting the existing ones.
> Because of that, it does merit a discussion about how shall we think about
> the future MXNet2.0.
>
> Technically Relay is a serious candidate. Of course relay, as well as its
> core, is in C++ but maintains the multi-language first principle, that is
> why the example code was in python.
> See more related discussion comparing NNVMv1 and relay:
> https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5
>
> I think the ideal graph data structure candidate for MXNet2.0 should have
> natural support for:
> - Native support of function, module, and recursions
> - Control flows
> - The ability of interpolation with multi-language frontend, e.g. being
> able to prototype graph optimizations in python/scala/clojure if needed.
>
> Adding these support needs significant engineering effort, and I do hope we
> only have to do it once. While I don't want to force any conclusion here,
> I do think Relay is one such candidate.
>
> Tianqi
>
>
> On Tue, May 14, 2019 at 5:58 PM Pedro Larroy 
> wrote:
>
> > Hi Tianqi
> >
> > Thanks for the quick response.
> >
> > Could you point to examples where graph.h is being exposed which would
> > not be possible with what I propose? I don't think my proposal is
> > having any impact in language bindings, and the way I describe it
> > doesn't affect having or not having higher language bindings. Please
> > elaborate so I can understand your concern.  Maybe code examples where
> > the graph attributes are being changed from Python?  I don't think we
> > have this on MXNet. This is such a core foundation for MXNet, that I
> > don't think we should compromise on it because other project not
> > directly related to MXNet might want to expose some untyped 

Re: Python2 End of Life

2019-05-15 Thread Damien Stanton
+1 Standardizing on Python 3 will make things easier for both MXNet devs as
well as users.

On Wed, May 15, 2019 at 2:49 PM sandeep krishnamurthy <
sandeep.krishn...@gmail.com> wrote:

> +1 Thanks for bringing this up Zach.
> Can we include this intent to deprecate support for Python 2, in the
> upcoming MXNet 1.5 release? This will help MXNet community to have enough
> advance notification of proposed plan.
>
> Best,
> Sandeep
>
> On Wed, May 15, 2019 at 11:29 AM Zach Kimberg 
> wrote:
>
> > The website I listed earlier (https://python3statement.org/) is backed
> by
> > a
> > git repository (
> > https://github.com/python3statement/python3statement.github.io) so that
> > projects can open a PR to add themselves to the list. Beyond that, they
> > also have a very nice timeline that projects can add themselves to which
> > details when their support ends. This might be a good first place to
> check
> > for knowing which dependencies might affect us. Here are some of the
> > notable projects and their support that are in the timeline:
> >
> > Projects currently Python3 only: pandas, scikit-learn
> > Projects dropping support betweeen now and Jan 1: IPython, XGBoost, rpy2,
> > dateutil
> > Projects dropping support on Jan 1: CPython, Numpy, Pillow, Scipy,
> > matplotlib, Spyder
> >
> > My hope is that following this discussion, we decide on a timeline and
> add
> > ourselves to this site as well. Does anyone disagree with the choice of
> Jan
> > 1?
> >
> > On Wed, May 15, 2019 at 2:40 AM Marco de Abreu 
> > wrote:
> >
> > > +1
> > >
> > > I'd like to point out that one of our dependencies, scikit, already
> > dropped
> > > support for python 2. If more dependencies drop support before 1.1.20,
> we
> > > might start running into further issues like we already did. As part of
> > > that decision, I'd propose to see what the detailed timelines of our
> > > dependencies are and then adjust our timeline accordingly.
> > >
> > > -Marco
> > >
> > > Pedro Larroy  schrieb am Mi., 15. Mai
> > 2019,
> > > 00:15:
> > >
> > > > +1  Let python2 rest, let's simplify our infrastructure and need to
> > > > support old Python versions.
> > > >
> > > > On Mon, May 13, 2019 at 1:58 PM Jake Lee  wrote:
> > > > >
> > > > > +1 Recently I upgraded the Numpy version and found out that Pylint
> > had
> > > > > false alarm on it. The Pylint fix is only available on Python3. So
> I
> > > > > changed the default python version of 'make pylint' command to
> > python3
> > > > (PR
> > > > > haven't been merged). It's time to drop support for Python2.
> > > > >
> > > > > On Mon, May 13, 2019 at 1:37 PM Junru Shao <
> junrushao1...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > On Mon, May 13, 2019 at 1:34 PM Aaron Markham <
> > > > aaron.s.mark...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > +1 for the pledge and to start moving things to Python 3.
> > > > > > > I think our installation instructions and tutorials can be
> > updated
> > > to
> > > > > > > default to Python3 and we should update Python2-only
> tutorials. I
> > > > know
> > > > > > > we have a handful of those, and when I spot them, I'll create
> an
> > > > > > > issue.
> > > > > > > I can also look at migrating the docs build to Python 3.
> > > > > > > Should we add a new label for issues relating to migrating to
> > > > Python3?
> > > > > > > Cheers,
> > > > > > > Aaron
> > > > > > >
> > > > > > > On Mon, May 13, 2019 at 12:04 PM Zach Kimberg <
> > > > zachary.kimb...@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Right now, the official date for ending support for Python
> 2.7
> > > > (and all
> > > > > > > of
> > > > > > > > python2) is set to January 1 [1]. As part of it, a number of
> > > > projects
> > > > > > > have
> > > > > > > > pledged to drop support for Python2 in or before 2020
> including
> > > > > > > Tensorflow,
> > > > > > > > requests, pandas, ipython, numpy, pillow, and Cython [2]. I
> > > > believe we
> > > > > > > > should also join in this pledge on python3statement.org [2]
> > > > because it
> > > > > > > > would help clean up our project and it would be difficult to
> > > > continue
> > > > > > > > supporting Python2 anyway when some of our dependencies are
> > > > dropping
> > > > > > > > support.
> > > > > > > >
> > > > > > > > As a concrete step, we should decide on a date to remove all
> > > > usages of
> > > > > > > > Python2 from our CI and consider that officially dropping
> > > support.
> > > > > > > > Following that, we can expect PRs will end up breaking
> support
> > > for
> > > > > > > Python2.
> > > > > > > > I suggest just using the same date that Python is dropping
> > > support
> > > > of
> > > > > > > > January 1. We may also need to update some examples or
> scripts
> > > that
> > > > > > were
> > > > > > > > written only for python2 that are around the project. Any
> > > thoughts?
> > > > > > > >
> > > > > > > > Zach
> > > > > > > >
> > > > > > > >
> > > > > > > > 

Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Junru Shao
Hi Anirudh, Naveen,

Thank you so much for the gentle remainder!

I am not a native speaker and have resulted in the mistake. I would love to
say sincere sorry to Pedro. Pedro is working really hard for growing our
community and improving our code base. I sincerely apologize for what I
have said in a hurry.

Let’s work hard together to grow a healthy community!

Thanks,
Junru

On Wed, May 15, 2019 at 11:51 Naveen Swamy  wrote:

> Being dismissive and condescending has been exactly what is plaguing this
> project.
>
> I agree the last paragraph sounds very condescending and very dismissive
> and it breaks many code of conducts listed.
>
> On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian <
> anirudh2...@gmail.com>
> wrote:
>
> > Hi Junru,
> >
> > Overall, I appreciate the points you made about the proposal.
> >
> > Having said that, I would like to remind the Apache Code of Conduct :
> > https://www.apache.org/foundation/policies/conduct.
> > "Be empathetic, welcoming, friendly and patient".
> >
> > I find your tone condescending. Clearly you understand what he meant from
> > the context whether you prefer to call IR in compilers or data-flow in
> > distributed systems. You could very well say lets use this terminology to
> > have a common understanding instead of saying go learn the basic
> concepts.
> > Before building a cool brand, its important to build a healthy community.
> >
> > Anirudh
> >
> >
> > On Wed, May 15, 2019 at 12:03 AM Junru Shao 
> > wrote:
> >
> > > Hi Pedro,
> > >
> > > I really appreciate that a diligent and talented engineer eagerly wants
> > to
> > > improve our system, and am very thankful that you have done so much for
> > our
> > > community. However, I do want to mention some points that I believe I
> > > should mention.
> > >
> > > While I agree with Tianqi that every design has its pros and cons, I
> > would
> > > love to emphasize that a *good taste* of system design is to optimize
> the
> > > bottleneck, enhance expressiveness (and usability), i.e. to do what
> needs
> > > doing, rather than *trivial nits* that are irrelevant to either
> > performance
> > > or expressiveness. Generally speaking, typed or untyped, shared_ptr or
> > > unique_ptr, won't affect the overall performance when it comes to deep
> > > learning workload, specially when we have an async scheduler that does
> > good
> > > latency hiding in MXNet - to me, these are not major issues that are
> > worth
> > > re-designing our entire system.
> > >
> > > To benefit users - real-world ML practitioners, the most thing I would
> > love
> > > to mention is that dataflow graph-based representation is increasingly
> > > incapable of modern neural networks, because the increasingly appeared
> > > structures like arbitrary control flow (w/ continue, break, etc),
> > > recursion, type conjunction and disjunction, etc. These issues will be
> > our
> > > priority to address, which is brought by Relay, which addresses all
> these
> > > pain points.
> > >
> > > Another minor thing I would love to humbly mention is that, for sake of
> > our
> > > brand, it is our responsibility to be professional about terminologies
> > when
> > > writing an official proposal on Confluence. As one of the numerous
> > > examples, the title of the proposal really shocks me for a while,
> > something
> > > like "operators graph" blah blah so weird. Educate me if I were wrong,
> > but
> > > compiler community would prefer the term "intermediate representation",
> > and
> > > distributed system community would prefer "dataflow graph". If you
> don't
> > > have knowledge in these fields, a better way for efficient
> communication
> > is
> > > to get yourself first familiarize the most basic concepts and then do
> > > discussion. This is a way to save your own valuable time as well.
> > >
> > > Again, thank you so much for your hard work, and hope that we could
> work
> > > together to win customers in the future :-)
> > >
> > > Thanks,
> > > Junru
> > >
> > >
> > > On Tue, May 14, 2019 at 8:03 PM Tianqi Chen 
> > > wrote:
> > >
> > > > The core part of the proposal is to move the graph to be much more
> > > strongly
> > > > typed template class.
> > > > I think this is mainly a point of engineering taste, and both sides
> > have
> > > > pros and cons, let me list them before I share my thoughts on this
> > issue:
> > > >
> > > > - Typed fields certainly enjoy more compile-time type checking, on
> the
> > > > other hand, it is hard to expose
> > > >template of explosive possibilities to frontend languages.
> > > > - More type-erased fields provide runtime flexibility to store
> > > polymorphic
> > > > types as well as extensible attributes for graph optimization
> > > >   - It is hard to use a virtual class to expose every possible
> > attribute
> > > > that an operator might have, such as inlining, storage pattern,
> > gradient
> > > > etc..
> > > >   - The nature of supporting a growing set of operator attribute
> > > requires a
> > > > 

Re: [DISCUSS] 1.5.0 Release Plan

2019-05-15 Thread Anirudh Subramanian
Hi Lai,

>From the discussion I had with Nvidia offline they are targeting on pushing
the required changes today.
Since this is important feature for the release, if this gets delayed and
cannot  be merged by 05/17/2019,
the code freeze date may need to be changed.

Anirudh

On Wed, May 15, 2019 at 1:23 AM Lv, Tao A  wrote:

> Hi dev,
>
> We see there are several github issues [1][2][3][4] about mxnet windows
> build experience. The team is working intensively [5][6][7] on that to fix
> some problems of MKL-DNN build on windows. We hope these fixes can catch
> the code freeze and finally enter the 1.5.0 release.
>
> The PR against mshadow (#374) was already merged and MXNet PR #14877 is
> under review - great thanks to CI team for helping on the MKL installation
> request. PR #14952 is document change according to build logic changes in
> PR #14877. So I think these two PRs should be merged simultaneously.
> Currently #14877 is experiencing a CI response problem.
>
> Please take your time to have a look at these two PRs. Your comments and
> suggestions are highly appreciated.
>
> Thanks,
> -tao
>
> [1] https://github.com/apache/incubator-mxnet/issues/14670
> [2] https://github.com/apache/incubator-mxnet/issues/14335
> [3] https://github.com/apache/incubator-mxnet/issues/14203
> [4] https://github.com/apache/incubator-mxnet/issues/14085
> [5] https://github.com/apache/incubator-mxnet/pull/14877
> [6] https://github.com/dmlc/mshadow/pull/374
> [7] https://github.com/apache/incubator-mxnet/pull/14952
>
> -Original Message-
> From: Lai Wei [mailto:roywei...@gmail.com]
> Sent: Wednesday, May 15, 2019 2:57 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: [DISCUSS] 1.5.0 Release Plan
>
> Hi Anirudh,
>
> I see there was an offline disucssion
> <
> https://github.com/apache/incubator-mxnet/pull/14173#pullrequestreview-235846341
> >
> and I have updated the AMP feature and your project on the release tracker
> <
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+and+Status
> >
> ,
> Please let me know if you have any updates.
>
> Hi @dev,
> This is a gentle reminder that  the code freeze for 1.5.0 release is on
> 05/17/2019, please let us know if you have any WIP pull requests aiming for
> 1.5.0 that needs attention.
> Please understand we already have around 650 commits in master that need
> to be released in time. We understand TensorRT test in CI is failing and
> are trying to fix it. Meanwhile please update the tracker if there is any
> change:
>
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+and+Status
>
> Thanks!
>
> Lai
>
>
> On Wed, May 8, 2019 at 11:58 AM Anirudh Subramanian  >
> wrote:
>
> > Hi Sheng,
> >
> > I had a discussion with nvidia folks offline today (@ptrendx et. al.).
> > I strongly feel that the AMP feature should be included as part of the
> > release: https://github.com/apache/incubator-mxnet/pull/14173 .
> > The PR is aimed for completion for next week but reviews and RFC
> > discussions may take some time. I would request to extend the release
> > code freeze by 2 weeks.
> > Also, I would like to include
> >
> > https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32
> > +to+Mixed+Precision+Models
> > which
> > depends on the AMP PR.
> > I am also aiming for adding a PR by this week end or early next week,
> > but reviews will take longer than May 17th.
> >
> > Anirudh
> >
> >
> > On Mon, May 6, 2019 at 11:49 PM Sheng Zha  wrote:
> >
> > > Hi,
> > >
> > > While 1.4.1 vote on general@incubator is still on going, I’d like to
> > > propose that we start preparing 1.5.0 release.
> > >
> > > 1.5.0 will include changes that dates back to last year and there
> > > has
> > been
> > > a lot of new features and improvements in it, so it will likely time
> > > us more time to prepare than 1.4.1. I propose the following timeline:
> > > - Cut release branch: release branch already cut. Will sync with
> > > master branch on 5/15/2019 EOD.
> > > - Code freeze: 5/17/2019. No more changes unless the release branch
> > > is in a broken state.
> > > - Tag and vote: 5/20/2019 onward.
> > >
> > > Lai Wei (roywei@) expressed to me offline that he’s willing to help
> > drive
> > > this release as release manager, and I’m happy to help again as
> > committer.
> > >
> > > If you have features in progress that you’d like to include in 1.5.0:
> > > - Add your feature to the scope:
> > >
> > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+a
> > nd+Status
> > > - Indicate in this thread:
> > >   - how confident you are about making it happen before the code
> freeze.
> > > If not confident, provide estimate for a more manageable code freeze
> > > date so that people can discuss whether to extend the deadline or to
> > > skip one release for it.
> > > - whether your PR requires more attention to make it happen.
> > >
> > > Thanks for your attention. Comments and suggestions are also welcome.
> > >
> > > -sz
> >
>


Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Zach Kimberg
I would like to raise another option to get back on the topic of changing
the Operator graph structure. On the page discussing Relay IR [1], it
discusses mainly the difference between a data flow graph like we use now
and A-normal [2] which is used in some functional compilers. Is there a
reason we do not want to use a structure based on Single Static Assignment
Form (wikipedia explanation [3], lecture note explanation [4]). It is used
almost universally in the compiler community including in LLVM (clang),
GCC, Oracle JVM, PyPy, Go, Webkit, and Swift [5]. The major reason behind
it's pervasiveness is that it has proven very effective for analysis and
transformations when dealing with control flow.

One possible concern is that it might make automatic differentiation more
difficult [6]. While it certainly is more complicated than a pure
functional approach, the functional approach requires users to use
functional programming. Especially with the languages we support now, that
doesn't seem like a reasonable assumption. Given that the users are already
introducing the complexity inherent in imperative programming, we have to
deal with the increased complexity regardless. I think it might be easier
to have the tools to deal with that rather than attempting to coerce users
into a different programming paradigm or convert code between paradigms.
Furthermore, this may become more important if users are increasingly
making use of control flow like Junru said.

Zach


[1] - https://docs.tvm.ai/dev/relay_intro.html
[2] - https://en.wikipedia.org/wiki/A-normal_form
[3] - https://en.wikipedia.org/wiki/Static_single_assignment_form
[4] - https://www.cs.cmu.edu/~rjsimmon/15411-f15/lec/10-ssa.pdf
[5] -
https://en.wikipedia.org/wiki/Static_single_assignment_form#Compilers_using_SSA_form
[6] - https://discuss.tvm.ai/t/choice-about-ir-ssa-or-anf/1757/2

On Wed, May 15, 2019 at 11:51 AM Naveen Swamy  wrote:

> Being dismissive and condescending has been exactly what is plaguing this
> project.
>
> I agree the last paragraph sounds very condescending and very dismissive
> and it breaks many code of conducts listed.
>
> On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian <
> anirudh2...@gmail.com>
> wrote:
>
> > Hi Junru,
> >
> > Overall, I appreciate the points you made about the proposal.
> >
> > Having said that, I would like to remind the Apache Code of Conduct :
> > https://www.apache.org/foundation/policies/conduct.
> > "Be empathetic, welcoming, friendly and patient".
> >
> > I find your tone condescending. Clearly you understand what he meant from
> > the context whether you prefer to call IR in compilers or data-flow in
> > distributed systems. You could very well say lets use this terminology to
> > have a common understanding instead of saying go learn the basic
> concepts.
> > Before building a cool brand, its important to build a healthy community.
> >
> > Anirudh
> >
> >
> > On Wed, May 15, 2019 at 12:03 AM Junru Shao 
> > wrote:
> >
> > > Hi Pedro,
> > >
> > > I really appreciate that a diligent and talented engineer eagerly wants
> > to
> > > improve our system, and am very thankful that you have done so much for
> > our
> > > community. However, I do want to mention some points that I believe I
> > > should mention.
> > >
> > > While I agree with Tianqi that every design has its pros and cons, I
> > would
> > > love to emphasize that a *good taste* of system design is to optimize
> the
> > > bottleneck, enhance expressiveness (and usability), i.e. to do what
> needs
> > > doing, rather than *trivial nits* that are irrelevant to either
> > performance
> > > or expressiveness. Generally speaking, typed or untyped, shared_ptr or
> > > unique_ptr, won't affect the overall performance when it comes to deep
> > > learning workload, specially when we have an async scheduler that does
> > good
> > > latency hiding in MXNet - to me, these are not major issues that are
> > worth
> > > re-designing our entire system.
> > >
> > > To benefit users - real-world ML practitioners, the most thing I would
> > love
> > > to mention is that dataflow graph-based representation is increasingly
> > > incapable of modern neural networks, because the increasingly appeared
> > > structures like arbitrary control flow (w/ continue, break, etc),
> > > recursion, type conjunction and disjunction, etc. These issues will be
> > our
> > > priority to address, which is brought by Relay, which addresses all
> these
> > > pain points.
> > >
> > > Another minor thing I would love to humbly mention is that, for sake of
> > our
> > > brand, it is our responsibility to be professional about terminologies
> > when
> > > writing an official proposal on Confluence. As one of the numerous
> > > examples, the title of the proposal really shocks me for a while,
> > something
> > > like "operators graph" blah blah so weird. Educate me if I were wrong,
> > but
> > > compiler community would prefer the term "intermediate representation",
> > and
> > > 

Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Naveen Swamy
Being dismissive and condescending has been exactly what is plaguing this
project.

I agree the last paragraph sounds very condescending and very dismissive
and it breaks many code of conducts listed.

On Wed, May 15, 2019 at 11:31 AM Anirudh Subramanian 
wrote:

> Hi Junru,
>
> Overall, I appreciate the points you made about the proposal.
>
> Having said that, I would like to remind the Apache Code of Conduct :
> https://www.apache.org/foundation/policies/conduct.
> "Be empathetic, welcoming, friendly and patient".
>
> I find your tone condescending. Clearly you understand what he meant from
> the context whether you prefer to call IR in compilers or data-flow in
> distributed systems. You could very well say lets use this terminology to
> have a common understanding instead of saying go learn the basic concepts.
> Before building a cool brand, its important to build a healthy community.
>
> Anirudh
>
>
> On Wed, May 15, 2019 at 12:03 AM Junru Shao 
> wrote:
>
> > Hi Pedro,
> >
> > I really appreciate that a diligent and talented engineer eagerly wants
> to
> > improve our system, and am very thankful that you have done so much for
> our
> > community. However, I do want to mention some points that I believe I
> > should mention.
> >
> > While I agree with Tianqi that every design has its pros and cons, I
> would
> > love to emphasize that a *good taste* of system design is to optimize the
> > bottleneck, enhance expressiveness (and usability), i.e. to do what needs
> > doing, rather than *trivial nits* that are irrelevant to either
> performance
> > or expressiveness. Generally speaking, typed or untyped, shared_ptr or
> > unique_ptr, won't affect the overall performance when it comes to deep
> > learning workload, specially when we have an async scheduler that does
> good
> > latency hiding in MXNet - to me, these are not major issues that are
> worth
> > re-designing our entire system.
> >
> > To benefit users - real-world ML practitioners, the most thing I would
> love
> > to mention is that dataflow graph-based representation is increasingly
> > incapable of modern neural networks, because the increasingly appeared
> > structures like arbitrary control flow (w/ continue, break, etc),
> > recursion, type conjunction and disjunction, etc. These issues will be
> our
> > priority to address, which is brought by Relay, which addresses all these
> > pain points.
> >
> > Another minor thing I would love to humbly mention is that, for sake of
> our
> > brand, it is our responsibility to be professional about terminologies
> when
> > writing an official proposal on Confluence. As one of the numerous
> > examples, the title of the proposal really shocks me for a while,
> something
> > like "operators graph" blah blah so weird. Educate me if I were wrong,
> but
> > compiler community would prefer the term "intermediate representation",
> and
> > distributed system community would prefer "dataflow graph". If you don't
> > have knowledge in these fields, a better way for efficient communication
> is
> > to get yourself first familiarize the most basic concepts and then do
> > discussion. This is a way to save your own valuable time as well.
> >
> > Again, thank you so much for your hard work, and hope that we could work
> > together to win customers in the future :-)
> >
> > Thanks,
> > Junru
> >
> >
> > On Tue, May 14, 2019 at 8:03 PM Tianqi Chen 
> > wrote:
> >
> > > The core part of the proposal is to move the graph to be much more
> > strongly
> > > typed template class.
> > > I think this is mainly a point of engineering taste, and both sides
> have
> > > pros and cons, let me list them before I share my thoughts on this
> issue:
> > >
> > > - Typed fields certainly enjoy more compile-time type checking, on the
> > > other hand, it is hard to expose
> > >template of explosive possibilities to frontend languages.
> > > - More type-erased fields provide runtime flexibility to store
> > polymorphic
> > > types as well as extensible attributes for graph optimization
> > >   - It is hard to use a virtual class to expose every possible
> attribute
> > > that an operator might have, such as inlining, storage pattern,
> gradient
> > > etc..
> > >   - The nature of supporting a growing set of operator attribute
> > requires a
> > > type-erased attrs field.
> > > - In contrast to your argument(typing is a blocker to features),
> > > type-erased or typed code can both get to the same feature except,
> except
> > > that
> > >   typed code gets more compile-time errors while type-erased get some
> of
> > > them in runtime.
> > > - Templatized data structures will likely introduce additional metal
> > > burdens to developers and are not really suitable as a core data
> > structure
> > >- Because they imply an explosive number of possible data
> structures,
> > > while the core data structure should be a single one.
> > >
> > > Now my view(as an MXNet PMC member) on typed vs type-erased style: If
> > MXNet
> > > 

Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Anirudh Subramanian
Hi Junru,

Overall, I appreciate the points you made about the proposal.

Having said that, I would like to remind the Apache Code of Conduct :
https://www.apache.org/foundation/policies/conduct.
"Be empathetic, welcoming, friendly and patient".

I find your tone condescending. Clearly you understand what he meant from
the context whether you prefer to call IR in compilers or data-flow in
distributed systems. You could very well say lets use this terminology to
have a common understanding instead of saying go learn the basic concepts.
Before building a cool brand, its important to build a healthy community.

Anirudh


On Wed, May 15, 2019 at 12:03 AM Junru Shao  wrote:

> Hi Pedro,
>
> I really appreciate that a diligent and talented engineer eagerly wants to
> improve our system, and am very thankful that you have done so much for our
> community. However, I do want to mention some points that I believe I
> should mention.
>
> While I agree with Tianqi that every design has its pros and cons, I would
> love to emphasize that a *good taste* of system design is to optimize the
> bottleneck, enhance expressiveness (and usability), i.e. to do what needs
> doing, rather than *trivial nits* that are irrelevant to either performance
> or expressiveness. Generally speaking, typed or untyped, shared_ptr or
> unique_ptr, won't affect the overall performance when it comes to deep
> learning workload, specially when we have an async scheduler that does good
> latency hiding in MXNet - to me, these are not major issues that are worth
> re-designing our entire system.
>
> To benefit users - real-world ML practitioners, the most thing I would love
> to mention is that dataflow graph-based representation is increasingly
> incapable of modern neural networks, because the increasingly appeared
> structures like arbitrary control flow (w/ continue, break, etc),
> recursion, type conjunction and disjunction, etc. These issues will be our
> priority to address, which is brought by Relay, which addresses all these
> pain points.
>
> Another minor thing I would love to humbly mention is that, for sake of our
> brand, it is our responsibility to be professional about terminologies when
> writing an official proposal on Confluence. As one of the numerous
> examples, the title of the proposal really shocks me for a while, something
> like "operators graph" blah blah so weird. Educate me if I were wrong, but
> compiler community would prefer the term "intermediate representation", and
> distributed system community would prefer "dataflow graph". If you don't
> have knowledge in these fields, a better way for efficient communication is
> to get yourself first familiarize the most basic concepts and then do
> discussion. This is a way to save your own valuable time as well.
>
> Again, thank you so much for your hard work, and hope that we could work
> together to win customers in the future :-)
>
> Thanks,
> Junru
>
>
> On Tue, May 14, 2019 at 8:03 PM Tianqi Chen 
> wrote:
>
> > The core part of the proposal is to move the graph to be much more
> strongly
> > typed template class.
> > I think this is mainly a point of engineering taste, and both sides have
> > pros and cons, let me list them before I share my thoughts on this issue:
> >
> > - Typed fields certainly enjoy more compile-time type checking, on the
> > other hand, it is hard to expose
> >template of explosive possibilities to frontend languages.
> > - More type-erased fields provide runtime flexibility to store
> polymorphic
> > types as well as extensible attributes for graph optimization
> >   - It is hard to use a virtual class to expose every possible attribute
> > that an operator might have, such as inlining, storage pattern, gradient
> > etc..
> >   - The nature of supporting a growing set of operator attribute
> requires a
> > type-erased attrs field.
> > - In contrast to your argument(typing is a blocker to features),
> > type-erased or typed code can both get to the same feature except, except
> > that
> >   typed code gets more compile-time errors while type-erased get some of
> > them in runtime.
> > - Templatized data structures will likely introduce additional metal
> > burdens to developers and are not really suitable as a core data
> structure
> >- Because they imply an explosive number of possible data structures,
> > while the core data structure should be a single one.
> >
> > Now my view(as an MXNet PMC member) on typed vs type-erased style: If
> MXNet
> > is a pure C++ project, I might take more of the typed approach.
> > However, MXNet itself is a project that takes python/scala/clojure and
> > other frontend languages.
> > The introduction of more typing may not align with the original goal as
> the
> > tradeoffs I listed above.
> >
> > This proposal is really a drastic change of what NNVM does, as well as
> the
> > optimization passes, and given the scope, in your analogy, "a new vehicle
> > to solve all the problems"
> > rather than a 

Re: Python2 End of Life

2019-05-15 Thread Zach Kimberg
The website I listed earlier (https://python3statement.org/) is backed by a
git repository (
https://github.com/python3statement/python3statement.github.io) so that
projects can open a PR to add themselves to the list. Beyond that, they
also have a very nice timeline that projects can add themselves to which
details when their support ends. This might be a good first place to check
for knowing which dependencies might affect us. Here are some of the
notable projects and their support that are in the timeline:

Projects currently Python3 only: pandas, scikit-learn
Projects dropping support betweeen now and Jan 1: IPython, XGBoost, rpy2,
dateutil
Projects dropping support on Jan 1: CPython, Numpy, Pillow, Scipy,
matplotlib, Spyder

My hope is that following this discussion, we decide on a timeline and add
ourselves to this site as well. Does anyone disagree with the choice of Jan
1?

On Wed, May 15, 2019 at 2:40 AM Marco de Abreu 
wrote:

> +1
>
> I'd like to point out that one of our dependencies, scikit, already dropped
> support for python 2. If more dependencies drop support before 1.1.20, we
> might start running into further issues like we already did. As part of
> that decision, I'd propose to see what the detailed timelines of our
> dependencies are and then adjust our timeline accordingly.
>
> -Marco
>
> Pedro Larroy  schrieb am Mi., 15. Mai 2019,
> 00:15:
>
> > +1  Let python2 rest, let's simplify our infrastructure and need to
> > support old Python versions.
> >
> > On Mon, May 13, 2019 at 1:58 PM Jake Lee  wrote:
> > >
> > > +1 Recently I upgraded the Numpy version and found out that Pylint had
> > > false alarm on it. The Pylint fix is only available on Python3. So I
> > > changed the default python version of 'make pylint' command to python3
> > (PR
> > > haven't been merged). It's time to drop support for Python2.
> > >
> > > On Mon, May 13, 2019 at 1:37 PM Junru Shao 
> > wrote:
> > >
> > > > +1
> > > >
> > > > On Mon, May 13, 2019 at 1:34 PM Aaron Markham <
> > aaron.s.mark...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1 for the pledge and to start moving things to Python 3.
> > > > > I think our installation instructions and tutorials can be updated
> to
> > > > > default to Python3 and we should update Python2-only tutorials. I
> > know
> > > > > we have a handful of those, and when I spot them, I'll create an
> > > > > issue.
> > > > > I can also look at migrating the docs build to Python 3.
> > > > > Should we add a new label for issues relating to migrating to
> > Python3?
> > > > > Cheers,
> > > > > Aaron
> > > > >
> > > > > On Mon, May 13, 2019 at 12:04 PM Zach Kimberg <
> > zachary.kimb...@gmail.com
> > > > >
> > > > > wrote:
> > > > > >
> > > > > > Right now, the official date for ending support for Python 2.7
> > (and all
> > > > > of
> > > > > > python2) is set to January 1 [1]. As part of it, a number of
> > projects
> > > > > have
> > > > > > pledged to drop support for Python2 in or before 2020 including
> > > > > Tensorflow,
> > > > > > requests, pandas, ipython, numpy, pillow, and Cython [2]. I
> > believe we
> > > > > > should also join in this pledge on python3statement.org [2]
> > because it
> > > > > > would help clean up our project and it would be difficult to
> > continue
> > > > > > supporting Python2 anyway when some of our dependencies are
> > dropping
> > > > > > support.
> > > > > >
> > > > > > As a concrete step, we should decide on a date to remove all
> > usages of
> > > > > > Python2 from our CI and consider that officially dropping
> support.
> > > > > > Following that, we can expect PRs will end up breaking support
> for
> > > > > Python2.
> > > > > > I suggest just using the same date that Python is dropping
> support
> > of
> > > > > > January 1. We may also need to update some examples or scripts
> that
> > > > were
> > > > > > written only for python2 that are around the project. Any
> thoughts?
> > > > > >
> > > > > > Zach
> > > > > >
> > > > > >
> > > > > > [1] - https://www.python.org/dev/peps/pep-0373/
> > > > > > [2] - https://python3statement.org/
> > > > >
> > > >
> >
>


Re: TensorRT blocker

2019-05-15 Thread Per da Silva
Hey,

Yup - I've @'ed you to the fix PR, would be great to get your 2c there just
to be sure it's all good.
https://github.com/apache/incubator-mxnet/pull/14960

Cheers,

Per

On Wed, May 15, 2019 at 4:14 PM Sunderland, Kellen
 wrote:

> Looks like it's merged.  Can I help with a fix Per?
>
> On May 15, 2019 3:00 AM, Per da Silva  wrote:
> Hi everyone,
>
> Could a committer please merge this PR:
> https://github.com/apache/incubator-mxnet/pull/14958
>
> It disables the TensorRT steps to unblock CI while a fix is being worked
> on.
>
> Cheers,
>
> Per
>


Re: TensorRT blocker

2019-05-15 Thread Sunderland, Kellen
Looks like it's merged.  Can I help with a fix Per?

On May 15, 2019 3:00 AM, Per da Silva  wrote:
Hi everyone,

Could a committer please merge this PR:
https://github.com/apache/incubator-mxnet/pull/14958

It disables the TensorRT steps to unblock CI while a fix is being worked on.

Cheers,

Per


TensorRT blocker

2019-05-15 Thread Per da Silva
Hi everyone,

Could a committer please merge this PR:
https://github.com/apache/incubator-mxnet/pull/14958

It disables the TensorRT steps to unblock CI while a fix is being worked on.

Cheers,

Per


Re: Python2 End of Life

2019-05-15 Thread Marco de Abreu
+1

I'd like to point out that one of our dependencies, scikit, already dropped
support for python 2. If more dependencies drop support before 1.1.20, we
might start running into further issues like we already did. As part of
that decision, I'd propose to see what the detailed timelines of our
dependencies are and then adjust our timeline accordingly.

-Marco

Pedro Larroy  schrieb am Mi., 15. Mai 2019,
00:15:

> +1  Let python2 rest, let's simplify our infrastructure and need to
> support old Python versions.
>
> On Mon, May 13, 2019 at 1:58 PM Jake Lee  wrote:
> >
> > +1 Recently I upgraded the Numpy version and found out that Pylint had
> > false alarm on it. The Pylint fix is only available on Python3. So I
> > changed the default python version of 'make pylint' command to python3
> (PR
> > haven't been merged). It's time to drop support for Python2.
> >
> > On Mon, May 13, 2019 at 1:37 PM Junru Shao 
> wrote:
> >
> > > +1
> > >
> > > On Mon, May 13, 2019 at 1:34 PM Aaron Markham <
> aaron.s.mark...@gmail.com>
> > > wrote:
> > >
> > > > +1 for the pledge and to start moving things to Python 3.
> > > > I think our installation instructions and tutorials can be updated to
> > > > default to Python3 and we should update Python2-only tutorials. I
> know
> > > > we have a handful of those, and when I spot them, I'll create an
> > > > issue.
> > > > I can also look at migrating the docs build to Python 3.
> > > > Should we add a new label for issues relating to migrating to
> Python3?
> > > > Cheers,
> > > > Aaron
> > > >
> > > > On Mon, May 13, 2019 at 12:04 PM Zach Kimberg <
> zachary.kimb...@gmail.com
> > > >
> > > > wrote:
> > > > >
> > > > > Right now, the official date for ending support for Python 2.7
> (and all
> > > > of
> > > > > python2) is set to January 1 [1]. As part of it, a number of
> projects
> > > > have
> > > > > pledged to drop support for Python2 in or before 2020 including
> > > > Tensorflow,
> > > > > requests, pandas, ipython, numpy, pillow, and Cython [2]. I
> believe we
> > > > > should also join in this pledge on python3statement.org [2]
> because it
> > > > > would help clean up our project and it would be difficult to
> continue
> > > > > supporting Python2 anyway when some of our dependencies are
> dropping
> > > > > support.
> > > > >
> > > > > As a concrete step, we should decide on a date to remove all
> usages of
> > > > > Python2 from our CI and consider that officially dropping support.
> > > > > Following that, we can expect PRs will end up breaking support for
> > > > Python2.
> > > > > I suggest just using the same date that Python is dropping support
> of
> > > > > January 1. We may also need to update some examples or scripts that
> > > were
> > > > > written only for python2 that are around the project. Any thoughts?
> > > > >
> > > > > Zach
> > > > >
> > > > >
> > > > > [1] - https://www.python.org/dev/peps/pep-0373/
> > > > > [2] - https://python3statement.org/
> > > >
> > >
>


RE: [DISCUSS] 1.5.0 Release Plan

2019-05-15 Thread Lv, Tao A
Hi dev,

We see there are several github issues [1][2][3][4] about mxnet windows build 
experience. The team is working intensively [5][6][7] on that to fix some 
problems of MKL-DNN build on windows. We hope these fixes can catch the code 
freeze and finally enter the 1.5.0 release.

The PR against mshadow (#374) was already merged and MXNet PR #14877 is under 
review - great thanks to CI team for helping on the MKL installation request. 
PR #14952 is document change according to build logic changes in PR #14877. So 
I think these two PRs should be merged simultaneously. Currently #14877 is 
experiencing a CI response problem.

Please take your time to have a look at these two PRs. Your comments and 
suggestions are highly appreciated.

Thanks,
-tao

[1] https://github.com/apache/incubator-mxnet/issues/14670 
[2] https://github.com/apache/incubator-mxnet/issues/14335  
[3] https://github.com/apache/incubator-mxnet/issues/14203 
[4] https://github.com/apache/incubator-mxnet/issues/14085  
[5] https://github.com/apache/incubator-mxnet/pull/14877 
[6] https://github.com/dmlc/mshadow/pull/374 
[7] https://github.com/apache/incubator-mxnet/pull/14952  

-Original Message-
From: Lai Wei [mailto:roywei...@gmail.com] 
Sent: Wednesday, May 15, 2019 2:57 PM
To: dev@mxnet.incubator.apache.org
Subject: Re: [DISCUSS] 1.5.0 Release Plan

Hi Anirudh,

I see there was an offline disucssion

and I have updated the AMP feature and your project on the release tracker 

,
Please let me know if you have any updates.

Hi @dev,
This is a gentle reminder that  the code freeze for 1.5.0 release is on 
05/17/2019, please let us know if you have any WIP pull requests aiming for
1.5.0 that needs attention.
Please understand we already have around 650 commits in master that need to be 
released in time. We understand TensorRT test in CI is failing and are trying 
to fix it. Meanwhile please update the tracker if there is any
change:
https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+and+Status

Thanks!

Lai


On Wed, May 8, 2019 at 11:58 AM Anirudh Subramanian 
wrote:

> Hi Sheng,
>
> I had a discussion with nvidia folks offline today (@ptrendx et. al.). 
> I strongly feel that the AMP feature should be included as part of the
> release: https://github.com/apache/incubator-mxnet/pull/14173 .
> The PR is aimed for completion for next week but reviews and RFC 
> discussions may take some time. I would request to extend the release 
> code freeze by 2 weeks.
> Also, I would like to include
>
> https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32
> +to+Mixed+Precision+Models
> which
> depends on the AMP PR.
> I am also aiming for adding a PR by this week end or early next week, 
> but reviews will take longer than May 17th.
>
> Anirudh
>
>
> On Mon, May 6, 2019 at 11:49 PM Sheng Zha  wrote:
>
> > Hi,
> >
> > While 1.4.1 vote on general@incubator is still on going, I’d like to 
> > propose that we start preparing 1.5.0 release.
> >
> > 1.5.0 will include changes that dates back to last year and there 
> > has
> been
> > a lot of new features and improvements in it, so it will likely time 
> > us more time to prepare than 1.4.1. I propose the following timeline:
> > - Cut release branch: release branch already cut. Will sync with 
> > master branch on 5/15/2019 EOD.
> > - Code freeze: 5/17/2019. No more changes unless the release branch 
> > is in a broken state.
> > - Tag and vote: 5/20/2019 onward.
> >
> > Lai Wei (roywei@) expressed to me offline that he’s willing to help
> drive
> > this release as release manager, and I’m happy to help again as
> committer.
> >
> > If you have features in progress that you’d like to include in 1.5.0:
> > - Add your feature to the scope:
> >
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+a
> nd+Status
> > - Indicate in this thread:
> >   - how confident you are about making it happen before the code freeze.
> > If not confident, provide estimate for a more manageable code freeze 
> > date so that people can discuss whether to extend the deadline or to 
> > skip one release for it.
> > - whether your PR requires more attention to make it happen.
> >
> > Thanks for your attention. Comments and suggestions are also welcome.
> >
> > -sz
>


Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Junru Shao
Hi Pedro,

I really appreciate that a diligent and talented engineer eagerly wants to
improve our system, and am very thankful that you have done so much for our
community. However, I do want to mention some points that I believe I
should mention.

While I agree with Tianqi that every design has its pros and cons, I would
love to emphasize that a *good taste* of system design is to optimize the
bottleneck, enhance expressiveness (and usability), i.e. to do what needs
doing, rather than *trivial nits* that are irrelevant to either performance
or expressiveness. Generally speaking, typed or untyped, shared_ptr or
unique_ptr, won't affect the overall performance when it comes to deep
learning workload, specially when we have an async scheduler that does good
latency hiding in MXNet - to me, these are not major issues that are worth
re-designing our entire system.

To benefit users - real-world ML practitioners, the most thing I would love
to mention is that dataflow graph-based representation is increasingly
incapable of modern neural networks, because the increasingly appeared
structures like arbitrary control flow (w/ continue, break, etc),
recursion, type conjunction and disjunction, etc. These issues will be our
priority to address, which is brought by Relay, which addresses all these
pain points.

Another minor thing I would love to humbly mention is that, for sake of our
brand, it is our responsibility to be professional about terminologies when
writing an official proposal on Confluence. As one of the numerous
examples, the title of the proposal really shocks me for a while, something
like "operators graph" blah blah so weird. Educate me if I were wrong, but
compiler community would prefer the term "intermediate representation", and
distributed system community would prefer "dataflow graph". If you don't
have knowledge in these fields, a better way for efficient communication is
to get yourself first familiarize the most basic concepts and then do
discussion. This is a way to save your own valuable time as well.

Again, thank you so much for your hard work, and hope that we could work
together to win customers in the future :-)

Thanks,
Junru


On Tue, May 14, 2019 at 8:03 PM Tianqi Chen 
wrote:

> The core part of the proposal is to move the graph to be much more strongly
> typed template class.
> I think this is mainly a point of engineering taste, and both sides have
> pros and cons, let me list them before I share my thoughts on this issue:
>
> - Typed fields certainly enjoy more compile-time type checking, on the
> other hand, it is hard to expose
>template of explosive possibilities to frontend languages.
> - More type-erased fields provide runtime flexibility to store polymorphic
> types as well as extensible attributes for graph optimization
>   - It is hard to use a virtual class to expose every possible attribute
> that an operator might have, such as inlining, storage pattern, gradient
> etc..
>   - The nature of supporting a growing set of operator attribute requires a
> type-erased attrs field.
> - In contrast to your argument(typing is a blocker to features),
> type-erased or typed code can both get to the same feature except, except
> that
>   typed code gets more compile-time errors while type-erased get some of
> them in runtime.
> - Templatized data structures will likely introduce additional metal
> burdens to developers and are not really suitable as a core data structure
>- Because they imply an explosive number of possible data structures,
> while the core data structure should be a single one.
>
> Now my view(as an MXNet PMC member) on typed vs type-erased style: If MXNet
> is a pure C++ project, I might take more of the typed approach.
> However, MXNet itself is a project that takes python/scala/clojure and
> other frontend languages.
> The introduction of more typing may not align with the original goal as the
> tradeoffs I listed above.
>
> This proposal is really a drastic change of what NNVM does, as well as the
> optimization passes, and given the scope, in your analogy, "a new vehicle
> to solve all the problems"
> rather than a minor patch. It will take a lot of engineering effort to
> bring in new features and adapting the existing ones.
> Because of that, it does merit a discussion about how shall we think about
> the future MXNet2.0.
>
> Technically Relay is a serious candidate. Of course relay, as well as its
> core, is in C++ but maintains the multi-language first principle, that is
> why the example code was in python.
> See more related discussion comparing NNVMv1 and relay:
> https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5
>
> I think the ideal graph data structure candidate for MXNet2.0 should have
> natural support for:
> - Native support of function, module, and recursions
> - Control flows
> - The ability of interpolation with multi-language frontend, e.g. being
> able to prototype graph optimizations in python/scala/clojure if 

Re: [DISCUSS] 1.5.0 Release Plan

2019-05-15 Thread Lai Wei
Hi Anirudh,

I see there was an offline disucssion

and I have updated the AMP feature and your project on the release tracker

,
Please let me know if you have any updates.

Hi @dev,
This is a gentle reminder that  the code freeze for 1.5.0 release is on
05/17/2019, please let us know if you have any WIP pull requests aiming for
1.5.0 that needs attention.
Please understand we already have around 650 commits in master that need to
be released in time. We understand TensorRT test in CI is failing and are
trying to fix it. Meanwhile please update the tracker if there is any
change:
https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+and+Status

Thanks!

Lai


On Wed, May 8, 2019 at 11:58 AM Anirudh Subramanian 
wrote:

> Hi Sheng,
>
> I had a discussion with nvidia folks offline today (@ptrendx et. al.). I
> strongly feel that the AMP feature should be included as part of the
> release: https://github.com/apache/incubator-mxnet/pull/14173 .
> The PR is aimed for completion for next week but reviews and RFC
> discussions may take some time. I would request to extend the release code
> freeze by 2 weeks.
> Also, I would like to include
>
> https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32+to+Mixed+Precision+Models
> which
> depends on the AMP PR.
> I am also aiming for adding a PR by this week end or early next week, but
> reviews will take longer than May 17th.
>
> Anirudh
>
>
> On Mon, May 6, 2019 at 11:49 PM Sheng Zha  wrote:
>
> > Hi,
> >
> > While 1.4.1 vote on general@incubator is still on going, I’d like to
> > propose that we start preparing 1.5.0 release.
> >
> > 1.5.0 will include changes that dates back to last year and there has
> been
> > a lot of new features and improvements in it, so it will likely time us
> > more time to prepare than 1.4.1. I propose the following timeline:
> > - Cut release branch: release branch already cut. Will sync with master
> > branch on 5/15/2019 EOD.
> > - Code freeze: 5/17/2019. No more changes unless the release branch is in
> > a broken state.
> > - Tag and vote: 5/20/2019 onward.
> >
> > Lai Wei (roywei@) expressed to me offline that he’s willing to help
> drive
> > this release as release manager, and I’m happy to help again as
> committer.
> >
> > If you have features in progress that you’d like to include in 1.5.0:
> > - Add your feature to the scope:
> >
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+and+Status
> > - Indicate in this thread:
> >   - how confident you are about making it happen before the code freeze.
> > If not confident, provide estimate for a more manageable code freeze date
> > so that people can discuss whether to extend the deadline or to skip one
> > release for it.
> > - whether your PR requires more attention to make it happen.
> >
> > Thanks for your attention. Comments and suggestions are also welcome.
> >
> > -sz
>