Re: [DISCUSS] 1.5.0 Release Plan

2019-05-15 Thread Anirudh Subramanian
Hi Lai,

>From the discussion I had with Nvidia offline they are targeting on pushing
the required changes today.
Since this is important feature for the release, if this gets delayed and
cannot  be merged by 05/17/2019,
the code freeze date may need to be changed.

Anirudh

On Wed, May 15, 2019 at 1:23 AM Lv, Tao A  wrote:

> Hi dev,
>
> We see there are several github issues [1][2][3][4] about mxnet windows
> build experience. The team is working intensively [5][6][7] on that to fix
> some problems of MKL-DNN build on windows. We hope these fixes can catch
> the code freeze and finally enter the 1.5.0 release.
>
> The PR against mshadow (#374) was already merged and MXNet PR #14877 is
> under review - great thanks to CI team for helping on the MKL installation
> request. PR #14952 is document change according to build logic changes in
> PR #14877. So I think these two PRs should be merged simultaneously.
> Currently #14877 is experiencing a CI response problem.
>
> Please take your time to have a look at these two PRs. Your comments and
> suggestions are highly appreciated.
>
> Thanks,
> -tao
>
> [1] https://github.com/apache/incubator-mxnet/issues/14670
> [2] https://github.com/apache/incubator-mxnet/issues/14335
> [3] https://github.com/apache/incubator-mxnet/issues/14203
> [4] https://github.com/apache/incubator-mxnet/issues/14085
> [5] https://github.com/apache/incubator-mxnet/pull/14877
> [6] https://github.com/dmlc/mshadow/pull/374
> [7] https://github.com/apache/incubator-mxnet/pull/14952
>
> -Original Message-
> From: Lai Wei [mailto:roywei...@gmail.com]
> Sent: Wednesday, May 15, 2019 2:57 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: [DISCUSS] 1.5.0 Release Plan
>
> Hi Anirudh,
>
> I see there was an offline disucssion
> <
> https://github.com/apache/incubator-mxnet/pull/14173#pullrequestreview-235846341
> >
> and I have updated the AMP feature and your project on the release tracker
> <
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+and+Status
> >
> ,
> Please let me know if you have any updates.
>
> Hi @dev,
> This is a gentle reminder that  the code freeze for 1.5.0 release is on
> 05/17/2019, please let us know if you have any WIP pull requests aiming for
> 1.5.0 that needs attention.
> Please understand we already have around 650 commits in master that need
> to be released in time. We understand TensorRT test in CI is failing and
> are trying to fix it. Meanwhile please update the tracker if there is any
> change:
>
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+and+Status
>
> Thanks!
>
> Lai
>
>
> On Wed, May 8, 2019 at 11:58 AM Anirudh Subramanian  >
> wrote:
>
> > Hi Sheng,
> >
> > I had a discussion with nvidia folks offline today (@ptrendx et. al.).
> > I strongly feel that the AMP feature should be included as part of the
> > release: https://github.com/apache/incubator-mxnet/pull/14173 .
> > The PR is aimed for completion for next week but reviews and RFC
> > discussions may take some time. I would request to extend the release
> > code freeze by 2 weeks.
> > Also, I would like to include
> >
> > https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32
> > +to+Mixed+Precision+Models
> > which
> > depends on the AMP PR.
> > I am also aiming for adding a PR by this week end or early next week,
> > but reviews will take longer than May 17th.
> >
> > Anirudh
> >
> >
> > On Mon, May 6, 2019 at 11:49 PM Sheng Zha  wrote:
> >
> > > Hi,
> > >
> > > While 1.4.1 vote on general@incubator is still on going, I’d like to
> > > propose that we start preparing 1.5.0 release.
> > >
> > > 1.5.0 will include changes that dates back to last year and there
> > > has
> > been
> > > a lot of new features and improvements in it, so it will likely time
> > > us more time to prepare than 1.4.1. I propose the following timeline:
> > > - Cut release branch: release branch already cut. Will sync with
> > > master branch on 5/15/2019 EOD.
> > > - Code freeze: 5/17/2019. No more changes unless the release branch
> > > is in a broken state.
> > > - Tag and vote: 5/20/2019 onward.
> > >
> > > Lai Wei (roywei@) expressed to me offline that he’s willing to help
> > drive
> > > this release as release manager, and I’m happy to help again as
> > committer.
> > >
> > > If you have features in progress that you’d like to include in 1.5.0:
> > > - Add your feature to the scope:
> > >
> > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+a
> > nd+Status
> > > - Indicate in this thread:
> > >   - how confident you are about making it happen before the code
> freeze.
> > > If not confident, provide estimate for a more manageable code freeze
> > > date so that people can discuss whether to extend the deadline or to
> > > skip one release for it.
> > > - whether your PR requires more attention to make it happen.
> > >
> > > Thanks for your attention. Comments and suggestions are also welcome.
> > >
> > > -sz
> >
>


Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Anirudh Subramanian
Hi Junru,

Overall, I appreciate the points you made about the proposal.

Having said that, I would like to remind the Apache Code of Conduct :
https://www.apache.org/foundation/policies/conduct.
"Be empathetic, welcoming, friendly and patient".

I find your tone condescending. Clearly you understand what he meant from
the context whether you prefer to call IR in compilers or data-flow in
distributed systems. You could very well say lets use this terminology to
have a common understanding instead of saying go learn the basic concepts.
Before building a cool brand, its important to build a healthy community.

Anirudh


On Wed, May 15, 2019 at 12:03 AM Junru Shao  wrote:

> Hi Pedro,
>
> I really appreciate that a diligent and talented engineer eagerly wants to
> improve our system, and am very thankful that you have done so much for our
> community. However, I do want to mention some points that I believe I
> should mention.
>
> While I agree with Tianqi that every design has its pros and cons, I would
> love to emphasize that a *good taste* of system design is to optimize the
> bottleneck, enhance expressiveness (and usability), i.e. to do what needs
> doing, rather than *trivial nits* that are irrelevant to either performance
> or expressiveness. Generally speaking, typed or untyped, shared_ptr or
> unique_ptr, won't affect the overall performance when it comes to deep
> learning workload, specially when we have an async scheduler that does good
> latency hiding in MXNet - to me, these are not major issues that are worth
> re-designing our entire system.
>
> To benefit users - real-world ML practitioners, the most thing I would love
> to mention is that dataflow graph-based representation is increasingly
> incapable of modern neural networks, because the increasingly appeared
> structures like arbitrary control flow (w/ continue, break, etc),
> recursion, type conjunction and disjunction, etc. These issues will be our
> priority to address, which is brought by Relay, which addresses all these
> pain points.
>
> Another minor thing I would love to humbly mention is that, for sake of our
> brand, it is our responsibility to be professional about terminologies when
> writing an official proposal on Confluence. As one of the numerous
> examples, the title of the proposal really shocks me for a while, something
> like "operators graph" blah blah so weird. Educate me if I were wrong, but
> compiler community would prefer the term "intermediate representation", and
> distributed system community would prefer "dataflow graph". If you don't
> have knowledge in these fields, a better way for efficient communication is
> to get yourself first familiarize the most basic concepts and then do
> discussion. This is a way to save your own valuable time as well.
>
> Again, thank you so much for your hard work, and hope that we could work
> together to win customers in the future :-)
>
> Thanks,
> Junru
>
>
> On Tue, May 14, 2019 at 8:03 PM Tianqi Chen 
> wrote:
>
> > The core part of the proposal is to move the graph to be much more
> strongly
> > typed template class.
> > I think this is mainly a point of engineering taste, and both sides have
> > pros and cons, let me list them before I share my thoughts on this issue:
> >
> > - Typed fields certainly enjoy more compile-time type checking, on the
> > other hand, it is hard to expose
> >template of explosive possibilities to frontend languages.
> > - More type-erased fields provide runtime flexibility to store
> polymorphic
> > types as well as extensible attributes for graph optimization
> >   - It is hard to use a virtual class to expose every possible attribute
> > that an operator might have, such as inlining, storage pattern, gradient
> > etc..
> >   - The nature of supporting a growing set of operator attribute
> requires a
> > type-erased attrs field.
> > - In contrast to your argument(typing is a blocker to features),
> > type-erased or typed code can both get to the same feature except, except
> > that
> >   typed code gets more compile-time errors while type-erased get some of
> > them in runtime.
> > - Templatized data structures will likely introduce additional metal
> > burdens to developers and are not really suitable as a core data
> structure
> >- Because they imply an explosive number of possible data structures,
> > while the core data structure should be a single one.
> >
> > Now my view(as an MXNet PMC member) on typed vs type-erased style: If
> MXNet
> > is a pure C++ project, I might take more of the typed approach.
> > However, MXNet itself is a project that takes python/scala/clojure and
> > other frontend languages.
> > The introduction of more typing may not align with the original goal as
> the
> > tradeoffs I listed above.
> >
> > This proposal is really a drastic change of what NNVM does, as well as
> the
> > optimization passes, and given the scope, in your analogy, "a new vehicle
> > to solve all the problems"
> > rather than a 

Re: Requesting slack access

2019-05-08 Thread Anirudh Subramanian
Sent invite!

On Wed, May 8, 2019 at 6:43 AM Sem  wrote:

> Requesting slack access
>
>


Re: [DISCUSS] 1.5.0 Release Plan

2019-05-08 Thread Anirudh Subramanian
Hi Sheng,

I had a discussion with nvidia folks offline today (@ptrendx et. al.). I
strongly feel that the AMP feature should be included as part of the
release: https://github.com/apache/incubator-mxnet/pull/14173 .
The PR is aimed for completion for next week but reviews and RFC
discussions may take some time. I would request to extend the release code
freeze by 2 weeks.
Also, I would like to include
https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32+to+Mixed+Precision+Models
which
depends on the AMP PR.
I am also aiming for adding a PR by this week end or early next week, but
reviews will take longer than May 17th.

Anirudh


On Mon, May 6, 2019 at 11:49 PM Sheng Zha  wrote:

> Hi,
>
> While 1.4.1 vote on general@incubator is still on going, I’d like to
> propose that we start preparing 1.5.0 release.
>
> 1.5.0 will include changes that dates back to last year and there has been
> a lot of new features and improvements in it, so it will likely time us
> more time to prepare than 1.4.1. I propose the following timeline:
> - Cut release branch: release branch already cut. Will sync with master
> branch on 5/15/2019 EOD.
> - Code freeze: 5/17/2019. No more changes unless the release branch is in
> a broken state.
> - Tag and vote: 5/20/2019 onward.
>
> Lai Wei (roywei@) expressed to me offline that he’s willing to help drive
> this release as release manager, and I’m happy to help again as committer.
>
> If you have features in progress that you’d like to include in 1.5.0:
> - Add your feature to the scope:
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+and+Status
> - Indicate in this thread:
>   - how confident you are about making it happen before the code freeze.
> If not confident, provide estimate for a more manageable code freeze date
> so that people can discuss whether to extend the deadline or to skip one
> release for it.
> - whether your PR requires more attention to make it happen.
>
> Thanks for your attention. Comments and suggestions are also welcome.
>
> -sz


Re: [VOTE] Release Apache MXNet (incubating) version 1.4.1.rc0

2019-05-04 Thread Anirudh Subramanian
No worries, maybe its just something with my setup.
Moving my vote to +0, pending someone else check.

On Fri, May 3, 2019 at 11:39 PM Junru Shao  wrote:

> Hi Anirudh,
>
> Thanks for reporting this!
>
> I verified on my EC2 machine for the second time. It perfectly builds with
> your commands. It is a bit weird...I noticed that there is a subtle
> difference that my ninja progress bar is like "[xxx/506]", while yours is
> "[xxx/488]". I am not sure if there is anything different between our
> settings.
>
> My understanding is that cmake should work because it is tested in our CI
> system under "ci/jenkins/incubator-mxnet" (
>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.4.x/201/pipeline
> ).
>
> It will be much appreciated if someone could help confirm whether cmake
> works on their side.
>
> Thanks,
> Junru
>
>
> On Fri, May 3, 2019 at 9:43 PM Anirudh Subramanian 
> wrote:
>
> > Hi Junru,
> >
> > I am on v1.4.x , and my dmlc-core commit is this one :
> >
> >
> https://github.com/dmlc/dmlc-core/tree/0a0e8addf92e1287fd7a25c6314016b8c0138dee
> >
> > Anirudh
> >
> > On Fri, May 3, 2019 at 8:30 PM Junru Shao 
> wrote:
> >
> > > Hey Anirudh,
> > >
> > > Although the vote has been closed, I am very interested in digging into
> > > this issue.
> > >
> > > I build on my EC2 machine using your instructions, and it seems that
> > > everything is working fine...
> > >
> > > Then, I noticed that your issue seems to be related to unittests in
> > > dmlc-core, not in mxnet. Could you kindly check the submodule git hash?
> > > Also, could you check if you are testing on v1.4.x branch?
> > >
> > > Thanks,
> > > Junru
> > >
> > >
> > >
> > > On Fri, May 3, 2019 at 4:33 PM Anirudh Subramanian <
> > anirudh2...@gmail.com>
> > > wrote:
> > >
> > > > -1 (binding)
> > > >
> > > > Is the cmake build failing for the 1.4.1 release tag ? Is this a
> known
> > > > issue ?
> > > >
> > > > Did the following:
> > > >
> > > > cd build && cmake VERBOSE=1 -DUSE_CUDA=ON -DUSE_CUDNN=ON
> > -DUSE_OPENMP=ON
> > > > -DCMAKE_BUILD_TYPE=Debug -DUSE_DIST_KVSTORE=0 -DUSE_OPENCV=1
> > > > -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DCUDNN_ROOT=/usr/local/cuda
> > > > -DUSE_MKLDNN=1 -DUSE_MKL_IF_AVAILABLE=1 -DUSE_MKLML_MKL=1
> -DUSE_ASAN=0
> > > > -GNinja -DUSE_OPERATOR_TUNING=1 -DUSE_CPP_PACKAGE=0
> > -DCUDA_ARCH_NAME=Auto
> > > > .. && ninja -v
> > > >
> > > > [272/488] : && /usr/bin/c++   -Wall -Wno-unknown-pragmas -fPIC -g -O0
> > > > -msse2 -std=c++11 -fopenmp -g  -pthread
> > > >
> > > >
> > >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_lockfree.cc.o
> > > >
> > > >
> > >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_param.cc.o
> > > >
> > > >
> > >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_parser.cc.o
> > > >
> > > >
> > >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_array_view.cc.o
> > > >
> > > >
> > >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_any.cc.o
> > > >
> > > >
> > >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_config.cc.o
> > > >
> > > >
> > >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_threaditer.cc.o
> > > >
> > > >
> > >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_serializer.cc.o
> > > >
> > > >
> > >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_threaditer_exc_handling.cc.o
> > > >
> > > >
> > >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_inputsplit.cc.o
> > > >
> > > >
> > >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_logging.cc.o
> > > >
> > > >
> > >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFile

Re: [VOTE] Release Apache MXNet (incubating) version 1.4.1.rc0

2019-05-03 Thread Anirudh Subramanian
Hi Junru,

I am on v1.4.x , and my dmlc-core commit is this one :
https://github.com/dmlc/dmlc-core/tree/0a0e8addf92e1287fd7a25c6314016b8c0138dee

Anirudh

On Fri, May 3, 2019 at 8:30 PM Junru Shao  wrote:

> Hey Anirudh,
>
> Although the vote has been closed, I am very interested in digging into
> this issue.
>
> I build on my EC2 machine using your instructions, and it seems that
> everything is working fine...
>
> Then, I noticed that your issue seems to be related to unittests in
> dmlc-core, not in mxnet. Could you kindly check the submodule git hash?
> Also, could you check if you are testing on v1.4.x branch?
>
> Thanks,
> Junru
>
>
>
> On Fri, May 3, 2019 at 4:33 PM Anirudh Subramanian 
> wrote:
>
> > -1 (binding)
> >
> > Is the cmake build failing for the 1.4.1 release tag ? Is this a known
> > issue ?
> >
> > Did the following:
> >
> > cd build && cmake VERBOSE=1 -DUSE_CUDA=ON -DUSE_CUDNN=ON -DUSE_OPENMP=ON
> > -DCMAKE_BUILD_TYPE=Debug -DUSE_DIST_KVSTORE=0 -DUSE_OPENCV=1
> > -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DCUDNN_ROOT=/usr/local/cuda
> > -DUSE_MKLDNN=1 -DUSE_MKL_IF_AVAILABLE=1 -DUSE_MKLML_MKL=1 -DUSE_ASAN=0
> > -GNinja -DUSE_OPERATOR_TUNING=1 -DUSE_CPP_PACKAGE=0 -DCUDA_ARCH_NAME=Auto
> > .. && ninja -v
> >
> > [272/488] : && /usr/bin/c++   -Wall -Wno-unknown-pragmas -fPIC -g -O0
> > -msse2 -std=c++11 -fopenmp -g  -pthread
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_lockfree.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_param.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_parser.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_array_view.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_any.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_config.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_threaditer.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_serializer.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_threaditer_exc_handling.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_inputsplit.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_logging.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_json.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_optional.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_main.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_env.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_thread_group.cc.o
> > -o 3rdparty/dmlc-core/test/unittest/dmlc_unit_tests  -rdynamic
> > lib/libgtestd.a 3rdparty/dmlc-core/libdmlc.a -lpthread && :
> > FAILED: : && /usr/bin/c++   -Wall -Wno-unknown-pragmas -fPIC -g -O0
> -msse2
> > -std=c++11 -fopenmp -g  -pthread
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_lockfree.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_param.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_parser.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_array_view.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_any.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_config.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_threaditer.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_serializer.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_threaditer_exc_handling.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_inputsplit.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_logging.cc.o
> >
> >
> 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_json.cc.o
> >
>

Re: [VOTE] Release Apache MXNet (incubating) version 1.4.1.rc0

2019-05-03 Thread Anirudh Subramanian
-1 (binding)

Is the cmake build failing for the 1.4.1 release tag ? Is this a known
issue ?

Did the following:

cd build && cmake VERBOSE=1 -DUSE_CUDA=ON -DUSE_CUDNN=ON -DUSE_OPENMP=ON
-DCMAKE_BUILD_TYPE=Debug -DUSE_DIST_KVSTORE=0 -DUSE_OPENCV=1
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DCUDNN_ROOT=/usr/local/cuda
-DUSE_MKLDNN=1 -DUSE_MKL_IF_AVAILABLE=1 -DUSE_MKLML_MKL=1 -DUSE_ASAN=0
-GNinja -DUSE_OPERATOR_TUNING=1 -DUSE_CPP_PACKAGE=0 -DCUDA_ARCH_NAME=Auto
.. && ninja -v

[272/488] : && /usr/bin/c++   -Wall -Wno-unknown-pragmas -fPIC -g -O0
-msse2 -std=c++11 -fopenmp -g  -pthread
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_lockfree.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_param.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_parser.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_array_view.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_any.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_config.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_threaditer.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_serializer.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_threaditer_exc_handling.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_inputsplit.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_logging.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_json.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_optional.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_main.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_env.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_thread_group.cc.o
-o 3rdparty/dmlc-core/test/unittest/dmlc_unit_tests  -rdynamic
lib/libgtestd.a 3rdparty/dmlc-core/libdmlc.a -lpthread && :
FAILED: : && /usr/bin/c++   -Wall -Wno-unknown-pragmas -fPIC -g -O0 -msse2
-std=c++11 -fopenmp -g  -pthread
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_lockfree.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_param.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_parser.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_array_view.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_any.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_config.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_threaditer.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_serializer.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_threaditer_exc_handling.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_inputsplit.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_logging.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_json.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_optional.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_main.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_env.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_thread_group.cc.o
-o 3rdparty/dmlc-core/test/unittest/dmlc_unit_tests  -rdynamic
lib/libgtestd.a 3rdparty/dmlc-core/libdmlc.a -lpthread && :
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_logging.cc.o:
In function `Logging_basics_Test::TestBody()':
/home/ubuntu/experimentals/master_mxnet/build/../3rdparty/dmlc-core/test/unittest/unittest_logging.cc:19:
undefined reference to `testing::internal::DeathTest::Create(char const*,
testing::internal::RE const*, char const*, int,
testing::internal::DeathTest**)'
collect2: error: ld returned 1 exit status

Anirudh

On Fri, May 3, 2019 at 8:04 AM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> No problem Damien, glad to have you helping us validating the release.
> Just wanted to make suer we have enough votes to pass the general vote (the
> next release step) and with Sheng I think we should.
>
> On Fri, May 3, 2019 at 7:52 AM Damien Stanton 
> wrote:
>
> > Ah, I misunderstood the binding/non-binding distinction. I am not a PPMC
> > member, so my vote is non-binding.
> >
> > Best,
> > Damien
> >
> > On Fri, May 3, 2019 at 3:19 AM kellen sunderland <
> > kellen.sunderl...@gmail.com> wrote:
> >
> > > Hi Junru could you give a quick summary of the binding / non-binding
> > votes.
> > >
> > > Damien just want to confirm, are you a member of the PPMC for MXNet?
> > > Usually committers or community members (like 

Re: Proposal for Conversion from FP32 to Mixed Precision Models

2019-04-30 Thread Anirudh Subramanian
Hi Tao,

I covered in the doc that it is specifically about inference. I can add
another section in FAQ to mention why INT8 quantization is not included.

Anirudh

On Tue, Apr 30, 2019 at 7:59 AM Lv, Tao A  wrote:

> Thank you Anirudh! I'm just a little surprised that when we talk about
> mixed precision model we don't talk about training, and when talk about
> inference, INT8 quantization is not mentioned~
>
> -Original Message-
> From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> Sent: Tuesday, April 30, 2019 8:27 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Proposal for Conversion from FP32 to Mixed Precision Models
>
> Hi Zach,
>
> I checked the QuantizeGraph pass and I think probably it can benefit from
> CSE pass to eliminate additional quantize/quantize_v2 nodes. Having said
> that, I think it may still be an overkill to add another NNVM pass to have
> a generic common subexpression elimination pass. Currently, this
> elimination logic takes only additional 3 to 6 lines of code in each of the
> two NNVM pass. Also, a generic common subexpression elimination has its own
> associated maintenance costs. I think it is better to continue with the
> current approach and revisit this need in the future as we add more NNVM
> passes.
>
> Anirudh
>
> On Mon, Apr 29, 2019 at 2:22 PM Anirudh Subramanian  >
> wrote:
>
> > Hi Zach,
> >
> > You raise an interesting point. Thank you for the pointer!
> >
> > Incorporating CSE pass comes with its own cost, and the advantage it
> > brings is to make the ReducePrecision nnvm pass more lightweight.
> > Since the amortized cost of the ReducePrecision pass is O(1) it
> > shouldn't matter much whether we  add it or not from performance point
> of view.
> >
> > From maintenance point of view, I would agree that separating these
> > two logics can be helpful if we have other such workflows which
> > require the original Pass followed by CSE pass. Currently, as far as I
> > know only the ReducePrecision pass using it. I will check to see if
> > CSE pass can benefit other NNVM pass also like quantization pass apart
> > from ReducePrecision, and will get back.
> >
> > Anirudh
> >
> > On Mon, Apr 29, 2019 at 11:18 AM Zach Kimberg
> > 
> > wrote:
> >
> >> I have one suggestion. In the current design, there are the
> >> additional maps from each input entry to each target casted entry
> >> dtype in order to avoid creating duplicate casts. Instead of creating
> >> these, another option is to use a general purpose Common
> >> Subexpression Elimination (CSE) [1] pass to apply afterwards. So, you
> >> would run the mixed precision pass which creates the duplicates and
> >> then the CSE pass which would remove all duplicates.
> >>
> >> This design is common in existing compilers like LLVM because
> >> maintaining and testing the passes is much easier when they are kept
> >> as simple as possible. The CSE can also be reused as necessary for
> >> other passes that could create duplicates or to remove duplicate
> expressions in general.
> >> This
> >> tutorial [2] talks about it a bit.
> >>
> >> Zach
> >>
> >> [1] - https://en.wikipedia.org/wiki/Common_subexpression_elimination
> >> [2] - https://blog.regehr.org/archives/1603
> >>
> >> On Mon, Apr 29, 2019 at 9:26 AM Anirudh Subramanian <
> >> anirudh2...@gmail.com>
> >> wrote:
> >>
> >> > Hi Tao,
> >> >
> >> > Thanks for raising this question! I thought about the existing
> >> quantization
> >> > workflow and whether it can be included with the AMP API. Although
> >> > quantization can be considered as mixed precision, there are
> >> differences.
> >> > For example, only a small number of operators can be quantized
> >> > compared
> >> to
> >> > the operators that can run in FP16 precision. Thus, overriding the
> >> > operators to run in original dtype vs target dtype doesnt make much
> >> sense
> >> > for quantization.
> >> >
> >> > Also, quantization workflow may require a calibration dataset to
> >> calibrate
> >> > the min and max and calib_mode.
> >> > Arriving at a common API, for quantization with calibration and
> >> > mixed precision inference (FP16 and BF16) may make the API too
> >> > complicated and not very easy to use. I understand that this may
> >> > cause some confusion as people may try to use target_dtype of in

Re: Proposal for Conversion from FP32 to Mixed Precision Models

2019-04-30 Thread Anirudh Subramanian
Hi Zach,

I checked the QuantizeGraph pass and I think probably it can benefit from
CSE pass to eliminate additional quantize/quantize_v2 nodes. Having said
that, I think it may still be an overkill to add another NNVM pass to have
a generic common subexpression elimination pass. Currently, this
elimination logic takes only additional 3 to 6 lines of code in each of the
two NNVM pass. Also, a generic common subexpression elimination has its own
associated maintenance costs. I think it is better to continue with the
current approach and revisit this need in the future as we add more NNVM
passes.

Anirudh

On Mon, Apr 29, 2019 at 2:22 PM Anirudh Subramanian 
wrote:

> Hi Zach,
>
> You raise an interesting point. Thank you for the pointer!
>
> Incorporating CSE pass comes with its own cost, and the advantage it
> brings is to make the ReducePrecision nnvm pass more lightweight. Since the
> amortized cost of the ReducePrecision pass is O(1) it shouldn't matter much
> whether we  add it or not from performance point of view.
>
> From maintenance point of view, I would agree that separating these two
> logics can be helpful if we have other such workflows which require the
> original Pass followed by CSE pass. Currently, as far as I know only the
> ReducePrecision pass using it. I will check to see if CSE pass can benefit
> other NNVM pass also like quantization pass apart from ReducePrecision, and
> will get back.
>
> Anirudh
>
> On Mon, Apr 29, 2019 at 11:18 AM Zach Kimberg 
> wrote:
>
>> I have one suggestion. In the current design, there are the additional
>> maps
>> from each input entry to each target casted entry dtype in order to avoid
>> creating duplicate casts. Instead of creating these, another option is to
>> use a general purpose Common Subexpression Elimination (CSE) [1] pass to
>> apply afterwards. So, you would run the mixed precision pass which creates
>> the duplicates and then the CSE pass which would remove all duplicates.
>>
>> This design is common in existing compilers like LLVM because maintaining
>> and testing the passes is much easier when they are kept as simple as
>> possible. The CSE can also be reused as necessary for other passes that
>> could create duplicates or to remove duplicate expressions in general.
>> This
>> tutorial [2] talks about it a bit.
>>
>> Zach
>>
>> [1] - https://en.wikipedia.org/wiki/Common_subexpression_elimination
>> [2] - https://blog.regehr.org/archives/1603
>>
>> On Mon, Apr 29, 2019 at 9:26 AM Anirudh Subramanian <
>> anirudh2...@gmail.com>
>> wrote:
>>
>> > Hi Tao,
>> >
>> > Thanks for raising this question! I thought about the existing
>> quantization
>> > workflow and whether it can be included with the AMP API. Although
>> > quantization can be considered as mixed precision, there are
>> differences.
>> > For example, only a small number of operators can be quantized compared
>> to
>> > the operators that can run in FP16 precision. Thus, overriding the
>> > operators to run in original dtype vs target dtype doesnt make much
>> sense
>> > for quantization.
>> >
>> > Also, quantization workflow may require a calibration dataset to
>> calibrate
>> > the min and max and calib_mode.
>> > Arriving at a common API, for quantization with calibration and mixed
>> > precision inference (FP16 and BF16) may make the API too complicated and
>> > not very easy to use. I understand that this may cause some confusion as
>> > people may try to use target_dtype of int8 but I think its still better
>> > than causing user confusion with the API usage.
>> >
>> > Also, when we move quantize_model APIs outside contrib we can consider
>> > adding them under AMP namespace. The challenge would then be to educate
>> > users on difference between "quantize" and "convert".
>> >
>> > Anirudh
>> >
>> > On Mon, Apr 29, 2019 at 7:45 AM Lv, Tao A  wrote:
>> >
>> > > Thank you for the explanation. Sorry I didn't realize the proposal is
>> for
>> > > inference only.
>> > >
>> > > Then how do you think the amp_cast and amp_multicast in this proposal
>> can
>> > > work with the existing INT8 quantization workflow which I think should
>> > also
>> > > be considered as 'mixed precision'.
>> > >
>> > > -Original Message-
>> > > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
>> > > Sent: Monday, April 29, 2019 10:25 PM
>> > > To: dev@

Re: Proposal for Conversion from FP32 to Mixed Precision Models

2019-04-29 Thread Anirudh Subramanian
Hi Zach,

You raise an interesting point. Thank you for the pointer!

Incorporating CSE pass comes with its own cost, and the advantage it brings
is to make the ReducePrecision nnvm pass more lightweight. Since the
amortized cost of the ReducePrecision pass is O(1) it shouldn't matter much
whether we  add it or not from performance point of view.

>From maintenance point of view, I would agree that separating these two
logics can be helpful if we have other such workflows which require the
original Pass followed by CSE pass. Currently, as far as I know only the
ReducePrecision pass using it. I will check to see if CSE pass can benefit
other NNVM pass also like quantization pass apart from ReducePrecision, and
will get back.

Anirudh

On Mon, Apr 29, 2019 at 11:18 AM Zach Kimberg 
wrote:

> I have one suggestion. In the current design, there are the additional maps
> from each input entry to each target casted entry dtype in order to avoid
> creating duplicate casts. Instead of creating these, another option is to
> use a general purpose Common Subexpression Elimination (CSE) [1] pass to
> apply afterwards. So, you would run the mixed precision pass which creates
> the duplicates and then the CSE pass which would remove all duplicates.
>
> This design is common in existing compilers like LLVM because maintaining
> and testing the passes is much easier when they are kept as simple as
> possible. The CSE can also be reused as necessary for other passes that
> could create duplicates or to remove duplicate expressions in general. This
> tutorial [2] talks about it a bit.
>
> Zach
>
> [1] - https://en.wikipedia.org/wiki/Common_subexpression_elimination
> [2] - https://blog.regehr.org/archives/1603
>
> On Mon, Apr 29, 2019 at 9:26 AM Anirudh Subramanian  >
> wrote:
>
> > Hi Tao,
> >
> > Thanks for raising this question! I thought about the existing
> quantization
> > workflow and whether it can be included with the AMP API. Although
> > quantization can be considered as mixed precision, there are differences.
> > For example, only a small number of operators can be quantized compared
> to
> > the operators that can run in FP16 precision. Thus, overriding the
> > operators to run in original dtype vs target dtype doesnt make much sense
> > for quantization.
> >
> > Also, quantization workflow may require a calibration dataset to
> calibrate
> > the min and max and calib_mode.
> > Arriving at a common API, for quantization with calibration and mixed
> > precision inference (FP16 and BF16) may make the API too complicated and
> > not very easy to use. I understand that this may cause some confusion as
> > people may try to use target_dtype of int8 but I think its still better
> > than causing user confusion with the API usage.
> >
> > Also, when we move quantize_model APIs outside contrib we can consider
> > adding them under AMP namespace. The challenge would then be to educate
> > users on difference between "quantize" and "convert".
> >
> > Anirudh
> >
> > On Mon, Apr 29, 2019 at 7:45 AM Lv, Tao A  wrote:
> >
> > > Thank you for the explanation. Sorry I didn't realize the proposal is
> for
> > > inference only.
> > >
> > > Then how do you think the amp_cast and amp_multicast in this proposal
> can
> > > work with the existing INT8 quantization workflow which I think should
> > also
> > > be considered as 'mixed precision'.
> > >
> > > -Original Message-
> > > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> > > Sent: Monday, April 29, 2019 10:25 PM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: Re: Proposal for Conversion from FP32 to Mixed Precision
> Models
> > >
> > > Hi Tao,
> > >
> > > The APIs proposed: "convert_model" and "convert_block" are mainly for
> > > inference use cases, where customers bring a FP32 model to convert it
> to
> > a
> > > mixed precision model to get improved performance while not losing out
> on
> > > the accuracy.
> > > The PR: https://github.com/apache/incubator-mxnet/pull/14173 is
> supposed
> > > to handle the training use cases and this proposal doesn't cover the
> AMP
> > > feature added in the PR. I think ptrendx@ and canoerst@ are better
> > > equipped to answer questions 1 and 2.
> > >
> > > > - more generally, what will be saved when users want to serialize
> > > > their
> > > model to disk?
> > >
> > > Lets say users want to save converted mixed precision model used for
> > 

Re: Proposal for Conversion from FP32 to Mixed Precision Models

2019-04-29 Thread Anirudh Subramanian
Hi Tao,

Thanks for raising this question! I thought about the existing quantization
workflow and whether it can be included with the AMP API. Although
quantization can be considered as mixed precision, there are differences.
For example, only a small number of operators can be quantized compared to
the operators that can run in FP16 precision. Thus, overriding the
operators to run in original dtype vs target dtype doesnt make much sense
for quantization.

Also, quantization workflow may require a calibration dataset to calibrate
the min and max and calib_mode.
Arriving at a common API, for quantization with calibration and mixed
precision inference (FP16 and BF16) may make the API too complicated and
not very easy to use. I understand that this may cause some confusion as
people may try to use target_dtype of int8 but I think its still better
than causing user confusion with the API usage.

Also, when we move quantize_model APIs outside contrib we can consider
adding them under AMP namespace. The challenge would then be to educate
users on difference between "quantize" and "convert".

Anirudh

On Mon, Apr 29, 2019 at 7:45 AM Lv, Tao A  wrote:

> Thank you for the explanation. Sorry I didn't realize the proposal is for
> inference only.
>
> Then how do you think the amp_cast and amp_multicast in this proposal can
> work with the existing INT8 quantization workflow which I think should also
> be considered as 'mixed precision'.
>
> -Original Message-
> From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> Sent: Monday, April 29, 2019 10:25 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Proposal for Conversion from FP32 to Mixed Precision Models
>
> Hi Tao,
>
> The APIs proposed: "convert_model" and "convert_block" are mainly for
> inference use cases, where customers bring a FP32 model to convert it to a
> mixed precision model to get improved performance while not losing out on
> the accuracy.
> The PR: https://github.com/apache/incubator-mxnet/pull/14173 is supposed
> to handle the training use cases and this proposal doesn't cover the AMP
> feature added in the PR. I think ptrendx@ and canoerst@ are better
> equipped to answer questions 1 and 2.
>
> > - more generally, what will be saved when users want to serialize
> > their
> model to disk?
>
> Lets say users want to save converted mixed precision model used for
> inference to disk. It will save both, the symbol with the amp_cast and
> amp_multicast operators and the params (which are casted if necessary).
>
> Anirudh
>
>
> On Mon, Apr 29, 2019 at 6:55 AM Lv, Tao A  wrote:
>
> > Thank you for sharing this, Anirudh.
> >
> > Curious to know:
> > - what will be saved in a training checkpoint or snapshot? Can it be
> > resumed on another platform which might not support the lower
> > precision the previous one used?
> > - what will be saved in the final symbol.json and params file when
> > training is finished?
> > - more generally, what will be saved when users want to serialize
> > their model to disk?
> >
> > Thank you,
> > -tao
> >
> > -Original Message-
> > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> > Sent: Monday, April 29, 2019 7:00 PM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Proposal for Conversion from FP32 to Mixed Precision Models
> >
> > Hi all,
> >
> > I have created a doc for conversion from FP32 to Mixed Precision Models:
> >
> > https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32
> > +to+Mixed+Precision+Models
> >
> > I look forward to your feedback on the same.
> >
> > Thanks,
> > Anirudh
> >
>


Re: Proposal for Conversion from FP32 to Mixed Precision Models

2019-04-29 Thread Anirudh Subramanian
Hi Tao,

The APIs proposed: "convert_model" and "convert_block" are mainly for
inference use cases, where customers bring a FP32 model to convert it to a
mixed precision model to get improved performance while not losing out on
the accuracy.
The PR: https://github.com/apache/incubator-mxnet/pull/14173 is supposed to
handle the training use cases and this proposal doesn't cover the AMP
feature added in the PR. I think ptrendx@ and canoerst@ are better equipped
to answer questions 1 and 2.

> - more generally, what will be saved when users want to serialize their
model to disk?

Lets say users want to save converted mixed precision model used for
inference to disk. It will save both, the symbol with the amp_cast and
amp_multicast operators and the params (which are casted if necessary).

Anirudh


On Mon, Apr 29, 2019 at 6:55 AM Lv, Tao A  wrote:

> Thank you for sharing this, Anirudh.
>
> Curious to know:
> - what will be saved in a training checkpoint or snapshot? Can it be
> resumed on another platform which might not support the lower precision the
> previous one used?
> - what will be saved in the final symbol.json and params file when
> training is finished?
> - more generally, what will be saved when users want to serialize their
> model to disk?
>
> Thank you,
> -tao
>
> -Original Message-
> From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> Sent: Monday, April 29, 2019 7:00 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Proposal for Conversion from FP32 to Mixed Precision Models
>
> Hi all,
>
> I have created a doc for conversion from FP32 to Mixed Precision Models:
>
> https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32+to+Mixed+Precision+Models
>
> I look forward to your feedback on the same.
>
> Thanks,
> Anirudh
>


Proposal for Conversion from FP32 to Mixed Precision Models

2019-04-29 Thread Anirudh Subramanian
Hi all,

I have created a doc for conversion from FP32 to Mixed Precision Models:
https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32+to+Mixed+Precision+Models

I look forward to your feedback on the same.

Thanks,
Anirudh


[Announcement] New Committer - Wang Jiajun

2019-04-16 Thread Anirudh Subramanian
Hi,

Please join me to welcome Wang Jiajun (https://github.com/arcadiaphy) as a
new committer of Apache (incubating) MXNet!

Wang has been solving some tough bugs with respect to memory leaks, process
fork handling, dependency engine issues and custom op exception handling.

Issue Involvement:
https://github.com/apache/incubator-mxnet/issues?utf8=%E2%9C%93=is%3Aissue+involves%3Aarcadiaphy

PRs authored:
https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93=is%3Apr+author%3Aarcadiaphy+

Anirudh


Re: Implementing zero-dim and zero-size tensors in MXNet and its impact on your codebases

2019-04-11 Thread Anirudh Subramanian
Hi Marco,

The backend private APIs in engine, executor, storage, ndarray etc. can
still be changed.
I understand that it may introduce code duplication, but introducing
duplicate C APIs can still be better than the backend
developer having to worry about different frontends. Not to mention a
frontend which is not yet merged to the
repo but in its own repo, these repos should also be considered consumers
of MXNet API.

Anirudh

On Thu, Apr 11, 2019 at 12:12 PM Marco de Abreu 
wrote:

> Good point about the adoption speed for the different frontends, Anirudh.
> While this is a quite valid argument, I'm afraid of the complexity it might
> introduce as well as a risk of further diverging frontend functionality.
>
> I'd rather propose that we introduce a guideline to follow when changes to
> C-APIs are being made. Part of that could be starting a thread like this
> one that lays down the changes that are being made to the C-API. We could
> then coordinate the changes to the different frontends and gather people
> from the community who feel comfortable to do the changes in the respective
> frontends. If nobody speaks up, the original proposer of that change could
> be responsible to do the necessary changes.
>
> An adjacent topic for this discussion could be test coverage: We currently
> have no tools to determine which frontend hits which C-API and where
> changes have to be made. This might be a topic we should spark up again
> separately.
>
> -Marco
>
> On Thu, Apr 11, 2019 at 8:55 PM Marco de Abreu 
> wrote:
>
> > My personal opinion towards that discussion is that we should keep the
> > C-API free from semantic versioning because otherwise we're introducing
> two
> > "fronts" that we have to maintain backwards compatibility for. By the
> way,
> > currently, we have no way to verify and guarantee the compatibility of
> the
> > C-API. The major issue I'd see with adding SemVer for the C-API is that
> > this would increase the complexity of changes that are (in my opinion)
> > entirely internal to MXNet by introducing another thing that developers
> > would have to look out for - possibly introducing code duplication as
> > described by Jun while not providing any clear benefits to me.
> >
> > If there is a use-case where people can not even use our C++ package,
> then
> > we could have discussions about introducing a user-facing C-API, but
> right
> > now this approach to interface with our C-API (although I know that
> people
> > use it) seem a bit like using undocumented Windows APIs: They work, but
> > it's on your own risk, they might always break and there's no guarantee.
> >
> > -Marco
> >
> > On Thu, Apr 11, 2019 at 8:52 PM Anirudh Subramanian <
> anirudh2...@gmail.com>
> > wrote:
> >
> >> Hi Jun,
> >>
> >> Till now from what I have observed this has been an undocumented
> guideline
> >> to not break C APIs (example:
> >>
> https://github.com/apache/incubator-mxnet/pull/11429#discussion_r199564999
> >> ).
> >> Although the C APIs are supposed to serve only as bridges for frontend
> >> language bindings (exception being C Predict API), I think there are 3rd
> >> party libraries like Horovod which are starting to
> >> depend on it (https://github.com/apache/incubator-mxnet/pull/14615) .
> >>
> >> Also, since MXNet has a lot of frontend bindings ensuring backward
> >> compatibility with semver can help frontend bindings adopt the new APIs
> at
> >> their own pace.
> >>
> >> Anirudh
> >>
> >>
> >> On Thu, Apr 11, 2019 at 10:58 AM Jun Wu  wrote:
> >>
> >> > I'm not sure about whether C APIs should fall under semver. This is
> the
> >> > discussion we would like to have with the community.
> >> >
> >> > My thinking on this:
> >> > 1. In most of the cases, C APIs only serve as bridges between frontend
> >> > language bindings and C++ backend. Most of users/developers do not
> >> interact
> >> > directly with C APIs.
> >> > 2. The cases I can think of where C APIs are directly adopted in
> >> > application development are model deployment in a C/C++ environment.
> In
> >> > those cases, developers only interact with C Predict APIs, which we
> >> didn't
> >> > touch.
> >> >
> >> > If the community feel that we are obliged to keep the semver for all C
> >> > APIs, we can try to make a copy of the C APIs we intend to modify in
> >> the PR
> >> > and keep the old signatures intact, t

Re: Implementing zero-dim and zero-size tensors in MXNet and its impact on your codebases

2019-04-11 Thread Anirudh Subramanian
Hi Jun,

Till now from what I have observed this has been an undocumented guideline
to not break C APIs (example:
https://github.com/apache/incubator-mxnet/pull/11429#discussion_r199564999).
Although the C APIs are supposed to serve only as bridges for frontend
language bindings (exception being C Predict API), I think there are 3rd
party libraries like Horovod which are starting to
depend on it (https://github.com/apache/incubator-mxnet/pull/14615) .

Also, since MXNet has a lot of frontend bindings ensuring backward
compatibility with semver can help frontend bindings adopt the new APIs at
their own pace.

Anirudh


On Thu, Apr 11, 2019 at 10:58 AM Jun Wu  wrote:

> I'm not sure about whether C APIs should fall under semver. This is the
> discussion we would like to have with the community.
>
> My thinking on this:
> 1. In most of the cases, C APIs only serve as bridges between frontend
> language bindings and C++ backend. Most of users/developers do not interact
> directly with C APIs.
> 2. The cases I can think of where C APIs are directly adopted in
> application development are model deployment in a C/C++ environment. In
> those cases, developers only interact with C Predict APIs, which we didn't
> touch.
>
> If the community feel that we are obliged to keep the semver for all C
> APIs, we can try to make a copy of the C APIs we intend to modify in the PR
> and keep the old signatures intact, this will introduce a lot of duplicate
> code though.
>
> On Thu, Apr 11, 2019 at 8:50 AM Anirudh Subramanian  >
> wrote:
>
> > I was under the impression that C API does fall under semver. Has this
> been
> > discussed somewhere before ? Is this also the case for C Predict API ?
> >
> > On Thu, Apr 11, 2019, 8:08 AM Marco de Abreu 
> > wrote:
> >
> > > In case only changes to the c-api are being made, it doesn't fall under
> > our
> > > semantic versioning since that's not a user facing API and thus I'd be
> in
> > > favour as doing it as part of a minor release. If there is any
> > behavioural
> > > change from a user perspective (a good indicator would be if tests have
> > to
> > > be changed as reaction to the Backend changes), then I'd prefer a major
> > > release.
> > >
> > > I'd slightly prefer a minor release since this change touches quite a
> few
> > > parts and could risk being outdated/diverged as the time until 2.0
> > > progresses.
> > >
> > > -Marco
> > >
> > > Aaron Markham  schrieb am Do., 11. Apr.
> 2019,
> > > 16:28:
> > >
> > > > Just curious about when this kind of change will land. Would it wait
> > for
> > > > 2.0 or would it be in 1.5 or another minor release?
> > > >
> > > > On Thu, Apr 11, 2019, 00:15 Junru Shao 
> > wrote:
> > > >
> > > > > Really nice improvement over MXNet's usability! I suggest that we
> > could
> > > > > make numpy-compatible behavior default in 2.0.
> > > > >
> > > > > On Wed, Apr 10, 2019 at 11:34 PM Jun Wu 
> wrote:
> > > > >
> > > > > > Dear Community,
> > > > > >
> > > > > > A while ago, we sent out an RFC
> > > > > > <https://github.com/apache/incubator-mxnet/issues/14253>
> > discussing
> > > > the
> > > > > > initiative introducing NumPy compatibility into MXNet. As the
> first
> > > > > outcome
> > > > > > of this initiative, we submitted the PR
> > > > > > <https://github.com/apache/incubator-mxnet/pull/14661> providing
> > the
> > > > > > infrastructure of supporting zero-dim (scalar) and zero-size
> > tensors,
> > > > > which
> > > > > > have been long-missing in MXNet.
> > > > > >
> > > > > > In our implementation, we have put the best efforts of keeping
> the
> > > > > promise
> > > > > > of backward compatibility in all the language bindings.
> > Nevertheless,
> > > > we
> > > > > > still would like to call out the changes explicitly that may
> impact
> > > > your
> > > > > > existing codebases developed on top of MXNet by calling C-APIs
> > > directly
> > > > > or
> > > > > > implementing operators in your own repos.
> > > > > >
> > > > > > 1. In you application, if you called any one of the following
> > > > > shape-related
> > > > > > C-APIs, 

Re: Implementing zero-dim and zero-size tensors in MXNet and its impact on your codebases

2019-04-11 Thread Anirudh Subramanian
I was under the impression that C API does fall under semver. Has this been
discussed somewhere before ? Is this also the case for C Predict API ?

On Thu, Apr 11, 2019, 8:08 AM Marco de Abreu 
wrote:

> In case only changes to the c-api are being made, it doesn't fall under our
> semantic versioning since that's not a user facing API and thus I'd be in
> favour as doing it as part of a minor release. If there is any behavioural
> change from a user perspective (a good indicator would be if tests have to
> be changed as reaction to the Backend changes), then I'd prefer a major
> release.
>
> I'd slightly prefer a minor release since this change touches quite a few
> parts and could risk being outdated/diverged as the time until 2.0
> progresses.
>
> -Marco
>
> Aaron Markham  schrieb am Do., 11. Apr. 2019,
> 16:28:
>
> > Just curious about when this kind of change will land. Would it wait for
> > 2.0 or would it be in 1.5 or another minor release?
> >
> > On Thu, Apr 11, 2019, 00:15 Junru Shao  wrote:
> >
> > > Really nice improvement over MXNet's usability! I suggest that we could
> > > make numpy-compatible behavior default in 2.0.
> > >
> > > On Wed, Apr 10, 2019 at 11:34 PM Jun Wu  wrote:
> > >
> > > > Dear Community,
> > > >
> > > > A while ago, we sent out an RFC
> > > >  discussing
> > the
> > > > initiative introducing NumPy compatibility into MXNet. As the first
> > > outcome
> > > > of this initiative, we submitted the PR
> > > >  providing the
> > > > infrastructure of supporting zero-dim (scalar) and zero-size tensors,
> > > which
> > > > have been long-missing in MXNet.
> > > >
> > > > In our implementation, we have put the best efforts of keeping the
> > > promise
> > > > of backward compatibility in all the language bindings. Nevertheless,
> > we
> > > > still would like to call out the changes explicitly that may impact
> > your
> > > > existing codebases developed on top of MXNet by calling C-APIs
> directly
> > > or
> > > > implementing operators in your own repos.
> > > >
> > > > 1. In you application, if you called any one of the following
> > > shape-related
> > > > C-APIs, you will need to change the data type of shape's ndim and
> > > dim_size
> > > > from *unsigned int* to signed *int*, because we have to use -1 to
> > > represent
> > > > unknown shape information, and reserve 0 for scalar and zero-size
> > > tensors.
> > > > One example of such changes can be seen in the cpp-package
> > > > <
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/pull/14661/files#diff-c0e1fcfe1619faa4ff5f59d94e8bR183
> > > > >
> > > > calling MXSymbolInferShape.
> > > > - MXSymbolInfershape
> > > > - MXSymbolInfershapePartial
> > > > - MXExecutorSimpleBind
> > > > - MXExecutorReshape
> > > > - MXNDArrayGetShape
> > > > - MXNDArrayCreaetFromSharedMem
> > > >
> > > > 2. If you have implemented operators in your own codebases, you will
> > > > probably need to change every operator's shape inference function to
> > use
> > > > the following util functions to check whether shape information is
> > known,
> > > > instead of checking against 0 directly. One example of such changes
> can
> > > be
> > > > seen in the shape inference function
> > > > <
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/pull/14661/files#diff-afa640c4653c59f00f43a84455f91ef9R35
> > > > >
> > > > of concat operator.
> > > > - shape_is_known (include/mxnet/tuple.h)
> > > > - ndim_is_known (include/mxnet/tuple.h)
> > > > - dim_size_is_known (include/mxnet/tuple.h)
> > > >
> > > > If you are interested in knowing the value of scalar tensors, and
> hence
> > > > understanding our motivation further, this thread
> > > > <
> https://discuss.mxnet.io/t/rank-0-arrays-in-mxnet-aka-pi-is-wrong/108
> > >
> > > of
> > > > discussion provides very good insights from the view of data science.
> > It
> > > > was actually related to an opportunity for MXNet becoming the backend
> > of
> > > > PyMC , but somehow it didn't go
> > > > through due to missing several key features
> > > > ,
> and
> > > > scalar tensors is one of them.
> > > >
> > > > Please leave comments in the PR
> > > >  if you have
> any
> > > > concerns or suggestions of our work.
> > > >
> > > > Thank you very much for your time and consideration.
> > > >
> > > > Best,
> > > > Jun
> > > >
> > > > *References*
> > > > [1] RFC of NumPy compatibility:
> > > > https://github.com/apache/incubator-mxnet/issues/14253
> > > > [2] Pull request of supporting scalar and zero-size tensors:
> > > > https://github.com/apache/incubator-mxnet/pull/14661
> > > > [3] The value of scalar tensors from the view of data science:
> > > >
> https://discuss.mxnet.io/t/rank-0-arrays-in-mxnet-aka-pi-is-wrong/108
> > > > 

[Announcement] New Committer - Alex Zai

2019-03-31 Thread Anirudh Subramanian
Hi all,

Please join me to welcome Alex Zai as a new committer of Apache
(incubating) MXNet!

Alex has been instrumental in brining MKLDNN from experimental to making it
default on MXNet master. This involved adding Python and C++ unit tests,
improving CI coverage for MKLDNN, testing MKLDNN on different platforms and
working on issues related to MKLDNN.

PRs:
https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93=is%3Apr+author%3Aazai91+

Issues:
https://github.com/apache/incubator-mxnet/issues?utf8=%E2%9C%93=is%3Aissue+involves%3Aazai91

Reviews:
https://github.com/apache/incubator-mxnet/pulls?page=1=is%3Apr+reviewed-by%3Aazai91=%E2%9C%93

Dev:
https://lists.apache.org/list.html?d...@mxnet.apache.org:lte=3y:azai91

Thanks,

Anirudh


[Announcement] New Committer - Patric Zhao

2019-03-14 Thread Anirudh Subramanian
Hi all,

Please join me to welcome Patric Zhao as a new committer of Apache
(incubating) MXNet!

Patric has put in great effort around MKLDNN integration into MXNet and has
been involved in features like quantization, graph fusion and fused RNN
operators for CPU.

Dev List activity:
https://lists.apache.org/list.html?d...@mxnet.apache.org:lte=3y:patric.zhao

Issues:
https://github.com/apache/incubator-mxnet/issues?utf8=%E2%9C%93=is%3Aissue+involves%3Apengzhao-intel+

PR Reviews:
https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93=is%3Apr+reviewed-by%3Apengzhao-intel

Proposals involved in:
https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN
https://cwiki.apache.org/confluence/display/MXNET/Fused+RNN+Operators+for+CPU



Thanks,
Anirudh


Re: [VOTE] Release Apache MXNet (incubating) version 1.4.0.rc2

2019-02-04 Thread Anirudh Subramanian
-0

Thanks Steffen for your release efforts !

Build from source works with make but fails with cmake for me.

 cd build && cmake VERBOSE=1 -DUSE_CUDA=ON -DUSE_CUDNN=ON -DUSE_OPENMP=ON
-DCMAKE_BUILD_TYPE=Debug -DUSE_DIST_KVSTORE=0 -DUSE_OPENCV=1 -GNinja .. &&
ninja -v

FAILED: : && /usr/bin/c++   -Wall -Wno-unknown-pragmas -fPIC -g -O0 -msse2
-std=c++11 -fopenmp -g  -pthread
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_lockfree.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_param.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_parser.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_array_view.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_any.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_config.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_threaditer.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_serializer.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_threaditer_exc_handling.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_inputsplit.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_logging.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_json.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_optional.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_main.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_env.cc.o
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_thread_group.cc.o
-o 3rdparty/dmlc-core/test/unittest/dmlc_unit_tests  -rdynamic
lib/libgtestd.a 3rdparty/dmlc-core/libdmlc.a -lpthread && :
3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_logging.cc.o:
In function `Logging_basics_Test::TestBody()':
/home/ubuntu/experimentals/1.4_release/build/../3rdparty/dmlc-core/test/unittest/unittest_logging.cc:19:
undefined reference to `testing::internal::DeathTest::Create(char const*,
testing::internal::RE const*, char const*, int,
testing::internal::DeathTest**)'
collect2: error: ld returned 1 exit status


Anirudh

On Mon, Feb 4, 2019 at 3:09 PM Haibin Lin  wrote:

> +1 built from source on Linux and passed dist sync kvstore test.
>
> On Mon, Feb 4, 2019 at 9:54 AM Lin Yuan  wrote:
>
> > +1 build from source on MacOS 10.13.6 and tested mxnet-to-coreml
> converter.
> >
> > On Mon, Feb 4, 2019 at 9:03 AM Indhu  wrote:
> >
> > > +1
> > >
> > > Build from source and tested few examples from the examples folder.
> > >
> > > Thanks,
> > > Indu
> > >
> > >
> > >
> > > On Fri, Feb 1, 2019 at 6:21 PM Steffen Rochel  >
> > > wrote:
> > >
> > > > Hi Sheng - thanks for the feedback.
> > > > TVM notice  file is missing as the 1.4.x branch/v1.4.0 release is
> using
> > > TVM
> > > > commit 0f053c8
> > > > <
> > > >
> > >
> >
> https://github.com/dmlc/tvm/commit/0f053c82a747b4dcdf49570ec87c17e0067b7439
> > > > >
> > > >  from Oct 8, 2018, which didn't have the NOTICE file. IMHO, MXNet
> > NOTICE
> > > > file is consistent with release content.
> > > > As the release started in 2018 I do think it is ok to move forward
> w/o
> > > > update to 2019 IMHO.
> > > >
> > > > All -
> > > > thanks to the committers/contributors (Tao, Aaron, Kellen, Aston,
> Yuxi)
> > > who
> > > > tested and provided feedback - we have five +1 votes.
> > > > As of today, Friday Feb 1st 2019 6pm PST we have two binding votes,
> one
> > > +1
> > > > (Carin), one +0 (Sheng). The vote continues be open waiting for
> > feedback
> > > > from PMC members.
> > > > Hope you can spare some time over the weekend to provide feedback.
> > > >
> > > > Regards,
> > > > Steffen
> > > >
> > > > On Fri, Feb 1, 2019 at 12:44 AM Marco de Abreu <
> > marco.g.ab...@gmail.com>
> > > > wrote:
> > > >
> > > > > Considering the release process has been started last year and the
> > code
> > > > tag
> > > > > has also been based on last year, I'd say that it is not really a
> big
> > > > deal.
> > > > >
> > > > > -Marco
> > > > >
> > > > > Am Fr., 1. Feb. 2019, 09:33 hat Sheng Zha 
> > > > > geschrieben:
> > > > >
> > > > > > I found an awesome checklist for incubator releases [1] so I'm
> > using
> > > it
> > > > > > here:
> > > > > >
> > > > > > -[Y] Are release files in correct location?
> > > > > > -[Y] Do release files have the word incubating in their name?
> > > > > > -[Y] Are the digital signature and hashes correct?
> > > > > > -[Y] Does DISCLAIMER file exist?
> > > > > > -[Y] Do LICENSE and NOTICE files exists?
> > > > > > -[N/A] Is the LICENSE and NOTICE text correct? (sz: did not
> finish
> > > > > > checking)
> > > > > > -[N] Is the NOTICE year correct?
> > > > > > -[N/A] Un-included software dependencies are not mentioned in
> > LICENSE
> > > > or
> > > > > > NOTICE? (sz: 

Re: [Question] UI change policy in MXNet

2018-12-20 Thread Anirudh Subramanian
On Thu, Dec 20, 2018, 1:56 PM Lin Yuan  Hi Anirudh,
>
> Thanks a lot for your clarifications! I have some followup
> questions/comments:
>
> 1) Which guideline should we follow when updating the UI in MXNet
> operators?
> A) MXNet follows semantic versioning, so breaking changes to operator
> interfaces can be introduced only in major versions.
>
> (Lin:) My question is what style of UI guide we should follow. e.g. naming
> convension, usage mode, etc. Something like numpy's style or tensorflow?
>
I don't think there is such an UI guide. If the operator already existed in
numpy/scipy or other frameworks we generally tend to use similar
interfaces.

>
> 2) Who should approve the UI change?
> A) Contributors who may have worked on the operator and/or other
> contributors/committers.
>
> (Lin:) Is it too local to reply on contributors to one/a few operators to
> decide the UI. How can we make sure the consistency of UI across all
> operators in MXNet?
>
agreed. Feel free to propose a better way.

>
> 3) In case of backward compatibility, should we favor breaking the backward
> compatibility and update the release notes or adding a newer version of the
> operator like ***_v2?
> A) If the operator interfaces are not compatible, its fine to create
> operator with the name "_v2" . In the next major version release, you can
> add an alias for newer implementation and deprecate the older one.
>
> (Lin) What if there is already "_v2", do we add "_v3", "_v4" as the project
> evolves?
>
This needs to be dealt on case by case basis. I haven't seen many ops which
would require three backward incompatible revisions between two major
releases.

>
> 4) Which operator should go to contrib and which be implemented as regular?
> A) I think this discussion may help:
> https://github.com/apache/incubator-mxnet/pull/5499 . To summarize:
> contrib
> was created for ops for which we provide limited guarantees with respect to
> backward compatibility, interface changes, testing etc.
>
> (Lin) This is definitely an informative discussion. It would be better if
> we can put this in a more noticeable place for developers.
>
>
> On Thu, Dec 20, 2018 at 1:39 PM Anirudh Subramanian  >
> wrote:
>
> > 1) Which guideline should we follow when updating the UI in MXNet
> > operators?
> > A) MXNet follows semantic versioning, so breaking changes to operator
> > interfaces can be introduced only in major versions.
> >
> > 2) Who should approve the UI change?
> > A) Contributors who may have worked on the operator and/or other
> > contributors/committers.
> >
> > 3) In case of backward compatibility, should we favor breaking the
> backward
> > compatibility and update the release notes or adding a newer version of
> the
> > operator like ***_v2?
> > A) If the operator interfaces are not compatible, its fine to create
> > operator with the name "_v2" . In the next major version release, you can
> > add an alias for newer implementation and deprecate the older one.
> >
> > 4) Which operator should go to contrib and which be implemented as
> regular?
> > A) I think this discussion may help:
> > https://github.com/apache/incubator-mxnet/pull/5499 . To summarize:
> > contrib
> > was created for ops for which we provide limited guarantees with respect
> to
> > backward compatibility, interface changes, testing etc.
> >
> > Anirudh
> >
> > On Thu, Dec 20, 2018 at 1:00 PM Lin Yuan  wrote:
> >
> > > Dear Community,
> > >
> > > As a contributor, I would like to know the current policy for updating
> UI
> > > of an operator. I understand UI change should be introduced in major
> > > release not minor release. However, it is still not quite clear to me
> > > regarding the UI change process:
> > >
> > > 1) Which guideline should we follow when updating the UI in MXNet
> > > operators?
> > > 2) Who should approve the UI change?
> > > 3) In case of backward compatibility, should we favor breaking the
> > backward
> > > compatibility and update the release notes or adding a newer version of
> > the
> > > operator like ***_v2?
> > > 4) Which operator should go to contrib and which be implemented as
> > regular?
> > >
> > > Any clarification is appreciated and it is helpful to guide PR
> reviewers
> > as
> > > well.
> > >
> > > Merry Christmas to ya'all!
> > >
> > > Lin
> > >
> >
>


Re: [Question] UI change policy in MXNet

2018-12-20 Thread Anirudh Subramanian
1) Which guideline should we follow when updating the UI in MXNet operators?
A) MXNet follows semantic versioning, so breaking changes to operator
interfaces can be introduced only in major versions.

2) Who should approve the UI change?
A) Contributors who may have worked on the operator and/or other
contributors/committers.

3) In case of backward compatibility, should we favor breaking the backward
compatibility and update the release notes or adding a newer version of the
operator like ***_v2?
A) If the operator interfaces are not compatible, its fine to create
operator with the name "_v2" . In the next major version release, you can
add an alias for newer implementation and deprecate the older one.

4) Which operator should go to contrib and which be implemented as regular?
A) I think this discussion may help:
https://github.com/apache/incubator-mxnet/pull/5499 . To summarize: contrib
was created for ops for which we provide limited guarantees with respect to
backward compatibility, interface changes, testing etc.

Anirudh

On Thu, Dec 20, 2018 at 1:00 PM Lin Yuan  wrote:

> Dear Community,
>
> As a contributor, I would like to know the current policy for updating UI
> of an operator. I understand UI change should be introduced in major
> release not minor release. However, it is still not quite clear to me
> regarding the UI change process:
>
> 1) Which guideline should we follow when updating the UI in MXNet
> operators?
> 2) Who should approve the UI change?
> 3) In case of backward compatibility, should we favor breaking the backward
> compatibility and update the release notes or adding a newer version of the
> operator like ***_v2?
> 4) Which operator should go to contrib and which be implemented as regular?
>
> Any clarification is appreciated and it is helpful to guide PR reviewers as
> well.
>
> Merry Christmas to ya'all!
>
> Lin
>


Re: v1.4.0 status 11/29

2018-12-03 Thread Anirudh Subramanian
Hi Steffen,

I have created a PR to cherry pick the change to v1.4.x branch:
https://github.com/apache/incubator-mxnet/pull/13517

Anirudh

On Mon, Dec 3, 2018 at 11:29 AM Steffen Rochel 
wrote:

> Thanks Haibin. Anirudh - please add PR for v1.4.x for
> https://github.com/apache/incubator-mxnet/pull/13501
> Steffen
>
> On Mon, Dec 3, 2018 at 10:55 AM Haibin Lin 
> wrote:
>
> > It would also be great to include the PR that reverts a commit causing
> cpu
> > performance degradation
> > https://github.com/apache/incubator-mxnet/pull/13501,
> > where num_omp_threads decrease to 1 when multiple GPUs are used, as
> Anirudh
> > reported in
> >
> >
> https://github.com/apache/incubator-mxnet/issues/13449#issuecomment-443388522
> > <
> >
> https://github.com/apache/incubator-mxnet/issues/13449#issuecomment-443388522
> > >
> >
> > Best,
> > Haibin
> >
> > On Mon, Dec 3, 2018 at 10:50 AM Afrooze, Sina 
> wrote:
> >
> > > I would also like this PR which is already merged with master (
> > > https://github.com/apache/incubator-mxnet/pull/13426) to be included
> in
> > > 1.4.0 to avoid any potential ONNX export issues in cases where the API
> is
> > > not used strictly correctly. - Sina
> > >
> > >
> > >
> > > On 11/30/18, 2:17 PM, "Alex Zai"  wrote:
> > >
> > > PR is here https://github.com/apache/incubator-mxnet/pull/13497.
> > >
> > > On Thu, Nov 29, 2018 at 8:56 PM Lv, Tao A 
> > wrote:
> > >
> > > > Credit belongs to Alex.
> > > >
> > > > Hi Alex, would you mind porting your fix to the v1.4.x branch?
> > > >
> > > > Thanks,
> > > > -Tao
> > > >
> > > > -Original Message-
> > > > From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> > > > Sent: Friday, November 30, 2018 12:48 PM
> > > > To: dev@mxnet.incubator.apache.org
> > > > Subject: Re: v1.4.0 status 11/29
> > > >
> > > > Hi Tao - thanks for fixing the crash. Please create PR on v1.4.x
> > > branch
> > > > with [v1.4.x] in title and add me to the PR.
> > > > Steffen
> > > >
> > > > On Thu, Nov 29, 2018 at 8:44 PM Lv, Tao A 
> > > wrote:
> > > >
> > > > > Hi Steffen, I would like to have
> > > > > https://github.com/apache/incubator-mxnet/pull/13433  into the
> > > coming
> > > > > 1.4.0 release. It fixed a crash of deconvolution with certain
> > input
> > > > > size for MKL-DNN backend. This PR is well reviewed and already
> > > merged
> > > > > into the master branch. New test case is also included there.
> > > > >
> > > > > Please find the corresponding issue here:
> > > > > https://github.com/apache/incubator-mxnet/issues/13421 .
> > > > >
> > > > > Thanks,
> > > > > -Tao
> > > > >
> > > > > -Original Message-
> > > > > From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> > > > > Sent: Friday, November 30, 2018 12:05 PM
> > > > > To: dev@mxnet.incubator.apache.org
> > > > > Subject: v1.4.0 status 11/29
> > > > >
> > > > > Dear MXNet community -
> > > > > I would like to provide update on v1.4.0 status, details will
> be
> > > > > tracked here <
> > > > >
> > > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incu
> > > > > bating%29+1.4.0+Release+Plan+and+Status
> > > > > >
> > > > > .
> > > > >
> > > > > 1. Sergey created v1.4.x branch
> > > > > 2. As expected, additional requests have been made for
> inclusion
> > in
> > > > > v1.4.0 release. Critical PR are tracked here <
> > > > >
> > > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incu
> > > > >
> > > bating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0Re
> > > > > leasePlanandStatus-OpenPRstotrack
> > > > > >
> > > > > .
> > > > > 3. PR to update README.md is blocked by flaky test failures,
> > > > > retriggered check.
> > > > > 4. PR to upgrade version on master to v1.5.0 has been
> submitted.
> > > > > 5. CI is setup and first run passed.
> > > > >
> > > > > Note: if you want to add selected fixes or enhancements, please
> > > reply
> > > > > to this email. Please provide justification, add me as approver
> > to
> > > the
> > > > > v1.4.x PR and make sure your changes have tests included in PR
> > and
> > > get
> > > > > properly reviewed.
> > > > >
> > > > > Regards,
> > > > > Steffen
> > > > >
> > > >
> > >
> > >
> > >
> > >
> >
>


Re: Adding AMD CPU to CI

2018-11-29 Thread Anirudh Subramanian
Instruction set extensions support like AVX2, AVX512 etc. can vary between
AMD and Intel and there can also be a time lag between when Intel supports
it versus when AMD supports it.
Also, in the future this setup may be useful in case MXNet supports AMD
GPUs and AWS also happens to have support for it.

Anirudh


On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
 wrote:

> I think it's worth a discussion to do a sanity check. While generally these
> instructions are standardized, we also made the experience with ARM that
> the theory and reality sometimes don't match. Thus, it's always good to
> check.
>
> In the next months we are going to refactor our slave creation processes.
> Chance Bair has been working on rewriting Windows slaves from scratch (we
> used images that haven't really been updated for 2 years - we still don't
> know what was done on them) and they're ready soon. In the following
> months, we will also port our Ubuntu slaves to the new method (don't have a
> timeline yet). Ideally, the integration of AMD instances will only be a
> matter of running the same pipeline on a different instance type. In that
> Case, it should not be a big deal.
>
> If there are big differences, that's already a yellow flag for
> compatibility, but that's unlikely. But in that case, we would have to make
> a more thorough time analysis and whether it's worth the effort. Maybe,
> somebody else could also lend us a hand and help us with adding AMD
> support.
>
> -Marco
>
> Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin 
> geschrieben:
>
> > f16c is also an instruction set supported by both brands' recent CPUs
> just
> > like x86, AVX, SSE etc., and any difference in behaviors (quite
> impossible
> > to happen or it will be a major defect) would most likely be caused by
> the
> > underlying hardware implementation, so still, adding AMD instances is not
> > adding much value here.
> > Hao
> >
> > On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
> > kellen.sunderl...@gmail.com> wrote:
> >
> > > Just looked at the mf16c work and wanted to mention Rahul clearly _was_
> > > thinking about AMD users in that PR.
> > >
> > > On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
> > > kellen.sunderl...@gmail.com> wrote:
> > >
> > > > From my perspective we're developing a few features like mf16c and
> > MKLDNN
> > > > integration specifically for Intel CPUs.  It wouldn't hurt to make
> sure
> > > > those changes also run properly on AMD cpus.
> > > >
> > > > On Thu, Nov 29, 2018, 3:38 PM Hao Jin  > > >
> > > >> I'm a bit confused about why we need extra functionality tests just
> > for
> > > >> AMD
> > > >> CPUs, aren't AMD CPUs supporting roughly the same instruction sets
> as
> > > the
> > > >> Intel ones? In the very impossible case that something working on
> > Intel
> > > >> CPUs being not functioning on AMD CPUs (or vice versa), it would
> > mostly
> > > >> likely be related to the underlying hardware implementation of the
> > same
> > > >> ISA, to which we definitely do not have a good solution. So I don't
> > > think
> > > >> performing extra tests on functional aspect of the system on AMD
> CPUs
> > is
> > > >> adding any values.
> > > >> Hao
> > > >>
> > > >> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu
>  > >
> > > >> wrote:
> > > >>
> > > >> > +1
> > > >> >
> > > >> > On 11/29/18, 2:39 PM, "Alex Zai"  wrote:
> > > >> >
> > > >> > What are people's thoughts on having AMD machines tested on
> the
> > > CI?
> > > >> AMD
> > > >> > machines are now available on AWS.
> > > >> >
> > > >> > Best,
> > > >> > Alex
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > > >
> > >
> >
>


Re: Adding AMD CPU to CI

2018-11-29 Thread Anirudh Subramanian
+1

On Thu, Nov 29, 2018 at 2:38 PM Alex Zai  wrote:

> What are people's thoughts on having AMD machines tested on the CI? AMD
> machines are now available on AWS.
>
> Best,
> Alex
>


Re: Include MKLDNN into default mxnet pip package

2018-11-27 Thread Anirudh Subramanian
Hi Tao,

I was suggesting we can start using a release tag from mkldnn for major and
minor releases of mxnet starting with 1.4.0. But this would require a
versioning mechanism similar to semver for MKLDNN and  MKLDNN to do patch
release to backport the bug fixes/regressions. I dont know if this is going
to happen anytime soon (It would be nice if you can obtain some timeline
from MKLDNN team on this). As long as the PIP still has two different
packages for mkl and without mkl my vote is +1 for adding it as a default.

Anirudh


On Tue, Nov 27, 2018 at 5:04 AM Lv, Tao A  wrote:

> Hi Anirudh,
>
> Just to confirm, you're focusing on the 1.4.0 release of MXNet and want to
> have a release version of MKL-DNN there, right? Or do you mean all the
> development in the future should base on the release version of MKL-DNN?
> For the former one, I think 0.17 release of MKL-DNN is a good choice. But
> it will not have fix for the LSTM regression mentioned in previous email.
>
> I'm talking about the versioning mechanism with MKL-DNN maintainers and
> will be back to you if I get any response. But from the releasing history
> of MKL-DNN, I cannot find any evidence about patch release.
>
> -tao
>
> -Original Message-
> From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> Sent: Tuesday, November 27, 2018 6:16 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Include MKLDNN into default mxnet pip package
>
> Hi Tao,
>
> I agree with Steffen that we can start with a stable release for MKLDNN
> for 1.4.0. For your suggestion on using 0.17, can you provide info on what
> versioning mechanism MKLDNN uses. Once a MKLDNN release is out and there
> are some regressions found like the LSTM regression, would it be possible
> to do a patch release for it or maintain a release branch for it ?
>
> Anirudh
>
> On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A  wrote:
>
> > Hi Steffen,
> >
> > I think all the commits on MKL-DNN master branch are well tested for
> > MKL-DNN development team. If we really want to have a release commit
> > in the coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release.
> >
> > Thank you,
> > Tao
> >
> > Sent from my iPhone
> >
> > > On Nov 26, 2018, at 8:09 AM, Steffen Rochel
> > > 
> > wrote:
> > >
> > > +1 to make MKL-DNN default.
> > > I'm tracking  https://github.com/apache/incubator-mxnet/issues/13369
> > > as open issue to be addressed for 1.4.0 I do agree that we should
> > > move to a model to include released
> > dependencies
> > > instead of just taking bleeding edge snapshots.
> > > However, speed of development is important as well.
> > > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN
> > development
> > > team provide us with a well tested tag/commit id to include in 1.4.0
> > > release?
> > > Steffen
> > >
> > >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A 
> wrote:
> > >>
> > >> Thanks for the information, Kellen and Naveen.
> > >>
> > >> Better than onnx-tensorrt, MKL-DNN has already provided versioning
> > >> and release tags. My concern is that as MKL-DNN is still under
> > >> intensive development, if it has a new feature or bug fix on its
> > >> master branch,
> > do we
> > >> really want to wait for next release to get it supported in MXNet?
> > >>
> > >> Take the LSTM regression as an example, probably MKL-DNN will give
> > >> a fix or improvement on its master branch soon, do we need to wait
> > >> for 0.18 release to get it fixed for mxnet user? AFAIK, tensorflow
> > >> is also using normal commit id, not release, as the dependency for
> MKL-DNN.
> > >>
> > >> Regarding the LSTM regression, we are using internal JIRA tickets
> > >> rather than github issues to track the defects of MKL-DNN. But I
> > >> agree with
> > you,
> > >> we need update the progress of it in Alex's issue.
> > >>
> > >> Thanks,
> > >> -tao
> > >>
> > >> -Original Message-
> > >> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> > >> Sent: Thursday, November 22, 2018 10:55 AM
> > >> To: dev@mxnet.incubator.apache.org
> > >> Subject: Re: Include MKLDNN into default mxnet pip package
> > >>
> > >> Agree with your point about other repos also not being based on
> > versioning
> > >> Tao.  I would point out that I've given some that I've worked with
> > similar

Re: Include MKLDNN into default mxnet pip package

2018-11-26 Thread Anirudh Subramanian
Hi Tao,

I agree with Steffen that we can start with a stable release for MKLDNN for
1.4.0. For your suggestion on using 0.17, can you provide info on what
versioning mechanism MKLDNN uses. Once a MKLDNN release is out and there
are some regressions found like the LSTM regression, would it be possible
to do a patch release for it or maintain a release branch for it ?

Anirudh

On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A  wrote:

> Hi Steffen,
>
> I think all the commits on MKL-DNN master branch are well tested for
> MKL-DNN development team. If we really want to have a release commit in the
> coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release.
>
> Thank you,
> Tao
>
> Sent from my iPhone
>
> > On Nov 26, 2018, at 8:09 AM, Steffen Rochel 
> wrote:
> >
> > +1 to make MKL-DNN default.
> > I'm tracking  https://github.com/apache/incubator-mxnet/issues/13369 as
> > open issue to be addressed for 1.4.0
> > I do agree that we should move to a model to include released
> dependencies
> > instead of just taking bleeding edge snapshots.
> > However, speed of development is important as well.
> > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN
> development
> > team provide us with a well tested tag/commit id to include in 1.4.0
> > release?
> > Steffen
> >
> >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A  wrote:
> >>
> >> Thanks for the information, Kellen and Naveen.
> >>
> >> Better than onnx-tensorrt, MKL-DNN has already provided versioning and
> >> release tags. My concern is that as MKL-DNN is still under intensive
> >> development, if it has a new feature or bug fix on its master branch,
> do we
> >> really want to wait for next release to get it supported in MXNet?
> >>
> >> Take the LSTM regression as an example, probably MKL-DNN will give a fix
> >> or improvement on its master branch soon, do we need to wait for 0.18
> >> release to get it fixed for mxnet user? AFAIK, tensorflow is also using
> >> normal commit id, not release, as the dependency for MKL-DNN.
> >>
> >> Regarding the LSTM regression, we are using internal JIRA tickets rather
> >> than github issues to track the defects of MKL-DNN. But I agree with
> you,
> >> we need update the progress of it in Alex's issue.
> >>
> >> Thanks,
> >> -tao
> >>
> >> -Original Message-
> >> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> >> Sent: Thursday, November 22, 2018 10:55 AM
> >> To: dev@mxnet.incubator.apache.org
> >> Subject: Re: Include MKLDNN into default mxnet pip package
> >>
> >> Agree with your point about other repos also not being based on
> versioning
> >> Tao.  I would point out that I've given some that I've worked with
> similar
> >> feedback: https://github.com/onnx/onnx-tensorrt/issues/68
> >>
> >>> On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy 
> wrote:
> >>>
> >>> Tao,
> >>>
> >>> You are right there are many submodules in 3rd party. We have to start
> >>> somewhere and I believe this one is a good candidate to start with.
> >>> This is not to cater to release of MXNet or to tie them with the
> >>> releases of the submodules but instead to pick only stable releases
> >>> and not to pick up bleeding edge commits from the tip of the master,
> >>> this gives us confidence in the submodule that MXNet users are
> >>> depending on that especially if we make MKLDNN the default.
> >>>
> >>> Good to know it is known already as a regression.Alex has created this
> >>> issue https://github.com/apache/incubator-mxnet/issues/13369, please
> >>> add details and link the corresponding issue in MKLDNN(I couldn't
> find).
> >>>
> >>> -Naveen
> >>>
>  On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A  wrote:
> 
>  Here are my answers for the questions from Kellen and Naveen about
>  MKL-DNN. It doesn't mean that I'm supportive for making MKL-DNN
>  default here.
> 
>  @Kellen,
> 
>  FYI, here is a list for those platforms which are officially
>  supported by MKL-DNN.
>  https://github.com/intel/mkl-dnn#system-requirements
> 
>  Most of computation intensive kernels in MKL-DNN are JITed. So they
>  are supposed to generate code according to the platform during
>  runtime. For non-JIT code in MKL-DNN, same as other code in MXNet,
>  it will generate instructions according to the options/flags of
>  compiler. We can set -DARCH_OPT_FLAGS when build MKL-DNN to avoid
>  optimization for compiling machine. That's exactly what we are doing
> >> for MKL-DNN build in MXNet.
> >>> Even
>  without MKL-DNN, I noticed there were issues about illegal
>  instructions
> >>> of
>  MXNet when users import the pip package on a lower end machine which
>  probably only supports SSE.
> 
>  @Naveen,
> 
>  The LSTM issue has already been identified as a regression from the
> >>> recent
>  version of MKL-DNN. Hopefully it will be fixed soon with a new
>  update of MKL-DNN.
> 
>  MXNet has many submodule dependencies under 

Re: CI impaired

2018-11-21 Thread Anirudh Subramanian
Thanks for the quick response and mitigation!

On Wed, Nov 21, 2018 at 3:55 PM Marco de Abreu
 wrote:

> Hello,
>
> today, CI had some issues and I had to cancel all jobs a few minutes ago.
> This was basically caused by the high load that is currently being put on
> our CI system due to the pre-release efforts for this Friday.
>
> It's really unfortunate that we just had outages of three core components
> within the last two days - sorry about that!. To recap, we had the
> following outages (which are unrelated to the parallel refactor of the
> Jenkins pipeline):
> - (yesterday evening) The Jenkins master ran out of disk space and thus
> processed requests at reduced capacity
> - (this morning) The Jenkins master got updated which broke our
> autoscalings upscaling capabilities.
> - (new, this evening) Jenkins API was irresponsive: Due to the high number
> of jobs and a bad API design in the Jenkins REST API, the time-complexity
> of a simple create or delete request was quadratic which resulted in all
> requests timing out (that was the current outage). This resulted in our
> auto scaling to be unable to interface with the Jenkins master.
>
> I have now made improvements to our REST API calls which reduced the
> complexity from O(N^2) to O(1). The reason was an underlying redirect loop
> in the Jenkins createNode and deleteNode REST API in combination with
> unrolling the entire slave and job graph (which got quite huge during
> extensive load) upon every single request. Since we had about 150
> registered slaves and 1000 jobs in the queue, the duration for a single
> REST API call rose to up to 45 seconds (we execute up to a few hundred
> queries per auto scaling loop). This lead to our auto scaling timing out.
>
> Everything should be back to normal now. I'm closely observing the
> situation and I'll let you know if I encounter any additional issues.
>
> Again, sorry for any caused inconveniences.
>
> Best regards,
> Marco
>
> On Wed, Nov 21, 2018 at 5:10 PM Gavin M Bell 
> wrote:
>
> > Yes, let me add to the kudos, very nice work Marco.
> >
> >
> > "I'm trying real hard to be the shepherd." -Jules Winnfield
> >
> >
> > > On Nov 21, 2018, at 5:04 PM, Sunderland, Kellen
> >  wrote:
> > >
> > > Appreciate the big effort in bring the CI back so quickly.  Thanks
> Marco.
> > >
> > > On Nov 21, 2018 5:52 AM, Marco de Abreu  .INVALID>
> > wrote:
> > > Thanks Aaron! Just for the record, the new Jenkins jobs were unrelated
> to
> > > that incident.
> > >
> > > If somebody is interested in the details around the outage:
> > >
> > > Due to a required maintenance (disk running full), we had to upgrade
> our
> > > Jenkins master because it was running on Ubuntu 17.04 (for an unknown
> > > reason, it used to be 16.04) and we needed to install some packages.
> > Since
> > > the support for Ubuntu 17.04 was stopped, this resulted in all package
> > > updates and installations to fail because the repositories were taken
> > > offline. Due to the unavailable maintenance package and other issues
> with
> > > the installed OpenJDK8 version, we made the decision to upgrade the
> > Jenkins
> > > master to Ubuntu 18.04 LTS in order to get back to a supported version
> > with
> > > maintenance tools. During this upgrade, Jenkins was automatically
> updated
> > > by APT as part of the dist-upgrade process.
> > >
> > > In the latest version of Jenkins, some labels have been changed which
> we
> > > depend on for our auto scaling. To be more specific:
> > >> Waiting for next available executor on mxnetlinux-gpu
> > > has been changed to
> > >> Waiting for next available executor on ‘mxnetlinux-gpu’
> > > Notice the quote characters.
> > >
> > > Jenkins does not offer a better way than to parse these messages
> > > unfortunately - there's no standardized way to express queue items.
> Since
> > > our parser expected the above message without quote signs, this message
> > was
> > > discarded.
> > >
> > > We support various queue reasons (5 of them to be exact) that indicate
> > > resource starvation. If we run super low on capacity, the queue reason
> is
> > > different and we would still be able to scale up, but most of the cases
> > > would have printed the unsupported message. This resulted in reduced
> > > capacity (to be specific, the limit during that time was 1 slave per
> > type).
> > >
> > > We have now fixed our autoscaling to automatically strip these
> characters
> > > and added that message to our test suite.
> > >
> > > Best regards,
> > > Marco
> > >
> > > On Wed, Nov 21, 2018 at 2:49 PM Aaron Markham <
> aaron.s.mark...@gmail.com
> > >
> > > wrote:
> > >
> > >> Marco, thanks for your hard work on this. I'm super excited about the
> > new
> > >> Jenkins jobs. This is going to be very helpful and improve sanity for
> > our
> > >> PRs and ourselves!
> > >>
> > >> Cheers,
> > >> Aaron
> > >>
> > >> On Wed, Nov 21, 2018, 05:37 Marco de Abreu
> > >>  > >>
> > >>> Hello,
> > >>>
> > >>> the CI is now back up and running. 

[ANNOUNCE] Apache MXNet (incubating) 1.2.1 Release

2018-07-20 Thread Anirudh Subramanian
Hello all,

The Apache MXNet (incubating) Community announces the availability of
Apache MXNet (incubating) 1.2.1!


Apache MXNet (incubating) is a deep learning framework designed for
both efficiency and flexibility. It allows you to mix symbolic and
imperative programming to maximize efficiency and productivity.

This release contains bug fixes and performance improvements.

A full list of the changes in this release can be found in the release
notes:
https://github.com/apache/incubator-mxnet/releases/tag/1.2.1


A Link to the Download is here:
https://www.apache.org/dyn/closer.cgi/incubator/mxnet/1.2.1


If you prefer to build from source and experiment with various
compile-time configuration options, use this link to get the
instructions:
http://mxnet.incubator.apache.org/install/index.html

Or You can download and play with MXNet easily using one of the options
below:
   1. The Pip package can be found here: https://pypi.python.org/pypi/mxnet
   2. The Docker Images can be found here:
https://hub.docker.com/r/mxnet/python/

Links to published scala packages in Maven:
https://mvnrepository.com/search?q=org.apache.mxnet
https://repository.apache.org/content/repositories/releases/org/apache/mxnet/


The release tag used for the 1.2.1 release is:
https://github.com/apache/incubator-mxnet/tree/1.2.1

Some more MXNet Resources:
   1. Issues: https://github.com/apache/incubator-mxnet/issues
   2. Wiki: https://cwiki.apache.org/confluence/display/MXNET
   3. Twitter: @ApacheMXNet

   4. YouTube: Apachemxnet channel

   5. Medium: https://medium.com/apache-mxnet

   6. Reddit: /r/mxnet




If you want to learn more about MXNet visit
http://mxnet.incubator.apache.org/

Finally, you are welcome to join and also invite your friends to the
dynamic and growing MXNet community by subscribing to
dev@mxnet.incubator.apache.org

Thanks!

Apache MXNet (incubating) Team
___


DISCLAIMER:

Apache MXNet (incubating) is an effort undergoing incubation at The

Apache Software Foundation (ASF), sponsored by the name of Apache

Incubator PMC. Incubation is required of all newly accepted

projects until a further review indicates that the

infrastructure, communications, and decision making process have

stabilized in a manner consistent with other successful ASF

projects. While incubation status is not necessarily a reflection

of the completeness or stability of the code, it does indicate

that the project has yet to be fully endorsed by the ASF.