Re: warnings as errors

2019-05-22 Thread Pedro Larroy
I was not able to fix the warnings on mshadow type switch with unused
local typedefs, that's one example of warning that I would disable. I
couldn't find a way to solve that one and I think the ramifications of
an unused typedef are not likely to cause bugs in the code and are
more of a pedantic nature.

https://github.com/apache/incubator-mxnet/pull/13424

I think turning on them one by one is going to pollute the compilation
output unecesarily and even run into commandline length problems?  I
think is best to enable all warnings and errors and cherry pick the
ones we can't fix or won't fix on purpose.

In this other case, I managed to tighten the warnings but ASAN is
giving some problems:

https://github.com/apache/incubator-mxnet/pull/14850

I think having warning fixes reviewed and merged faster without
triggering additional refactorings could make this process easier,
also having some help in this area and contributions would be greatly
appreciated.

Pedro.

On Tue, May 21, 2019 at 3:49 PM Sheng Zha  wrote:
>
> It would be great to enforce the check for warnings and treat as errors. Some 
> questions I have:
> - what are the warnings that you think should be ignored?
> - for the rest of the warning types, can we turn them on one by one?
>
> -sz
>
> On 2019/05/21 22:33:51, Pedro Larroy  wrote:
> > Hi dev@
> >
> > I try to fix any warning that I see during compilation of MXNet in my
> > platform and with the build toggles that I care about. These seemingly
> > trivial and ungrateful efforts, take nonetheless energy on the
> > contributor side.
> >
> > I think overall I submitted myself more than a dozen of PRs fixing
> > warnings and I would like to call for additional help and
> > contributions in this area.
> >
> > There was a question from Lin about discussing this on the mailing
> > list, I have the feeling that everybody agrees on moving towards zero
> > warnings and warnings as errors. I think there are unavoidable
> > warnings that can be disabled specifically such as the one triggered
> > by mshadow type switch.
> >
> > Some important missing warnings such as warning on missing return
> > values (ie. forgetting to return on a function returning non-void)
> > cause bugs, danger and additional time spent bugfixing, which can be
> > better spent somewhere else.
> >
> > Is there a process that we can figure out such as a more expedited
> > merges of PRs fixing warnings or a specific label?
> >
> > Some simple PRs that fixes a warning can take long to merge, and
> > sometimes trigger too much discussion and make the progress a bit
> > unfriendly to contributors.
> >
> > Any help or constructive ideas on this topic would be appreciated.
> >
> > Pedro.
> >


Re: [Discussion] Remove bundled llvm OpenMP

2019-05-22 Thread Pedro Larroy
Thanks Aaron and Anton!   Can we rebase to update the PR?  Let me know
how can I help further if you find some problems.

On Wed, May 22, 2019 at 6:49 AM Aaron Markham  wrote:
>
> I reopened it for you.
>
> On Wed, May 22, 2019, 05:25 Anton Chernov  wrote:
>
> > I don't have necessary rights to reopen this PR.
> >
> > пн, 20 мая 2019 г. в 08:00, Pedro Larroy :
> >
> > > Hi Anton, Stas.
> > >
> > > Can we reopen this PR and get it merged as per the data collected by
> > Stas?
> > >
> > > https://github.com/apache/incubator-mxnet/pull/12160
> > >
> > >
> > >
> > https://cwiki.apache.org/confluence/display/MXNET/Benchmarking+MXNet+with+different+OpenMP+implementations
> > >
> > > There are multiple issues that will be fixed by solving this problem.
> > >
> > >
> > > Pedro
> > >
> > > On Tue, Feb 12, 2019 at 4:54 AM Anton Chernov 
> > wrote:
> > > >
> > > > I would like to propose a possible alternative solution for
> > > consideration.
> > > >
> > > > If keeping llvm OpenMP as a submodule is inevitable one could make
> > > > following adjustments:
> > > >
> > > > Since compilers try to find their own OpenMP library implicitly, MXNet
> > > > needs to ensure that only the bundled version is found. Therefore
> > during
> > > > the build and also during deployment this library has to provide
> > symlinks
> > > > for each possible compiler that would link to the built artifact ie.
> > > >
> > > > libiomp.so -> libgomp.so -> libomp.so
> > > >
> > > > The MKLML iomp would need to be hidden and removed as well.
> > > >
> > > > On Windows it would be a different story, but as can be seen [1]
> > bundled
> > > > OpenMP was not included in the Windows build anyway.
> > > >
> > > > Alternatively: always use iomp (with same symlinking trick though)
> > > provided
> > > > by MKLML distribution [2]. This potentially could work on Windows as
> > > well.
> > > >
> > > > Best
> > > > Anton
> > > >
> > > > [1]
> > > >
> > >
> > https://github.com/apache/incubator-mxnet/blob/8a63bdecf2d9f12d34fe5874957ae4c867eb5f5b/CMakeLists.txt#L408-L410
> > > > [2] https://github.com/intel/mkl-dnn/releases
> > > >
> > > > вт, 12 февр. 2019 г. в 11:22, Anton Chernov :
> > > >
> > > > > Recent benchmarking results have been published here [1]. Experiments
> > > > > compare different OpenMP implementations as well as binaries compiled
> > > with
> > > > > different compilers including GCC, Clang and ICC.
> > > > >
> > > > > During experimentation another issues with mixing up libraries was
> > > > > identified and described here [2].
> > > > >
> > > > > Best
> > > > > Anton
> > > > >
> > > > > [1] https://cwiki.apache.org/confluence/x/2wclBg
> > > > > [2]
> > > > >
> > >
> > https://github.com/apache/incubator-mxnet/issues/14087#issuecomment-461734041
> > > > >
> > > > >
> > > > > вс, 9 дек. 2018 г. в 16:28, Anton Chernov :
> > > > >
> > > > >> Hi Chris,
> > > > >>
> > > > >> Following up on the issue, are all things resolved in the
> > discussion?
> > > > >>
> > > > >> If yes, I kindly ask you to reopen this PR and remove ‘requesting
> > > > >> changes’ status:
> > > > >> https://github.com/apache/incubator-mxnet/pull/12160
> > > > >>
> > > > >> Thank you.
> > > > >>
> > > > >>
> > > > >> Best
> > > > >> Anton
> > > > >>
> > > > >>
> > > > >> вт, 27 нояб. 2018 г. в 17:15, Anton Chernov :
> > > > >>
> > > > >>> Another thing to take into consideration:
> > > > >>>
> > > > >>> All python artefacts that are created (PyPi) are built with make
> > and
> > > are
> > > > >>> not using the bundled OpenMP library.
> > > > >>>
> > > > >>> One step for the switch to CMake to happen is the approval 

Re: Report of MXNet NumPy Project Status

2019-05-22 Thread Pedro Larroy
Thanks, that's a nice summary. Great job and good to know the
progress. I think we can do some exciting stuff in terms of parsing
the Python AST and converting to a computational graph. Maybe we could
brainstorm on that further on the linked ticket.

On Wed, May 22, 2019 at 12:12 AM Jun Wu  wrote:
>
> Dear Community,
>
> A few months ago, we submitted this RFC
>  proposing
> introducing NumPy-compatible coding experience into MXNet. As it has been
> some time since the proposal, we would like to share the progress with the
> community and listen to feedbacks and suggestions to enhance technical
> implementation as well as the way the project is operated.
>
> We set our first milestone by tackling the problem of MXNet not supporting
> scalar and zero-size tensors. Last month, we submitted the PR
>  providing the
> infrastructure to support those two types of tensors in MXNet. This work
> has affected almost every file and all language bindings in MXNet codebase.
> It would be impossible to provide a complete solution hadn't there any
> contributions from many MXNet developers across different organizations.
>
> With the infrastructure of supporting scalar and zero-size tensors, we are
> currently working on implementing NumPy operators in MXNet. We created a
> list of operators 
> to be implemented from the D2L book , and hope that we
> will be able to provide full NumPy operator coverage for the book by the
> end of next month.
>
> In the future, we plan to provide NumPy operator support for GluonCV
>  and GluonNLP
> . We also intend to explore the
> opportunities of extending our work to support the libraries that heavily
> depend on NumPy, not only from the deep learning world, but also a broader
> data science community, where the techniques employed by deep learning,
> such as auto differentiation, symbolic programming, GPU computing, and so
> forth can be beneficial.
>
> Thank you very much for your time to read this email and care about our
> efforts on making MXNet a super user-friendly deep learning framework. We
> look forward to your comments, suggestions and contributions for this
> project.
>
> Best,
> Developers of MXNet NumPy Project
>
> References
> [1] Development branch: https://github.com/apache/incubator-mxnet/tree/numpy
> [2] PR for supporting scalar and zero-size tensors:
> https://github.com/apache/incubator-mxnet/pull/14661
> [3] First batch of NumPy operators to be implemented:
> https://github.com/apache/incubator-mxnet/issues/14327
> [4] The D2L book: https://github.com/d2l-ai/d2l-en
> [5] GluonCV: https://github.com/dmlc/gluon-cv
> [6] GluonNLP: https://github.com/dmlc/gluon-nlp


Re: New PMC member: Dick Carter

2019-05-21 Thread Pedro Larroy
Finally! Welcome!

On Tue, May 21, 2019 at 6:28 PM Steffen Rochel  wrote:
>
> Congratulation Dick!
>
> On Tue, May 21, 2019 at 2:43 PM Carin Meier  wrote:
>
> > Congrats and welcome!
> >
> > On Tue, May 21, 2019 at 4:37 PM Marco de Abreu 
> > wrote:
> >
> > > The Project Management Committee (PMC) for Apache MXNet
> > > has invited Dick Carter to become a PMC member and we are pleased
> > > to announce that he has accepted.
> > >
> > > Dick has been a great help over the past years to make MXNet as
> > > efficient and easy-to-use on GPU as possible, reduce technical debt,
> > > improve our testing experience around flaky tests and providing senior
> > > guidance within the project.
> > >
> > > Being a committer enables easier contribution to the
> > > project since there is no need to go via the patch
> > > submission process. This should enable better productivity.
> > > Being a PMC member enables assistance with the management
> > > and to guide the direction of the project.
> > >
> > > Best regards,
> > > Marco de Abreu
> > >
> >


Re: [ANNOUNCEMENT] New Committer: Przemyslaw Tredak (ptrendx)

2019-05-21 Thread Pedro Larroy
Welcome!

On Tue, May 21, 2019 at 3:55 PM Steffen Rochel  wrote:
>
> Congratulations Przemyslaw!
>
> On Tue, May 21, 2019 at 3:38 PM Marco de Abreu 
> wrote:
>
> > Welcome!
> >
> > On Tue, May 21, 2019 at 11:48 PM Carin Meier  wrote:
> >
> > > Welcome!
> > >
> > > On Tue, May 21, 2019 at 5:32 PM Naveen Swamy  wrote:
> > >
> > > > The Project Podling Management Committee (PPMC) for Apache MXNet has
> > > > invited Przemyslaw Tredak (ptrendx) based on his contribution to MXNet
> > to
> > > > become a committer and we are pleased to announce that he has accepted.
> > > >
> > > > Przemyslaw, thanks a lot for your contribution and continued effort to
> > > > support MXNet community.
> > > >
> > > > Please join me in welcoming Przemyslaw to the project!
> > > >
> > > > Thanks, Naveen
> > > > (on behalf of Apache MXNet PPMC)
> > > >
> > >
> >


warnings as errors

2019-05-21 Thread Pedro Larroy
Hi dev@

I try to fix any warning that I see during compilation of MXNet in my
platform and with the build toggles that I care about. These seemingly
trivial and ungrateful efforts, take nonetheless energy on the
contributor side.

I think overall I submitted myself more than a dozen of PRs fixing
warnings and I would like to call for additional help and
contributions in this area.

There was a question from Lin about discussing this on the mailing
list, I have the feeling that everybody agrees on moving towards zero
warnings and warnings as errors. I think there are unavoidable
warnings that can be disabled specifically such as the one triggered
by mshadow type switch.

Some important missing warnings such as warning on missing return
values (ie. forgetting to return on a function returning non-void)
cause bugs, danger and additional time spent bugfixing, which can be
better spent somewhere else.

Is there a process that we can figure out such as a more expedited
merges of PRs fixing warnings or a specific label?

Some simple PRs that fixes a warning can take long to merge, and
sometimes trigger too much discussion and make the progress a bit
unfriendly to contributors.

Any help or constructive ideas on this topic would be appreciated.

Pedro.


Re: [Discussion] Remove bundled llvm OpenMP

2019-05-20 Thread Pedro Larroy
Hi Anton, Stas.

Can we reopen this PR and get it merged as per the data collected by Stas?

https://github.com/apache/incubator-mxnet/pull/12160

https://cwiki.apache.org/confluence/display/MXNET/Benchmarking+MXNet+with+different+OpenMP+implementations

There are multiple issues that will be fixed by solving this problem.


Pedro

On Tue, Feb 12, 2019 at 4:54 AM Anton Chernov  wrote:
>
> I would like to propose a possible alternative solution for consideration.
>
> If keeping llvm OpenMP as a submodule is inevitable one could make
> following adjustments:
>
> Since compilers try to find their own OpenMP library implicitly, MXNet
> needs to ensure that only the bundled version is found. Therefore during
> the build and also during deployment this library has to provide symlinks
> for each possible compiler that would link to the built artifact ie.
>
> libiomp.so -> libgomp.so -> libomp.so
>
> The MKLML iomp would need to be hidden and removed as well.
>
> On Windows it would be a different story, but as can be seen [1] bundled
> OpenMP was not included in the Windows build anyway.
>
> Alternatively: always use iomp (with same symlinking trick though) provided
> by MKLML distribution [2]. This potentially could work on Windows as well.
>
> Best
> Anton
>
> [1]
> https://github.com/apache/incubator-mxnet/blob/8a63bdecf2d9f12d34fe5874957ae4c867eb5f5b/CMakeLists.txt#L408-L410
> [2] https://github.com/intel/mkl-dnn/releases
>
> вт, 12 февр. 2019 г. в 11:22, Anton Chernov :
>
> > Recent benchmarking results have been published here [1]. Experiments
> > compare different OpenMP implementations as well as binaries compiled with
> > different compilers including GCC, Clang and ICC.
> >
> > During experimentation another issues with mixing up libraries was
> > identified and described here [2].
> >
> > Best
> > Anton
> >
> > [1] https://cwiki.apache.org/confluence/x/2wclBg
> > [2]
> > https://github.com/apache/incubator-mxnet/issues/14087#issuecomment-461734041
> >
> >
> > вс, 9 дек. 2018 г. в 16:28, Anton Chernov :
> >
> >> Hi Chris,
> >>
> >> Following up on the issue, are all things resolved in the discussion?
> >>
> >> If yes, I kindly ask you to reopen this PR and remove ‘requesting
> >> changes’ status:
> >> https://github.com/apache/incubator-mxnet/pull/12160
> >>
> >> Thank you.
> >>
> >>
> >> Best
> >> Anton
> >>
> >>
> >> вт, 27 нояб. 2018 г. в 17:15, Anton Chernov :
> >>
> >>> Another thing to take into consideration:
> >>>
> >>> All python artefacts that are created (PyPi) are built with make and are
> >>> not using the bundled OpenMP library.
> >>>
> >>> One step for the switch to CMake to happen is the approval and merging
> >>> of the mentioned PR:
> >>>
> >>> https://github.com/apache/incubator-mxnet/pull/12160
> >>>
> >>> If there are no other objections I kindly ask Chris Olivier to remove
> >>> his 'requesting changes' veto on it to unblock the CMake overhaul work.
> >>>
> >>> Thank you.
> >>>
> >>> Best
> >>> Anton
> >>>
> >>> чт, 22 нояб. 2018 г. в 17:11, Anton Chernov :
> >>>
> 
>  Thank you for you answer, Chris.
> 
>  > The whole “mixing omp libraries” is something that occurs in
>  production
>  every day and certainly in everything that uses mkl.
> 
>  I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures that
>  this mixture is not happening:
> 
>  "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
>  runtime library to work. As different OpenMP runtimes may not be binary
>  compatible it's important to ensure that only one OpenMP runtime is used
>  throughout the application. Having more than one OpenMP runtime 
>  initialized
>  may lead to undefined behavior resulting in incorrect results or 
>  crashes."
>  [1]
> 
>  That is why 2 different MKLML libraries are provided:
> 
>  lib/libmklml_gnu.so  | Intel MKL small library for GNU* OpenMP runtime
>  lib/libmklml_intel.so | Intel MKL small library for Intel(R) OpenMP
>  runtime
> 
>  > is the suggestion that libiomp be removed from mkl?
> 
>  That is certainly not my suggestion.
> 
>  > have you spoken with intel? have you consulted Intel at all?
> 
>  Yes, I have asked for comments on the issue.
> 
>  > “hard to debug random crash”. you’re seeing an assertion which is
>  probably ...
> 
>  I'm seeing the result of undefined behaviour. And I want to put
>  emphasis on the following statement:
> 
>  I disregards of whether there is a particular reason for the assert -
>  it is a result of behaviour that should not happen. There are valid ways
>  how to use llvm OpenMP in MXNet and the current way is not one of them.
> 
>  > The lack of root-causing the problem and knee-jerk solution here
>  makes me
>  uncomfortable.
> 
>  I hope that my efforts highlighting the problems reach you to mitigate
>  your uncomfort.
> 

Re: [Proposal] New operator graph for MXNet

2019-05-17 Thread Pedro Larroy
n typed vs type-erased style: If MXNet
> is a pure C++ project, I might take more of the typed approach.
> However, MXNet itself is a project that takes python/scala/clojure and
> other frontend languages.
> The introduction of more typing may not align with the original goal as the
> tradeoffs I listed above.
>
> This proposal is really a drastic change of what NNVM does, as well as the
> optimization passes, and given the scope, in your analogy, "a new vehicle
> to solve all the problems"
> rather than a minor patch. It will take a lot of engineering effort to
> bring in new features and adapting the existing ones.
> Because of that, it does merit a discussion about how shall we think about
> the future MXNet2.0.
>
> Technically Relay is a serious candidate. Of course relay, as well as its
> core, is in C++ but maintains the multi-language first principle, that is
> why the example code was in python.
> See more related discussion comparing NNVMv1 and relay:
> https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5
>
> I think the ideal graph data structure candidate for MXNet2.0 should have
> natural support for:
> - Native support of function, module, and recursions
> - Control flows
> - The ability of interpolation with multi-language frontend, e.g. being
> able to prototype graph optimizations in python/scala/clojure if needed.
>
> Adding these support needs significant engineering effort, and I do hope we
> only have to do it once. While I don't want to force any conclusion here,
> I do think Relay is one such candidate.
>
> Tianqi
>
>
> On Tue, May 14, 2019 at 5:58 PM Pedro Larroy 
> wrote:
>
> > Hi Tianqi
> >
> > Thanks for the quick response.
> >
> > Could you point to examples where graph.h is being exposed which would
> > not be possible with what I propose? I don't think my proposal is
> > having any impact in language bindings, and the way I describe it
> > doesn't affect having or not having higher language bindings. Please
> > elaborate so I can understand your concern.  Maybe code examples where
> > the graph attributes are being changed from Python?  I don't think we
> > have this on MXNet. This is such a core foundation for MXNet, that I
> > don't think we should compromise on it because other project not
> > directly related to MXNet might want to expose some untyped graph and
> > Node attributes.  The current status makes maintaining the code very
> > painful and also is preventing desired features such as higher order
> > gradients to be developed. I have heard from you many times how speed
> > is critical for us to innovate in this quickly changing field.
> >
> > My proposal is limited to the graph and wouldn't change the way
> > operators are registered and arguments are processed for operators for
> > example.
> >
> >
> > Regarding the second point, the documentation about Relay in the web
> > which I found for example:
> >
> > https://docs.tvm.ai/dev/relay_add_op.html#
> >
> > Is somebody working on making Imperative::Backward use this API? this
> > would be a big change which I'm not aware of. And using an IR is of a
> > much bigger scope than the change I'm proposing here for example.
> >
> > I think I'm having difficulty understanding what are the arguments
> > here. I'm saying I need to change one piece of my car and what you are
> > selling me is a new vehicle here?  Or your suggestion that we use
> > Relay for the graph passes in MXNet?
> >
> > I would like to see C++ code examples, Python examples are not
> > sufficient when we talk about the core MXNet.
> >
> > Pedro.
> >
> >
> >
> >
> >
> >
> > On Tue, May 14, 2019 at 5:39 PM Tianqi Chen 
> > wrote:
> > >
> > > Thanks for the proposal. Let me share some of my thoughts:
> > >
> > > Specific comments on the proposal
> > > ---
> > > The heavy use of generic in the Graph type was a huge departure from
> > > type-erased data structure which was presented in the previous design.
> > > While we understand the advantage of typed language(more compile-time
> > > checking) and type-erased types(more dynamism) the heavy use of
> > > the template will actually make the project solely C++ focused, making it
> > > hard to expose intermediate(templatized) data structure to
> > > other languages like python/scala/clojure.
> > >
> > > While I fully understand some of the lessons taught in programming
> > > C++(reduce shared_ptr, more typing etc.)
>

Re: [Proposal] New operator graph for MXNet

2019-05-15 Thread Pedro Larroy
Hi

Thanks for all the materials and keypoints raised. The discussion has
many ramifications, I will think about them and research them very
carefully before replying further. Please also don't quickly dismiss
the points I have raised and reduce them to typed vs untyped or
pedantic C++ comments, we have been debugging missing nodes and
pointers in the graph when doing second order gradient for weeks with
no success due to the design of the graph.

There's 60 years of software development learnings and practice behind
some concepts, and compiler theory that deep learning frameworks can
also take advantage of instead of rediscovering everything again until
we end up in a typed pure functional IR.
In some of the materials linked you also point out limitations of the
current architecture. I think it's good that we raise this topic and
it shows that we need to have a deeper and structured conversation on
how we evolve the dataflow graph in MXNet. Maybe you can help cross
polinizing this conversation between the TVM and MXNet project. If
there's an intention to change from NNVM to NNVM2 I think this should
have been communicated or discussed with the community before.

Until then.

Pedro.




On Tue, May 14, 2019 at 8:03 PM Tianqi Chen  wrote:
>
> The core part of the proposal is to move the graph to be much more strongly
> typed template class.
> I think this is mainly a point of engineering taste, and both sides have
> pros and cons, let me list them before I share my thoughts on this issue:
>
> - Typed fields certainly enjoy more compile-time type checking, on the
> other hand, it is hard to expose
>template of explosive possibilities to frontend languages.
> - More type-erased fields provide runtime flexibility to store polymorphic
> types as well as extensible attributes for graph optimization
>   - It is hard to use a virtual class to expose every possible attribute
> that an operator might have, such as inlining, storage pattern, gradient
> etc..
>   - The nature of supporting a growing set of operator attribute requires a
> type-erased attrs field.
> - In contrast to your argument(typing is a blocker to features),
> type-erased or typed code can both get to the same feature except, except
> that
>   typed code gets more compile-time errors while type-erased get some of
> them in runtime.
> - Templatized data structures will likely introduce additional metal
> burdens to developers and are not really suitable as a core data structure
>- Because they imply an explosive number of possible data structures,
> while the core data structure should be a single one.
>
> Now my view(as an MXNet PMC member) on typed vs type-erased style: If MXNet
> is a pure C++ project, I might take more of the typed approach.
> However, MXNet itself is a project that takes python/scala/clojure and
> other frontend languages.
> The introduction of more typing may not align with the original goal as the
> tradeoffs I listed above.
>
> This proposal is really a drastic change of what NNVM does, as well as the
> optimization passes, and given the scope, in your analogy, "a new vehicle
> to solve all the problems"
> rather than a minor patch. It will take a lot of engineering effort to
> bring in new features and adapting the existing ones.
> Because of that, it does merit a discussion about how shall we think about
> the future MXNet2.0.
>
> Technically Relay is a serious candidate. Of course relay, as well as its
> core, is in C++ but maintains the multi-language first principle, that is
> why the example code was in python.
> See more related discussion comparing NNVMv1 and relay:
> https://discuss.tvm.ai/t/any-materials-of-relay-for-beginners/2392/5
>
> I think the ideal graph data structure candidate for MXNet2.0 should have
> natural support for:
> - Native support of function, module, and recursions
> - Control flows
> - The ability of interpolation with multi-language frontend, e.g. being
> able to prototype graph optimizations in python/scala/clojure if needed.
>
> Adding these support needs significant engineering effort, and I do hope we
> only have to do it once. While I don't want to force any conclusion here,
> I do think Relay is one such candidate.
>
> Tianqi
>
>
> On Tue, May 14, 2019 at 5:58 PM Pedro Larroy 
> wrote:
>
> > Hi Tianqi
> >
> > Thanks for the quick response.
> >
> > Could you point to examples where graph.h is being exposed which would
> > not be possible with what I propose? I don't think my proposal is
> > having any impact in language bindings, and the way I describe it
> > doesn't affect having or not having higher language bindings. Please
> > elaborate so I can understand your concern.  Maybe code examples where
> > the graph att

Re: [Proposal] New operator graph for MXNet

2019-05-14 Thread Pedro Larroy
Hi Tianqi

I thought a bit more about your comments and I think there is a simple
way to address your concerns that satisfies both needs.

We can have a NodeAttributes template class which has a map of string
to any as it's currenlty the case, so the graph can be used in the
highly dynamic scenario that you are concerned about.

Let me know what you think.

Pedro.


On Tue, May 14, 2019 at 5:50 PM Pedro Larroy
 wrote:
>
> Hi Tianqi
>
> Thanks for the quick response.
>
> Could you point to examples where graph.h is being exposed which would
> not be possible with what I propose? I don't think my proposal is
> having any impact in language bindings, and the way I describe it
> doesn't affect having or not having higher language bindings. Please
> elaborate so I can understand your concern.  Maybe code examples where
> the graph attributes are being changed from Python?  I don't think we
> have this on MXNet. This is such a core foundation for MXNet, that I
> don't think we should compromise on it because other project not
> directly related to MXNet might want to expose some untyped graph and
> Node attributes.  The current status makes maintaining the code very
> painful and also is preventing desired features such as higher order
> gradients to be developed. I have heard from you many times how speed
> is critical for us to innovate in this quickly changing field.
>
> My proposal is limited to the graph and wouldn't change the way
> operators are registered and arguments are processed for operators for
> example.
>
>
> Regarding the second point, the documentation about Relay in the web
> which I found for example:
>
> https://docs.tvm.ai/dev/relay_add_op.html#
>
> Is somebody working on making Imperative::Backward use this API? this
> would be a big change which I'm not aware of. And using an IR is of a
> much bigger scope than the change I'm proposing here for example.
>
> I think I'm having difficulty understanding what are the arguments
> here. I'm saying I need to change one piece of my car and what you are
> selling me is a new vehicle here?  Or your suggestion that we use
> Relay for the graph passes in MXNet?
>
> I would like to see C++ code examples, Python examples are not
> sufficient when we talk about the core MXNet.
>
> Pedro.
>
>
>
>
>
>
> On Tue, May 14, 2019 at 5:39 PM Tianqi Chen  wrote:
> >
> > Thanks for the proposal. Let me share some of my thoughts:
> >
> > Specific comments on the proposal
> > ---
> > The heavy use of generic in the Graph type was a huge departure from
> > type-erased data structure which was presented in the previous design.
> > While we understand the advantage of typed language(more compile-time
> > checking) and type-erased types(more dynamism) the heavy use of
> > the template will actually make the project solely C++ focused, making it
> > hard to expose intermediate(templatized) data structure to
> > other languages like python/scala/clojure.
> >
> > While I fully understand some of the lessons taught in programming
> > C++(reduce shared_ptr, more typing etc.)
> > We need to think about the context of MXNet project and **the need to
> > support multi-language as a first-class**.
> > Some of the type-erased types are design trade-offs made to support these
> > features, and we need to think more
> > carefully instead of just applying "rules for C++" which may bring problems.
> >
> > Future of NNVM
> > --
> > Given that this thread touched upon what we should do for better
> > computational graph handling. I would recommend also to take a look at
> > NNVMv2 -- relay.
> >
> > Relay addresses many of the wish-lists in the proposal already, such as
> > operator fusion, high order gradient, offload to hardware, isolated
> > compilation, deployment on edge and accelerators etc.
> > Relay also address problems not yet being mentioned in the proposal,
> > including control flow and dynamic runtime, automatic layout optimization
> > etc.
> >
> > Tianqi
> >
> > On Tue, May 14, 2019 at 5:06 PM Sheng Zha  wrote:
> >
> > > Hi Pedro,
> > >
> > > Thanks for taking the inititaive. Skimming through the design doc, I
> > > didn't see comparison with existing solutions such as relay in tvm, which
> > > is already a dependency of mxnet already. Could you elaborate on 
> > > comparison
> > > with existing solutions in the design doc too?
> > >
> > > -sz
> > >
> > > On 2019/05/14 23:49:30, Pedro Larroy 
> > > wrote:
> > > &

Re: [Proposal] New operator graph for MXNet

2019-05-14 Thread Pedro Larroy
Hi Tianqi

Thanks for the quick response.

Could you point to examples where graph.h is being exposed which would
not be possible with what I propose? I don't think my proposal is
having any impact in language bindings, and the way I describe it
doesn't affect having or not having higher language bindings. Please
elaborate so I can understand your concern.  Maybe code examples where
the graph attributes are being changed from Python?  I don't think we
have this on MXNet. This is such a core foundation for MXNet, that I
don't think we should compromise on it because other project not
directly related to MXNet might want to expose some untyped graph and
Node attributes.  The current status makes maintaining the code very
painful and also is preventing desired features such as higher order
gradients to be developed. I have heard from you many times how speed
is critical for us to innovate in this quickly changing field.

My proposal is limited to the graph and wouldn't change the way
operators are registered and arguments are processed for operators for
example.


Regarding the second point, the documentation about Relay in the web
which I found for example:

https://docs.tvm.ai/dev/relay_add_op.html#

Is somebody working on making Imperative::Backward use this API? this
would be a big change which I'm not aware of. And using an IR is of a
much bigger scope than the change I'm proposing here for example.

I think I'm having difficulty understanding what are the arguments
here. I'm saying I need to change one piece of my car and what you are
selling me is a new vehicle here?  Or your suggestion that we use
Relay for the graph passes in MXNet?

I would like to see C++ code examples, Python examples are not
sufficient when we talk about the core MXNet.

Pedro.






On Tue, May 14, 2019 at 5:39 PM Tianqi Chen  wrote:
>
> Thanks for the proposal. Let me share some of my thoughts:
>
> Specific comments on the proposal
> ---
> The heavy use of generic in the Graph type was a huge departure from
> type-erased data structure which was presented in the previous design.
> While we understand the advantage of typed language(more compile-time
> checking) and type-erased types(more dynamism) the heavy use of
> the template will actually make the project solely C++ focused, making it
> hard to expose intermediate(templatized) data structure to
> other languages like python/scala/clojure.
>
> While I fully understand some of the lessons taught in programming
> C++(reduce shared_ptr, more typing etc.)
> We need to think about the context of MXNet project and **the need to
> support multi-language as a first-class**.
> Some of the type-erased types are design trade-offs made to support these
> features, and we need to think more
> carefully instead of just applying "rules for C++" which may bring problems.
>
> Future of NNVM
> --
> Given that this thread touched upon what we should do for better
> computational graph handling. I would recommend also to take a look at
> NNVMv2 -- relay.
>
> Relay addresses many of the wish-lists in the proposal already, such as
> operator fusion, high order gradient, offload to hardware, isolated
> compilation, deployment on edge and accelerators etc.
> Relay also address problems not yet being mentioned in the proposal,
> including control flow and dynamic runtime, automatic layout optimization
> etc.
>
> Tianqi
>
> On Tue, May 14, 2019 at 5:06 PM Sheng Zha  wrote:
>
> > Hi Pedro,
> >
> > Thanks for taking the inititaive. Skimming through the design doc, I
> > didn't see comparison with existing solutions such as relay in tvm, which
> > is already a dependency of mxnet already. Could you elaborate on comparison
> > with existing solutions in the design doc too?
> >
> > -sz
> >
> > On 2019/05/14 23:49:30, Pedro Larroy 
> > wrote:
> > > Hi dev@
> > >
> > > As a result of my deep dives on the graph machinery I have created a
> > > new proposal to improve the operator graph in MXNet.
> > >
> > > This would mean superseding the use of NNVM Graph in MXNet and having
> > > a new implementation that we can use to simplify a lot of code and do
> > > powerful graph manipulation and passes such as operator fusion and
> > > other optimizations.
> > >
> > > As it would be a change with big impact and ramifications, your
> > > thoughts and feedback on the document would be highly appreciated so
> > > we can take potential future interesting use cases:
> > >
> > >
> > https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0
> > >
> > > Pedro.
> > >
> >


Re: [Proposal] New operator graph for MXNet

2019-05-14 Thread Pedro Larroy
Hi Sheng

Could  you provide relevant links to Relay and what you would
recommend to read so we have a focused discussion instead of me
potentially me miss-searching? Probably I also missed the discussion
or vote in the mail list regarding including TVM as a depedency or
future plans on using Relay.
As far as I know, we have TVM as a dependency because NNVM was
assimilated into it but we are not using it directly.  Is this
correct?

This would help me to add this information to the doc as you request.

Thanks.

Pedro.

On Tue, May 14, 2019 at 5:06 PM Sheng Zha  wrote:
>
> Hi Pedro,
>
> Thanks for taking the inititaive. Skimming through the design doc, I didn't 
> see comparison with existing solutions such as relay in tvm, which is already 
> a dependency of mxnet already. Could you elaborate on comparison with 
> existing solutions in the design doc too?
>
> -sz
>
> On 2019/05/14 23:49:30, Pedro Larroy  wrote:
> > Hi dev@
> >
> > As a result of my deep dives on the graph machinery I have created a
> > new proposal to improve the operator graph in MXNet.
> >
> > This would mean superseding the use of NNVM Graph in MXNet and having
> > a new implementation that we can use to simplify a lot of code and do
> > powerful graph manipulation and passes such as operator fusion and
> > other optimizations.
> >
> > As it would be a change with big impact and ramifications, your
> > thoughts and feedback on the document would be highly appreciated so
> > we can take potential future interesting use cases:
> >
> > https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0
> >
> > Pedro.
> >


[Proposal] New operator graph for MXNet

2019-05-14 Thread Pedro Larroy
Hi dev@

As a result of my deep dives on the graph machinery I have created a
new proposal to improve the operator graph in MXNet.

This would mean superseding the use of NNVM Graph in MXNet and having
a new implementation that we can use to simplify a lot of code and do
powerful graph manipulation and passes such as operator fusion and
other optimizations.

As it would be a change with big impact and ramifications, your
thoughts and feedback on the document would be highly appreciated so
we can take potential future interesting use cases:

https://cwiki.apache.org/confluence/display/MXNET/MXVM%3A+Operator+graph+2.0

Pedro.


Re: assimilation of mshadow into the MXNet codebase

2019-05-14 Thread Pedro Larroy
Hi Sheng.

Do you need some help with this?  Do we plan to have this for 1.5?

Pedro.

On Wed, Apr 24, 2019 at 4:26 PM Pedro Larroy
 wrote:
>
> Thanks. Great to read.
>
> On Wed, Apr 24, 2019 at 2:19 PM Sheng Zha  wrote:
> >
> > The community has agreed to donate mshadow to the mxnet code base. I will 
> > start the migration and build logic changes soon.
> >
> > -sz
> >
> > On 2019/04/07 21:47:39, Sheng Zha  wrote:
> > > I agree it would make development easier to donate mshadow to mxnet code 
> > > base, since mshadow is only used in MXNet. I support donating the mshadow 
> > > code to mxnet and I started an RFC for this in mshadow [1].
> > >
> > > [1] https://github.com/dmlc/mshadow/issues/373
> > >
> > > -sz
> > >
> > > On 2019/04/06 04:38:19, Tianqi Chen  wrote:
> > > > Technically, mshadow is sufficient for MXNet. Adopting other libraries (
> > > > eigen or xtensor) will unnecessarily increase the codebase complexity
> > > > without any additional gains.
> > > >
> > > > Given that mshadow is only used by mxnet. I do support donating it into
> > > > mxnet codebase.
> > > > To respect the original mshadow community. I would recommend starting a
> > > > community RFC In the mshadow github issue for a week, before we start 
> > > > the
> > > > migrating process.
> > > > Also, I would recommend a rebase merge just like the case of MXNet.jl 
> > > > code
> > > > base to preserve the contribution history.
> > > >
> > > > Tianqi
> > > >
> > > >
> > > > On Fri, Apr 5, 2019 at 9:25 PM Alfredo Luque
> > > >  wrote:
> > > >
> > > > > Do you have a link to both of these proposals?
> > > > >
> > > > > On Fri, Apr 5, 2019 at 20:14 Anirudh Acharya 
> > > > > wrote:
> > > > >
> > > > > > Hi Pedro,
> > > > > >
> > > > > > mshadow is mostly used for tensor arithmetic. There have been 
> > > > > > discussions
> > > > > > about including it within mxnet. I think it is a good idea.
> > > > > >
> > > > > > As a more long term solution using libraries like eigen to perform 
> > > > > > linear
> > > > > > algebra operations was also suggested by anirudh2290@. I think 
> > > > > > xtensor(
> > > > > > https://github.com/QuantStack/xtensor ) can also be a candidate 
> > > > > > here.
> > > > > >
> > > > > > -
> > > > > > Anirudh
> > > > > >
> > > > > >
> > > > > > On Fri, Apr 5, 2019 at 7:03 PM Pedro Larroy <
> > > > > pedro.larroy.li...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi
> > > > > > >
> > > > > > > Some developers have noticed that working in mshadow is 
> > > > > > > cumbersome as
> > > > > > > it's a 3rdparty subrepo.
> > > > > > >
> > > > > > > Since mshadow is a bunch of headers which don't have much of
> > > > > > > independent tests / library functionality, me and other developers
> > > > > > > believe that it would be good to assimilate this code in the
> > > > > > > repository for ease of contribution and changes without having to 
> > > > > > > go
> > > > > > > trough contortions to test PRs that modify mshadow.
> > > > > > >
> > > > > > > Would anybody oppose this change?
> > > > > > >
> > > > > > > Thanks and have a nice weekend.
> > > > > > >
> > > > > > > Pedro.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >


Re: Python2 End of Life

2019-05-14 Thread Pedro Larroy
+1  Let python2 rest, let's simplify our infrastructure and need to
support old Python versions.

On Mon, May 13, 2019 at 1:58 PM Jake Lee  wrote:
>
> +1 Recently I upgraded the Numpy version and found out that Pylint had
> false alarm on it. The Pylint fix is only available on Python3. So I
> changed the default python version of 'make pylint' command to python3 (PR
> haven't been merged). It's time to drop support for Python2.
>
> On Mon, May 13, 2019 at 1:37 PM Junru Shao  wrote:
>
> > +1
> >
> > On Mon, May 13, 2019 at 1:34 PM Aaron Markham 
> > wrote:
> >
> > > +1 for the pledge and to start moving things to Python 3.
> > > I think our installation instructions and tutorials can be updated to
> > > default to Python3 and we should update Python2-only tutorials. I know
> > > we have a handful of those, and when I spot them, I'll create an
> > > issue.
> > > I can also look at migrating the docs build to Python 3.
> > > Should we add a new label for issues relating to migrating to Python3?
> > > Cheers,
> > > Aaron
> > >
> > > On Mon, May 13, 2019 at 12:04 PM Zach Kimberg  > >
> > > wrote:
> > > >
> > > > Right now, the official date for ending support for Python 2.7 (and all
> > > of
> > > > python2) is set to January 1 [1]. As part of it, a number of projects
> > > have
> > > > pledged to drop support for Python2 in or before 2020 including
> > > Tensorflow,
> > > > requests, pandas, ipython, numpy, pillow, and Cython [2]. I believe we
> > > > should also join in this pledge on python3statement.org [2] because it
> > > > would help clean up our project and it would be difficult to continue
> > > > supporting Python2 anyway when some of our dependencies are dropping
> > > > support.
> > > >
> > > > As a concrete step, we should decide on a date to remove all usages of
> > > > Python2 from our CI and consider that officially dropping support.
> > > > Following that, we can expect PRs will end up breaking support for
> > > Python2.
> > > > I suggest just using the same date that Python is dropping support of
> > > > January 1. We may also need to update some examples or scripts that
> > were
> > > > written only for python2 that are around the project. Any thoughts?
> > > >
> > > > Zach
> > > >
> > > >
> > > > [1] - https://www.python.org/dev/peps/pep-0373/
> > > > [2] - https://python3statement.org/
> > >
> >


Re: [Announcement] New Committer - Zach Kimberg

2019-05-13 Thread Pedro Larroy
Congratulations

On Thu, May 9, 2019 at 11:29 AM Chaitanya Bapat  wrote:
>
> Congratulations Zachary! Way to go!
>
> On Thu, 9 May 2019 at 14:01, Carin Meier  wrote:
>
> > Congrats!
> >
> > On Thu, May 9, 2019 at 1:41 PM Per da Silva  wrote:
> >
> > > Nice one! Congratulations =)
> > >
> > > On Thu, May 9, 2019 at 7:38 PM Jake Lee  wrote:
> > >
> > > > Congrat!
> > > >
> > > > On Thu, May 9, 2019 at 10:37 AM Yuan Tang 
> > > wrote:
> > > >
> > > > > Welcome!
> > > > >
> > > > > On Thu, May 9, 2019 at 1:36 PM Marco de Abreu <
> > marco.g.ab...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Welcome!
> > > > > >
> > > > > > Hagay Lupesko  schrieb am Do., 9. Mai 2019,
> > > 19:33:
> > > > > >
> > > > > > > Congratulations Zach - well deserved!
> > > > > > >
> > > > > > > On Thu, May 9, 2019, 13:26 Qing Lan  wrote:
> > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > Please join me in welcoming Zach Kimberg (
> > > > https://github.com/zachgk)
> > > > > > as
> > > > > > > a
> > > > > > > > new committer.
> > > > > > > >
> > > > > > > > He has been solving some important bugs in MXNet JVM with
> > respect
> > > > to
> > > > > > > usage
> > > > > > > > improvement, build issues and a lot more. He also created the
> > > > Jenkins
> > > > > > > based
> > > > > > > > publish pipeline for us to have standard way to build and test
> > > > > > > > static-linked package conveniently for everyone in the
> > community.
> > > > > > > Moreover,
> > > > > > > > he solved a bunch of License problems we have in MXNet and
> > > brought
> > > > > > > several
> > > > > > > > fixes to let us get 1.4.0 release on time.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Qing
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> *Chaitanya Prakash Bapat*
> *+1 (973) 953-6299*
>
> [image: https://www.linkedin.com//in/chaibapat25]
> [image: https://www.facebook.com/chaibapat]
> [image:
> https://twitter.com/ChaiBapchya] [image:
> https://www.linkedin.com//in/chaibapat25]
> 


Re: [VOTE] Release Apache MXNet (incubating) version 1.4.1.rc0

2019-05-01 Thread Pedro Larroy
+1 (non-binding)

Tried CPU build + C++ tests + 714 Python unit tests in 605s.
ARMv7 build + small unit test in QEMU + ARMv8 builds.

Thanks. Regards

Pedro.

On Wed, May 1, 2019 at 10:41 AM Qing Lan  wrote:
>
> +1 (binding)
>
> build from source works for OSX and Ubuntu CPU
> Scala build/test successfully with Dynamic link and static link.
>
> Thanks,
> Qing
>
> 
> From: Sheng Zha 
> Sent: Wednesday, May 1, 2019 13:14
> To: d...@mxnet.apache.org
> Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.4.1.rc0
>
> Hi all,
>
> Reminder that the vote for 1.4.1 release is still ongoing. If you can, please 
> help out. Thank you.
>
> -sz
>
> On 2019/04/30 06:51:45, Junru Shao  wrote:
> > Dear MXNet community,
> >
> > This is the 3-day vote to release Apache MXNet (incubating) version v1.4.1.
> > The voting on dev@ list will start Apr 29 23:59:59 (PST) and close on May
> > 02 23:59:59.
> >
> > Below are links to
> > 1) Release notes:
> > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.1+Release+Notes
> > .
> > 2) Release Candidate:
> > https://github.com/apache/incubator-mxnet/releases/tag/1.4.1.rc0.
> > 3) Source and signatures on Apache dist server:
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.4.1.rc0/.
> >
> > Please remember to TEST first before voting accordingly:
> > +1 = approve
> > +0 = no opinion
> > -1 = disapprove (provide reason)
> >
> > Best regards,
> > Junru Shao
> >


Re: [Announcement] New Committer - Hao Jin

2019-05-01 Thread Pedro Larroy
Congrats!

On Wed, May 1, 2019 at 11:08 AM Lin Yuan  wrote:
>
> Congrats!
>
> On Tue, Apr 30, 2019 at 11:28 PM Alex Zai  wrote:
>
> > Congrats Hao!
> >
> > On Tue, Apr 30, 2019 at 10:53 PM Steffen Rochel 
> > wrote:
> >
> > > congratulation Hao!
> > >
> > > On Tue, Apr 30, 2019 at 8:05 AM MiraiWK WKCN  wrote:
> > >
> > > > Congrats Hao! Welcome!
> > > >
> > > > 
> > > > From: Lv, Tao A 
> > > > Sent: Tuesday, April 30, 2019 11:00:33 PM
> > > > To: dev@mxnet.incubator.apache.org
> > > > Subject: RE: [Announcement] New Committer - Hao Jin
> > > >
> > > > Congratulations Hao!
> > > >
> > > > -Original Message-
> > > > From: Jun Wu [mailto:wujun@gmail.com]
> > > > Sent: Tuesday, April 30, 2019 12:29 PM
> > > > To: dev@mxnet.incubator.apache.org
> > > > Subject: [Announcement] New Committer - Hao Jin
> > > >
> > > > Please join me in welcoming Hao Jin (https://github.com/haojin2) from
> > > AWS
> > > > as a new committer.
> > > >
> > > > Hao has designed and implemented many sophisticated algorithms for
> > tensor
> > > > operations. His work has greatly expanded the coverage of MXNet
> > operator
> > > > inventory and enhanced the performance of many operators that are hard
> > to
> > > > be optimized. Not only that, Hao has been active in advocating MXNet
> > > > through providing high-quality translation service for quite a few
> > > > technical articles and blog posts.
> > > >
> > >
> >


Re: Clojure MXNet Monthly Update

2019-04-29 Thread Pedro Larroy
nice!  I would suggest that we use the medium account for MXNet and
have a "this month in MXNet" as they do in other open source projects
and just have a clojure section? I think it gives nice visibility to
the project and attracts contributors. Maybe just copy your updates to
the "main one"?

Pedro.


On Fri, Apr 26, 2019 at 1:45 PM Carin Meier  wrote:
>
> I've started a monthly blog update targeted for the Clojure community but I
> thought I would share it here too :)
>
> http://gigasquidsoftware.com/blog/2019/04/26/clojure-mxnet-april-update/
>
> http://gigasquidsoftware.com/blog/2019/03/22/clojure-mxnet-march-update/
>
> Best,
> Carin


Re: docs updates

2019-04-25 Thread Pedro Larroy
Hi Aaron.

I'm no design expert, but wouldn't smoothing out the boxes a bit
rounding the corners would make the design more pleasant?

As an old fart learning from Knuth books, I'm all pro having
documentation as close to the source as possible. Could you maybe put
two concrete examples so the documentation noobs like me can
understand a bit more what you mean?

Thanks.


On Fri, Apr 19, 2019 at 6:02 PM Aaron Markham  wrote:
>
> Hello everyone!
> I've been pecking away at adding content to the beta site. Take a look
> and let me know what you think. Right now, I'm focused 100% on the
> content and the organization thereof.
>
> One feature I thought would be great for the API docs would be to
> cross-reference usage of a particular function with an example or
> tutorial. As I researched approaches I found sphinx-gallery [2]
> already supports this.
>
> When trying to apply this to some examples we have I ran across an
> issue [3] that many examples don't run out of the box. This prevents
> sphinx-gallery from doing its magic.
>
> My question to you all is: what you think of having .py files as a
> primary source of example usage and to generate notebooks and web
> pages from those. This is as opposed to .md files that get converted
> to notebooks and then html. GluonCV's site uses this and it seems to
> work pretty well for generating the tutorial web pages from the .py
> files. The implication is fixing many examples to run when executed by
> Sphinx. The remainder would have to be excluded and wouldn't be a
> source of example usage in the docs.
>
> Cheers,
> Aaron
>
> [1] https://beta.mxnet.io/
> [2] 
> https://sphinx-gallery.github.io/configuration.html#auto-documenting-your-api-with-links-to-examples
> [3] https://github.com/apache/incubator-mxnet/issues/5717


Re: [Announcement] New Committer - Wang Jiajun

2019-04-25 Thread Pedro Larroy
Welcome!

On Tue, Apr 16, 2019 at 6:34 PM kellen sunderland
 wrote:
>
> Welcome!  Very impressed with the work fixing memory leaks so far.
>
> On Tue, Apr 16, 2019 at 9:14 AM Carin Meier  wrote:
>
> > Congrats!
> >
> > On Tue, Apr 16, 2019 at 11:58 AM Anirudh Subramanian <
> > anirudh2...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Please join me to welcome Wang Jiajun (https://github.com/arcadiaphy)
> > as a
> > > new committer of Apache (incubating) MXNet!
> > >
> > > Wang has been solving some tough bugs with respect to memory leaks,
> > process
> > > fork handling, dependency engine issues and custom op exception handling.
> > >
> > > Issue Involvement:
> > >
> > >
> > https://github.com/apache/incubator-mxnet/issues?utf8=%E2%9C%93=is%3Aissue+involves%3Aarcadiaphy
> > >
> > > PRs authored:
> > >
> > >
> > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93=is%3Apr+author%3Aarcadiaphy+
> > >
> > > Anirudh
> > >
> >


DNS failures in jenkins

2019-04-25 Thread Pedro Larroy
Hi

I see some DNS resolution failures on jenkins, I think this is the
cause of jenkins not reporting the build status sometimes. What dns
server are we using in the master? should we add a couple of secondary
resolvers to remediate?

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwebsite/detail/PR-14788/2/pipeline/


Thanks.

Pedro.


Re: assimilation of mshadow into the MXNet codebase

2019-04-24 Thread Pedro Larroy
Thanks. Great to read.

On Wed, Apr 24, 2019 at 2:19 PM Sheng Zha  wrote:
>
> The community has agreed to donate mshadow to the mxnet code base. I will 
> start the migration and build logic changes soon.
>
> -sz
>
> On 2019/04/07 21:47:39, Sheng Zha  wrote:
> > I agree it would make development easier to donate mshadow to mxnet code 
> > base, since mshadow is only used in MXNet. I support donating the mshadow 
> > code to mxnet and I started an RFC for this in mshadow [1].
> >
> > [1] https://github.com/dmlc/mshadow/issues/373
> >
> > -sz
> >
> > On 2019/04/06 04:38:19, Tianqi Chen  wrote:
> > > Technically, mshadow is sufficient for MXNet. Adopting other libraries (
> > > eigen or xtensor) will unnecessarily increase the codebase complexity
> > > without any additional gains.
> > >
> > > Given that mshadow is only used by mxnet. I do support donating it into
> > > mxnet codebase.
> > > To respect the original mshadow community. I would recommend starting a
> > > community RFC In the mshadow github issue for a week, before we start the
> > > migrating process.
> > > Also, I would recommend a rebase merge just like the case of MXNet.jl code
> > > base to preserve the contribution history.
> > >
> > > Tianqi
> > >
> > >
> > > On Fri, Apr 5, 2019 at 9:25 PM Alfredo Luque
> > >  wrote:
> > >
> > > > Do you have a link to both of these proposals?
> > > >
> > > > On Fri, Apr 5, 2019 at 20:14 Anirudh Acharya 
> > > > wrote:
> > > >
> > > > > Hi Pedro,
> > > > >
> > > > > mshadow is mostly used for tensor arithmetic. There have been 
> > > > > discussions
> > > > > about including it within mxnet. I think it is a good idea.
> > > > >
> > > > > As a more long term solution using libraries like eigen to perform 
> > > > > linear
> > > > > algebra operations was also suggested by anirudh2290@. I think 
> > > > > xtensor(
> > > > > https://github.com/QuantStack/xtensor ) can also be a candidate here.
> > > > >
> > > > > -
> > > > > Anirudh
> > > > >
> > > > >
> > > > > On Fri, Apr 5, 2019 at 7:03 PM Pedro Larroy <
> > > > pedro.larroy.li...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi
> > > > > >
> > > > > > Some developers have noticed that working in mshadow is cumbersome 
> > > > > > as
> > > > > > it's a 3rdparty subrepo.
> > > > > >
> > > > > > Since mshadow is a bunch of headers which don't have much of
> > > > > > independent tests / library functionality, me and other developers
> > > > > > believe that it would be good to assimilate this code in the
> > > > > > repository for ease of contribution and changes without having to go
> > > > > > trough contortions to test PRs that modify mshadow.
> > > > > >
> > > > > > Would anybody oppose this change?
> > > > > >
> > > > > > Thanks and have a nice weekend.
> > > > > >
> > > > > > Pedro.
> > > > > >
> > > > >
> > > >
> > >
> >


Re: Benchmarking MXNet with different compilers and different OpenMP implementations (results)

2019-04-12 Thread Pedro Larroy
Are there any updates on this?

This is still affecting multiprocessing, some tests hang:

rces. For information on submitting this issue, please see
https://bugs.llvm.org/.
[INFO] Setting test np/mx/python random seeds, use
MXNET_TEST_SEED=2124604270 to reproduce.
Assertion failure at kmp_runtime.cpp(6479): __kmp_thread_pool == __null.
OMP: Error #13: Assertion failure at kmp_runtime.cpp(6479).
OMP: Hint: Please submit a bug report with this message, compile and
run commands used, and machine configuration info including native
compiler and operating system versions. Faster response will be
obtained by including all program sources. For information on
submitting this issue, please see https://bugs.llvm.org/.
Assertion failure at kmp_runtime.cpp(6479): __kmp_thread_pool == __null.
OMP: Error #13: Assertion failure at kmp_runtime.cpp(6479).
OMP: Hint: Please submit a bug report with this message, compile and
run commands used, and machine configuration info including native
compiler and operating system versions. Faster response will be
obtained by including all program sources. For information on
submitting this issue, please see https://bugs.llvm.org/.
^CException ignored in: >
Traceback (most recent call last):
  File "/home/piotr/mxnet_other/python/mxnet/gluon/data/dataloader.py",
line 595, in __del__
self._worker_pool.terminate()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 567, in terminate
self._terminate()
  File "/usr/lib/python3.6/multiprocessing/util.py", line 186, in __call__
res = self._callback(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 597, in
_terminate_pool
cls._help_stuff_finish(inqueue, task_handler, len(pool))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 582, in
_help_stuff_finish
inqueue._rlock.acquire()
KeyboardInterrupt

Pedro.

On Thu, Feb 14, 2019 at 6:30 AM Tsukrov, Stanislav
 wrote:
>
> Thanks Aaron for the feedback.
>
> > As for your next steps, would you propose that cmake be brought up to 
> > parity?
> Yes. sse2 in cmake vs sse3 in make is a minor example without high impact. 
> There are others.
>
> > It seems strange that it causes slowness and if so, it shouldn't be 
> > recommended for now.
> There are some issues in the cmake-files code, that should be fixed. Some of 
> them are workarounded for the benchmark.
>
> Best Regards
>
> Stas
>
> On 14.02.19, 14:09, "Anton Chernov"  wrote:
>
> Thank you, Aaron, for your interest on the topic.
>
> My main previous proposal still stands: remove bundled OpenMP submodule 
> and
> use OpenMP provided by the environment [1]. This might lead to performance
> degradation in some cases where an old OpenMP library is used or thread
> affinity wasn't set properly. But that would be a problem of the
> environment, not MXNet.
>
> I described some alternative solutions in [1] as part of this [2] thread.
> Tricking the linker with symlinks in both cases should allow to avoid
> multiple OpenMP implementations linked simultaneously to MXNet. Windows
> questions would be still open.
>
> Best
> Anton
>
> [1] https://github.com/apache/incubator-mxnet/pull/12160
> [2]
> 
> https://lists.apache.org/thread.html/007d8db15a1782e1b20896a4050b62710d4ff0908c67b94af7cb0f8b@%3Cdev.mxnet.apache.org%3E
> [3]
> 
> https://lists.apache.org/thread.html/4827f0f742b6e7e070da350ea81226d059401527f3072ce8b33c1fdf@%3Cdev.mxnet.apache.org%3E
>
>
> вт, 12 февр. 2019 г. в 16:39, Aaron Markham :
>
> > This is really great research. I've often wondered what the difference
> > really is, and why it has to be so complicated. It seems the answer is
> > there isn't much difference and it shouldn't be as complex.
> > As for your next steps, would you propose that cmake be brought up to
> > parity? It seems strange that it causes slowness and if so, it 
> shouldn't be
> > recommended for now.
> > Also, testing for windows compliers might be quite important as install
> > stats suggest a significant portion of windows users. Wouldn't this 
> nudge
> > the decision of what to use as a rule going forward?
> > I ran into this submodule openmp issue on windows myself. How does that 
> get
> > fixed? Do we have to repackage all of the submodules to make sure they 
> use
> > the recommended implementation or they use what the system expects?
> >
> > Cheers,
> > Aaron
> >
> > On Tue, Feb 12, 2019, 04:37 Anton Chernov  wrote:
> >
> > > Dear MXNet community,
> > >
> > > Due to multiple problems related to OpenMP and stale proposed change 
> [1]
> > we
> > > have been working on gathering performance data on the impact of using
> > > different OpenMP implementations with MXNet (great thanks to Stanislav
> > > Tsukrov for the hard work). The results can be found here [2].
> > >
> > > As a short summary of the investigation: 

Re: duplicated nnvm code

2019-04-12 Thread Pedro Larroy
I would think that if we are using nnvm from tvm we should not have
duplicated code in our repository. I think we either use the
subrepository as a 3rdparty or assimilate the code in the codebase as
what is planned with mshadow. But I guess TVM is making heavy use of
nnvm, and this case might make sense to reause across projects.
@Tianqi?

On Thu, Apr 11, 2019 at 10:16 PM Junru Shao  wrote:
>
> We should remove 3rdparty/tvm/nnvm/gradient.cc.o imo
>
> On Thu, Apr 11, 2019 at 6:44 PM Pedro Larroy 
> wrote:
>
> > Hi
> >
> > I found that src/nnvm  and 3rdparty/tvm/nnvm/src/pass/  has duplicated
> > code that we are linking in:
> >
> > ./CMakeFiles/mxnet_static.dir/3rdparty/tvm/nnvm/src/pass/gradient.cc.o
> > ./CMakeFiles/mxnet_static.dir/src/nnvm/gradient.cc.o
> >
> > This can potentially cause problems when linking. The symbol that will
> > be used is left as an exercise to the readers.
> >
> > Is this intentional?  Should we address this?
> >
> > Pedro.
> >


duplicated nnvm code

2019-04-11 Thread Pedro Larroy
Hi

I found that src/nnvm  and 3rdparty/tvm/nnvm/src/pass/  has duplicated
code that we are linking in:

./CMakeFiles/mxnet_static.dir/3rdparty/tvm/nnvm/src/pass/gradient.cc.o
./CMakeFiles/mxnet_static.dir/src/nnvm/gradient.cc.o

This can potentially cause problems when linking. The symbol that will
be used is left as an exercise to the readers.

Is this intentional?  Should we address this?

Pedro.


Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

2019-04-11 Thread Pedro Larroy
I will respond in slack, so we don't derail the original thread's
topic with my points.

Looking forward to your proposal.

On Thu, Apr 11, 2019 at 1:00 PM Junru Shao  wrote:
>
> I don't have idea about the following issues:
>
> 1) Reducing the abuse of inlined code moving more logic to implementation
> files and improve scoping which will also speed up compilation
> 2) Reduce runtime of some unit tests
> 3) Improve MXNet startup time
>
> Will be super interested to hear about your ideas :-)
>
>
> On Thu, Apr 11, 2019 at 12:52 PM Junru Shao  wrote:
>
> > We have a systematic solution to go without ABI headache. I am struggling
> > with some errants, and will share our proposal here as soon as I could.
> > This will be very interesting topic to discuss. Let's work hard together
> > and make it perfect :-)
> >
> > On Thu, Apr 11, 2019 at 12:43 PM Pedro Larroy <
> > pedro.larroy.li...@gmail.com> wrote:
> >
> >> Thanks Marco for raising this issue. I think we can certainly do some
> >> improvements in modularization and build. At the same time Tianqi's
> >> point of view is important to consider and on point. I see a high risk
> >> of overengineering in such endeavor.
> >>
> >> I also see increased complexity, difficulty debugging, C++ ABI
> >> headaches, API compatibility, crashes inside a binary module, etc.
> >> which I don't want to deal with as a developer or even as an MXNet
> >> user. Does somebody have answers to these problems?
> >>
> >> If somebody thinks they have a good solution, by all means propose a
> >> design in the wiki, I think we are all open. Personally I see several
> >> other lower hanging fruits which need our attention:
> >>  * Simplifying our build logic,
> >>  * Cuda selection in CMake,
> >>  * Reducing the abuse of inlined code moving more logic to
> >> implementation files and improve scoping which will also speed up
> >> compilation, (some units take more than 5 minutes to build and lots of
> >> RAM in a top of the line CPU core)
> >>  * Reduce runtime of some unit tests
> >> And other  improvements in our codebase that would bring immediate
> >> benefits without the risks of overengineering of a plugin system. I
> >> also question our bandwidth for such an endeavor.
> >>  * Improve MXNet startup time.
> >>  * Thread safety
> >>
> >> I would say, let's apply the KISS principle, let's make the project
> >> fast to build, easy to work on, well documented and easy to contribute
> >> to before building the next Netscape browser. Otherwise we could save
> >> ourselves this exercise and switch to Rust directly.
> >>
> >> Pedro.
> >>
> >>
> >>
> >> On Mon, Apr 8, 2019 at 9:42 AM Tianqi Chen 
> >> wrote:
> >> >
> >> > Just to clarify. I am not questioning the usefulness of the separation.
> >> > Just want to highlight the technical challenges here based on our past
> >> > experiences.
> >> >
> >> > Crossing DLL boundaries in C++ can create quite a lot of problems,
> >> > especially some of the dependencies used a different version of the
> >> > compiler, follows static packaging or simply because of the dynamic
> >> linking
> >> > difference in windows. These problems could make this direction move
> >> less
> >> > appealing compared to focusing effort on other things.
> >> >
> >> > Technically, as a first step, it is possible to make dependencies change
> >> > not change the global header files and via registration so that changing
> >> > certain component won't trigger a global recompile in CMake. This is
> >> also a
> >> > required step toward some modularity.
> >> >
> >> > For plugins, solutions that use C ABI can be used for certain plugin
> >> > modules.
> >> >
> >> > Some of the discussion has been tied to what the interface should look
> >> > like. I think we should use different threads for these and puts in more
> >> > thoughts.
> >> >
> >> > Tianqi
> >> >
> >> >
> >> >
> >> > On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
> >> > kellen.sunderl...@gmail.com> wrote:
> >> >
> >> > > I think we can make some incremental progress.  My thoughts were
> >> along the
> >> > > lines of plugins (thinking about what happens with the VLC proje

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

2019-04-11 Thread Pedro Larroy
Thanks Marco for raising this issue. I think we can certainly do some
improvements in modularization and build. At the same time Tianqi's
point of view is important to consider and on point. I see a high risk
of overengineering in such endeavor.

I also see increased complexity, difficulty debugging, C++ ABI
headaches, API compatibility, crashes inside a binary module, etc.
which I don't want to deal with as a developer or even as an MXNet
user. Does somebody have answers to these problems?

If somebody thinks they have a good solution, by all means propose a
design in the wiki, I think we are all open. Personally I see several
other lower hanging fruits which need our attention:
 * Simplifying our build logic,
 * Cuda selection in CMake,
 * Reducing the abuse of inlined code moving more logic to
implementation files and improve scoping which will also speed up
compilation, (some units take more than 5 minutes to build and lots of
RAM in a top of the line CPU core)
 * Reduce runtime of some unit tests
And other  improvements in our codebase that would bring immediate
benefits without the risks of overengineering of a plugin system. I
also question our bandwidth for such an endeavor.
 * Improve MXNet startup time.
 * Thread safety

I would say, let's apply the KISS principle, let's make the project
fast to build, easy to work on, well documented and easy to contribute
to before building the next Netscape browser. Otherwise we could save
ourselves this exercise and switch to Rust directly.

Pedro.



On Mon, Apr 8, 2019 at 9:42 AM Tianqi Chen  wrote:
>
> Just to clarify. I am not questioning the usefulness of the separation.
> Just want to highlight the technical challenges here based on our past
> experiences.
>
> Crossing DLL boundaries in C++ can create quite a lot of problems,
> especially some of the dependencies used a different version of the
> compiler, follows static packaging or simply because of the dynamic linking
> difference in windows. These problems could make this direction move less
> appealing compared to focusing effort on other things.
>
> Technically, as a first step, it is possible to make dependencies change
> not change the global header files and via registration so that changing
> certain component won't trigger a global recompile in CMake. This is also a
> required step toward some modularity.
>
> For plugins, solutions that use C ABI can be used for certain plugin
> modules.
>
> Some of the discussion has been tied to what the interface should look
> like. I think we should use different threads for these and puts in more
> thoughts.
>
> Tianqi
>
>
>
> On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
>
> > I think we can make some incremental progress.  My thoughts were along the
> > lines of plugins (thinking about what happens with the VLC project).  At
> > process launch time we could gather some information about our execution
> > environment (either through configuration, or by convention looking at our
> > folder structure and libraries available).  We could then later load the
> > components we need after understanding if we're using a CUDA backend and
> > what operators or subgraph components we would need.  Advantages would be
> > that we would move a lot of the current conditional compile logic to
> > runtime, and automate a lot of it.  It would also make packaging binaries
> > for targeted environments a little easier.  As an example we could compile
> > once, then remove CUDA focused libraries for systems that are going to run
> > on CPUs.
> >
> > On Sun, Apr 7, 2019 at 2:45 PM Tianqi Chen 
> > wrote:
> >
> > > While I personally like the idea. This can be something that is fairly
> > > technical challenging and I would caution against this idea vs pushing
> > for
> > > good features and just allow runtime configuration.
> > >
> > > The main problem here is due to the C++ ABI. There is no standard c++ ABI
> > > across compilers, which means resorting to runtime DLL and dynamic
> > loading
> > > brings all sorts of technical problems, especially when multiple modules
> > > depend on the same third dependency(CUDA runtime).
> > > There is no good to go solution can be made here, especially given the
> > > explosion of the backend variants and dependencies in C++.
> > > A partial solution could be achieved, through the sole use of C ABI.
> > > Combing this with code generation can result in some simplifications and
> > > enable some runtime loadable module. TVM does this, and perhaps MXNet
> > could
> > > reuse some of that component for operator libraries. Similarly, having a
> > > customizable operator library that is loadable via C ABI might be
> > possible.
> > >
> > > So to summarize, while I really like the idea of dynamically loadable
> > > modules. My past experience suggests that this will bring a lot of
> > > additional engineering burden and technical debts without significant
> > > benefit. I would suggest starting by 

Re: assimilation of mshadow into the MXNet codebase

2019-04-08 Thread Pedro Larroy
There's a flag MSHADOW_STAND_ALONE which supports gemm but not all of
them, and looks like an untested codepath. From what I have seen I
don't think we use this from MXNet, hence the need for a BLAS
implementation.


On Sun, Apr 7, 2019 at 6:16 PM Zhao, Patric  wrote:
>
> Agree.
>
> Recently, we (Tao, Shufan, Pengxin) are trying to integrate the Intel MKL 
> math functions into mshadow and MXNet.
> We have to go through two repos and make lots of tradeoff between them.
> If we can move mshadow into MXNet, it will be more flexible to redesign and 
> refactor parts of legacy code.
>
> > -Original Message-
> > From: Sheng Zha [mailto:zhash...@apache.org]
> > Sent: Monday, April 8, 2019 5:48 AM
> > To: d...@mxnet.apache.org
> > Subject: Re: assimilation of mshadow into the MXNet codebase
> >
> > mshadow depends on *a* BLAS library, and there's nothing inherent in
> > mshadow code base that requires OpenBLAS over MKL. The linked issue
> > #11769 seems to be more of a build logic issue.
> >
> > -sz
> >
> > On 2019/04/07 18:56:43, Aaron Markham 
> > wrote:
> > > +1
> > > Reduced complexity. Choice of math library... Hopefully you can just
> > > install MKL and not be forced into mshadow's dependency on OpenBLAS.
> > > This could make Windows setup easier.
> > > Maybe this issue will get fixed: #11769.
> > >
> > > On Sun, Apr 7, 2019, 00:51 Junru Shao  wrote:
> > >
> > > > Does merging mshadow into mxnet bring any actual benefit for
> > > > customers in sense of performance, portability, or anything else?
> > > >
> > > > On Fri, Apr 5, 2019 at 9:38 PM Tianqi Chen
> > > > 
> > > > wrote:
> > > >
> > > > > Technically, mshadow is sufficient for MXNet. Adopting other
> > > > > libraries ( eigen or xtensor) will unnecessarily increase the
> > > > > codebase complexity without any additional gains.
> > > > >
> > > > > Given that mshadow is only used by mxnet. I do support donating it
> > > > > into mxnet codebase.
> > > > > To respect the original mshadow community. I would recommend
> > > > > starting a community RFC In the mshadow github issue for a week,
> > > > > before we start the migrating process.
> > > > > Also, I would recommend a rebase merge just like the case of
> > > > > MXNet.jl
> > > > code
> > > > > base to preserve the contribution history.
> > > > >
> > > > > Tianqi
> > > > >
> > > > >
> > > > > On Fri, Apr 5, 2019 at 9:25 PM Alfredo Luque
> > > > >  wrote:
> > > > >
> > > > > > Do you have a link to both of these proposals?
> > > > > >
> > > > > > On Fri, Apr 5, 2019 at 20:14 Anirudh Acharya
> > > > > > 
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Pedro,
> > > > > > >
> > > > > > > mshadow is mostly used for tensor arithmetic. There have been
> > > > > discussions
> > > > > > > about including it within mxnet. I think it is a good idea.
> > > > > > >
> > > > > > > As a more long term solution using libraries like eigen to
> > > > > > > perform
> > > > > linear
> > > > > > > algebra operations was also suggested by anirudh2290@. I think
> > > > > xtensor(
> > > > > > > https://github.com/QuantStack/xtensor ) can also be a
> > > > > > > candidate
> > > > here.
> > > > > > >
> > > > > > > -
> > > > > > > Anirudh
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Apr 5, 2019 at 7:03 PM Pedro Larroy <
> > > > > > pedro.larroy.li...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi
> > > > > > > >
> > > > > > > > Some developers have noticed that working in mshadow is
> > > > > > > > cumbersome
> > > > as
> > > > > > > > it's a 3rdparty subrepo.
> > > > > > > >
> > > > > > > > Since mshadow is a bunch of headers which don't have much of
> > > > > > > > independent tests / library functionality, me and other
> > > > > > > > developers believe that it would be good to assimilate this
> > > > > > > > code in the repository for ease of contribution and changes
> > > > > > > > without having to
> > > > go
> > > > > > > > trough contortions to test PRs that modify mshadow.
> > > > > > > >
> > > > > > > > Would anybody oppose this change?
> > > > > > > >
> > > > > > > > Thanks and have a nice weekend.
> > > > > > > >
> > > > > > > > Pedro.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >


assimilation of mshadow into the MXNet codebase

2019-04-05 Thread Pedro Larroy
Hi

Some developers have noticed that working in mshadow is cumbersome as
it's a 3rdparty subrepo.

Since mshadow is a bunch of headers which don't have much of
independent tests / library functionality, me and other developers
believe that it would be good to assimilate this code in the
repository for ease of contribution and changes without having to go
trough contortions to test PRs that modify mshadow.

Would anybody oppose this change?

Thanks and have a nice weekend.

Pedro.


Re: Discussing plans for next MXNet releases

2019-04-02 Thread Pedro Larroy
Great initiative.

I would like to add the issue that tracks APIs that we would like to
break for 2.0 so we can take the chance to streamline and improve
customer facing code:

https://github.com/apache/incubator-mxnet/issues/9686

I would be happy to volunteer for 2.0 release with assistance from a comitter.

Pedro.

On Tue, Apr 2, 2019 at 5:06 PM Hagay Lupesko  wrote:
>
> Dear MXNet community,
>
> I wanted to initiate a discussion about the plan and scope for the next
> MXNet releases.
> I suggest we focus on three releases, and get the process going in parallel:
> (1) 1.4.1 - patch release on top of 1.4.0 to address some perf regressions
> and memory leaks I am aware of, such as the memory leak fixed on Scala [0
> ]. I went ahead and
> created a draft release proposal wiki [1
> 
> ].
> (2) 1.5.0 - a minor release to add new features introduced since 1.4.0
> release started (back in Nov 2018!), such as various performance
> improvements: aggregate SGD, in-place updates in optimizers, gpu support
> for image processing operators and many more features useful for MXNet’s
> users.
> (3) 2.0 - an exciting major release that will include major enhancements to
> MXNet.
>
> Timeframes will probably vary based on the scope. I think we should plan to
> start 1.4.1 release within a couple of weeks, 1.5.0 should target starting
> once we release 1.4.1, and 2.0 timeline is TBD - but such a major release
> will require more time to discuss and decide in the community.
>
> I was thinking to get started through:
> (1) Draft proposals on CWiki, where the community can add content and
> propose scope and features.
> (2) Setup online meetings, where anyone can dial into, from anywhere, where
> we will have a chance to discuss in voice+video.
> (3) With (1)+(2) have a scope and timeline that the community, in large,
> supports.
>
> Would be great to get the community's feedback and suggestions, and please
> reply if you would like to be involved in the effort of supporting the
> releases!
>
> MXNet is awesome, looking forward to working together to make it even
> better!
> Hagay
>
> [0] https://github.com/apache/incubator-mxnet/pull/14586
> [1]
> https://cwiki.apache.org/confluence/display/MXNET/%5BDRAFT+PROPOSAL%5D+Apache+MXNet+%28incubating%29+1.4.1+Release+Plan+and+Status


Developer tools for MXNet

2019-03-28 Thread Pedro Larroy
Hi developers!

We did a session on working with developer tools for debugging and
extending the MXNet engine in CLion and posted the screencast in Vimeo
in case you are interested.

https://vimeo.com/326697272

I also added it to the wiki.

Pedro.


Re: [DISCUSS] Rebrand Gluon to MXNet imperative or something MXNet.

2019-03-28 Thread Pedro Larroy
Isabel:

Gluon (base) is mainly here:

https://github.com/apache/incubator-mxnet/tree/master/python/mxnet/gluon

Other packages that were mentioned are:

https://github.com/dmlc/gluon-nlp
https://gluon-cv.mxnet.io/
https://github.com/dmlc/gluon-cv


Pedro.

On Sat, Mar 23, 2019 at 2:20 AM Isabel Drost-Fromm  wrote:
>
> I'm a bit confused. Can you point me to the source files that make up Gluon? 
> Are they part of Apache MxNet?
>
> Isabel
>
>
> Am 23. März 2019 00:01:56 MEZ schrieb Pedro Larroy 
> :
> >Hi dev@
> >
> >We heard feedback from users that the Gluon name is confusing. Some of
> >them don't even know it's MXNet and it's unclear the relationship with
> >MXNet
> >
> >Would it make sense to rebrand Gluon to just MXNet or MXNet
> >imperative? Diluting brands and names is never a good idea.
> >
> >There's also gluonhq which is related to JavaFX which adds to the
> >confusion, search engine friendliness is not high as well.
> >
> >Pedro.
>
> --
> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.


UB, type narrowing and integer overflows (in this case in mshadow)

2019-03-28 Thread Pedro Larroy
Hi

While looking at the failures reporting in this issue:
https://github.com/apache/incubator-mxnet/issues/14522

I have noticed that in mshadow when calling the BLAS Engine we are
doing narrowing integer conversions from index_t (int64_t) to int, and
then operations on dimensions that can overflow integer arithmetic
such as i * m *k as seen in the second link below. Which when added to
the pointer holding the matrix data results in a) undefined behaviour,
and b) in x86_64 a subtraction instead of an addition due to platform
dependent integer overflow semantics in x86 platforms.

I think we should address this in a twofold manner: checking the
shapes for possible overflows in the implementation (which will have
some performance impact), and second we should widen the types of
BLASEngine to index_t.

https://github.com/dmlc/mshadow/blob/master/mshadow/tensor_cpu-inl.h#L613

Which in CPU ends up calling batched_gemm:

https://github.com/dmlc/mshadow/blob/master/mshadow/dot_engine-inl.h#L339

Let me know if you have additional thoughts in this solution or see
any blockers or better ideas, otherwise I will proceed to work on PRs
fixing this in mshadow. Since it's an issue that seem to happen often
I think we should be really careful with integer overflows and
undefined behaviour related bugs and pay attention in CRs to this kind
of traps.

Thanks!

Pedro.


Re: [DISCUSS] Rebrand Gluon to MXNet imperative or something MXNet.

2019-03-22 Thread Pedro Larroy
+1 to MXNet Gluon given the feedbacks and explanations from everyone so far.

On Fri, Mar 22, 2019 at 5:09 PM Junru Shao  wrote:
>
> I feel like MXNet Gluon is a good name. You don't lose customers who have
> been familiar with MXNet, nor lose customers who are used to MXNet symbolic.
>
> On Fri, Mar 22, 2019 at 5:07 PM Davydenko, Denis <
> dzianis.davydze...@gmail.com> wrote:
>
> > As subject suggests this is a proposal for re-branding of Gluon to align
> > it with MXNet. One of the common things undertaken for re-branding
> > exercises is renaming. That's what my thinking behind suggesting new name
> > for Gluon. I am sincerely curious what would be alternatives to rebrand
> > Gluon to align it with MXNet without changing its name.
> >
> >
> > On 3/22/19, 4:57 PM, "Mu Li"  wrote:
> >
> > Are you proposing to rename Gluon? I think Pedro's opinion is about a
> > better way to communicate what's Gluon and how it's related to MXNet.
> >
> > On Fri, Mar 22, 2019 at 4:54 PM Davydenko, Denis
> > 
> > wrote:
> >
> > > I support idea of putting brands of MXNet and Gluon closer together.
> > I
> > > agree with your argument, Mu, but MXNet is quite far away from TF
> > place at
> > > this time so I don’t know how well that argument is transferable
> > from TF
> > > position to MXNet position.
> > >
> > > MXNet Imperative is definitely too restrictive of a name, we can
> > come up
> > > with better one... MXNet-M for example, stands for MXNet-Modified
> > (military
> > > connotation). If naming is the only thing we need to figure out -
> > that is a
> > > good place to be in __
> > >
> > > --
> > > Thanks,
> > > Denis
> > >
> > > On 3/22/19, 4:48 PM, "Mu Li"  wrote:
> > >
> > > Gluon is about imperative neural network training and data
> > loading.
> > > ndarray
> > > is another large imperative module. Besides, Gluon also supports
> > > symbolic
> > > execution after hybridizing.  mxnet imperative might not be a
> > good
> > > name for
> > > it. Another choice is high-level API, that's how TF talks about
> > Keras.
> > >
> > > On Fri, Mar 22, 2019 at 4:38 PM Yuan Tang <
> > terrytangy...@gmail.com>
> > > wrote:
> > >
> > > > +1
> > > >
> > > > On Fri, Mar 22, 2019 at 7:29 PM Lin Yuan 
> > > wrote:
> > > >
> > > > > +1.
> > > > >
> > > > > Just to give some of my real experience:
> > > > > 1) I advertised a recent GluonNLP blog and many responses are
> > > "This seems
> > > > > nice. So is Gluon a new library to replace MXNet?"
> > > > > 2) We visited customers in a unicorn company who showed
> > interests
> > > in
> > > > MXNet
> > > > > but none of the engineers knew the relationship between
> > > GluonNLP/GluonCV
> > > > > and MXNet
> > > > > 3) When integrating MXNet to Horovod and adding examples, I
> > > received
> > > > > comments like "What is Gluon? Is it a new library in
> > addition to
> > > MXNet?"
> > > > >
> > > > > Everyone is talking about PyTorch nowadays, but not Caffe2
> > anymore
> > > > although
> > > > > the latter is still serving as a backend component. Maybe we
> > > should also
> > > > > doubledown on one brand?
> > > > >
> > > > > Lin
> > > > >
> > > > > On Fri, Mar 22, 2019 at 4:02 PM Pedro Larroy <
> > > > pedro.larroy.li...@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi dev@
> > > > > >
> > > > > > We heard feedback from users that the Gluon name is
> > confusing.
> > > Some of
> > > > > > them don't even know it's MXNet and it's unclear the
> > > relationship with
> > > > > > MXNet
> > > > > >
> > > > > > Would it make sense to rebrand Gluon to just MXNet or MXNet
> > > > > > imperative? Diluting brands and names is never a good idea.
> > > > > >
> > > > > > There's also gluonhq which is related to JavaFX which adds
> > to the
> > > > > > confusion, search engine friendliness is not high as well.
> > > > > >
> > > > > > Pedro.
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> >
> >
> >
> >


[DISCUSS] Rebrand Gluon to MXNet imperative or something MXNet.

2019-03-22 Thread Pedro Larroy
Hi dev@

We heard feedback from users that the Gluon name is confusing. Some of
them don't even know it's MXNet and it's unclear the relationship with
MXNet

Would it make sense to rebrand Gluon to just MXNet or MXNet
imperative? Diluting brands and names is never a good idea.

There's also gluonhq which is related to JavaFX which adds to the
confusion, search engine friendliness is not high as well.

Pedro.


Re: [Announcement] New Committer - Nicolas Modrzyk

2019-03-21 Thread Pedro Larroy
welcome

On Fri, Feb 15, 2019 at 8:15 PM Marco de Abreu  wrote:
>
> Welcome!
>
> Am Fr., 15. Feb. 2019, 20:23 hat sandeep krishnamurthy <
> sandeep.krishn...@gmail.com> geschrieben:
>
> > Welcome Nicolas. Thank you for all your contributions to the community.
> >
> > On Fri, Feb 15, 2019 at 9:53 AM Lin Yuan  wrote:
> >
> > > Welcome, Nicolas! Good to have you on board.
> > >
> > > Lin
> > >
> > > On Fri, Feb 15, 2019 at 8:03 AM Carin Meier 
> > wrote:
> > >
> > > > Please join me in welcoming Nicolas Modrzyk, (@hellonico), as a new
> > > > committer.
> > > >
> > > > He has made valuable contributions to the Clojure package, especially
> > in
> > > > the areas of stability with integration tests and visualizations [1].
> > > >
> > > > We are excited to have him with us as a committer and look forward to
> > > > future growth of the MXNet Clojure package and community.
> > > >
> > > > - Carin
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > >
> > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93=is%3Apr+hellonico+
> > > >
> > >
> >
> >
> > --
> > Sandeep Krishnamurthy
> >


Re: [Announcement] New Committer - Patric Zhao

2019-03-21 Thread Pedro Larroy
Congratulations

On Thu, Mar 21, 2019 at 12:04 PM Jake Lee  wrote:
>
> Congrats, Patric!
>
> Jake
>
> On Thu, Mar 21, 2019 at 12:03 PM Lin Yuan  wrote:
>
> > Congrats, Patric!
> >
> > On Thu, Mar 21, 2019 at 10:32 AM Yuxi Hu  wrote:
> >
> > > Congrats, Patric! Well deserved!
> > >
> > > On Wed, Mar 20, 2019 at 1:08 PM kellen sunderland <
> > > kellen.sunderl...@gmail.com> wrote:
> > >
> > > > Congrats Patric!
> > > >
> > > > On Sun, Mar 17, 2019 at 10:34 PM Hagay Lupesko 
> > > wrote:
> > > >
> > > > > Congrats Patric!
> > > > >
> > > > > On Fri, Mar 15, 2019 at 7:49 AM Joshua Z. Zhang <
> > cheungc...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >  Congrats Patrick!
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >  Zhi
> > > > > >
> > > > > > >
> > > > > > > On Mar 15, 2019 at 10:46 PM,   > > > > > marco.g.ab...@gmail.com)>  wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >  Congratulations, great to have you on board!
> > > > > > >
> > > > > > > -Marco
> > > > > > >
> > > > > > > Lv, Tao Aschrieb am Fr., 15. März 2019,
> > > > 15:38:
> > > > > > >
> > > > > > > >  Wow, congratulation Patric!
> > > > > > > >
> > > > > > > >  -Original Message-
> > > > > > > >  From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> > > > > > > >  Sent: Friday, March 15, 2019 10:25 PM
> > > > > > > >  To: dev@mxnet.incubator.apache.org
> > > > > > > >  Cc: patric zhao  
> > > > > > > >  Subject: Re: [Announcement] New Committer - Patric Zhao
> > > > > > > >
> > > > > > > >  Congratulation Patrick!
> > > > > > > >  Steffen
> > > > > > > >
> > > > > > > >  On Fri, Mar 15, 2019 at 5:38 AM Zhao, Patric  <
> > > > > patric.z...@intel.com>
> > > > > >
> > > > > > > >  wrote:
> > > > > > > >
> > > > > > > >   >  I am very glad to have this opportunity to contribute to
> > the
> > > > > > > >   >  Apache/MXNet community :)
> > > > > > > >   >
> > > > > > > >   >  Thanks all of the supports from the community and Intel.
> > > > > > > >   >
> > > > > > > >   >  BR,
> > > > > > > >   >
> > > > > > > >   >  --Patric
> > > > > > > >   >
> > > > > > > >   >
> > > > > > > >   >   >  -Original Message-
> > > > > > > >   >   >  From: MiraiWK WKCN [mailto:w...@live.cn]
> > > > > > > >   >   >  Sent: Friday, March 15, 2019 12:52 AM
> > > > > > > >   >   >  To: dev@mxnet.incubator.apache.org; patric zhao
> > > > > > > >   >   >   
> > > > > > > >   >   >  Subject: Re: [Announcement] New Committer - Patric
> > Zhao
> > > > > > > >   >   >
> > > > > > > >   >   >  Welcome Peng Zhao!
> > > > > > > >   >   >  Peng is the AI Tech Leader in Intel Corporation. We
> > have
> > > > > > good
> > > > > > > >   >   >  cooperation before. He is very professional and
> > > > contribute a
> > > > > > lot to
> > > > > > > >   >   >  MXNet,
> > > > > > > >   >  especially deep
> > > > > > > >   >   >  learning boost on CPU.
> > > > > > > >   >   >
> > > > > > > >   >   >  
> > > > > > > >   >   >  From: Anirudh Subramanian  
> > > > > > > >   >   >  Sent: Thursday, March 14, 2019 3:54:50 PM
> > > > > > > >   >   >  To: dev@mxnet.incubator.apache.org; patric zhao
> > > > > > > >   >   >  Subject: [Announcement] New Committer - Patric Zhao
> > > > > > > >   >   >
> > > > > > > >   >   >  Hi all,
> > > > > > > >   >   >
> > > > > > > >   >   >  Please join me to welcome Patric Zhao as a new
> > committer
> > > > of
> > > > > > Apache
> > > > > > > >   >   >  (incubating) MXNet!
> > > > > > > >   >   >
> > > > > > > >   >   >  Patric has put in great effort around MKLDNN
> > integration
> > > > > into
> > > > > > MXNet
> > > > > > > >   >   >  and
> > > > > > > >   >  has
> > > > > > > >   >   >  been involved in features like quantization, graph
> > > fusion
> > > > > and
> > > > > > fused
> > > > > > > >   >   >  RNN operators for CPU.
> > > > > > > >   >   >
> > > > > > > >   >   >  Dev List activity:
> > > > > > > >   >   >
> > > > > > > >   >
> > > > > >
> > > https://lists.apache.org/list.html?d...@mxnet.apache.org:lte=3y:patric.
> > > > > > > >   >  zhao
> > > > > > > >   >   >
> > > > > > > >   >   >  Issues:
> > > > > > > >   >   >  https://github.com/apache/incubator-
> > > > > > > >   >   >
> > > > > > mxnet/issues?utf8=%E2%9C%93=is%3Aissue+involves%3Apengzhao-intel+
> > > > > > > >   >   >
> > > > > > > >   >   >  PR Reviews:
> > > > > > > >   >   >  https://github.com/apache/incubator-
> > > > > > > >   >   >
> > > > > > mxnet/pulls?utf8=%E2%9C%93=is%3Apr+reviewed-by%3Apengzhao-intel
> > > > > > > >   >   >
> > > > > > > >   >   >  Proposals involved in:
> > > > > > > >   >   >
> > > > > >
> > https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimi
> > > > > > > >   >   >  z
> > > > > > > >   >   >  ation+and+Quantization+based+on+subgraph+and+MKL-DNN
> > > > > > > >   >   >
> > > > > >
> > https://cwiki.apache.org/confluence/display/MXNET/Fused+RNN+Operator
> > > > > > > >   >   >  s
> > > > > > > >   >   >  

Re: Rust Client Lib

2019-02-19 Thread Pedro Larroy
and while they 
> > > > > noted
> > > > > Rust is a great choice for lots of reasons, the learning curve of the
> > > > > language is too steep... It seems like Rust isn't going to get much 
> > > > > love
> > > > > from the ML community in the places that matter.
> > > > >
> > > > > I also see that as of writing this, the Rust crate for Tensorflow has
> > > > only
> > > > > ~10,000 lifetime downloads, which is pretty low considering how much
> > > > effort
> > > > > the client library required. So the existing set of practitioners in 
> > > > > the
> > > > > language is very small, and it's unlikely to grow.
> > > > >
> > > > > Also, the benefits of Rust memory safety and ownership won't really be
> > > > > realized via a client library that uses FFI on a C API.
> > > > >
> > > > > I'm not going to move forward with this client lib. I'll check back 
> > > > > here
> > > > in
> > > > > the future and see if there's any activity... In the meantime, if 
> > > > > someone
> > > > > stumbles across this in the future and wants to pick it up, don't let 
> > > > > me
> > > > > stand in the way!
> > > > >
> > > > > - Zach
> > > > >
> > > > >
> > > > > On Wed, Jan 30, 2019 at 11:16 PM Zach Boldyga 
> > > > wrote:
> > > > >
> > > > > > Rad, thanks for the input everyone!
> > > > > >
> > > > > > I'm anticipating some friction with using FFI with the C API since 
> > > > > > it's
> > > > > > considered unsafe in Rust; difficulty of integrating will depend on 
> > > > > > the
> > > > > > nuances of the C API as HY mentioned...
> > > > > >
> > > > > > Going to go ahead and dive in. Will be back eventually for feedback 
> > > > > > /
> > > > > > input!
> > > > > >
> > > > > > Zach Boldyga
> > > > > > Scalabull  |  Founder
> > > > > > 1 (866) 846-8771 x 101
> > > > > >
> > > > > >
> > > > > > On Wed, Jan 30, 2019 at 12:02 AM HY Chen 
> > > > wrote:
> > > > > >
> > > > > >> I have tried to create a a module via existing rust FFI generators 
> > > > > >> but
> > > > > >> failed. It seems like you have to think a lot more than just
> > > > translate the
> > > > > >> C api to make it work. It's better understand the C API first and 
> > > > > >> make
> > > > > >> sure
> > > > > >> it won't introduce new problems in rust.
> > > > > >>
> > > > > >> HY
> > > > > >>
> > > > > >> Pedro Larroy  于2019年1月30日周三 上午4:35写道:
> > > > > >>
> > > > > >> > I have been thinking about this and I find really exciting to 
> > > > > >> > have
> > > > > >> > Rust bindings and bring a powerful framework like MXNet to the 
> > > > > >> > Rust
> > > > > >> > community and to native applications in a convenient Rust crate. 
> > > > > >> > I
> > > > > >> > would love to see this happen. I think basically MXNet needs to 
> > > > > >> > be
> > > > > >> > wrapped in a Rust crate via FFI / C Bindings.
> > > > > >> >
> > > > > >> > Pedro.
> > > > > >> >
> > > > > >> > On Tue, Jan 29, 2019 at 11:05 AM Zach Boldyga 
> > > > > >> > 
> > > > > >> wrote:
> > > > > >> > >
> > > > > >> > > Hey y'all!
> > > > > >> > >
> > > > > >> > > I'm thinking about spending this week working on a rust client
> > > > lib for
> > > > > >> > > MXNet. saw a little bit of chatter about this in the github 
> > > > > >> > > issues
> > > > > >> and no
> > > > > >> > > strong existing crates at the moment. Any pointers on 
> > > > > >> > > approaching
> > > > this
> > > > > >> > in a
> > > > > >> > > way that will lead to it being adopted as an officially 
> > > > > >> > > supported
> > > > > >> client
> > > > > >> > > library? And overall yay/nay on whether adding a Rust lib makes
> > > > sense
> > > > > >> &
> > > > > >> > why
> > > > > >> > > / why not?
> > > > > >> > >
> > > > > >> > > Zach Boldyga
> > > > > >> > > Scalabull  |  Founder
> > > > > >> > > 1 (866) 846-8771 x 101
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >


Re: [Announce] Runtime feature detection

2019-02-12 Thread Pedro Larroy
An update on this topic, Sheng just merged the refinements to the
feature detection so it's now a single API call. (
https://github.com/apache/incubator-mxnet/pull/13964 ). Thank you
Sheng for the reviews.

Please use this functionality to check for capabilities of MXNet at
runtime such as Cuda, OpenCV etc. This can simplify tests and
automation in several places in the code.

Lin Iblis is already preparing Julia support:
https://github.com/apache/incubator-mxnet/pull/13992

This is a PR that adds documentation on the feature and explains how
to use it from Python:
https://github.com/apache/incubator-mxnet/pull/14130

Thanks.

On Fri, Jan 25, 2019 at 7:08 PM Sheng Zha  wrote:
>
> Hi Pedro,
>
> Happy to help, though I was waiting for PR comments to be addressed. 
> Currently the PR is close to complete, with some open comments to be resolved.
>
> -sz
>
> > On Jan 25, 2019, at 9:27 AM, Pedro Larroy  
> > wrote:
> >
> > That's Great! There's a PR that we should merge first which
> > internalizes the enum inside the library as per Sheng's suggestion.
> >
> > https://github.com/apache/incubator-mxnet/pull/13964
> >
> > @Sheng could we merge the PR? so we can build on top of this feature?
> > It's badly needed for tests suites etc.
> > Thanks a lot!
> >
> > Pedro.
> >
> >
> >> On Fri, Jan 25, 2019 at 2:22 PM Iblis Lin  wrote:
> >>
> >> Hi,
> >>
> >> I added the Julia binding for it.
> >> PR is here:
> >> https://github.com/apache/incubator-mxnet/pull/13992
> >>
> >> Iblis Lin
> >> 林峻頤
> >>
> >>> On 1/23/19 12:39 AM, Pedro Larroy wrote:
> >>> Hi
> >>>
> >>> I'm pleased to announce that runtime feature detection has been merged
> >>> in master, thanks to Aaron for the merge and the many reviewers who
> >>> gave feedback on the PR.  (
> >>> https://github.com/apache/incubator-mxnet/pull/13549 )
> >>>
> >>> As the functionality matures and is exposed through other bindings,
> >>> please feel free to try and use it to build on it, for example for
> >>> easier test suite selection depending on what's compiled in the
> >>> engine.
> >>>
> >>> Usage examples:
> >>>
> >>> $ ipython
> >>> In [4]: import mxnet.mxfeatures
> >>>
> >>> In [5]: mxnet.mxfeatures.features_enabled()
> >>> Out[5]:
> >>> [,
> >>> ,
> >>> ,
> >>> ,
> >>> ,
> >>> ,
> >>> ,
> >>> ,
> >>> ,
> >>> ,
> >>> ]
> >>>
> >>> In [6]: mxnet.mxfeatures.features_enabled_str()
> >>> Out[6]: 'CPU_SSE, CPU_SSE2, CPU_SSE3, CPU_SSE4_1, CPU_SSE4_2, CPU_AVX,
> >>> F16C, BLAS_OPEN, LAPACK, SIGNAL_HANDLER, DEBUG'
> >>>
> >>> see also: help(mxnet.mxfeatures)
> >>>
> >>> Regards.
> >>>


Re: [Announcement] New Committer -- Steffen Rochel

2019-02-06 Thread Pedro Larroy
Congrats Steffen.

On Tue, Feb 5, 2019 at 7:48 PM Yuan Tang  wrote:
>
> Welcome!
>
> On Tue, Feb 5, 2019 at 1:46 PM Gavin M. Bell 
> wrote:
>
> > Great news!
> >
> >
> > On Tue, Feb 5, 2019 at 6:16 PM Lin Yuan  wrote:
> >
> > > Welcome Steffen!
> > >
> > > Lin
> > >
> > > On Mon, Feb 4, 2019 at 7:53 PM kellen sunderland <
> > > kellen.sunderl...@gmail.com> wrote:
> > >
> > > > Great news.  Congrats Steffen.
> > > >
> > > > On Mon, Feb 4, 2019, 5:29 PM Thomas DELTEIL  > > > wrote:
> > > >
> > > > > Welcome Steffen!
> > > > >
> > > > > On Mon, Feb 4, 2019, 15:55 Marco de Abreu  > > > wrote:
> > > > >
> > > > > > Welcome!
> > > > > >
> > > > > > Am Di., 5. Feb. 2019, 00:45 hat Chris Olivier <
> > > cjolivie...@apache.org>
> > > > > > geschrieben:
> > > > > >
> > > > > > > Dear Community:
> > > > > > >
> > > > > > > Please join me to welcome Steffen Rochel (
> > steffenroc...@gmail.com)
> > > > as
> > > > > a
> > > > > > > new
> > > > > > > committer of Apache (incubating) MXNet!
> > > > > > >
> > > > > > > Steffen has played a role in nearly every MXNet release in the
> > past
> > > > 18
> > > > > > > months, managed several of the wiki pages and has contributed in
> > > > > > expanding
> > > > > > > the community by managing and hosting meetups in different parts
> > of
> > > > the
> > > > > > > world.
> > > > > > >
> > > > > > > -Chris
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Sincerely,
> > Gavin M. Bell
> >
> >  "Never mistake a clear view for a short distance."
> >   -Paul Saffo
> >


Re: Rust Client Lib

2019-01-29 Thread Pedro Larroy
I have been thinking about this and I find really exciting to have
Rust bindings and bring a powerful framework like MXNet to the Rust
community and to native applications in a convenient Rust crate. I
would love to see this happen. I think basically MXNet needs to be
wrapped in a Rust crate via FFI / C Bindings.

Pedro.

On Tue, Jan 29, 2019 at 11:05 AM Zach Boldyga  wrote:
>
> Hey y'all!
>
> I'm thinking about spending this week working on a rust client lib for
> MXNet. saw a little bit of chatter about this in the github issues and no
> strong existing crates at the moment. Any pointers on approaching this in a
> way that will lead to it being adopted as an officially supported client
> library? And overall yay/nay on whether adding a Rust lib makes sense & why
> / why not?
>
> Zach Boldyga
> Scalabull  |  Founder
> 1 (866) 846-8771 x 101


Re: [Announce] Runtime feature detection

2019-01-25 Thread Pedro Larroy
That's Great! There's a PR that we should merge first which
internalizes the enum inside the library as per Sheng's suggestion.

https://github.com/apache/incubator-mxnet/pull/13964

@Sheng could we merge the PR? so we can build on top of this feature?
It's badly needed for tests suites etc.
Thanks a lot!

Pedro.


On Fri, Jan 25, 2019 at 2:22 PM Iblis Lin  wrote:
>
> Hi,
>
> I added the Julia binding for it.
> PR is here:
> https://github.com/apache/incubator-mxnet/pull/13992
>
> Iblis Lin
> 林峻頤
>
> On 1/23/19 12:39 AM, Pedro Larroy wrote:
> > Hi
> >
> > I'm pleased to announce that runtime feature detection has been merged
> > in master, thanks to Aaron for the merge and the many reviewers who
> > gave feedback on the PR.  (
> > https://github.com/apache/incubator-mxnet/pull/13549 )
> >
> > As the functionality matures and is exposed through other bindings,
> > please feel free to try and use it to build on it, for example for
> > easier test suite selection depending on what's compiled in the
> > engine.
> >
> > Usage examples:
> >
> > $ ipython
> > In [4]: import mxnet.mxfeatures
> >
> > In [5]: mxnet.mxfeatures.features_enabled()
> > Out[5]:
> > [,
> >  ,
> >  ,
> >  ,
> >  ,
> >  ,
> >  ,
> >  ,
> >  ,
> >  ,
> >  ]
> >
> > In [6]: mxnet.mxfeatures.features_enabled_str()
> > Out[6]: 'CPU_SSE, CPU_SSE2, CPU_SSE3, CPU_SSE4_1, CPU_SSE4_2, CPU_AVX,
> > F16C, BLAS_OPEN, LAPACK, SIGNAL_HANDLER, DEBUG'
> >
> > see also: help(mxnet.mxfeatures)
> >
> > Regards.
> >


Re: [Announce] Runtime feature detection

2019-01-23 Thread Pedro Larroy
I'm still refining the feature given some late feedback and that it
will be public API. I guess with the help of Aaron we will get some
nice documentation in, as it's not showing up in the master python API
docs. I thought it would be taken automatically from the Python doc.

Is this a correct source for the documentation format that we are
using? I know we use sphynx but doesn't look like RST to me:

http://www.sphinx-doc.org/en/master/usage/quickstart.html

Pedro.


[Announce] Runtime feature detection

2019-01-22 Thread Pedro Larroy
Hi

I'm pleased to announce that runtime feature detection has been merged
in master, thanks to Aaron for the merge and the many reviewers who
gave feedback on the PR.  (
https://github.com/apache/incubator-mxnet/pull/13549 )

As the functionality matures and is exposed through other bindings,
please feel free to try and use it to build on it, for example for
easier test suite selection depending on what's compiled in the
engine.

Usage examples:

$ ipython
In [4]: import mxnet.mxfeatures

In [5]: mxnet.mxfeatures.features_enabled()
Out[5]:
[,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ]

In [6]: mxnet.mxfeatures.features_enabled_str()
Out[6]: 'CPU_SSE, CPU_SSE2, CPU_SSE3, CPU_SSE4_1, CPU_SSE4_2, CPU_AVX,
F16C, BLAS_OPEN, LAPACK, SIGNAL_HANDLER, DEBUG'

see also: help(mxnet.mxfeatures)

Regards.


Re: Taxonomy on our cwiki

2019-01-19 Thread Pedro Larroy
+1

On Sat, Jan 19, 2019 at 2:51 PM Zhao, Patric  wrote:
>
> +1, Good idea.
>
> It's not very easy to find out the related contents since lots of folders in 
> the website.
>
>
> > -Original Message-
> > From: Sheng Zha [mailto:zhash...@apache.org]
> > Sent: Saturday, January 19, 2019 3:28 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Taxonomy on our cwiki
> >
> > Hi MXNet,
> >
> > Given that currently cwiki is the only place other than mxnet website for
> > mxnet-related documentation, I'd like to request your attention to the
> > (slightly disorganized) cwiki page of MXNet. The top level folders (and 
> > their
> > contents) currently looks like this:
> > - Design Proposals* (bag of proposals, not in order)
> > - Development* (mixture of guides, roadmaps, processes)
> > - Release Process (release notes)
> > - Website (guides and proposals)
> > - MXNet Clojure (call for contribution, guides)
> > - MXNet Keras Integration (design)
> > - MXNet-ONNX Integration (design, dev status)
> > - MXNet R Package (guide, backlog)
> > - MXNet-Scala (design, dev status, guide)
> > - Content Formatting Templates (not a folder but link to two docs)
> > - How-to articles (1 guide)
> > - Community (guide on apache-related processes)
> > - Data IO (designs)
> > - Continuous Integration (guides, designs)
> > - Meetups and Hangouts (events)
> >
> > And here are two good examples from successful Apache projects:
> > - Apache Flink: an **audience-oriented** structure [1]
> >   Users (Presentations and How-to)
> >   Contributors (Dev processes and How-to)
> >   Committers (Infra, Dev processes, Release processes, Releases)
> >   Roadmaps and Feature Designs (archive)
> > - Apache OpenNLP: a **content-oriented** structure [2]
> >   Guides
> >   External Resources
> >   Proposals
> >   Releasing
> >
> > Clean organization helps content discovery and saves time on locating useful
> > content. Given that we have good amount of content on the wiki page, I
> > suggest that we decide on a cleaner taxonomy, re-organize contents
> > accordingly, and add future contents accordingly. To provide a starting 
> > point
> > for the discussion, I suggest:
> > - Given the state we are in, start with content-oriented organization, use
> > these top-level categories: Guides (including processes and how-tos),
> > Development (including designs, proposals, notes, roadmaps), Community
> > (including events, activities, external resources and contents)
> > - If people strongly prefer audience-oriented structure, later we can adopt 
> > a
> > structure similar to Flink's.
> >
> > Feel free to share your thoughts and preferences here. Thanks.
> >
> > -sz
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Homehttp
> > s://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home
> > [2] https://cwiki.apache.org/confluence/display/OPENNLP/Index


Re: Question about notification on nightly test failures

2019-01-19 Thread Pedro Larroy
Looks fine to me.

On Wed, Jan 16, 2019 at 11:44 PM Carin Meier  wrote:
>
> The Clojure package now has turned the examples into integration tests that
> we would like to have CI run. It takes about 15 min for them to complete.
>
> We have a PR in progress
> https://github.com/apache/incubator-mxnet/pull/13624
>
> I would like to propose that we run these as part of the regular CI until
> the notifications for nightly builds get implemented. We will run on a
> whitelist or have an exclude list so that any problematic tests can be
> disabled if needed.
>
> Please let me know any feedback on this or concerns.
>
> Thanks,
> Carin
>
>
> On Tue, Jan 15, 2019 at 12:24 PM Pedro Larroy 
> wrote:
>
> > Why don't we enable a slack notifier?  I think it would be useful to
> > interact with notifications from slack directly, including the label
> > bot for example.
> >
> > Pedro.
> >
> > On Sun, Jan 13, 2019 at 1:55 AM Carin Meier  wrote:
> > >
> > > Thanks for the explanation Marco :)
> > >
> > > - Carin
> > >
> > > On Sat, Jan 12, 2019 at 7:43 PM Marco de Abreu 
> > > wrote:
> > >
> > > > Hi Carin,
> > > >
> > > > thanks for thinking about adding nightly tests to clojure, I'm sure
> > this
> > > > will be of big benefit!
> > > >
> > > > You're right, the email system is in place but we basically disabled
> > the
> > > > service December 2017 because it was flooding the inboxes of everybody.
> > > >
> > > > We've been thinking about various notification methods, but were always
> > > > afraid making the notifications meaningless if they come too frequently
> > > > (which is WAY better now, thanks to everybody's efforts around
> > stabilizing
> > > > the tests!) because people would filter them.
> > > >
> > > > I like the idea of a specific slack channel for the notifications. Are
> > > > there any alternatives the community could think of?
> > > >
> > > > Best regards,
> > > > Marco
> > > >
> > > >
> > > > Am Sa., 12. Jan. 2019, 10:11 hat Carin Meier 
> > > > geschrieben:
> > > >
> > > > > The Clojure package is thinking of adding some nightly tests and I'd
> > like
> > > > > to understand how the notification works in case of failure.
> > > > >
> > > > > The code in the nightly Jenkins file
> > > > >
> > > > >
> > > >
> > https://github.com/apache/incubator-mxnet/blob/master/tests/nightly/Jenkinsfile#L135
> > > > > seems to indicate that the failure is emailed. But where does this
> > email
> > > > > go?
> > > > > If you are a contributor, do you get notified of this?
> > > > >
> > > > > I don't remember seeing any notification of nightly failures, so I'm
> > > > > wondering how this works and if there are any improvements to
> > > > accessibility
> > > > > that we can make, (like maybe posting it to a slack room)?
> > > > >
> > > > > Thanks,
> > > > > Carin
> > > > >
> > > >
> >


Re: Proposal for a recurrent architecture meeting and long term direction

2019-01-19 Thread Pedro Larroy
Hi Isabel. We talked with Timur about graphcore accelerators, and
seems other people also missed the meeting.

For next meeting I will send an agenda of topics to discuss in advance.

Thanks.

Pedro.

On Thu, Jan 17, 2019 at 2:43 PM Isabel Drost-Fromm  wrote:
>
> On Mon, Jan 14, 2019 at 08:03:42PM +0100, Pedro Larroy wrote:
> > If you wish to join the monthly architecture meeting today, please
> > join the hangout below:
> >
> > https://hangouts.google.com/call/ZXXqJ0ZL5m_dcHOVIeTcAEEE
>
> Likely I've missed the mail - can you point me to the summary of the above
> meeting?
>
>
> Isabel


Re: CI reporting & diagnostics improvements

2019-01-19 Thread Pedro Larroy
Hi Aaron. Looking at the log this could well be due to docker
container being rebuilt:

http://jenkins.mxnet-ci.amazon-ml.com/job/restricted-website-build/986/consoleFull

Trend is around 45 min.
http://jenkins.mxnet-ci.amazon-ml.com/job/restricted-website-build/buildTimeTrend

I think one of the problems with documentation build is that is
rebuilds the project and also runs some degree of unit tests on the
scala package if I'm not mistaken.

I added ccache support to build docs faster. I looked at why sphinx is
rebuilding the project and seems it just runs make on the root folder.

Pedro.

On Thu, Jan 17, 2019 at 6:07 PM Aaron Markham  wrote:
>
> I've been trying to diagnose issues with some of the CI pipelines and one
> blind spot in the reporting is in the stages - we'll have timing info on
> the first couple of parts and maybe the last part, but the lion's share is
> in one big part in the middle. It'll say 45 minutes, or jump to an hour,
> and I can't tell why because there's not enough granularity in the report.
> Is that something we can improve?
> For example, this one jumped to almost 2 hours from a normal 45 minute run:
> http://jenkins.mxnet-ci.amazon-ml.com/job/restricted-website-build/986/timings/
>
> I'd also like to see some data dumps like memory usage and disk space and
> whatever else diagnostics we can get and each stage.
>
> Cheers,
> Aaron


Re: Cherry pick bug fix from master branch to v1.4.x

2019-01-15 Thread Pedro Larroy
I would add as well

https://github.com/apache/incubator-mxnet/pull/13535


On Tue, Jan 15, 2019 at 4:37 PM kellen sunderland
 wrote:
>
> We may want to consider having a new code freeze deadline for RC1.  We
> could allow users to open PRs against the 1.4.x branch up until this
> deadline.
>
> One advantage is we can have a second look at some API changes which we may
> not have got 100% right before we push them out and have to support them.
> This PR I know of could benefit from this
> https://github.com/apache/incubator-mxnet/pull/13697
>
> Other PRs we may want to consider migrating because they fix functional
> issues:
> https://github.com/apache/incubator-mxnet/pull/13188
> https://github.com/apache/incubator-mxnet/pull/13727
> https://github.com/apache/incubator-mxnet/pull/13695
>
> -Kellen
>
> On Tue, Jan 15, 2019 at 12:24 AM Lv, Tao A  wrote:
>
> >
> > Hi community,
> >
> > As 1.4.0 release is still in process, I would like to propose to cherry
> > pick https://github.com/apache/incubator-mxnet/pull/13843  into the
> > v1.4.x branch. It fixed a crash issue of quantized SSD example on master
> > branch which was reported by MXNet user. This issue also exists on the
> > v1.4.x branch. Since quantization is an important feature of MKL-DNN
> > backend in 1.4.0 release, so I think this fix is critical and we should
> > have it in the release.
> >
> > A PR is filed to do that:
> > https://github.com/apache/incubator-mxnet/pull/13882
> >
> > Thank you,
> > -tao
> >


Re: Question about notification on nightly test failures

2019-01-15 Thread Pedro Larroy
Why don't we enable a slack notifier?  I think it would be useful to
interact with notifications from slack directly, including the label
bot for example.

Pedro.

On Sun, Jan 13, 2019 at 1:55 AM Carin Meier  wrote:
>
> Thanks for the explanation Marco :)
>
> - Carin
>
> On Sat, Jan 12, 2019 at 7:43 PM Marco de Abreu 
> wrote:
>
> > Hi Carin,
> >
> > thanks for thinking about adding nightly tests to clojure, I'm sure this
> > will be of big benefit!
> >
> > You're right, the email system is in place but we basically disabled the
> > service December 2017 because it was flooding the inboxes of everybody.
> >
> > We've been thinking about various notification methods, but were always
> > afraid making the notifications meaningless if they come too frequently
> > (which is WAY better now, thanks to everybody's efforts around stabilizing
> > the tests!) because people would filter them.
> >
> > I like the idea of a specific slack channel for the notifications. Are
> > there any alternatives the community could think of?
> >
> > Best regards,
> > Marco
> >
> >
> > Am Sa., 12. Jan. 2019, 10:11 hat Carin Meier 
> > geschrieben:
> >
> > > The Clojure package is thinking of adding some nightly tests and I'd like
> > > to understand how the notification works in case of failure.
> > >
> > > The code in the nightly Jenkins file
> > >
> > >
> > https://github.com/apache/incubator-mxnet/blob/master/tests/nightly/Jenkinsfile#L135
> > > seems to indicate that the failure is emailed. But where does this email
> > > go?
> > > If you are a contributor, do you get notified of this?
> > >
> > > I don't remember seeing any notification of nightly failures, so I'm
> > > wondering how this works and if there are any improvements to
> > accessibility
> > > that we can make, (like maybe posting it to a slack room)?
> > >
> > > Thanks,
> > > Carin
> > >
> >


Re: Proposal for a recurrent architecture meeting and long term direction

2019-01-14 Thread Pedro Larroy
Hi

If you wish to join the monthly architecture meeting today, please
join the hangout below:

https://hangouts.google.com/call/ZXXqJ0ZL5m_dcHOVIeTcAEEE

Regards. Pedro.

On Mon, Dec 17, 2018 at 1:18 AM Pedro Larroy
 wrote:
>
> Hi
>
> I think you make good points. We can address your concerns by sending
> notes to the mailing list and use the wiki / RFCs appropriately so the
> community can follow and participate asynchronously so asynchronous
> participation is still possible.
>
> I would say let's try and see if they bring value, we can alway stop
> if they don't or do it more often if the time is not enough, I think
> once a month is a conservative choice that doesn't take too much time
> from our already busy lives.
>
> Other open source projects like IPFS do these kind of sessions and
> seems to work for them.
>
> It's also an opportunity to share what we are working on and collaborate more.
>
> Pedro.
>
> On Sun, Dec 16, 2018 at 7:04 PM Tianqi Chen  wrote:
> >
> > I feel that online meeting may not address most of the issues get raised in
> > a short amount of time(1h) and still suffers the problem of not being
> > publically archivable.
> >
> > Maybe we can not try more asynchronous way? (RFC discussion in issues.
> > and/or discuss @dev). Just my two cents and I am not blocking the proposal
> >
> > Tianqi
> >
> >
> > On Fri, Dec 14, 2018 at 5:34 AM Pedro Larroy 
> > wrote:
> >
> > > Hi MXNetters
> > >
> > > To address the project growth and increased contributions I'm
> > > proposing a monthly meeting / hangout to have community discussions
> > > about MXNet architecture and mid / longer term technical directions
> > > that require coordination beyond single PRs.
> > >
> > > TOPICS:
> > >
> > > The goal of this series is to address topics including but not limited to:
> > >  - How to best integrate features that have a big impact on the project
> > >  - Discussion about long term technical direction
> > >  - Addressing of technical debt / refactoring needed.
> > >  - Other architectures / framework support. Ex. ARM, Cuda etc.
> > >  - Build system improvements and tooling such as code coverage, static
> > > analysis etc.
> > >  - Performance discussions.
> > >  - Live discussion to address exceptional PRs with complex changes
> > > that are better discussed live than in written form.
> > >
> > > FREQUENCY:
> > >
> > > I propose to make this meeting on the second Monday of the month at
> > > 11am PST /  20pm CET
> > >
> > > So the tentative date for the first one would be on January 14th.
> > >
> > > If you think this arbitrary date is not good, please say so. In this
> > > case we can proceed to make a doodle to find a slot that works for the
> > > interested parties.
> > >
> > > I have opened a group calendar for our meetings, hangouts and other
> > > events related to MXNet.
> > >
> > >
> > > https://calendar.google.com/calendar/embed?src=6co88bqo3n4bjsbt1qrqmsvj4o%40group.calendar.google.com
> > >
> > > Pedro.
> > >


Re: Order of includes in cpplint

2019-01-08 Thread Pedro Larroy
I worked around the case, I saw several cases where project and third
party headers were included with diamond includes instead of quotes.
Agree I think is too much hassle to change the order of includes in
the whole project anyway. But this convention can cause some headers
not to be self sufficient as system headers are being included before
other library headers which might be using a system header like
 without including it themselves.

Pedro.

On Wed, Jan 9, 2019 at 2:12 AM Qin, Zhennan  wrote:
>
> Hi Pedro,
>
> Interesting topic. Google style does have guidance for this:
>
> https://google.github.io/styleguide/cppguide.html#Names_and_Order_of_Includes
>
> According to it, the order is,
>
> dir2/foo2.h.
> A blank line
> C system files.
> C++ system files.
> A blank line
> Other libraries' .h files.
> Your project's .h files.
>
> As MXNet follows this style, I guess we shouldn't break it unless we have 
> some problems. Do you have such a case that need the change?
>
> Thanks,
> Zhennan
>
> -Original Message-
> From: Pedro Larroy [mailto:pedro.larroy.li...@gmail.com]
> Sent: Wednesday, January 9, 2019 6:44 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Order of includes in cpplint
>
> Hi MXNet community
>
> cpplint seems to complain when the order of includes is not  [own, system, 
> other]
>
> But the general best practice in C++ is [own, project, 3rd party, system] for 
> the reasons explained in this stackoverflow answer:  ( 
> https://stackoverflow.com/questions/614302/c-header-order )
>
> A contribution to cpplint could be made to make this configurable:
>
> https://github.com/cpplint/cpplint/blob/master/cpplint.py#L605
>
> Thoughts?
>
> Pedro.


Order of includes in cpplint

2019-01-08 Thread Pedro Larroy
Hi MXNet community

cpplint seems to complain when the order of includes is not  [own,
system, other]

But the general best practice in C++ is [own, project, 3rd party,
system] for the reasons explained in this stackoverflow answer:  (
https://stackoverflow.com/questions/614302/c-header-order )

A contribution to cpplint could be made to make this configurable:

https://github.com/cpplint/cpplint/blob/master/cpplint.py#L605

Thoughts?

Pedro.


Re: Apache MXNet v1.4.0 release status

2018-12-17 Thread Pedro Larroy
Hi Steffen

Added some notes in your PR for the release notes.

In particular, I'm a bit concerned about the status of topology aware
communication, since it has open issues and is not being tested in CI.
(The tests also fail). I think we should anounce it when it's working
properly and it's well tested.

Pedro.

On Sat, Dec 15, 2018 at 11:06 AM Steffen Rochel  wrote:
>
> Dear MXNet community -
> all issues beside one
>  have been
> addressed. I suggest to document the last remaining issue as known problem
> and move forward with the release.
> Please communicate if you have concerns of know about critical issues to be
> addressed before starting vote about releasing 1.4.0 as soon as possible.
> Please also have a look at the release notes
> 
> and provide feedback.
>
> I'm planing to start voting beginning of next week.
> Steffen
>
> On Sat, Dec 8, 2018 at 8:31 PM Steffen Rochel 
> wrote:
>
> > Hi Pedro - this are indeed the draft release notes for v1.4.0. Please add
> > description as you suggested.
> >
> > All - please have a look at the release notes and provide feedback and
> > suggestions..
> > Steffen
> > On Sun, Dec 9, 2018 at 3:30 AM Zhao, Patric  wrote:
> >
> >> Hi Steffen,
> >>
> >> I saw the draft of 1.4 release notes in here (
> >> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> >> ).
> >>
> >> Is this near the final version?  I'd like to add some descriptions of new
> >> quantization features enabled in 1.4.
> >>
> >> Is it OK?
> >>
> >> Thanks,
> >>
> >> --Patric
> >>
> >>
> >> > -Original Message-
> >> > From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> >> > Sent: Saturday, December 8, 2018 1:12 AM
> >> > To: dev@mxnet.incubator.apache.org
> >> > Subject: Apache MXNet v1.4.0 release status
> >> >
> >> > Dear MXNet community -
> >> > I would like to provide update on v1.4.0 status, details are tracked
> >> here
> >> >  >> > ncubating%29+1.4.0+Release+Plan+and+Status>
> >> > .
> >> >
> >> > Thank you very much for everybody effort to resolve the identified
> >> issues.
> >> > We are down to 3 open issues - for details please see
> >> > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28in
> >> > cubating%29+1.
> >> > 4.0
> >> > +Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandSt
> >> > atu
> >> > +Release+Plan+and+s-OpenPRstotrack
> >> >  >> > ncubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.
> >> > 4.0ReleasePlanandStatus-OpenPRstotrack>
> >> > Please help to resolve the remaining issues and integrate to v1.4.x
> >> branch.
> >> > Current estimate to address the identified security vulnerabilities in
> >> the
> >> > Scala/Java package and merge into v1.4.x branch is end of next week
> >> > (December 14th) I will communicate as soon I have more information.
> >> >
> >> > Regards,
> >> > Steffen
> >>
> >


Re: Cambricon MLU support for MXNet.

2018-12-17 Thread Pedro Larroy
Hi Haochong

Welcome to MXNet, It's exciting to have additional hardware platforms
added and supported in the project.

The CI system for MXNet is donated by AWS to the project. We have a
small hardware lab with embedded physical hardware like ARM boards
including NVidia Jetson which we are connecting to the CI system.
(It's a WIP).

However, the bulk of the CI system runs in the AWS Cloud using Jenkins
and EC2 GPU and CPU instances. So even though any of the options you
mention are possible and could work, I think in the order you
mentioned them would be the most preferable. Connecting a remote
server or cloud instance to the MXNet Jenkins would be the easiest
which wouldn't involve hardware shipping and maintenance.

I think once you have the contribution merged and the changes ready to
be tested we can make a plan on how to best integrate with CI. For
that, the recommendation that Hagay gave (Design proposal in the Wiki)
is a good path forward, so other members of the community and the
engineers contributing to the CI system can contribute.

Pedro.

On Mon, Dec 17, 2018 at 3:33 AM 张昊翀  wrote:
>
> Dear MXNet community,
>
> We are from Cambricon, a leading supplier of artificial intelligence chips. 
> We have two product lines, including IP products (e.g., Cambricon 1A/1H) and 
> chip products (e.g., MLU100 released in May 2018)
>
> We are now adapting MXNet on Cambricon products. During the follow-up 
> session, we plan to open source, and hope to merge these new features into 
> the master branch of MXNet and to be a part of MXNet's long-term support. We 
> firmly believe that these MLU features will promote the MXNet community 
> development.
> To this end, we are ready to accept the rigorous inspection of MXNet 
> community. In addition, we need advice from the community to achieve high 
> quality implementation. On this basis, we very much hope to reach a 
> full-scale long-term cooperation with the community.
>
> In order to achieve the above goals, we hope to keep in touch with the 
> community on some issues. Looking forward to your valuable feedback.
>
> 1. MLU100 mainly focuses on inference, and we plan to first support the 
> inference part of MXNet. The training part of MXNet on MLU will be released 
> in the future. Is that acceptable for MXNet community?
>
> 2. Though MLU can support various operators/networks, to guarantee high 
> quality, all supported operators submitted to the community should undergo 
> rigorous stress test. Thus, at the beginning, we plan to release a small 
> number of supported operators and networks, and more of them will be 
> continuously added. Is that acceptable or do we have to support all networks 
> in the ModelZoo in the first release?
>
> 3. Currently we plan to support both Python and C++ APIs. More details on 
> supported APIs will be provided in a follow-up proposal.
>
> 4. We need to modify the mShadow in order to support tensor memory operations.
>
> 5. In order to enable the community to run and fully test our code, we want 
> to provide the community with a complete test environment. At present, we are 
> considering the following three ways.
> A) Provides several remote servers for community and integrates with the 
> community's Jenkins.
> B) Provide a cloud platform to the community.
> C) Donate MLU100 to the community's testing platform. However, we don’t know 
> the specific ways of donation, and we hope to get help. We are wondering 
> about how MXNet's test servers are managed.
>
> About more technical details, a proposal will be submitted to the community 
> before releasing the code.
>
> In addition to the above points, the remaining questions and suggestions are 
> also welcome. Thanks!
>
> More about Cambricon:
> Cambricon is the artificial intelligence computing pioneer that engineers and 
> successfully commercializes world’s first dedicated machine learning 
> processor. To bring its unique AI processors from edge to cloud, enriching 
> and advancing human life, is the firm mission of the company. Dr. Tianshi 
> Chen is the founder and CEO of Cambricon, where he brings over 10 years 
> experience in the fields of micro-processor architecture and artificial 
> intelligence.
> In 2016, Cambricon released Cambricon 1A processor, the first commercial 
> machine learning specific processor in the world. Later, during the 3rd World 
> Internet Conference, Cambricon 1A processor was elected as one of “World 
> Leading Internet Scientific and Technological Achievements“. In May 2018, 
> Cambricon released MLU100, a machine learning chip which is in mass 
> production now. By offering revolutionary technology and products, Cambricon 
> has established and remains active relationships with various companies in 
> the AI industry.
>
>
> Regards,
> Haochong Zhang
> Cambricon MXNet Development Team
>
>


Re: Proposal for a recurrent architecture meeting and long term direction

2018-12-16 Thread Pedro Larroy
Hi

I think you make good points. We can address your concerns by sending
notes to the mailing list and use the wiki / RFCs appropriately so the
community can follow and participate asynchronously so asynchronous
participation is still possible.

I would say let's try and see if they bring value, we can alway stop
if they don't or do it more often if the time is not enough, I think
once a month is a conservative choice that doesn't take too much time
from our already busy lives.

Other open source projects like IPFS do these kind of sessions and
seems to work for them.

It's also an opportunity to share what we are working on and collaborate more.

Pedro.

On Sun, Dec 16, 2018 at 7:04 PM Tianqi Chen  wrote:
>
> I feel that online meeting may not address most of the issues get raised in
> a short amount of time(1h) and still suffers the problem of not being
> publically archivable.
>
> Maybe we can not try more asynchronous way? (RFC discussion in issues.
> and/or discuss @dev). Just my two cents and I am not blocking the proposal
>
> Tianqi
>
>
> On Fri, Dec 14, 2018 at 5:34 AM Pedro Larroy 
> wrote:
>
> > Hi MXNetters
> >
> > To address the project growth and increased contributions I'm
> > proposing a monthly meeting / hangout to have community discussions
> > about MXNet architecture and mid / longer term technical directions
> > that require coordination beyond single PRs.
> >
> > TOPICS:
> >
> > The goal of this series is to address topics including but not limited to:
> >  - How to best integrate features that have a big impact on the project
> >  - Discussion about long term technical direction
> >  - Addressing of technical debt / refactoring needed.
> >  - Other architectures / framework support. Ex. ARM, Cuda etc.
> >  - Build system improvements and tooling such as code coverage, static
> > analysis etc.
> >  - Performance discussions.
> >  - Live discussion to address exceptional PRs with complex changes
> > that are better discussed live than in written form.
> >
> > FREQUENCY:
> >
> > I propose to make this meeting on the second Monday of the month at
> > 11am PST /  20pm CET
> >
> > So the tentative date for the first one would be on January 14th.
> >
> > If you think this arbitrary date is not good, please say so. In this
> > case we can proceed to make a doodle to find a slot that works for the
> > interested parties.
> >
> > I have opened a group calendar for our meetings, hangouts and other
> > events related to MXNet.
> >
> >
> > https://calendar.google.com/calendar/embed?src=6co88bqo3n4bjsbt1qrqmsvj4o%40group.calendar.google.com
> >
> > Pedro.
> >


Proposal for a recurrent architecture meeting and long term direction

2018-12-14 Thread Pedro Larroy
Hi MXNetters

To address the project growth and increased contributions I'm
proposing a monthly meeting / hangout to have community discussions
about MXNet architecture and mid / longer term technical directions
that require coordination beyond single PRs.

TOPICS:

The goal of this series is to address topics including but not limited to:
 - How to best integrate features that have a big impact on the project
 - Discussion about long term technical direction
 - Addressing of technical debt / refactoring needed.
 - Other architectures / framework support. Ex. ARM, Cuda etc.
 - Build system improvements and tooling such as code coverage, static
analysis etc.
 - Performance discussions.
 - Live discussion to address exceptional PRs with complex changes
that are better discussed live than in written form.

FREQUENCY:

I propose to make this meeting on the second Monday of the month at
11am PST /  20pm CET

So the tentative date for the first one would be on January 14th.

If you think this arbitrary date is not good, please say so. In this
case we can proceed to make a doodle to find a slot that works for the
interested parties.

I have opened a group calendar for our meetings, hangouts and other
events related to MXNet.

https://calendar.google.com/calendar/embed?src=6co88bqo3n4bjsbt1qrqmsvj4o%40group.calendar.google.com

Pedro.


Re: Scala standard library is included in the mxnet jar

2018-12-04 Thread Pedro Larroy
Chris,

I asked in github before and in other private forums before and I
wanted to bring attention to the topic nothing more, you are reading
too much between the lines. There's nothing impolite about my question
and I think dev@ is the right forum for such conversations.

Pedro.
On Tue, Dec 4, 2018 at 7:59 PM Chris Olivier  wrote:
>
> Pedro,
>
> It would be polite to ask if there is a reason it is included before
> categorically declaring it is wrong.
>
> I am not involved in the scala library and what's included in it, but maybe
> there's a good reason for it. Or maybe there isn't.  Either way, it's best
> to ask first :)
>
> Thanks,
>
> -Chris


Re: Scala standard library is included in the mxnet jar

2018-12-04 Thread Pedro Larroy
In my opinion that's not a good reason and the majority of other
libraries in maven don't do that. Is the first time I see a Scala
library do that.

If every library would include the standard library, asides from the
unnecessary bloating you have the problem that at runtime, when you
mix different jars containing the same class you don't know which
classes are going to be used (depends on the order of the jars) which
can cause problems at runtime.

If you need the Scala standard library you should add it to your
dependency closure as any other library.

https://mvnrepository.com/artifact/org.scala-lang/scala-library

Providing an additional "assembly" or "fat jar" for convenience is ok,
but I think the vanilla distribution should not include the standard
library.

Pedro.
On Tue, Dec 4, 2018 at 8:18 PM Naveen Swamy  wrote:
>
> Pedro,
> Not everyone has Scala installed on their system, especially for example
> they are Java/Clojure users(and they don't have to). I don't expect them to
> download/install Scala libraries. IMO, this approach is correct and lowers
> the entry bar for users.
> Soon, we are going include all dependencies of MXNet in the Jar and
> publish, this is deliberately done. Hope that answers your concern.
>
> -Naveen
>
> On Tue, Dec 4, 2018 at 10:59 AM Chris Olivier  wrote:
>
> > Pedro,
> >
> > It would be polite to ask if there is a reason it is included before
> > categorically declaring it is wrong.
> >
> > I am not involved in the scala library and what's included in it, but maybe
> > there's a good reason for it. Or maybe there isn't.  Either way, it's best
> > to ask first :)
> >
> > Thanks,
> >
> > -Chris
> >


Scala standard library is included in the mxnet jar

2018-12-04 Thread Pedro Larroy
Hi

I filled this issue because I observed that the Scala standard library
is included in the mxnet jar:
https://github.com/apache/incubator-mxnet/issues/13528

I don't think this is correct. Libraries should not include the
standard library. Why are we doing this?

Pedro.


Re: v1.4.0 status December 3rd

2018-12-04 Thread Pedro Larroy
This is ready to merge:  https://github.com/apache/incubator-mxnet/pull/13487

Parent PR was just merged in master.On Tue, Dec 4, 2018 at 6:08 AM
Steffen Rochel  wrote:
>
> Dear MXNet community -
> thank you very much for everybody effort to resolve the identified issues.
> We are down to 4 open issues - for details please see
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> 
> Please help to resolve the remaining issues and integrate to v1.4.x branch.
>
> A security scan revealed a number of security vulnerabilities in the
> Scala/Java package. Contributors are assessing the situation and severity
> as well as path to resolve the most critical issues. It might be necessary
> to move the next milestones to address the most critical vulnerabilities. I
> will communicate as soon I have more information.
>
> Regards,
> Steffen


Re: [Announce] Virtualized testing on ARM with Qemu and Docker

2018-12-04 Thread Pedro Larroy
Thanks, good idea.

Pedro.
On Mon, Dec 3, 2018 at 2:15 AM Steffen Rochel  wrote:
>
> Thanks Pedro, nice work. Please consider adding instructions also to
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Development+Guide
>
> On Fri, Nov 2, 2018 at 2:35 PM Marco de Abreu
>  wrote:
>
> > Great work, Pedro! This will help to have a consistent quality on all ARM
> > devices.
> >
> > -Marco
> >
> > Am Fr., 2. Nov. 2018, 20:02 hat Pedro Larroy  > >
> > geschrieben:
> >
> > > Hi MXNet community
> > >
> > > AI on MCUs can enable cheaper, lower power, better privacy and lower
> > > latency applications. There’s an estimate of more than 20 billion
> > connected
> > > devices to be deployed in 2020 and a part of them will do some amount of
> > AI
> > > / ML tasks. Testing in embedded devices is very challenging and expensive
> > > due to logistics, tooling and resource constraints. Here I would like to
> > > announce a contribution I have done using the free and open-source
> > emulator
> > > QEMU and Docker to perform hardware virtualization and test MXNet on edge
> > > devices, specifically to test the MXNet artifacts on ARM such as Pip
> > wheels
> > > and run unit tests.
> > >
> > > There's small instructions to run a virtualized environment on the bottom
> > > of the README.md in the ci folder:
> > >
> > >
> > https://github.com/apache/incubator-mxnet/tree/master/ci#testing-with-qemu
> > >
> > > I would encourage you to give it a try and report any comments or
> > feedback.
> > > The plan is to integrate it into nightly testing. We would need to narrow
> > > down a bit the scope of testing since still the full suite is just too
> > big
> > > and resource intensive to finish in a reasonable time.
> > >
> > > My idea would be to split the unit tests into different suites such as
> > core
> > > / gluon / extended. Do you have any suggestions for this split?
> > >
> > > As a cool thing to try, you can execute the following command which will
> > > give you a shell in an ARM VM (also sshable via ssh -p qemu@localhost
> > )
> > > so you can use and debug MXNet in ARM:
> > >
> > > ci/build.py -p test.arm_qemu -b && docker run -p: -ti
> > > mxnetci/build.test.arm_qemu
> > >
> > >
> > > How cool is that?   If you are curious or want to hack on it, have a look
> > > at the qemu folders under ci.
> > >
> > > Pedro.
> > >
> >


Re: Adding AMD CPU to CI

2018-11-30 Thread Pedro Larroy
I think just Adding AMD is not the right abstraction level. Testing and 
benchmarking with different cpu flags / march ie AVX2 sse2 brings value in my 
opinion. Just testing another vendor of a compatible cpu doesn’t.

Pedro

> On 30. Nov 2018, at 19:32, kellen sunderland  
> wrote:
> 
> Damn, knew i should have double-checked!  Oh well it's also carbon neutral.
> 
> On Fri, Nov 30, 2018 at 10:27 AM Pedro Larroy 
> wrote:
> 
>> Agee with Tianqi and Hao. Adding AMD brings no value and increases
>> complexity and CI cost. The instructions sets are the same. For
>> benchmarking it might make sense though.
>> 
>> Pedro
>> 
>>> On 30. Nov 2018, at 18:19, Tianqi Chen  wrote:
>>> 
>>> I still think it is overkill to add AMD CPU to the CI, given the
>> additional
>>> cost it could bring and little additional information we can get out from
>>> it.
>>> 
>>> A middle group is to add AMD CPU to a nightly build or final sweep before
>>> release. If there is a case that we find that AMD CPU really makes a
>>> difference, then we add it to the CI
>>> 
>>> Tianqi
>>> 
>>>> On Thu, Nov 29, 2018 at 6:29 PM Hao Jin  wrote:
>>>> 
>>>> For CPUs, the supported instruction sets may also vary between the same
>>>> manufacturer's different product lines of the same generation
>> (Skylake-SP
>>>> versus Skylake).
>>>> For the same instruction set, the two manufacturers should both have a
>>>> working version of the hardware implementation. If any of the
>>>> implementations does not work, then the chip would not even be
>> considered
>>>> functioning properly.
>>>> If some AMD CPUs only support up to AVX2 instruction sets, they would
>> just
>>>> function in the same way as an Intel CPU that supports up to AVX2
>>>> instruction sets. The performance may vary, but the capability and
>> behavior
>>>> of the two chips would be the same when given the same machine code.
>>>> For AMD GPUs it's a totally different story, as AMD GPUs do not share
>> the
>>>> same instruction sets with the NVIDIA ones, thus testing on AMD GPUs(if
>> we
>>>> do have support for them) would definitely add values.
>>>> Hao
>>>> 
>>>> On Thu, Nov 29, 2018 at 8:37 PM Anirudh Subramanian <
>> anirudh2...@gmail.com
>>>>> 
>>>> wrote:
>>>> 
>>>>> Instruction set extensions support like AVX2, AVX512 etc. can vary
>>>> between
>>>>> AMD and Intel and there can also be a time lag between when Intel
>>>> supports
>>>>> it versus when AMD supports it.
>>>>> Also, in the future this setup may be useful in case MXNet supports AMD
>>>>> GPUs and AWS also happens to have support for it.
>>>>> 
>>>>> Anirudh
>>>>> 
>>>>> 
>>>>> On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
>>>>>  wrote:
>>>>> 
>>>>>> I think it's worth a discussion to do a sanity check. While generally
>>>>> these
>>>>>> instructions are standardized, we also made the experience with ARM
>>>> that
>>>>>> the theory and reality sometimes don't match. Thus, it's always good
>> to
>>>>>> check.
>>>>>> 
>>>>>> In the next months we are going to refactor our slave creation
>>>> processes.
>>>>>> Chance Bair has been working on rewriting Windows slaves from scratch
>>>> (we
>>>>>> used images that haven't really been updated for 2 years - we still
>>>> don't
>>>>>> know what was done on them) and they're ready soon. In the following
>>>>>> months, we will also port our Ubuntu slaves to the new method (don't
>>>>> have a
>>>>>> timeline yet). Ideally, the integration of AMD instances will only be
>> a
>>>>>> matter of running the same pipeline on a different instance type. In
>>>> that
>>>>>> Case, it should not be a big deal.
>>>>>> 
>>>>>> If there are big differences, that's already a yellow flag for
>>>>>> compatibility, but that's unlikely. But in that case, we would have to
>>>>> make
>>>>>> a more thorough time analysis and whether it's worth the effort.
>> Maybe,
>>>>>> somebod

Re: Adding AMD CPU to CI

2018-11-30 Thread Pedro Larroy
Agee with Tianqi and Hao. Adding AMD brings no value and increases complexity 
and CI cost. The instructions sets are the same. For benchmarking it might make 
sense though.

Pedro

> On 30. Nov 2018, at 18:19, Tianqi Chen  wrote:
> 
> I still think it is overkill to add AMD CPU to the CI, given the additional
> cost it could bring and little additional information we can get out from
> it.
> 
> A middle group is to add AMD CPU to a nightly build or final sweep before
> release. If there is a case that we find that AMD CPU really makes a
> difference, then we add it to the CI
> 
> Tianqi
> 
>> On Thu, Nov 29, 2018 at 6:29 PM Hao Jin  wrote:
>> 
>> For CPUs, the supported instruction sets may also vary between the same
>> manufacturer's different product lines of the same generation (Skylake-SP
>> versus Skylake).
>> For the same instruction set, the two manufacturers should both have a
>> working version of the hardware implementation. If any of the
>> implementations does not work, then the chip would not even be considered
>> functioning properly.
>> If some AMD CPUs only support up to AVX2 instruction sets, they would just
>> function in the same way as an Intel CPU that supports up to AVX2
>> instruction sets. The performance may vary, but the capability and behavior
>> of the two chips would be the same when given the same machine code.
>> For AMD GPUs it's a totally different story, as AMD GPUs do not share the
>> same instruction sets with the NVIDIA ones, thus testing on AMD GPUs(if we
>> do have support for them) would definitely add values.
>> Hao
>> 
>> On Thu, Nov 29, 2018 at 8:37 PM Anirudh Subramanian >> 
>> wrote:
>> 
>>> Instruction set extensions support like AVX2, AVX512 etc. can vary
>> between
>>> AMD and Intel and there can also be a time lag between when Intel
>> supports
>>> it versus when AMD supports it.
>>> Also, in the future this setup may be useful in case MXNet supports AMD
>>> GPUs and AWS also happens to have support for it.
>>> 
>>> Anirudh
>>> 
>>> 
>>> On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
>>>  wrote:
>>> 
 I think it's worth a discussion to do a sanity check. While generally
>>> these
 instructions are standardized, we also made the experience with ARM
>> that
 the theory and reality sometimes don't match. Thus, it's always good to
 check.
 
 In the next months we are going to refactor our slave creation
>> processes.
 Chance Bair has been working on rewriting Windows slaves from scratch
>> (we
 used images that haven't really been updated for 2 years - we still
>> don't
 know what was done on them) and they're ready soon. In the following
 months, we will also port our Ubuntu slaves to the new method (don't
>>> have a
 timeline yet). Ideally, the integration of AMD instances will only be a
 matter of running the same pipeline on a different instance type. In
>> that
 Case, it should not be a big deal.
 
 If there are big differences, that's already a yellow flag for
 compatibility, but that's unlikely. But in that case, we would have to
>>> make
 a more thorough time analysis and whether it's worth the effort. Maybe,
 somebody else could also lend us a hand and help us with adding AMD
 support.
 
 -Marco
 
 Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin 
 geschrieben:
 
> f16c is also an instruction set supported by both brands' recent CPUs
 just
> like x86, AVX, SSE etc., and any difference in behaviors (quite
 impossible
> to happen or it will be a major defect) would most likely be caused
>> by
 the
> underlying hardware implementation, so still, adding AMD instances is
>>> not
> adding much value here.
> Hao
> 
> On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
> 
>> Just looked at the mf16c work and wanted to mention Rahul clearly
>>> _was_
>> thinking about AMD users in that PR.
>> 
>> On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
>> kellen.sunderl...@gmail.com> wrote:
>> 
>>> From my perspective we're developing a few features like mf16c
>> and
> MKLDNN
>>> integration specifically for Intel CPUs.  It wouldn't hurt to
>> make
 sure
>>> those changes also run properly on AMD cpus.
>>> 
>>> On Thu, Nov 29, 2018, 3:38 PM Hao Jin > wrote:
>>> 
 I'm a bit confused about why we need extra functionality tests
>>> just
> for
 AMD
 CPUs, aren't AMD CPUs supporting roughly the same instruction
>> sets
 as
>> the
 Intel ones? In the very impossible case that something working
>> on
> Intel
 CPUs being not functioning on AMD CPUs (or vice versa), it would
> mostly
 likely be related to the underlying hardware implementation of
>> the
> same
 ISA, to which we definitely do not have a good solution. So I
>>> don't
>> think
 

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

2018-11-29 Thread Pedro Larroy
PR is ready from my side and passes the tests, unless somebody raises
any concerns it's good to go.
On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel  wrote:
>
> Pedro - added  to 1.4.0 tracking list
> <https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack>
>
> Do you have already ETA?
> Steffen
>
> On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy 
> wrote:
>
> > Hi all.
> >
> > There are two important issues / fixes that should go in the next
> > release in my radar:
> >
> > 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> > There is a bug in shape inference on CPU when not using MKL, also we
> > are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> > I'm finishing a fix for these issues in the above PR.
> >
> > 2) https://github.com/apache/incubator-mxnet/issues/13438
> > We are seeing crashes due to unsafe setenv in multithreaded code.
> > Setenv / getenv from multiple threads is not safe and is causing
> > segfaults. This piece of code (the handlers in pthread_atfork) already
> > caused a very difficult to diagnose hang in a previous release, where
> > a fork inside cudnn would deadlock the engine.
> >
> > I would remove setenv from 2) as a mitigation, but we would need to
> > check for regressions as we could be creating additional threads
> > inside the engine.
> >
> > I would suggest that we address these two major issues before the next
> > release.
> >
> > Pedro
> >
> >
> >
> > On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel 
> > wrote:
> > >
> > > Dear MXNet community,
> > >
> > > I will be the release manager for the upcoming Apache MXNet 1.4.0
> > release.
> > > Sergey Kolychev will be co-managing the release and providing help from
> > the
> > > committers side.
> > > A release candidate will be cut on November 29, 2018 and voting will
> > start
> > > December 7, 2018. Release notes have been drafted here [1]. If you have
> > any
> > > additional features in progress and would like to include it in this
> > > release, please assure they have been merged by November 27, 2018.
> > Release
> > > schedule is available here [2].
> > >
> > > Feel free to add any other comments/suggestions. Please help to review
> > and
> > > merge outstanding PR's and resolve issues impacting the quality of the
> > > 1.4.0 release.
> > >
> > > Regards,
> > >
> > > Steffen
> > >
> > > [1]
> > >
> > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> > >
> > > [2]
> > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> > >
> > >
> > >
> > >
> > > On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> > > kellen.sunderl...@gmail.com> wrote:
> > >
> > > > Spoke too soon[1], looks like others have been adding Turing support as
> > > > well (thanks to those helping with this).  I believe there's still a
> > few
> > > > changes we'd have to make to claim support though (mshadow CMake
> > changes,
> > > > PyPi package creation tweaks).
> > > >
> > > > 1:
> > > >
> > > >
> > https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> > > >
> > > > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > > > kellen.sunderl...@gmail.com> wrote:
> > > >
> > > > > Hey Steffen, I'd like to be able to merge this PR for version 1.4:
> > > > > https://github.com/apache/incubator-mxnet/pull/13310 . It fixes a
> > > > > regression in master which causes incorrect feature vectors to be
> > output
> > > > > when using the TensorRT feature.  (Thanks to Nathalie for helping me
> > > > track
> > > > > down the root cause of the issue).   I'm currently blocked on a CI
> > issue
> > > > I
> > > > > haven't seen before, but hope to have it resolved by EOW.
> > > > >
> > > > > One call-out I would make is that we currently don't support Turing
> > > > > architecture (sm_75).  I've been slowly trying to add support, but I
> > > > don't
> > > > > think I'd have capacit

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

2018-11-29 Thread Pedro Larroy
I see. There's also an openmp primitive to change this. I see a way to
fix this issue with a bit of refactor.

Thanks.

Pedro.
On Thu, Nov 29, 2018 at 6:24 PM Chris Olivier  wrote:
>
> I don’t think that does anything at all, as stated in my other email.
> Someone can look into the omp code to be sure but my suspicion is that the
> environment variable is only read on startup, and at any rate, better to be
> set through the api at runtime
>
> On Thu, Nov 29, 2018 at 8:11 AM Pedro Larroy 
> wrote:
>
> > To be precise, what would be the consequences of not having these env
> > variables set in the engine threads related to OMP?
> > Given your experience with OpenMP I hope you can help us answer these
> > questions.
> >
> > Hopefully we can get the same effect (if any) of these setenvs using
> > some openmp call or a pragma. Definitely we shouldn't be mutating the
> > environment from a different thread from what I understand, which is
> > the likely cause of the random crashes some users are experiencing.
> >
> > Pedro
> > On Thu, Nov 29, 2018 at 5:00 PM Pedro Larroy
> >  wrote:
> > >
> > > Chris.  The problem is with setenv, not with getenv. We don't want to
> > > remove any getenv call, just these misplaced setenvs:
> > >
> > >
> > >
> > https://github.com/apache/incubator-mxnet/blob/master/src/initialize.cc#L61
> > >
> > > Please check the code above carefully and give us your feedback. Based
> > > on your email I think we don't yet have a common understanding of the
> > > root cause of this issue.
> > >
> > > Pedro.
> > > On Thu, Nov 29, 2018 at 4:02 PM Chris Olivier 
> > wrote:
> > > >
> > > > - getenv should be thread safe as long as nothing is calling
> > putenv/setenv
> > > > in another thread (the environment doesn’t change) as stated here:
> > > >
> > > > http://www.cplusplus.com/reference/cstdlib/getenv/
> > > >
> > > > it’s a simple library call, so to be sure either way, one can check the
> > > > actual source and see (in case some particular implementation is
> > acting in
> > > > a particularly thread-unsafe manner). This should be vetted before
> > making
> > > > any high-impact decisions such as trying to go remove every getenv
> > call in
> > > > the whole system.
> > > >
> > > > - locking after fork is possibly due to libgomp not supporting forking
> > such
> > > > that after a fork, a call is made to release the blocked omp threads
> > and
> > > > the main thread waits for the omp threads to finish, but the omp
> > threads
> > > > belong to the pre-forked process and thus never execute, causing that
> > > > forked process to freeze.  This behavior has been witnessed before.
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
> > pedro.larroy.li...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi all.
> > > > >
> > > > > There are two important issues / fixes that should go in the next
> > > > > release in my radar:
> > > > >
> > > > > 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> > > > > There is a bug in shape inference on CPU when not using MKL, also we
> > > > > are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> > > > > I'm finishing a fix for these issues in the above PR.
> > > > >
> > > > > 2) https://github.com/apache/incubator-mxnet/issues/13438
> > > > > We are seeing crashes due to unsafe setenv in multithreaded code.
> > > > > Setenv / getenv from multiple threads is not safe and is causing
> > > > > segfaults. This piece of code (the handlers in pthread_atfork)
> > already
> > > > > caused a very difficult to diagnose hang in a previous release, where
> > > > > a fork inside cudnn would deadlock the engine.
> > > > >
> > > > > I would remove setenv from 2) as a mitigation, but we would need to
> > > > > check for regressions as we could be creating additional threads
> > > > > inside the engine.
> > > > >
> > > > > I would suggest that we address these two major issues before the
> > next
> > > > > release.
> > > > >
> > > > > Pedro
> > > > >
> 

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

2018-11-29 Thread Pedro Larroy
To be precise, what would be the consequences of not having these env
variables set in the engine threads related to OMP?
Given your experience with OpenMP I hope you can help us answer these questions.

Hopefully we can get the same effect (if any) of these setenvs using
some openmp call or a pragma. Definitely we shouldn't be mutating the
environment from a different thread from what I understand, which is
the likely cause of the random crashes some users are experiencing.

Pedro
On Thu, Nov 29, 2018 at 5:00 PM Pedro Larroy
 wrote:
>
> Chris.  The problem is with setenv, not with getenv. We don't want to
> remove any getenv call, just these misplaced setenvs:
>
>
> https://github.com/apache/incubator-mxnet/blob/master/src/initialize.cc#L61
>
> Please check the code above carefully and give us your feedback. Based
> on your email I think we don't yet have a common understanding of the
> root cause of this issue.
>
> Pedro.
> On Thu, Nov 29, 2018 at 4:02 PM Chris Olivier  wrote:
> >
> > - getenv should be thread safe as long as nothing is calling putenv/setenv
> > in another thread (the environment doesn’t change) as stated here:
> >
> > http://www.cplusplus.com/reference/cstdlib/getenv/
> >
> > it’s a simple library call, so to be sure either way, one can check the
> > actual source and see (in case some particular implementation is acting in
> > a particularly thread-unsafe manner). This should be vetted before making
> > any high-impact decisions such as trying to go remove every getenv call in
> > the whole system.
> >
> > - locking after fork is possibly due to libgomp not supporting forking such
> > that after a fork, a call is made to release the blocked omp threads and
> > the main thread waits for the omp threads to finish, but the omp threads
> > belong to the pre-forked process and thus never execute, causing that
> > forked process to freeze.  This behavior has been witnessed before.
> >
> >
> >
> >
> > On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy 
> > wrote:
> >
> > > Hi all.
> > >
> > > There are two important issues / fixes that should go in the next
> > > release in my radar:
> > >
> > > 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> > > There is a bug in shape inference on CPU when not using MKL, also we
> > > are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> > > I'm finishing a fix for these issues in the above PR.
> > >
> > > 2) https://github.com/apache/incubator-mxnet/issues/13438
> > > We are seeing crashes due to unsafe setenv in multithreaded code.
> > > Setenv / getenv from multiple threads is not safe and is causing
> > > segfaults. This piece of code (the handlers in pthread_atfork) already
> > > caused a very difficult to diagnose hang in a previous release, where
> > > a fork inside cudnn would deadlock the engine.
> > >
> > > I would remove setenv from 2) as a mitigation, but we would need to
> > > check for regressions as we could be creating additional threads
> > > inside the engine.
> > >
> > > I would suggest that we address these two major issues before the next
> > > release.
> > >
> > > Pedro
> > >
> > >
> > >
> > > On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel 
> > > wrote:
> > > >
> > > > Dear MXNet community,
> > > >
> > > > I will be the release manager for the upcoming Apache MXNet 1.4.0
> > > release.
> > > > Sergey Kolychev will be co-managing the release and providing help from
> > > the
> > > > committers side.
> > > > A release candidate will be cut on November 29, 2018 and voting will
> > > start
> > > > December 7, 2018. Release notes have been drafted here [1]. If you have
> > > any
> > > > additional features in progress and would like to include it in this
> > > > release, please assure they have been merged by November 27, 2018.
> > > Release
> > > > schedule is available here [2].
> > > >
> > > > Feel free to add any other comments/suggestions. Please help to review
> > > and
> > > > merge outstanding PR's and resolve issues impacting the quality of the
> > > > 1.4.0 release.
> > > >
> > > > Regards,
> > > >
> > > > Steffen
> > > >
> > > > [1]
> > > >
> > > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Releas

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

2018-11-29 Thread Pedro Larroy
Chris.  The problem is with setenv, not with getenv. We don't want to
remove any getenv call, just these misplaced setenvs:


https://github.com/apache/incubator-mxnet/blob/master/src/initialize.cc#L61

Please check the code above carefully and give us your feedback. Based
on your email I think we don't yet have a common understanding of the
root cause of this issue.

Pedro.
On Thu, Nov 29, 2018 at 4:02 PM Chris Olivier  wrote:
>
> - getenv should be thread safe as long as nothing is calling putenv/setenv
> in another thread (the environment doesn’t change) as stated here:
>
> http://www.cplusplus.com/reference/cstdlib/getenv/
>
> it’s a simple library call, so to be sure either way, one can check the
> actual source and see (in case some particular implementation is acting in
> a particularly thread-unsafe manner). This should be vetted before making
> any high-impact decisions such as trying to go remove every getenv call in
> the whole system.
>
> - locking after fork is possibly due to libgomp not supporting forking such
> that after a fork, a call is made to release the blocked omp threads and
> the main thread waits for the omp threads to finish, but the omp threads
> belong to the pre-forked process and thus never execute, causing that
> forked process to freeze.  This behavior has been witnessed before.
>
>
>
>
> On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy 
> wrote:
>
> > Hi all.
> >
> > There are two important issues / fixes that should go in the next
> > release in my radar:
> >
> > 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> > There is a bug in shape inference on CPU when not using MKL, also we
> > are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> > I'm finishing a fix for these issues in the above PR.
> >
> > 2) https://github.com/apache/incubator-mxnet/issues/13438
> > We are seeing crashes due to unsafe setenv in multithreaded code.
> > Setenv / getenv from multiple threads is not safe and is causing
> > segfaults. This piece of code (the handlers in pthread_atfork) already
> > caused a very difficult to diagnose hang in a previous release, where
> > a fork inside cudnn would deadlock the engine.
> >
> > I would remove setenv from 2) as a mitigation, but we would need to
> > check for regressions as we could be creating additional threads
> > inside the engine.
> >
> > I would suggest that we address these two major issues before the next
> > release.
> >
> > Pedro
> >
> >
> >
> > On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel 
> > wrote:
> > >
> > > Dear MXNet community,
> > >
> > > I will be the release manager for the upcoming Apache MXNet 1.4.0
> > release.
> > > Sergey Kolychev will be co-managing the release and providing help from
> > the
> > > committers side.
> > > A release candidate will be cut on November 29, 2018 and voting will
> > start
> > > December 7, 2018. Release notes have been drafted here [1]. If you have
> > any
> > > additional features in progress and would like to include it in this
> > > release, please assure they have been merged by November 27, 2018.
> > Release
> > > schedule is available here [2].
> > >
> > > Feel free to add any other comments/suggestions. Please help to review
> > and
> > > merge outstanding PR's and resolve issues impacting the quality of the
> > > 1.4.0 release.
> > >
> > > Regards,
> > >
> > > Steffen
> > >
> > > [1]
> > >
> > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> > >
> > > [2]
> > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> > >
> > >
> > >
> > >
> > > On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> > > kellen.sunderl...@gmail.com> wrote:
> > >
> > > > Spoke too soon[1], looks like others have been adding Turing support as
> > > > well (thanks to those helping with this).  I believe there's still a
> > few
> > > > changes we'd have to make to claim support though (mshadow CMake
> > changes,
> > > > PyPi package creation tweaks).
> > > >
> > > > 1:
> > > >
> > > >
> > https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> > > >
> > > > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > > > kell

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

2018-11-29 Thread Pedro Larroy
Hi all.

There are two important issues / fixes that should go in the next
release in my radar:

1) https://github.com/apache/incubator-mxnet/pull/13409/files
There is a bug in shape inference on CPU when not using MKL, also we
are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
I'm finishing a fix for these issues in the above PR.

2) https://github.com/apache/incubator-mxnet/issues/13438
We are seeing crashes due to unsafe setenv in multithreaded code.
Setenv / getenv from multiple threads is not safe and is causing
segfaults. This piece of code (the handlers in pthread_atfork) already
caused a very difficult to diagnose hang in a previous release, where
a fork inside cudnn would deadlock the engine.

I would remove setenv from 2) as a mitigation, but we would need to
check for regressions as we could be creating additional threads
inside the engine.

I would suggest that we address these two major issues before the next release.

Pedro



On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel  wrote:
>
> Dear MXNet community,
>
> I will be the release manager for the upcoming Apache MXNet 1.4.0 release.
> Sergey Kolychev will be co-managing the release and providing help from the
> committers side.
> A release candidate will be cut on November 29, 2018 and voting will start
> December 7, 2018. Release notes have been drafted here [1]. If you have any
> additional features in progress and would like to include it in this
> release, please assure they have been merged by November 27, 2018. Release
> schedule is available here [2].
>
> Feel free to add any other comments/suggestions. Please help to review and
> merge outstanding PR's and resolve issues impacting the quality of the
> 1.4.0 release.
>
> Regards,
>
> Steffen
>
> [1]
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
>
> [2] 
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
>
>
>
>
> On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
>
> > Spoke too soon[1], looks like others have been adding Turing support as
> > well (thanks to those helping with this).  I believe there's still a few
> > changes we'd have to make to claim support though (mshadow CMake changes,
> > PyPi package creation tweaks).
> >
> > 1:
> >
> > https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> >
> > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > kellen.sunderl...@gmail.com> wrote:
> >
> > > Hey Steffen, I'd like to be able to merge this PR for version 1.4:
> > > https://github.com/apache/incubator-mxnet/pull/13310 . It fixes a
> > > regression in master which causes incorrect feature vectors to be output
> > > when using the TensorRT feature.  (Thanks to Nathalie for helping me
> > track
> > > down the root cause of the issue).   I'm currently blocked on a CI issue
> > I
> > > haven't seen before, but hope to have it resolved by EOW.
> > >
> > > One call-out I would make is that we currently don't support Turing
> > > architecture (sm_75).  I've been slowly trying to add support, but I
> > don't
> > > think I'd have capacity to do this done by EOW.  Does anyone feel
> > strongly
> > > we need this in the 1.4 release?  From my perspective this will already
> > be
> > > a strong release without it.
> > >
> > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel 
> > > wrote:
> > >
> > >> Thanks Patrick, lets target to get the PR's merged this week.
> > >>
> > >> Call for contributions from the community: Right now we have 10 PR
> > >> awaiting
> > >> merge
> > >> <
> > >>
> > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > >> >
> > >> and
> > >> we have 61 open PR awaiting review.
> > >> <
> > >>
> > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > >> >
> > >> I would appreciate if you all can help to review the open PR and the
> > >> committers can drive the merge before code freeze for 1.4.0.
> > >>
> > >> The contributors on the Java API are making progress, but not all
> > >> performance issues are resolved. With some luck it should be possible to
> > >> code freeze towards end of this week.
> > >>
> > >> Are there other critical features/bugs/PR you think need to be included
> > in
> > >> 1.4.0? If so, please communicate as soon as possible.
> > >>
> > >> Regards,
> > >> Steffen
> > >>
> > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric 
> > >> wrote:
> > >>
> > >> > Thanks, Steffen. I think there is NO open issue to block the MKLDNN to
> > >> GA
> > >> > now.
> > >> >
> > >> > BTW, several quantization related PRs (#13297,#13260) are under the
> > >> review
> > >> > and I think it can be merged in this week.
> > >> >
> > >> > Thanks,
> > >> >
> > >> > --Patric
> > >> >
> > >> >
> > >> > > -Original Message-
> > >> > > From: Steffen 

Re: using conan to manage Apache Incubator MXNet project dependencies

2018-11-27 Thread Pedro Larroy
Thanks both for the detailed explanations. Couple of more questions:

Is it easy to create a build which will build dependencies from source?
What guarantees you get with conan with regards to ABI / C++ stdlib
binary compatibility of the pulled dependencies?

Just to clarify: My concerns are in terms of reproducible builds /
source only distribution and undefined behaviour due to different
compiler / stdlib versions. Are these valid or is it oudated
knowledge?

Pedro.
On Tue, Nov 27, 2018 at 2:34 PM Diego Rodriguez-Losada
 wrote:
>
> Hi Pedro,
>
> Conan is distributed. So besides building from sources the dependencies, it
> is also possible to create binaries yourself for those dependencies (with
> the existing recipes, or your own recipes), and host them in your own repo
> (Bintray OSS repo, or Artifactory).
>
> This will provide both the security that you own the dependencies binaries
> and the convenience and speed of not having to build from sources. Even if
> you provide the binaries, consumers can always fallback to build from
> sources too.
>
> Kind regards,
> Diego
>
> El mar., 27 nov. 2018 a las 13:34, Konstantin Ivlev ()
> escribió:
>
> > Hi Pedro,
> >
> > yes, you're absolutely right, by default, conan will be pulling prebuilt
> > binaries for the libraries from the bintray.
> > however, if prebuilt binaries are not available (e.g. because you use some
> > different compiler for which we don't have prebuilt binaries),
> > or if you want to build binaries yourself for some another reason,
> > then libraries always might be built from source (by passing e.g. "--build
> > always", "--build missing" or "--build " to the conan install
> > command line).
> >
> > yours sincerely, Konstantin
> >
> > вт, 27 нояб. 2018 г. в 19:27, Pedro Larroy :
> >
> > > Hi Konstantin
> > >
> > > Thanks for this contribution. With your proposed changes, when
> > > building MXNet we will be pulling binaries for the libraries managed
> > > by conan?
> > >
> > >
> > > Pedro.
> > > On Mon, Nov 26, 2018 at 11:43 AM Konstantin Ivlev 
> > > wrote:
> > > >
> > > > Hello, Ivan,
> > > >
> > > > could you possibly clarify your question (may be explaining the
> > use-case
> > > > behind it)?
> > > > Gradle appears to be build system, AFAIK more popular in Java world.
> > > > meanwhile, Apache Incubator MXNet project uses CMake as its build
> > system.
> > > > please correct me, I am wrong.
> > > > in general, conan, as a package manager, is pretty
> > build-system-agnostic,
> > > > and it may work with arbitrary build systems (to count a few, CMake,
> > > > premake, qmake, full list:
> > > > https://docs.conan.io/en/latest/reference/generators.html). I don't
> > > think
> > > > Gradle is exception here.
> > > > also, for instance, Android Studio also uses Gradle for Android C++
> > > > projects, and conan works flawlessly in this particular case (
> > > >
> > https://blog.conan.io/2018/02/13/Android-Studio-project-Conan-Boost.html
> > > ).
> > > >
> > > > yours sincerely, Konstantin
> > > >
> > > > пн, 26 нояб. 2018 г. в 16:43, Ivan Serdyuk <
> > local.tourist.k...@gmail.com
> > > >:
> > > >
> > > > > Kostantin, and what (overall) option with using Gradle? Does this
> > your
> > > > > suggested package manager has been supported by Gradle?
> > > > >
> > > > > Ivan
> > > > >
> > > > > On Mon, Nov 26, 2018 at 9:43 AM Konstantin Ivlev <
> > tomsks...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > hello,
> > > > > >
> > > > > > this email is related to the following PR and JIRA ticket:
> > > > > > - [MXNET-1229] use OpenBLAS, lapack & OpenCV from conan
> > > > > > <https://github.com/apache/incubator-mxnet/pull/13400>
> > > > > > - use conan to manage project dependencies
> > > > > > <https://issues.apache.org/jira/browse/MXNET-1229>
> > > > > >
> > > > > > conan <https://conan.io> is an open-source package manager for C++
> > > > > > projects. it allows to manage project dependencies in transparent
> > and
> > > > > > declarative manner.
> > > > > >
> > > > > > cur

Re: using conan to manage Apache Incubator MXNet project dependencies

2018-11-27 Thread Pedro Larroy
Hi Konstantin

Thanks for this contribution. With your proposed changes, when
building MXNet we will be pulling binaries for the libraries managed
by conan?


Pedro.
On Mon, Nov 26, 2018 at 11:43 AM Konstantin Ivlev  wrote:
>
> Hello, Ivan,
>
> could you possibly clarify your question (may be explaining the use-case
> behind it)?
> Gradle appears to be build system, AFAIK more popular in Java world.
> meanwhile, Apache Incubator MXNet project uses CMake as its build system.
> please correct me, I am wrong.
> in general, conan, as a package manager, is pretty build-system-agnostic,
> and it may work with arbitrary build systems (to count a few, CMake,
> premake, qmake, full list:
> https://docs.conan.io/en/latest/reference/generators.html). I don't think
> Gradle is exception here.
> also, for instance, Android Studio also uses Gradle for Android C++
> projects, and conan works flawlessly in this particular case (
> https://blog.conan.io/2018/02/13/Android-Studio-project-Conan-Boost.html).
>
> yours sincerely, Konstantin
>
> пн, 26 нояб. 2018 г. в 16:43, Ivan Serdyuk :
>
> > Kostantin, and what (overall) option with using Gradle? Does this your
> > suggested package manager has been supported by Gradle?
> >
> > Ivan
> >
> > On Mon, Nov 26, 2018 at 9:43 AM Konstantin Ivlev 
> > wrote:
> >
> > > hello,
> > >
> > > this email is related to the following PR and JIRA ticket:
> > > - [MXNET-1229] use OpenBLAS, lapack & OpenCV from conan
> > > 
> > > - use conan to manage project dependencies
> > > 
> > >
> > > conan  is an open-source package manager for C++
> > > projects. it allows to manage project dependencies in transparent and
> > > declarative manner.
> > >
> > > currently, apache incubator-mxnet project uses the following different
> > > ways to manage its dependencies:
> > >
> > > - download GitHub archives during the build
> > > - OpenBLAS 
> > > - OpenCV 
> > > - conda  (alternative way to GitHub archives)
> > > - download from CMake
> > > - Intel Math Kernel Library 
> > > (MKL)
> > > - Git submodules
> > > - cub 
> > > - dlpack 
> > > - dmlc-core 
> > > - googletest 
> > > - mkldnn 
> > > - mshadow 
> > > - onnx-tensorrt 
> > > - openmp 
> > > - ps-lite 
> > > - tvm 
> > >
> > > this appears to be very heterogeneous and hard to manage/maintain, as
> > > multiple various commands are in use to achieve dependencies
> > installation,
> > > as well as multiple places are to look for dependency versions and their
> > > updates.
> > >
> > > with conan, it may became much more straightforward, as dependencies will
> > > be declared in single place (conanfile) and installed via single command
> > > (conan install).
> > >
> > > as project is very complex, and has lots of dependencies, for the first
> > > prototype I've used only very few of dependencies from conan: OpenCV
> > > , OpenBLAS
> > >  and lapack
> > > .
> > > others may be easily added then one by one, but they first has to be
> > > packaged (not all of them are packaged yet, e.g. GoogleTest
> > >  is
> > > available, while MKL is not).
> > >
> > > I attach patch which adds an initial conan support as proof of concept.
> > > also, I attach two simple build scripts, which I've used to test (for
> > > Windows and Linux / Mac OS X). Google Mail blocks .sh and .cmd
> > extensions,
> > > so you'll need to rename files.
> > > lemme know if you have any further questions.
> > >
> > > yours sincerely, Konstantin
> > >
> >


Re: [Discussion] MXNet CMake build - raise minimal required version

2018-11-22 Thread Pedro Larroy
Thanks Anton for putting this together and your efforts here. I think
it's crucial that we maintain and bring the CMake system forward. I
have spent a lot of time dealing with CMake issues on different
platforms, we really increase developer productivity and platform
support by having a streamlined build system.
On Thu, Nov 22, 2018 at 4:06 PM Chris Olivier  wrote:
>
> what is meant by:
>
>
> *Profiling*
> The profiler is always on even for production release builds, because MXNet
> can not be build without it [2].  ?
>
> you mean it is always built or it is turned on (ie recording and saving
> profiling information)?  I am not aware of it being turned on by default.
>
>
> profiler has no overhead when built in but not turned on.
>
>
> On Thu, Nov 22, 2018 at 2:35 AM Anton Chernov  wrote:
>
> > Dear MXNet community,
> >
> > I propose to raise the minimal required cmake version that is needed to
> > build MXNet to 3.10 which was tagged on March 16 2018 [1].
> >
> > The effort of repairing cmake scripts in general is targeting to deprecate
> > make and maintain only 1 build system.
> >
> > *Need*
> >
> > The build system is the foundation of every software project. It's quality
> > is directly impacting the quality of the project. The MXNet build system is
> > fragile, partially broken and not maintained.
> >
> > Users of MXNet and developers are confused by the fact that 2 build systems
> > exist at the same time: make and CMake.
> >
> > The main functional areas which are impacted by the current state of the
> > cmake files are:
> >
> > *OpenMP*
> > The current CMake files mix OpenMP libraries from different compliers which
> > is undefined behaviour. It leads to indeterministic crashes on some
> > platforms. Build and deployment are very hard. No evidence exists that
> > proves that there is any benefit of having llvm OpenMP library as a
> > submodule in MXNet.
> >
> > *BLAS and LAPACK*
> > Basic math library usage is mixed up. It is hard and confusing to configure
> > and the choosing logic of the most optimal library is not present. MKL and
> > OpenBLAS are intermixed in an unpredictable manner.
> >
> > *Profiling*
> > The profiler is always on even for production release builds, because MXNet
> > can not be build without it [2].
> >
> > *CUDA*
> > CUDA is detected by 3 different files in the current cmake scripts and the
> > choice of those is based on a obscure logic with involves different
> > versions of cmake and platforms which it's building on
> >
> > * CMakeLists.txt
> > * cmake/FirstClassLangCuda.cmake
> > * 3rdparty/mshadow/cmake/Cuda.cmake
> >
> >
> > *Confusing and misleading cmake user options*
> > For example, USE_CUDA / USE_OLDCMAKECUDA. Some of them will do or not do
> > what they supposed to based on cmake generator version and version of cmake
> > [3].
> > There are currently more than 30 build parameters for MXNet none of them
> > documented. Some of them not even located in the main CMakeLists.txt file,
> > for example 'BLAS'.
> >
> >
> > *Issues*
> > There is a significant amount of github issues related to cmake or build in
> > general. New tickets are issued frequently.
> >
> > * #8702 (https://github.com/apache/incubator-mxnet/issues/8702)
> >  [DISCUSSION] Should we deprecate Makefile and only use CMake?
> > * #5079 (https://github.com/apache/incubator-mxnet/issues/5079)   troubles
> > building python interface on raspberry pi 3
> > * #1722 (https://github.com/apache/incubator-mxnet/issues/1722)   problem:
> > compile mxnet with hdfs
> > * #11549 (https://github.com/apache/incubator-mxnet/issues/11549) Pip
> > package can be much faster (OpenCV version?)
> > * #11417 (https://github.com/apache/incubator-mxnet/issues/11417)
> > libomp.so
> > dependency (need REAL fix)
> > * #8532 (https://github.com/apache/incubator-mxnet/issues/8532)
> >  mxnet-mkl
> > (v0.12.0) crash when using (conda-installed) numpy with MKL // (indirectly)
> > * #11131 (https://github.com/apache/incubator-mxnet/issues/11131)
> > mxnet-cu92 low efficiency  // (indirectly)
> > * #10743 (https://github.com/apache/incubator-mxnet/issues/10743) CUDA
> > 9.1.xx failed if not set OLDCMAKECUDA on cmake 3.10.3 with unix makefile or
> > Ninja generator
> > * #10742 (https://github.com/apache/incubator-mxnet/issues/10742) typo in
> > cpp-package/CMakeLists.txt
> > * #10737 (https://github.com/apache/incubator-mxnet/issues/10737) Cmake is
> > running again when execute make install
> > * #10543 (https://github.com/apache/incubator-mxnet/issues/10543) Failed
> > to
> > build from source when set USE_CPP_PACKAGE = 1, fatal error C1083: unabel
> > to open file: “mxnet-cpp/op.h”: No such file or directory
> > * #10217 (https://github.com/apache/incubator-mxnet/issues/10217) Building
> > with OpenCV causes link errors
> > * #10175 (https://github.com/apache/incubator-mxnet/issues/10175) MXNet
> > MKLDNN build dependency/flow discussion
> > * #10009 (https://github.com/apache/incubator-mxnet/issues/10009)
> > [CMAKE][IoT] 

soft relu gradient, is it correct?

2018-11-20 Thread Pedro Larroy
I bumped into the definition of the softrelu gradient:

https://github.com/apache/incubator-mxnet/blob/master/src/operator/mshadow_op.h#L170

Which is defined as  1- exp(-x)

As we define the forward of the softrelu as the softplus function,
shouldn't the gradient be the logistic function?

Is my understanding that the gradient of the softrelu should go down
to zero as Lim x -> -Inf  Which is not the case with the above
definition which goes to -Inf as Lim x- > -Inf

https://en.wikipedia.org/wiki/Rectifier_(neural_networks)


Pedro.


Re: Splitting Jenkins pipelines - stop changes to Jenkinsfiles!

2018-11-20 Thread Pedro Larroy
I think this is a big problem, which has blocked us before. I want to
point out that you are doing a great thing by avoiding everyone
getting blocked by refactoring the pipelines.

My concern is that we are kicking the can down the road and not
addressing the root cause of the problem with is known
https://issues.jenkins-ci.org/browse/JENKINS-37984

Pedro.


On Tue, Nov 20, 2018 at 6:08 PM Marco de Abreu
 wrote:
>
> Hello Steffen,
>
> no, there won't be any impact on the PR process or nightly regressions.
> Only the reporting will have to be updated with the new job links, but that
> should be a minor issue. To avoid any outage, I have been thinking about
> running both versions in parallel.
>
> Best regards,
> Marco
>
>
>
> On Tue, Nov 20, 2018 at 5:53 PM Steffen Rochel 
> wrote:
>
> > Hi Marco - is there any impact on reporting, the PR process or nightly
> > regression beside reduction in TAT?  If yes, please elaborate.
> > Steffen
> >
> > On Tue, Nov 20, 2018 at 8:05 AM Marco de Abreu
> >  wrote:
> >
> > > Hello,
> > >
> > > we ran into issues around the maximum filesize of the Jenkinsfile a few
> > > times already. In order to resolve this issue, I'd like to combine this
> > > with some refactors I have planned for quite some time.
> > >
> > > The idea is basically to move away from one big Jenkinsfile and instead
> > > split it into separate jobs that run in parallel and report their status
> > > individually. Besides avoiding the size restriction, this will greatly
> > > speed up the PR validation process by reducing the critical path. Instead
> > > of having to wait for every single step within a stage to finish before
> > the
> > > next stage (e.g. tests) is getting executed, these pipelines would now be
> > > able to move forward individually. I'm still in the process of
> > refactoring
> > > and can't provide any numbers or documentation at this time, but I would
> > > like to announce this early on to avoid conflicts:
> > >
> > > Since I will remove the original Jenkinsfile, this might cause conflicts
> > > with ongoing efforts that try to change the Jenkinsfile. This poses the
> > > risk that I might forget to port a change. Thus, I'd like to ask all
> > > contributors to wait with changes of Jenkinsfile and would like to
> > request
> > > fellow-committers to wait with merging any Jenkinsfile-related PRs until
> > > further notice.
> > >
> > > I expect to finish this refactor until the end of the week. Please don't
> > > hesitate to ask if you've got further questions.
> > >
> > > Please excuse any caused inconveniences.
> > >
> > > Best regards,
> > > Marco
> > >
> >


Re: Should PR-860 (Use modernized range loops where possible) be reverted?

2018-11-20 Thread Pedro Larroy
Hi all

I think we have to make the clear separation between the thread votes
on "uniformly adopting C++11 range loops in the MXNet project" and a
PR which refactored code to be more legible and with improved variable
names.
Merging that PR doesn't imply that we have to uniformly adopt the
previous proposal.  The PR was reviewed and approved by several
people. I would keep the two topics separate, merging this PR doesn't
prescribe any particular idiom for future commits or reviews.

Pedro.

On Tue, Nov 20, 2018 at 2:58 PM Carin Meier  wrote:
>
> My intent was to be helpful, but I think I may have merged this PR
> yesterday too soon thinking it was approved and ready to merge
> https://github.com/apache/incubator-mxnet/pull/12356
>
> I didn't see the connected dev discussion
> https://lists.apache.org/thread.html/b47f285a80bef47c5ead6c361614e338a0661f6c0c76196c1e3719c5@%3Cdev.mxnet.apache.org%3E
> where there were -1 votes, which I believe are vetos?
>
> So the question is confirm: should PR should be reverted?
>
> Sorry for any confusion,
> Carin


collecting anonymous statistics

2018-11-19 Thread Pedro Larroy
Hi folks

As you know we have a combinatorial explosion of build flavours and
different libraries / platforms / environments.

Is collecting and gathering anonymous usage statistics as it's done in
other open source projects something which would be acceptable for the
community?

I think it would help us understand where to put efforts and how different
features are being used.

Regards.

Pedro.


Re: Catch divide-by-zero floating number exception in backend

2018-11-12 Thread Pedro Larroy
Hi

Could you be specific about the bugs? While we could use this for debug some 
particular errors as you describe I would think that in the general case you 
would want to rely on unit testing and conditional checks for very small 
numbers on the denominator if you can’t have a NaN. I think we should collect 
some examples before and study them carefully as fp artihmetic is tricky. I 
think is not common practice and also not portable to use signals and fp 
exceptions, as you mentioned.

Pedro

> On 9. Nov 2018, at 00:30, Lin Yuan  wrote:
> 
> Dear MXNet Community,
> 
> I recently found the NaN errors sometimes could be due to some
> divide-by-zero float number bugs in engine backend. However, by default,
> such an exception will not be thrown. I added a signal trap to catch this
> error (https://github.com/apache/incubator-mxnet/pull/13190) and caught a
> few exceptions when running the python unit test. But this only works for
> Linux OS.
> 
> I would like to get more feedback on the best practice to catch such bugs
> in the code and if we should enforce such checks in CI. Any comment is
> appreciated.
> 
> Best Regards,
> 
> Lin


Developer setup in AWS EC2 & menu based tool

2018-11-11 Thread Pedro Larroy
Hi Folks!

We got the feedback that reproducing test results on EC2 was complex and
time consuming.

I have cleaned up my personal scripts to provision a development instance:

https://cwiki.apache.org/confluence/display/MXNET/MXNet+Developer+setup+on+AWS+EC2

https://github.com/larroy/ec2_launch_scripts

My favorite is ephemeral storage provisioning with raid0 which makes IO
blazing fast. Also provisions your home files super easily.

In addition I'm working on a menu based tool (tool.py in lack of a better
name) which makes quick & easy to compile MXNet, run tests, etc. Check and
review this PR:

https://github.com/apache/incubator-mxnet/pull/13202

Hopefully these tools makes life much easier for anybody developing MXNet
as this automates repetitive tasks that everyone is doing often.

Let me know what you think.

Pedro.


Unit test breakdown by time

2018-11-07 Thread Pedro Larroy
Hi

I made a quick breakdown of time spent on unit tests (in seconds) per test
and per test class. I run CPU tests on an m1 instance.

As you can see the slowest test classes are:

test_operator, 630.92699
test_gluon, 261.106017
test_profiler, 159.427002
test_gluon_model_zoo, 130.854
test_metric_perf, 88.016
test_gluon_data, 59.193
test_io, 58.9280004
test_optimizer, 54.630999

I think we should dive deep in some of those tests and see how can we make
them lighter.  Any thoughts?

Pedro.
test_operator, 630.92699
test_gluon, 261.106017
test_profiler, 159.427002
test_gluon_model_zoo, 130.854
test_metric_perf, 88.016
test_gluon_data, 59.193
test_io, 58.9280004
test_optimizer, 54.630999
test_loss, 38.6230005
test_contrib_control_flow, 31.102
test_sparse_operator, 28.695
test_image.TestImage, 24.4339997
test_recordio, 24.0059997
test_ndarray, 14.8279994
test_sparse_ndarray, 12.546
test_module, 8.017
test_gluon_contrib, 6.757
test_contrib_io, 3.525
test_gluon_rnn, 2.838
test_subgraph, 2.56
test_executor, 1.762
test_gluon_utils, 1.462
test_contrib_text, 1.032
test_kvstore, 0.942
test_symbol, 0.397
test_subgraph_op, 0.345
test_gluon_trainer, 0.27
test_test_utils, 0.251
test_contrib_optimizer, 0.235
test_rnn, 0.16504
test_autograd, 0.15305
test_contrib_operator, 0.14202
test_gluon_data_vision, 0.077
test_exc_handling, 0.063
test_metric, 0.043003
test_contrib_autograd, 0.039
test_init, 0.027997
test_contrib_svrg_module, 0.025
test_thread_local, 0.023
test_predictor, 0.02
test_multi_device_exec, 0.016
test_contrib_svrg_optimizer, 0.013001
test_engine, 0.012
test_infer_type, 0.012
test_infer_shape, 0.009001
test_model_parallel, 0.009
test_contrib_krprod, 0.008
test_viz, 0.006
test_attr, 0.005
test_engine_import, 0.002
test_base.MXNetDataDirTest, 0.0
nose.failure.Failure, 0.0
test_models, 130.854
test_slice_pooling2d_slice_pooling2d, 126.273
test_psroipooling, 101.859
test_pick, 99.113
test_continuous_profile_and_instant_marker, 88.328
test_metric_performance, 88.016
test_broadcast_binary_op, 87.497
test_datasets, 45.359
test_layer_norm, 42.679
test_order, 36.46
test_NDArrayIter, 32.427
test_slice_pooling2d, 29.235
test_slice_batchnorm, 28.137
test_stack, 26.771
test_slice_batchnorm_reshape_batchnorm, 25.103
test_profiler, 24.563
test_recordio_pack_label, 23.987
test_lstm_dropout, 23.58
test_adam, 23.38
test_reduce, 19.459
test_Cifar10Rec, 18.818
test_cond, 17.734
test_gru_dropout, 17.297
test_lstm_bidirectional, 16.692
test_imageiter, 16.551
test_profile_tune_pause_resume, 14.664
test_legacy_save_params, 14.139
test_gru_bidirectional, 13.197
test_laop_2, 11.906
test_lstm_sym, 10.996
test_correlation, 10.212
test_rms, 10.177
test_profile_counter, 9.983
test_symbol_block, 9.779
test_while_loop_for_foreach, 9.172
test_gru_sym, 8.293
test_conv2d_16c, 8.189
test_one_hot, 7.545
test_order, 7.424
test_profile_event, 7.347
test_profile_task, 7.298
test_synthetic_dataset_generator, 7.261
test_profile_frame, 7.226
test_laop_3, 6.976
test_binary_op, 6.563
test_elemwise_binary_ops, 6.522
test_rnnrelu_dropout, 6.42
test_datasets, 6.382
test_rnntanh_dropout, 6.345
test_multi_worker, 6.325
test_pad, 6.01
test_multi_proposal_op, 5.72
test_sparse_square_sum, 5.702
test_executor_group, 5.57
test_hybrid_static_memory, 5.43
test_MNISTIter, 5.16
test_recordimage_dataset_with_data_loader_multiworker, 4.896
test_nadam, 4.88
test_cast_storage_ex, 4.652
test_sparse_nd_broadcast, 4.591
test_batchnorm_training, 4.399
test_l2_normalization, 4.297
test_sgd, 4.192
test_poisson_nllloss_mod, 4.033
test_triplet_loss, 3.862
test_ctc_loss_train, 3.825
test_resize_short, 3.794
test_rnntanh_bidirectional, 3.721
test_unary_math_operators, 3.666
test_contrib_DataLoaderIter, 3.525
test_huber_loss, 3.433
test_crop, 3.402
test_saveload, 3.36
test_squared_hinge_loss, 3.289
test_sparse_dot, 3.25
test_signum, 3.184
test_hinge_loss, 3.138
test_diag, 3.049
test_roipooling, 3.038
test_sparse_sgd, 2.919
test_rnntanh_sym, 2.816
test_sparse_storage_fallback, 2.771
test_dtype, 2.719
test_hybrid_static_memory_switching, 2.704
test_batch_dot, 2.601
test_make_subgraph, 2.56
test_sample_weight_loss, 2.558
test_batchnorm_fallback, 2.513
test_where, 2.459
test_ftml, 2.37
test_l2_loss, 2.303
test_l1_loss, 2.296
test_dot, 2.262
test_ce_loss, 2.218
test_laop, 2.174
test_LibSVMIter, 2.13
test_bce_loss, 2.128
test_kl_loss, 2.127
test_nag, 2.023
test_broadcast, 2.019
test_ndarray_indexing, 1.945
test_sparse_retain, 1.838
test_op_roi_align, 1.797
test_factorization_machine_module, 1.577
test_bind, 1.56
test_stn, 1.486
test_multiprocessing_download_successful, 1.457
test_concat, 1.378
test_quadratic_function, 1.361
test_while_loop_simple_forward, 1.357
test_broadcast, 1.343
test_spacetodepth, 1.288
test_recordimage_dataset, 1.261
test_while_loop_rnn, 1.238
test_broadcast_binary, 1.196

Re: [DISCUSS] Speedup non-code PR in CI

2018-11-06 Thread Pedro Larroy
It has been raised but there are practical complications about introducing
an additional layer of logic for skipping CI in some scenarios.

How many of these PRs do we have which will justify investing human effort
on optimizing an automated process?
How much effort shall it be dedicated to this logic that can be invested
somewhere else?

If you have a proposal, you are welcome to explain.

Pedro.

On Tue, Nov 6, 2018 at 7:09 PM Lin Yuan  wrote:

> Dear Community,
>
> I recently submitted a few small PRs with only changes in README files.
> However, I noticed they still triggered the full cycle of CI including
> build and test on all platforms.
>
> Do we have a plan to speed up this process, maybe skipping non-code related
> PRs in CI? Sorry, if this topic has been raised earlier and if not I
> appreciate any comments.
>
> Cheers,
>
> Lin
>


[Announce] Virtualized testing on ARM with Qemu and Docker

2018-11-02 Thread Pedro Larroy
Hi MXNet community

AI on MCUs can enable cheaper, lower power, better privacy and lower
latency applications. There’s an estimate of more than 20 billion connected
devices to be deployed in 2020 and a part of them will do some amount of AI
/ ML tasks. Testing in embedded devices is very challenging and expensive
due to logistics, tooling and resource constraints. Here I would like to
announce a contribution I have done using the free and open-source emulator
QEMU and Docker to perform hardware virtualization and test MXNet on edge
devices, specifically to test the MXNet artifacts on ARM such as Pip wheels
and run unit tests.

There's small instructions to run a virtualized environment on the bottom
of the README.md in the ci folder:

https://github.com/apache/incubator-mxnet/tree/master/ci#testing-with-qemu

I would encourage you to give it a try and report any comments or feedback.
The plan is to integrate it into nightly testing. We would need to narrow
down a bit the scope of testing since still the full suite is just too big
and resource intensive to finish in a reasonable time.

My idea would be to split the unit tests into different suites such as core
/ gluon / extended. Do you have any suggestions for this split?

As a cool thing to try, you can execute the following command which will
give you a shell in an ARM VM (also sshable via ssh -p qemu@localhost)
so you can use and debug MXNet in ARM:

ci/build.py -p test.arm_qemu -b && docker run -p: -ti
mxnetci/build.test.arm_qemu


How cool is that?   If you are curious or want to hack on it, have a look
at the qemu folders under ci.

Pedro.


Re: Coverity scan

2018-11-02 Thread Pedro Larroy
Thanks a lot, I think is very beneficial that we invest in these kind of
tooling for code quality. As a developer I wonder, do we have actionable
items for looking at / fixing these issues or right now is done in an
informational / good will basis?

Is there a way to colorize this output?

Pedro.

On Fri, Nov 2, 2018 at 5:10 PM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> Reference scan here (I believe I also count 5 memory violations):
>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/incubator-mxnet/branches/master/runs/1856/nodes/104/log/?start=0
>
> -Kellen
>
> On Fri, Nov 2, 2018 at 9:07 AM kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
>
> > Hey Anton, can you provide a sample scan?  I'm interested to see if it
> > catches different memory access violations, or if it gets the same ones
> > we've already seen reported by clang-tidy.  For example are these
> > violations in the reports:
> > --
> > "/work/mxnet/3rdparty/dmlc-core/include/dmlc/concurrentqueue.h:3443:24:
> > warning: Access to field 'capacity' results in a dereference of a null
> > pointer (loaded from variable 'mainHash')
> > [clang-analyzer-core.NullDereference]"
> >
> > ---
> >
> > /work/mxnet/3rdparty/mshadow/mshadow/./tensor.h:64:23: warning: Assigned
> value is garbage or undefined [clang-analyzer-core.uninitialized.Assign]
> >   this->shape_[i] = s[i];"
> >
> > -
> >
> >
> >
> /usr/bin/../lib/gcc/x86_64-linux-gnu/8.0.1/../../../../include/c++/8.0.1/ext/atomicity.h:67:29:
> warning: Use of memory after it is freed
> [clang-analyzer-cplusplus.NewDelete]
> >
> > --
> >
> > -Kellen
> >
> >
> >
> > On Fri, Nov 2, 2018 at 2:20 AM Anton Chernov 
> wrote:
> >
> >> Dear MXNet community,
> >>
> >> I had investigated the possibility to adopt Coverity static analysis
> tools
> >> for the MXNet project and it turned out that there is a tool provided by
> >> Synopsys for open-source projects:
> >>
> >> https://scan.coverity.com
> >>
> >> The tool works nicely with GitHub [1] and I found that a scan for a fork
> >> (from @apeforest) [2] was already set up. I can not tell how long ago
> the
> >> scan was performed, but at the time of writing the project page shows 5
> >> illegal memory access errors, that I think would be worth investigating.
> >>
> >> If there is interest I would suggest that we would setup a Coverity scan
> >> for the main repository instead of a fork and people that have interest
> >> managing and fixing issues would request add them to the project.
> >>
> >> I would appreciate feedback for this proposal and help from people
> having
> >> rights for the main repository to set things up.
> >>
> >> Best regards,
> >> Anton
> >>
> >> [1] https://scan.coverity.com/github
> >> [2] https://scan.coverity.com/projects/apeforest-incubator-mxnet
> >>
> >
>


possible bug in gpu_topology.h ComputeDepth ?

2018-11-01 Thread Pedro Larroy
Hi

I'm investigating this issue:
https://github.com/apache/incubator-mxnet/issues/12994

To me this code seems suspicious, as it doesn't do what is stated in the
comment.

https://github.com/apache/incubator-mxnet/blob/master/src/kvstore/gpu_topology.h#L577

I don't think the depth of the binary tree is calculated correctly, for
example a tree of three nodes should have two leves, but a tree of four
nodes should have three. A tree of 0 should have 0.

Any ideas if this is indeed buggy? or there's something hidden I'm missing?

Test code to check:

#include 
#include 
#include 
#include 
#include 
#include 
using namespace std;
inline int ComputeDepth(int n) {
  for (int depth = 0; depth < 16; ++depth) {
int num = 2 << depth;
if (n <= num)
  return depth+1;
  }
  return 0;
}
int main(int argc, char *argv[])
{
for (size_t i=0; i<64; ++i)
cout << "ComputeDepth(" << i << ") = " << ComputeDepth(i) << endl;
}


ComputeDepth(0) = 1
ComputeDepth(1) = 1
ComputeDepth(2) = 1
ComputeDepth(3) = 2
ComputeDepth(4) = 2
ComputeDepth(5) = 3
ComputeDepth(6) = 3
ComputeDepth(7) = 3
ComputeDepth(8) = 3
ComputeDepth(9) = 4
ComputeDepth(10) = 4
ComputeDepth(11) = 4
ComputeDepth(12) = 4
ComputeDepth(13) = 4
ComputeDepth(14) = 4
ComputeDepth(15) = 4
ComputeDepth(16) = 4
ComputeDepth(17) = 5
ComputeDepth(18) = 5


Re: [VOTE] - Adopt "Become a Committer and PPMC Member" Document

2018-11-01 Thread Pedro Larroy
+1 non-binding. Thanks for driving this, looking forward to see the
positive impact.

On Mon, Oct 29, 2018 at 11:47 PM Carin Meier  wrote:

> This vote is to adopt the document
>
> https://cwiki.apache.org/confluence/display/MXNET/Become+an+Apache+MXNet+%28incubating%29+Committer+and+PPMC+Member+Proposal
> to replace the current document
> https://cwiki.apache.org/confluence/display/MXNET/Becoming+a+Committer
>
> The dev discussion thread is here
>
> https://lists.apache.org/thread.html/e61ffa26af374de7a99c475d406e462a00b26cfc1155e232198dd53e@%3Cdev.mxnet.apache.org%3E
>
> The vote will be a procedural issue vote as defined
> https://www.apache.org/foundation/voting.html
>
> Votes on procedural issues follow the common format of majority rule unless
> otherwise stated. That is, if there are more favourable votes than
> unfavourable ones, the issue is considered to have passed -- regardless of
> the number of votes in each category. (If the number of votes seems too
> small to be representative of a community consensus, the issue is typically
> not pursued. However, see the description of lazy consensus
>  for a
> modifying factor.)
>
> The vote will run until Friday Nov 2nd at 6:00 am EST
>
> Thanks,
> Carin
>


[Discuss] Feature detection at runtime / test skipping depending on features

2018-11-01 Thread Pedro Larroy
Hi

There are some tests that fail when some features are not compiled in, such
as Opencv.

In some cases we skip the test according to some precondition such as:

@unittest.skipIf(not graphviz_exists(),


I would propose that we have a Python module that exports a set of methods
to check what features are compiled in to skip tests which need this
feature.



test_gluon_data.test_recordimage_dataset ... [INFO] Setting test
np/mx/python random seeds, use MXNET_TEST_SEED=1883419283 to reproduce.
ERROR
test_gluon_data.test_recordimage_dataset_with_data_loader_multiworker ...
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in
_bootstrap
self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
  File
"/usr/local/lib/python3.5/dist-packages/mxnet/gluon/data/dataloader.py",
line 189, in worker_loop
batch = batchify_fn([dataset[i] for i in samples])
  File
"/usr/local/lib/python3.5/dist-packages/mxnet/gluon/data/dataloader.py",
line 189, in 
batch = batchify_fn([dataset[i] for i in samples])
  File
"/usr/local/lib/python3.5/dist-packages/mxnet/gluon/data/vision/datasets.py",
line 261, in __getitem__
return image.imdecode(img, self._flag), header.label
  File "/usr/local/lib/python3.5/dist-packages/mxnet/image/image.py", line
147, in imdecode
return _internal._cvimdecode(buf, *args, **kwargs)
  File "", line 36, in _cvimdecode
  File "/usr/local/lib/python3.5/dist-packages/mxnet/_ctypes/ndarray.py",
line 92, in _imperative_invoke
ctypes.byref(out_stypes)))
  File "/usr/local/lib/python3.5/dist-packages/mxnet/base.py", line 252, in
check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [19:21:42] /work/mxnet/src/io/image_io.cc:211: Build
with USE_OPENCV=1 for image io.


Pedro


Re: [DISCUSS] - Revisions to Committer Criteria

2018-10-25 Thread Pedro Larroy
This is the first hangout that I was able to attend, I liked the format and
found them valuable. Thanks for organizing and publishing the notes.
Looking forward to the next one.

Pedro

On Thu, Oct 25, 2018 at 6:44 AM Steffen Rochel 
wrote:

> Carin - please see
>
> https://cwiki.apache.org/confluence/display/MXNET/Hangout+October+24th+2018+8am+and+5pm+PDT
> :
> Discussion about committer proposal:
>
>- Proposal default should be to have separation between committer and
>PPMC election
>- Criteria are vague, should we add some example persona?
>- Spell out privileges of committer and PPMC member
>
>
> Note: I update the project proposal to address first bullet.
>
> Steffen
>
>
> On Wed, Oct 24, 2018 at 11:29 AM Carin Meier  wrote:
>
> > A request to whoever is taking notes at the MXNet Hangouts that are
> > occurring today. Could you please recap feedback from the meeting in
> > regards to document revisions here for everyone? I would like to attend
> the
> > session later today, but may not due to family obligations.
> >
> > Thanks!
> > Carin
> >
> > On Tue, Oct 23, 2018 at 2:24 PM Steffen Rochel 
> > wrote:
> >
> > > Carin - I got feedback on my proposal and made changes. I incorporated
> > > Tianqi's suggesiton that we should strive to nominate committer/PPMC
> > > candidates from outside ones own organization. It should not be
> > considered
> > > as a hard rule, but recommendation.
> > >
> > > Steffen
> > >
> > > On Mon, Oct 22, 2018 at 2:18 PM Carin Meier 
> > wrote:
> > >
> > > > Thanks Steffen helping draft up the proposal for Committer and PPMC
> > > > guidelines.
> > > >
> > > > Please everyone review and provide feedback
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Become+an+Apache+MXNet+(incubating)+Committer+and+PPMC+Member+Proposal
> > > > .
> > > >
> > > > I plan to start a vote on this Friday if the discussions/revisions
> are
> > > > complete.
> > > >
> > > > - Carin
> > > >
> > > > On Fri, Oct 19, 2018 at 12:03 PM Carin Meier 
> > > wrote:
> > > >
> > > > > Great!
> > > > >
> > > > > I started a rough draft for collaboration at
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Become+a+Committer+Proposal
> > > > > .
> > > > >
> > > > > Everyone feel free to enhance and provide feedback.
> > > > >
> > > > > - Carin
> > > > >
> > > > > On Fri, Oct 19, 2018 at 10:55 AM Steffen Rochel <
> > > steffenroc...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > >> +1, great suggestion, thanks Carin!
> > > > >> I'm willing to collaborate to create a draft proposal.
> > > > >> Steffen
> > > > >>
> > > > >> On Fri, Oct 19, 2018 at 5:35 AM Carin Meier  >
> > > > wrote:
> > > > >>
> > > > >> > Background:
> > > > >> >
> > > > >> > There is a desire to increase the committer pool and grow the
> > > > community.
> > > > >> > This thread is to discuss the possibility of revision the
> current
> > > > >> committer
> > > > >> > criteria in light of the following goals:
> > > > >> >
> > > > >> > - Make it easier to newcomers to be committers
> > > > >> > - Recognize non-code contributions as paths to committership
> > > > >> > - Open the door to separating levels of committer and PMC
> > (discussed
> > > > in
> > > > >> > another thread)
> > > > >> >
> > > > >> > Current State:
> > > > >> >
> > > > >> > The current committer criteria is here
> > > > >> >
> > > >
> https://cwiki.apache.org/confluence/display/MXNET/Becoming+a+Committer
> > > > >> as
> > > > >> > is modeled after the Hadoop committer criteria
> > > > >> > https://hadoop.apache.org/committer_criteria.html
> > > > >> >
> > > > >> > Proposal:
> > > > >> >
> > > > >> > Model the MXNet path to committership and PMC after the Apache
> > Beam
> > > > >> project
> > > > >> > https://beam.apache.org/contribute/become-a-committer/
> > > > >> >
> > > > >> > Short quote from page:
> > > > >> >   =
> > > > >> > An Apache Beam committer…
> > > > >> > <
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
> > > > >> > >
> > > > >> >
> > > > >> >- Takes many forms
> > > > >> ><
> > > > >> >
> > > >
> > https://beam.apache.org/contribute/become-a-committer/#takes-many-forms
> > > > >> >
> > > > >> >- Knows, upholds, and reinforces the Apache Software
> Foundation
> > > > code
> > > > >> of
> > > > >> >conduct
> > > > >> ><
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://beam.apache.org/contribute/become-a-committer/#knows-upholds-and-reinforces-the-apache-software-foundation-code-of-conduct
> > > > >> > >
> > > > >> >- Knows, upholds, and reinforces the responsibilities of an
> > > Apache
> > > > >> >Software Foundation committer
> > > > >> ><
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://beam.apache.org/contribute/become-a-committer/#knows-upholds-and-reinforces-the-responsibilities-of-an-apache-software-foundation-committer
> > > > >> > >
> > > > 

Re: Up-streaming MXNet HIP Port

2018-10-24 Thread Pedro Larroy
Hi Srihari

Thanks for the document. We could add it to the relevant section in the
wiki so it's easier to keep track whenever you feel comfortable with it.
https://cwiki.apache.org/confluence/display/MXNET/Design+Proposals

The calls to synchronize and wait looks very similar. Shall we use
polymorphism or a bridge pattern to abstract this common calls instead of
using the preprocessor? both seem to use the same abstraction (streams).
Using the suggested pattern instead of preprocessor would lead to code that
is easier to maintain and instrument. A shortcoming would be if the APIs to
abstract would be too different.

Pedro.

On Wed, Oct 24, 2018 at 6:13 AM Karnam, Srihari 
wrote:

> Dear All,
>
>
> Please review the design for Up-Streaming MXNet HIP Port.
>
>
>
> https://docs.google.com/document/d/1uGr1KPVDqDVUwnhM0vtmxkZZpE2Mf9slNF9e2IbwVj0/edit?usp=sharing
>
>
> Regards,
>
> Srihari Karnam
>


Re: [Discussion] PMC and Committer Courtesy: Only Propose Candidate in a Different Organization

2018-10-23 Thread Pedro Larroy
Hi Steffen

I don't see that text in the proposal. I personally think the bar should be
the same for everyone independent of their affiliation, as per Apache ways
of working, and per ethical principles of equality and meritocracy,
otherwise we open the door to discrimination.

This old movie with de Niro and Cuba Gooding comes to mind:
https://en.wikipedia.org/wiki/Men_of_Honor

Pedro.

On Tue, Oct 23, 2018 at 5:15 PM Steffen Rochel 
wrote:

> Tianqi and Pedro - I suggested the following in
>
> https://cwiki.apache.org/confluence/display/MXNET/Become+an+Apache+MXNet+%28incubating%29+Committer+and+PPMC+Member+Proposal
>  :
> In general, a nominee for PPMC will be evaluated based on the merit and
> contribution, independent of his affiliation. However, as an exception to
> improve the affiliation diversity within the PPMC, the PPMC might apply a
> higher bar for nominees affiliated with Amazon AWS.
>
> Please comment on the proposal (see thread "[DISCUSS] - Revisions to
> Committer Criteria).
>
> Steffen
>
> On Tue, Oct 23, 2018 at 7:09 AM Pedro Larroy  >
> wrote:
>
> > Hi
> >
> > Tianqi, there's a saying that the road to hell is paved with good
> > intentions. I think most of us here are enthusiastic about increasing
> > diversity of contributions to this project and are working actively
> towards
> > this. Having any kind of positive or negative discrimination seems to me
> > like it goes against what's listed on the Apache website, under the
> section
> > "Meritocracy". https://www.apache.org/foundation/how-it-works.html
> > Several
> > Amazonians have invested time, love and resources both inside and outside
> > working hours to this project. I don't think it's fair to them that their
> > contributions are not taken impartially irregardless of their current
> > employer, neither would be against a member of any other organization,
> sex,
> > condition etc.
> >
> > -1
> >
> > Pedro.
> >
> >
> >
> >
> > On Mon, Oct 22, 2018 at 6:02 PM Tianqi Chen  wrote:
> >
> > > I want to clarify that this would not prevent Amazon contributors from
> > > being nominated. Nor it would prevent collaboration between Amazon
> > > employees. A good thing about Apache is that everything is recorded and
> > > presented to the entire community, this includes the dev list,
> > > github review/commit history, documentation, wiki.
> > >
> > > Specifically to Amazon contributors, there are non-Amazon PMCs can
> watch
> > > these evidence and bring these contributors on board -- and I am very
> > > certain this would be the case. If the problem that there are too few
> > > non-Amazon PMCs, then it wouldn't hurt to try to get more on board, and
> > > then get them in the term to nominate PMC contributors.
> > >
> > > One of the key criteria of graduation for an Apache Project is that it
> > > should not be controlled by a single entity. I think that whether we
> can
> > > execute this guideline is exactly a good test to check if we pass that
> > bar.
> > >
> > > On Sun, Oct 21, 2018 at 9:19 PM Naveen Swamy 
> wrote:
> > >
> > > > this suggestion looks like it is putting the onus on contributors to
> > > > collaborate with contributors outside their org to get nominated to
> be
> > > > committer or a PMC of this project.
> > > > Every organization has its own business goals, on the way to meet
> their
> > > > objectives if their employees happen to be great contributors to this
> > > > project, I would expect PMC members(wearing their Apache hat) to
> > > recognize
> > > > them and give them a greater role in the project.
> > > > I would assume the responsibility of increasing the diversity is
> solely
> > > > upon the PMC members, the PMC should look ways to evangelize the
> > project,
> > > > mentor new contributors, nominate and make them a part of the
> project's
> > > > journey.
> > > > I do agree that we have to increase the diversity and suggest to
> > explore
> > > > different ways( for example collaborate with other successful Open
> > source
> > > > projects to get their members excited about MXNet).
> > > >
> > > > Guideline or not, I cannot agree to this in principle.
> > > > -1
> > > >
> > > >
> > > > On Sun, Oct 21, 2018 at 8:22 PM Tianqi Chen <
> tqc...@cs.washington.edu>
> > > > wrote:
> > > >
> > > > >

Re: [Discussion] PMC and Committer Courtesy: Only Propose Candidate in a Different Organization

2018-10-23 Thread Pedro Larroy
Hi

Tianqi, there's a saying that the road to hell is paved with good
intentions. I think most of us here are enthusiastic about increasing
diversity of contributions to this project and are working actively towards
this. Having any kind of positive or negative discrimination seems to me
like it goes against what's listed on the Apache website, under the section
"Meritocracy". https://www.apache.org/foundation/how-it-works.html  Several
Amazonians have invested time, love and resources both inside and outside
working hours to this project. I don't think it's fair to them that their
contributions are not taken impartially irregardless of their current
employer, neither would be against a member of any other organization, sex,
condition etc.

-1

Pedro.




On Mon, Oct 22, 2018 at 6:02 PM Tianqi Chen  wrote:

> I want to clarify that this would not prevent Amazon contributors from
> being nominated. Nor it would prevent collaboration between Amazon
> employees. A good thing about Apache is that everything is recorded and
> presented to the entire community, this includes the dev list,
> github review/commit history, documentation, wiki.
>
> Specifically to Amazon contributors, there are non-Amazon PMCs can watch
> these evidence and bring these contributors on board -- and I am very
> certain this would be the case. If the problem that there are too few
> non-Amazon PMCs, then it wouldn't hurt to try to get more on board, and
> then get them in the term to nominate PMC contributors.
>
> One of the key criteria of graduation for an Apache Project is that it
> should not be controlled by a single entity. I think that whether we can
> execute this guideline is exactly a good test to check if we pass that bar.
>
> On Sun, Oct 21, 2018 at 9:19 PM Naveen Swamy  wrote:
>
> > this suggestion looks like it is putting the onus on contributors to
> > collaborate with contributors outside their org to get nominated to be
> > committer or a PMC of this project.
> > Every organization has its own business goals, on the way to meet their
> > objectives if their employees happen to be great contributors to this
> > project, I would expect PMC members(wearing their Apache hat) to
> recognize
> > them and give them a greater role in the project.
> > I would assume the responsibility of increasing the diversity is solely
> > upon the PMC members, the PMC should look ways to evangelize the project,
> > mentor new contributors, nominate and make them a part of the project's
> > journey.
> > I do agree that we have to increase the diversity and suggest to explore
> > different ways( for example collaborate with other successful Open source
> > projects to get their members excited about MXNet).
> >
> > Guideline or not, I cannot agree to this in principle.
> > -1
> >
> >
> > On Sun, Oct 21, 2018 at 8:22 PM Tianqi Chen 
> > wrote:
> >
> > > >
> > > >  Many potential committers and
> > > > PMC won’t interact with the non-Amazonians at all (since there are so
> > > few),
> > > > so they’d be relegated to obscurity and hopelessness by default.
> > > >
> > >
> > > If potential contributors do not comes from Amazon, then the Amazonian
> > PMC
> > > can nominate them :)  If the potential contributors does comes from
> > Amazon,
> > > then it is not a bad thing to interact with bigger part of the
> > community. I
> > > can expect that as more non-Amazonian contributors get nonimated, this
> > > would make the process more healthy.
> > >
> > > Like neural networks, any guideline can be played in adverserial
> fashion
> > > (e.g. in the case of the gray areas). I think having a goodwill to push
> > the
> > > guideline will understandably make people to work together.
> > >
> > > Afterall, this is an Apache project that should goes beyond a single
> > > company
> > >
> > > Tianqi
> > >
> > > >
> > > >
> > > >
> > > > On Sun, Oct 21, 2018 at 5:06 PM Steffen Rochel <
> > steffenroc...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Tianqi -
> > > > > +1 . I like the idea to grow diversity at the project and encourage
> > > > > communication beyond people sitting next to each other. I also
> > support
> > > > the
> > > > > way you described as guideline, not has a hard rule. I think it is
> > > > > important we focus on merit and contributions when evaluating
> nominee
> > > for
> > > > > committer and PPMC.
> > > > >
> > > > > Carin started a draft document for revised criteria for committer
> and
> > > > PPMC
> > > > > membership
> > > > > <
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Become+an+Apache+MXNet+%28incubating%29+Committer+and+PPMC+Member+Proposal
> > > > > >.
> > > > > I suggest to contribute, provide feedback and suggestion including
> > your
> > > > > proposal.
> > > > >
> > > > > Steffen
> > > > >
> > > > > On Sun, Oct 21, 2018 at 10:22 AM Tianqi Chen 
> > > wrote:
> > > > >
> > > > > > Dear MXNet Community:
> > > > > >
> > > > > > There has been a great discussion going on in terms of
> > > 

Re: Include MKLDNN into default mxnet pip package

2018-10-19 Thread Pedro Larroy
I did  pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen 1950X and unit
tests are passing.

Is this build using AVX512?  in /proc/cpuinfo I see only "avx" flag.
There's no "avx2" like on recent intel cpus.

Pedro.

On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko  wrote:

> Awesome collaborative effort across many contributors and companies!
>
> The boost is impressive and for MXNet users to get this boost "out of the
> box" is a great benefit and makes MXNet an even better choice.
>
> Alex - can you clarify whether there are any down sides with regards to
> noon AVX-512 architectures, AMD CPUs, etc? Will it gracefully fallback?
>
> Hagay
>
>
> On Fri, Oct 19, 2018, 15:46 Sergio Fernández  wrote:
>
> > If there is no downside on platforms not supporting AVX512 instructions,
> > then +1
> >
> >
> > On Wed, Oct 17, 2018, 14:10 Alex Zai  wrote:
> >
> > > Hey all,
> > > We have been working hard these past few months to integrate and
> > stabilize
> > > Intel’s MKLDNN deep learning CPU accelerator into Mxnet and have made
> > > incredible progress. On CPUs with AVX512 instructions (such as c5.18x)
> we
> > > have seen performance increase up to 12x and on other platforms (Macs,
> > > AVX2) we seen a speedup of 1.5+. Full list of benchmarks can be found
> > here
> > > (
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650764
> > >  and https://github.com/apache/incubator-mxnet/pull/12591).
> > >
> > > Currently, using this accelerator requires the developer to either pip
> > > install the mxnet-mkl version of mxnet or to build it themselves from
> > > source. Given that we should try to provide the best performance "out
> of
> > > the box” with mxnet we should include this in the default build. The
> > mkldnn
> > > library is included with in the pip package build so it does not
> require
> > an
> > > external dependency.
> > >
> > > There were concerns that MKLDNN could cause regressions on certain
> > > platforms (as it did with the tensorflow version a while back); but we
> > > added a env flag (MXNET_MKLDNN_ENABLED) that allows users to turn of
> this
> > > feature during runtime. Please bring up any other concerns you may have
> > and
> > > your thoughts on including this accelerator in the default build.
> > >
> > > Best,
> > > Alex
> > >
> >
>


Re: [LAZY VOTE]: rename dockerfiles s/.build.//

2018-10-19 Thread Pedro Larroy
Allright then, I closed that PR. Thanks for your feedback.

On Wed, Oct 17, 2018 at 9:06 PM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> May be of interest to people that we're trying get a good set of
> production-ready Dockerfiles (which I'm referring to as runtime Dockerfiles
> in this thread) with a PR open here:
> https://github.com/apache/incubator-mxnet/pull/12791 (thanks for updating
> these Meghna).
>
> On Wed, Oct 17, 2018 at 12:00 PM Naveen Swamy  wrote:
>
> > I agree with Kellen on not renaming the CI docker files (by renaming - i
> > think its implicit you can use these for production) i don't think we
> > should telling our users go use these bloated docker files, you could
> > create lean separate docker files for production use-case with only
> > necessary runtime packages.
> >
> > -1
> >
> > On Wed, Oct 17, 2018 at 11:48 AM kellen sunderland <
> > kellen.sunderl...@gmail.com> wrote:
> >
> > > Hey Pedro, sorry I still don't see a good reason to justify changing
> the
> > > filenames.  Renaming them to be less specific isn't going to explain to
> > > users what the purpose of the files is, and it could cause breakages
> with
> > > any system that refer to these files including external company's CI
> > > systems.  If I think of the benefits versus potential errors introduced
> > by
> > > making the change I see more potential risk than obvious benefits.  I
> > also
> > > feel that this change will make the difference between the runtime
> docker
> > > files and the CI docker files less clear to users, not more clear.  In
> > > general I think adding a descriptive README.md would server our
> purposed
> > > better here.  Happy to hear what others think.
> > >
> > > On Wed, Oct 17, 2018 at 6:45 AM Pedro Larroy <
> > pedro.larroy.li...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi Kellen, thank you for your response.
> > > >
> > > > Maybe I didn't explain myself correctly. The purpose of this
> > > infrastructure
> > > > is not changed.
> > > >
> > > > I'm not planning to use these Dockerfiles as MXNet docker containers
> > for
> > > > users to run MXNet, that is a separate concern.
> > > >
> > > > It is just that some of this Dockerfiles we use in CI to build, test
> > and
> > > > generate documentation, so are used as a runtime container as well.
> > Thus
> > > > i'm just changing the pathing for semantic reasons and remove the
> > .build.
> > > > which is just noise.
> > > >
> > > > As an example I would like to explain that we are about to merge the
> PR
> > > > which uses QEMU to run the unit tests, so there's an associated
> > > Dockerfile
> > > > which hosts the QEMU runtime environment used to execute the unit
> tests
> > > in
> > > > an ARM emulated machine. Thus makes little sense that these
> Dockerfiles
> > > are
> > > > called "build".  I don't know if my explanation changes your vote.
> > Either
> > > > way please let me know. Separating this change in a different PR was
> > > > suggested by several MXNet contributors during review.
> > > >
> > > > Pedro.
> > > >
> > > > On Wed, Oct 17, 2018 at 11:21 AM kellen sunderland <
> > > > kellen.sunderl...@gmail.com> wrote:
> > > >
> > > > > -1. (non-binding)
> > > > >
> > > > > These Dockerfiles are very bloated and imo only useful for
> creating a
> > > > build
> > > > > environment or running tests.  Just as you wouldn't setup a server
> > for
> > > a
> > > > > service and then install 200 packages that may or may not be used
> for
> > > the
> > > > > service I wouldn't recommend using these Dockerfiles at runtime.
> > > Runtime
> > > > > Dockerfiles should in my opinion be as lightweight and suited to
> > their
> > > > task
> > > > > as possible.
> > > > >
> > > > > On Wed, Oct 17, 2018, 1:58 AM Hagay Lupesko 
> > wrote:
> > > > >
> > > > > > The PR provides a good explanation of this change and all code
> > > updates.
> > > > > > LGTM.
> > > > > >
> > > > > > On Tue, Oct 16, 2018 at 8:41 AM Pedro Larroy <
> > > > > pedro.larroy.li...@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi
> > > > > > >
> > > > > > > I would like to rename the dockerfiles since they are used as a
> > > > runtime
> > > > > > > environment and not only as build as they were initially
> > intended.
> > > > > > >
> > > > > > > More info about the change in this PR:
> > > > > > > https://github.com/apache/incubator-mxnet/pull/12423/files
> > > > > > >
> > > > > > >
> > > > > > > Pedro.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Include MKLDNN into default mxnet pip package

2018-10-18 Thread Pedro Larroy
Very nice! 

Pedro

> On 17. Oct 2018, at 23:12, Alfredo Luque  
> wrote:
> 
> This is huge. Thanks for working on this. Is there a similar plan with eg;
> tensor-rt support being ported into the main cuda-9.x packages?
> 
> On October 17, 2018 at 2:10:20 PM, Alex Zai (aza...@gmail.com) wrote:
> 
> Hey all,
> We have been working hard these past few months to integrate and stabilize
> Intel’s MKLDNN deep learning CPU accelerator into Mxnet and have made
> incredible progress. On CPUs with AVX512 instructions (such as c5.18x) we
> have seen performance increase up to 12x and on other platforms (Macs,
> AVX2) we seen a speedup of 1.5+. Full list of benchmarks can be found here
> (
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650764
> and https://github.com/apache/incubator-mxnet/pull/12591).
> 
> Currently, using this accelerator requires the developer to either pip
> install the mxnet-mkl version of mxnet or to build it themselves from
> source. Given that we should try to provide the best performance "out of
> the box” with mxnet we should include this in the default build. The mkldnn
> library is included with in the pip package build so it does not require an
> external dependency.
> 
> There were concerns that MKLDNN could cause regressions on certain
> platforms (as it did with the tensorflow version a while back); but we
> added a env flag (MXNET_MKLDNN_ENABLED) that allows users to turn of this
> feature during runtime. Please bring up any other concerns you may have and
> your thoughts on including this accelerator in the default build.
> 
> Best,
> Alex
> 
> —
> Alfredo Luque
> Software Engineer
> Machine Learning Infrastructure
> Airbnb
> San Francisco, CA


Re: Storing PGP Key for Publishing packages

2018-10-17 Thread Pedro Larroy
Do nightly artifacts need to be signed? For releases what you wrote and what 
Apache recommends makes total sense. Thus artifacts from cd can’t be signed 
manually.

Pedro

> On 17. Oct 2018, at 22:29, Naveen Swamy  wrote:
> 
> I am collaborating with Zach Kimberg and Qing to work on automatic (
> currently its very tedious and time consuming) publishing the MXNet-Scala
> maven package to Apache Snapshot repo(either as nightly or weekly), for
> publishing the package the artifacts need to be signed with a committer's
> key, however Zach found Apache seems to strictly advise against storing the
> PGP Keys, so I suggested to look at what Spark is doing and he found that
> they are releasing to Apache Snapshots as a nightly job so they got to be
> storing the credentials on the host.
> I am looking for advise from Mentors on how to proceed with this?
> 
> One option(not preferable) is to publish to a private Repo or an S3 bucket
> and only during the release and the keys continue to remain in the
> committers control.
> 
> -- Advise on PGP Key storage on Apache website--
> 
> 
> “It is recommended that you create a PGP key for your apache.org address
> now (or add that address to an existing key, if you have one). *DO NOT* create
> this key on any machine to which multiple users have access and *DO NOT*,
> ever, copy your private key to any other shared machine. Release managers
> need to take particular care of keys used to sign releases
> .“ (
> https://www.apache.org/dev/new-committers-guide.html#set-up-security-and-pgp-keys
> )
> 
> “Strictly speaking, releases must be *verified
> *
> on
> hardware owned and controlled by the committer. That means hardware the
> committer has physical possession and control of and exclusively full
> administrative/superuser access to. That's because only such hardware is
> qualified to hold a PGP private key, and the release should be verified on
> the machine the private key lives on or on a machine as trusted as that.” (
> https://www.apache.org/legal/release-policy.html#release-signing)
> 
> ---
> 
> 
> Thanks, Naveen


Re: [LAZY VOTE]: rename dockerfiles s/.build.//

2018-10-17 Thread Pedro Larroy
Hi Kellen, thank you for your response.

Maybe I didn't explain myself correctly. The purpose of this infrastructure
is not changed.

I'm not planning to use these Dockerfiles as MXNet docker containers for
users to run MXNet, that is a separate concern.

It is just that some of this Dockerfiles we use in CI to build, test and
generate documentation, so are used as a runtime container as well. Thus
i'm just changing the pathing for semantic reasons and remove the .build.
which is just noise.

As an example I would like to explain that we are about to merge the PR
which uses QEMU to run the unit tests, so there's an associated Dockerfile
which hosts the QEMU runtime environment used to execute the unit tests in
an ARM emulated machine. Thus makes little sense that these Dockerfiles are
called "build".  I don't know if my explanation changes your vote. Either
way please let me know. Separating this change in a different PR was
suggested by several MXNet contributors during review.

Pedro.

On Wed, Oct 17, 2018 at 11:21 AM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> -1. (non-binding)
>
> These Dockerfiles are very bloated and imo only useful for creating a build
> environment or running tests.  Just as you wouldn't setup a server for a
> service and then install 200 packages that may or may not be used for the
> service I wouldn't recommend using these Dockerfiles at runtime.  Runtime
> Dockerfiles should in my opinion be as lightweight and suited to their task
> as possible.
>
> On Wed, Oct 17, 2018, 1:58 AM Hagay Lupesko  wrote:
>
> > The PR provides a good explanation of this change and all code updates.
> > LGTM.
> >
> > On Tue, Oct 16, 2018 at 8:41 AM Pedro Larroy <
> pedro.larroy.li...@gmail.com
> > >
> > wrote:
> >
> > > Hi
> > >
> > > I would like to rename the dockerfiles since they are used as a runtime
> > > environment and not only as build as they were initially intended.
> > >
> > > More info about the change in this PR:
> > > https://github.com/apache/incubator-mxnet/pull/12423/files
> > >
> > >
> > > Pedro.
> > >
> >
>


Re: Call for participation: evaluate Java API

2018-10-16 Thread Pedro Larroy
To play around we use the java api branch?  is there a link to some example
code?

Thanks.

On Fri, Oct 12, 2018 at 9:16 PM Davydenko, Denis <
dzianis.davydze...@gmail.com> wrote:

> Not so long ago there was a design shared for MXNet Java API: [1]
>
> In a couple of days we are going to have initial version of its
> implementation. We are looking for users who would like to get this initial
> version and evaluate how well it suits their use cases or just play around
> with it and provide feedback on its usability and performance. This initial
> version includes:
> - Predictor
> - ObjectDetector
> - NDArray, Context, Shape, DataDesc
> - Reference implementation of SSD
>
> If you or someone you know is interested - please do not hesitate to reach
> out!
>
> [1]:
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Java+Inference+API
>
>
>
>


Re: Reproducing test failures on CI

2018-10-16 Thread Pedro Larroy
These are two separate events. The London meetup is not related to Anton's
original email.

Regarding reproducing CI failures I would suggest that we create some easy
to use scripts and templates to launch instances rather than lengthy
documentation or materials. If the process is complex, automation is always
better than lengthy instructions.

It should be a couple of instructions to reproduce test failures locally or
in EC2.
I have a personal terraform file and scripts which I use to provision
instances to do MXNet work in which does all the tedious configuration. I
could polish them up a bit and create a PR. Another script would be needed
to launch build & test easily as now with the complexity of the
JenkinsFiles is too convoluted to reverse engineer for somebody not
familiar with CI.

There's this nice guide that Marco created:
https://cwiki.apache.org/confluence/display/MXNET/Reproducing+test+results

But seems not many people read it, also it doesn't solve provisioning the
instance and installing the initial dependencies.


Pedro.

On Mon, Oct 15, 2018 at 8:58 PM Naveen Swamy  wrote:

> Timur,
> Here is a meetup Scheduled for 23rd October in London, where Pedro Larroy
> will talk about Deep Learning using MXNet!
>
>
> https://www.meetup.com/Deep-Learning-with-Apache-MXNet-London/events/255280739/
>
>
> -Naveen
>
> On Mon, Oct 15, 2018 at 11:18 AM Anton Chernov 
> wrote:
>
> > Sorry, Timur, I've missed that part.
> >
> > It will be during the regular user group meeting that is conducted in
> > Berlin and is streamed via Chime. You can find more information on the
> > wiki:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28Incubating%29+User+Groups+recurring+meetings
> >
> > Best
> > Anton
> >
> >
> > пн, 15 окт. 2018 г. в 18:45, Timur Shenkao :
> >
> > > Is it London meeting?
> > > Or some other location?
> > >
> > > On Monday, October 15, 2018, Anton Chernov 
> wrote:
> > >
> > > > Dear MXNet community,
> > > >
> > > > We've noticed that there has been some difficulties setting up
> > > environments
> > > > and reproducing test results from failed builds on the CI. We would
> > like
> > > to
> > > > offer some help to the community on that and therefore helding a
> small
> > > live
> > > > stream demo session during our User Group Meeting on the 23rd of
> > October.
> > > > We will be:
> > > >
> > > > * Reviewing a failure and make an initial guess on the cause
> > > > * Setting up environment
> > > > * Reproducing the build step from the CI
> > > > * Reproducing a failure step
> > > > * Making and submitting a fix back to the community
> > > >
> > > > Feel free to propose some additional topic for the streaming.
> > > >
> > > > Best regards
> > > > Anton
> > > >
> > >
> >
>


[LAZY VOTE]: rename dockerfiles s/.build.//

2018-10-16 Thread Pedro Larroy
Hi

I would like to rename the dockerfiles since they are used as a runtime
environment and not only as build as they were initially intended.

More info about the change in this PR:
https://github.com/apache/incubator-mxnet/pull/12423/files


Pedro.


Re: Maturity Model and Graduation

2018-10-06 Thread Pedro Larroy
Thanks Steffen

This document looks great and to me showcases clearly the areas in which we
can improve. CO50, CO30 and IN10 standout to me in particular.

Pedro.



On Thu, Oct 4, 2018 at 6:38 PM Steffen Rochel 
wrote:

> I started a draft assessment -
>
> https://cwiki.apache.org/confluence/display/MXNET/Apache+Maturity+Model+Assessment+for+MXNet
> based
> on my personal view. Please keep in mind I'm new to the project and Apache
> (just attended my first ApacheCon!!). The items I was not sure myself I
> marked as "???".
>
> Jim et all - looking for your guidance if the assessment should be
> discussed further within PPMC, on private@ and next steps.
>
> Steffen
>
> On Fri, Sep 28, 2018 at 1:53 PM Pedro Larroy  >
> wrote:
>
> > So Isabel, are you saying that if we publish a clearer TODO list or
> > contributions needed material we might get more contribution there?
> >
> > One thing that I like from other projects is to make a list of
> low-hanging
> > fruit issues or easy contributions that newcomers can pick to get
> familiar
> > with the project, especially in projects like MXNet in which some
> > contributions might require significant ramp up time, technical and
> > mathematical skills or domain knowledge.
> >
> > Pedro.
> >
> > On Fri, Sep 28, 2018 at 3:06 AM Isabel Drost-Fromm 
> > wrote:
> >
> > >
> > >
> > > On 28/09/18 11:27, kellen sunderland wrote:
> > > > I'd love to see some more
> > > > sustained contribution from other open source communities to help us
> > out
> > > in
> > > > this area
> > >
> > > That's not exactly the model I have seen to work. What I have seen
> works
> > > really well at other projects is pulling users in as committers in a
> > > scratch your own itch kind of way. For that to work you need to make it
> > > clear what contributions you need, you need to make time to coach
> people
> > > to become developers, you need to make your users accustomed to the way
> > > you work as early as possible. It also helps to ask users for
> > > contributions and offer mentoring help from your side along the way.
> > >
> > > I know that this is tedious work that needs a lot of motivating people,
> > > mentoring people, explaining to people, however it makes for a
> > > sustainable community of people that do the work out of self interest.
> > >
> > >
> > >
> >
> http://blog.isabel-drost.de/posts/open-development-and-inner-source-for-fun-and-profit.html
> > >
> > >
> > > Isabel
> > >
> >
>


Re: Which merge option to use on the Import Julia binding PR?

2018-10-05 Thread Pedro Larroy
; > >> On Fri, Sep 28, 2018 at 9:51 PM Marco de Abreu
> > > > > > > >> > >>  wrote:
> > > > > > > >> > >>
> > > > > > > >> > >> > Are we sure that this is due to lacking permissions
> and
> > > not
> > > > > > > >> because of
> > > > > > > >> > >> some
> > > > > > > >> > >> > technical limitation? If we are certain, we can ask
> out
> > > > > mentors
> > > > > > > to
> > > > > > > >> > >> create a
> > > > > > > >> > >> > ticket with Apache Infra to make that switch.
> > > > > > > >> > >> >
> > > > > > > >> > >> > -Marco
> > > > > > > >> > >> >
> > > > > > > >> > >> > Carin Meier  schrieb am Sa.,
> 29.
> > > > Sep.
> > > > > > > 2018,
> > > > > > > >> > >> 01:17:
> > > > > > > >> > >> >
> > > > > > > >> > >> > > I made a test regular merge commit into a copy of
> > > master.
> > > > > It
> > > > > > > >> seemed
> > > > > > > >> > >> to go
> > > > > > > >> > >> > > fine. Here is a listing of what it will look like
> for
> > > > > > everyone.
> > > > > > > >> > >> > >
> > > > > > > >> > >> > >
> > > > > > > >> > >> >
> > > > > > > >> > >>
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/commits/test-merge-julia-import
> > > > > > > >> > >> > >
> > > > > > > >> > >> > > Although, I would be happy to push the merge
> button.
> > I
> > > > > think
> > > > > > > the
> > > > > > > >> > most
> > > > > > > >> > >> > > important thing is to get the PR merged, so
> whatever
> > > way
> > > > is
> > > > > > the
> > > > > > > >> best
> > > > > > > >> > >> to
> > > > > > > >> > >> > > make that happen, let's do it.
> > > > > > > >> > >> > >
> > > > > > > >> > >> > > So - Does the regular merge seem like a good
> option?
> > > > > > > >> > >> > > If so, what is the best way to make that happen?
> > > > > > > >> > >> > >
> > > > > > > >> > >> > >
> > > > > > > >> > >> > > On Fri, Sep 28, 2018 at 6:05 PM Chiyuan Zhang <
> > > > > > > plus...@gmail.com
> > > > > > > >> >
> > > > > > > >> > >> wrote:
> > > > > > > >> > >> > >
> > > > > > > >> > >> > > > Agreed with Pedro. Maybe the merge-commit option
> > from
> > > > the
> > > > > > > >> github
> > > > > > > >> > >> > > interface
> > > > > > > >> > >> > > > was disabled for a reason. But as Pedro said,
> maybe
> > > it
> > > > is
> > > > > > > good
> > > > > > > >> to
> > > > > > > >> > >> > > > temporarily enable it for this PR and merge using
> > > that.
> > > > > > > >> > >> > > >
> > > > > > > >> > >> > > >
> > > > > > > >> > >> > > >- It should be technically easier than
> rebasing
> > > due
> > > > to
> > > > > > the
> > > > > > > >> > >> > > >git-subtree-import issue we are c

  1   2   3   >