Re: [apache/incubator-mxnet] [RFC] Raising the toolchain requirements for MXNet 2 (#17968)

2020-07-20 Thread Leonard Lausen
Closed #17968.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/17968#event-3568205765

Re: [apache/incubator-mxnet] [RFC] MXNet 2.0 API Deprecation (#17676)

2020-07-15 Thread Leonard Lausen
NNPACK is currently only supported in the Makefile build 
(https://github.com/apache/incubator-mxnet/issues/15974), which will be 
removed. I think oneDNN (mkldnn) replaced it and we can remove it. Any concerns?

-- 
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/17676#issuecomment-659039903

Re: [apache/incubator-mxnet] [RFC] Use TVMOp with GPU & Build without libcuda.so in CI (#18716)

2020-07-15 Thread Leonard Lausen
> Violates the effort of removing libcuda.so totally, (would be great if 
> someone can elaborate the motivation behind it).

Many customers use a single mxnet build that supports gpu features and deploy 
it to both gpu and cpu machines. Due to the way how cuda containers are 
designed, libcuda.so won't be present on the cpu machines. That's why it's 
better to dlopen(cuda) only once needed. This not only affects tvmop but als 
nvrtc feature in mxnet.

Using the stubs is a workaround for using dlopen, but adds additional 
requirements for modifying the LD_LIBRARY_PATH on users cpu machines. That's 
not always feasible for users and for mxnet 1.6, which introduced nvrtc, users 
typically just disable the nvrtc feature to be able to deploy the libmxnet.so 
to both cpu and gpu machines. 

Why not fix the underlying problem and then enable tvmop feature?

> Also, When setting -DUSE_TVM_OP=OFF the CI checks would be stuck. 

That doesn't make sense as we are running CI successfully with tvm op disabled 
since a couple of months? Maybe you ran into some unrelated flakyness and need 
to retrigger the run? 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18716#issuecomment-658846227

RE: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0

2020-07-13 Thread Leonard Lausen
One of the selling points of MXNet is (or used to be) speed and having multiple
releases in series with speed regressions may not be acceptable to users that
adopted MXNet based on the speed advantage. Should we vote on a 1.7 Beta release
and only vote on 1.7 final release once the regressions have been fixed?

On Mon, 2020-07-13 at 19:33 +, Patrick Mu wrote:
> It happens only on CPU, and I did more runs and found that the runtime
> fluctuates very badly, but the average regression is ~10%.
> 
> 
> Through the previous benchmarks I also found some worse regression comparing
> 1.6 to 1.5 like inception inference on CPU and those regression was not
> caught.
> 
> My 2-cent is it might not be a blocker for the release, and we can have room
> for improvement for upcoming 2.0 and 1.7.1 if necessary
> 
> Ziyi
> 
> On 2020/07/13 08:40:32, "Chen, Ciyong"  wrote:
> > Thanks Ziyi,
> > 
> > May I know which platform did you notice the performance regression, CPU or
> > GPU? ~20% regression would be a large gap.
> > 
> > Thanks,
> > -Ciyong
> > 
> > -Original Message-
> > From: Patrick Mu 
> > Sent: Monday, July 13, 2020 4:13 PM
> > To: d...@mxnet.apache.org
> > Subject: Re: RE: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0
> > 
> > Hi Ciyong,
> > 
> > I have reverted the commit, and I am able to train Yolov3 with no problem.
> > 
> > However I also noticed there is a ~20% regression in 1.7 comparing with 1.6
> > in inference Yolov3 with Module API, so we are going to discuss tomorrow if
> > that would be an issue for 1.7.
> > 
> > Thanks,
> > Ziyi
> > 
> > On 2020/07/13 02:19:28, "Chen, Ciyong"  wrote:
> > > Hi Ziyi, Xingjian,
> > > 
> > > Thanks for reporting the issues from GluonCV/AutoGluon perspective.
> > > I just did a quick try by reverting the 
> > > https://github.com/apache/incubator-mxnet/pull/18358, then the behavior is
> > > same as 1.6.0 with the cases in the gist (
> > > https://gist.github.com/sxjscience/944066c82e566f1b89b01fa226678890).
> > > 
> > > Considering there's many end-users using Gluon based API/models, and
> > > introducing a new patch to fix this issue could be risky, so I agree that
> > > reverting this PR (#18358) might be the best option for the 1.7.0 release.
> > > But I'm considering is there any other test cases to cover this feature,
> > > which could be helpful to track this kind of code changes in future, or
> > > can you help to verify if this revert do resolve the broken issue at your
> > > side?
> > > 
> > > > Thus, the real issue is: Should we supporting pickling a Gluon Block? If
> > > > not, should we support combining multiprocessing.pool with the Gluon
> > > > Block?
> > > Seems it's more like a new feature for MXNet Gluon Block, probably we can
> > > make it available in the next patch/minor release?
> > > 
> > > Thanks,
> > > -Ciyong
> > > 
> > > -Original Message-
> > > From: Xingjian SHI 
> > > Sent: Saturday, July 11, 2020 4:27 AM
> > > To: dev@mxnet.incubator.apache.org; d...@mxnet.apache.org
> > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0
> > > 
> > > Thanks Ziyi,
> > > 
> > > I've discovered the same issue when I'm trying to use AutoGluon with
> > > 1.7.0rc0 and would like to share my finding:
> > > 
> > > Basically, I don't think Gluon Block is designed to be pickleble. But
> > > pickling do work for some cases in the old version:
> > > 
> > > I've included two cases in the gist (
> > > https://gist.github.com/sxjscience/944066c82e566f1b89b01fa226678890).
> > > 
> > > - Case1: we construct a gluon block, hybridize it and feed one NDArray to
> > > help initialize the block. After that, it will no longer be pickleble.
> > > - Case2: we just construct a gluon block and it will be pickleble in
> > > 1.6.0, but won't be pickleble in 1.7.0.
> > > 
> > > Thus, the real issue is: Should we supporting pickling a Gluon Block? If
> > > not, should we support combining multiprocessing.pool with the Gluon
> > > Block? For reference, PyTorch supports pickling the nn.Module as shown in:
> > > https://gist.github.com/sxjscience/90b812a66d445e759c55eedc3ef93668 and
> > > also in the doc (
> > > https://pytorch.org/tutorials/beginner/saving_loading_models.html).
> > > 
> > > Best,
> > > Xingjian
> > > 
> > > 
> > > On 7/10/20, 11:31 AM, "Patrick Mu"  wrote:
> > > 
> > > Hi Ciyong,
> > > 
> > > I just discovered an issue with the 1.7, which causes the Yolo
> > > training with latest Gluon CV Yolo to fail.
> > > 
> > > The PR that causes the failure is 
> > > https://github.com/apache/incubator-mxnet/pull/18358, which
> > > modifies  basic blocks of Gluon to fix a memory leak issue.
> > > 
> > > Talked with Leonard, the author of the PR, and he said he found the
> > > root cause, but patching that PR would modifies those Gluon basic blocks
> > > further, which might be risky towards existing models and various customer
> > > models.
> > > 
> > > So my 2-cents is reverting this PR in 1.7, and try 

Re: [DISCUSS] When to add Apache Headers to Third Party Code [WAS] Re: [MENTORS] PPMC case-by-case decision for major modifications of third-party work guidance

2020-06-23 Thread Leonard Lausen
Hi Ciyong,

the consensus passed, so we should proceed according to the consensus.

Thank you
Leonard

On Tue, 2020-06-23 at 09:04 +, Chen, Ciyong wrote:
> Hi all,
> 
> I'm wondering if there's any further concerns for this "72 hours lazy
> consensus"?
> Shall we continue with the option of "I believe PPMC would prefer to put the
> ASF header on top of the file (ie. 2 headers)"
> 
> Thanks,
> -Ciyong
> 
> -Original Message-
> From: Leonard Lausen 
> Sent: Tuesday, June 16, 2020 7:06 AM
> To: dev@mxnet.incubator.apache.org; gene...@incubator.apache.org
> Subject: Re: [DISCUSS] When to add Apache Headers to Third Party Code [WAS]
> Re: [MENTORS] PPMC case-by-case decision for major modifications of third-
> party work guidance
> 
> Thank you everyone for your valuable advice.
> 
> > so if you did want to avoid including the license in your releases you
> > would either need to rely on the file as an external dependency or
> > completely reimplement the functionality not deriving it from this
> > file.
> 
> Including the BSD-3 style license in releases wouldn't be a problem, as it's
> compatible with Apache License 2. As there are substantial changes, I believe
> PPMC would prefer to put the ASF header on top of the file (ie. 2 headers) [72
> hours lazy consensus if there are no concerns]. We still need to declare all
> the numpy einsum derived files in the LICENSE and fix the inconsistency that
> ASF header was removed in src/operator/numpy/np_einsum_op-inl.h but remains in
> src/operator/numpy/np_einsum_path_op-inl.h
> 
> Related: As PPMC strives to provide partial API compatibility with NumPy in
> MXNet 2 based on the NumPy Array Function Protocol [1], could you clarify if
> these MXNet operators should be considered derived from NumPy (thus warranting
> the BSD-3 style license headers) solely based on integrating with the NumPy
> API and providing compatible operators? Or only (as in the einsum case above),
> if the actual implementation was derived from NumPy's implementation. I
> believe it's the latter, but please clarify if that's wrong.
> 
> Should ASF update the "Do not add the standard Apache License header to the
> top of third-party source files." at [2]? This sentence was the motivation to
> open this discussion thread, and according to the current consensus here is
> "incomplete". How about adding an "unless the third-party source file contains
> major modifications by ASF" to clarify?
> 
> Thank you
> Leonard
> 
> [1]: https://numpy.org/neps/nep-0018-array-function-protocol.html
> [2]: https://www.apache.org/legal/src-headers.html#3party
> 
> On Mon, 2020-06-15 at 09:36 -0400, John D. Ament wrote:
> > On Sat, Jun 13, 2020 at 2:19 PM Bob Paulin  wrote:
> > 
> > > Hi,
> > > 
> > > I agree there does not appear to be consensus on when it's
> > > appropriate to add Apache License Headers to Third Party code across
> > > projects.  Here is Justin's email that request the Apache Headers
> > > removed [1]
> > > 
> > > 
> > > 
> > > - file copyright  NumPy Developers [6] this file look to incorrectly
> > > have an ASF header on it 
> > > 6. ./src/operator/numpy/np_einsum_path_op-inl.h
> > > 
> > > 
> > > We want to make the choice that will be most sustainable for the
> > > project and most correct for the situation.
> > > 
> > > Based on the emails I linked in the prior email it does seem like
> > > the cases where dual headers are appropriate is when there are Major
> > > Modifications.  In the case of
> > > 
> > > np_einsum_path_op-inl.h
> > > 
> > > The file is derived from the implementation in Numpy [2].  If the
> > > implementation in Numpy changes will this file change?  If so then
> > > the community will be tasked with continuing to re-port the changes
> > > over that is always based on Numpy so it may be more appropriate to
> > > just keep the Numpy license.
> > > 
> > > Will MXNet likely evolve this file in a way that it's no longer
> > > resembles the Numpy implementation (Major Modification)?  If so it
> > > may be better to keep the Apache Header as going forward the file
> > > will represent the work of the MXNet community not that of Numpy.
> > > 
> > 
> > Keeping the (what appears to be) BSD-3 style license is perfectly fine
> > and is in fact what the NumPy license says to do.  We would only
> > change the license from the NumPy license to ALv2 if an SGA or ICLA is
> > received from all contribut

Re: Update on non-compliant releases on repository.apache.org and how to obtain a compliant build

2020-06-19 Thread Leonard Lausen
Hi Justin,

Thank you. Please note that the libgfortran.so is not subject to the GPL in this
case as one may relicense it under the Apache License 2 based on the license
grant by the GCC developers. Thus any policy wrt to GPL is unrelated, as we are
not talking about GPL software.

The ticket has been open since 7+ days, so there has been plenty of time to
voice concerns. Let's discuss on the ticket if you have concerns. If not, what's
the process to close the ticket?

Thank you
Leonard

On Fri, 2020-06-19 at 23:36 +, Justin Mclean wrote:
> CAUTION: This email originated from outside of the organization. Do not click
> links or open attachments unless you can confirm the sender and know the
> content is safe.
> 
> 
> 
> Hi,
> 
> > On the question of building compliant CPU convenience binaries that can be
> > distributed under the name MXNet, ASF now agrees that the previous stance of
> > "libgfortran.so" is GPL and can't be distributed is incorrect based on the
> > GCC
> > Runtime Library Exception [4].
> 
> I would wait until that legal JIRA is closed before assuming this is the case.
> I think you may of misinterpreted what Roman was saying. The currently policy
> is GPL with exceptions are are not allowed, either in a release or as a
> dependancy. [1]
> 
> Thanks,
> Justin
> 
> 1. https://www.apache.org/legal/resolved.html#category-x
> 



Update on non-compliant releases on repository.apache.org and how to obtain a compliant build

2020-06-18 Thread Leonard Lausen
As per the consensus in [1], I requested INFRA to delete the MXNet convenience
binaries on repository.apache.org [2].

Zach previously offered to organize a third-party Maven distribution. However,
based on recently updated version 0.4 of the draft Apache Downstream
Distribution Branding Policy published in June 2020, the ASF does not allow the
use of the MXNet for referring to a non-compliant distribution, taking the
position that this would constitute a trademark infringement. Thus such third-
party distributions are required to happen under a name not trademarked by the
ASF. [3]

On the question of building compliant CPU convenience binaries that can be
distributed under the name MXNet, ASF now agrees that the previous stance of
"libgfortran.so" is GPL and can't be distributed is incorrect based on the GCC
Runtime Library Exception [4]. libgfortran.so itself still has a runtime
dependency on pure GPL libquadmath.so, which can't be redistributed by ASF
projects, but (unlike libgfortran.so) libquadmath.so has a stable ABI and we can
ask users to install libquadmath.so on their systems to use the MXNet
convenience binary.

Instead of relying on the GNU Fortran compiler and introducing the
libquadmath.so dependency, I tried using the Flang Fortran compiler [5]. In
principle, it works well. But there are still a couple of bugs in OpenBLAS
and/or Flang which mean that one needs to build OpenBLAS with -O1 optimization
instead of -O2 to pass the test-suite. [6] This may be fixed in the next
OpenBLAS release. One reason is that Flang performs more aggressive optimization
and vectorization, meaning that once these bugs are fixed we may also find speed
improvements by switching to Flang toolchain.

Best regards
Leonard

[1]: 
https://lists.apache.org/thread.html/r9a949761ce5b9b40fc9404b44db012797a50e490163f5ae616428096%40%3Cdev.mxnet.apache.org%3E
[2]: https://issues.apache.org/jira/browse/INFRA-20442
[3]: 
https://web.archive.org/web/20200612172050/www.apache.org/foundation/marks/downstream.html
[4]: https://issues.apache.org/jira/browse/LEGAL-523
[5]: https://github.com/apache/incubator-mxnet/pull/18513
[6]: https://github.com/xianyi/OpenBLAS/issues/2650#issuecomment-643685423



Re: [DISCUSS] When to add Apache Headers to Third Party Code [WAS] Re: [MENTORS] PPMC case-by-case decision for major modifications of third-party work guidance

2020-06-16 Thread Leonard Lausen
Hi Justin,

thank you for clarifying the major modification threshold. To clarify on the
scope of modification in MXNet: re-implementing the functionality in C++ is just
one aspect. MXNet generally provides more features compared to the original
implementation, such as automatic gradient calculation and a GPU kernels. If we
need additional clarification on the differences to the original implementation,
we can ask Haozheng to elaborate.

Best regards
Leonard

On Tue, 2020-06-16 at 00:52 +, Justin Mclean wrote:
> Hi,
> 
> I also add that converting from one language to another is not considered a
> major modification.
> 
> Thanks,
> Justin



Re: [DISCUSS] When to add Apache Headers to Third Party Code [WAS] Re: [MENTORS] PPMC case-by-case decision for major modifications of third-party work guidance

2020-06-16 Thread Leonard Lausen
actual Lawyer in Legal?  Or have these determinations been
> > low enough risk that we are comfortable with our PMC making best effort
> > decisions based on the ASF guidelines?
> > 
> > 
> > - Bob
> > 
> > 
> > [1]
> > https://lists.apache.org/thread.html/rb83ff64bdac464df2f0cf2fe8fb4c6b9d3b8fa62b645763dc606045f%40%3Cgeneral.incubator.apache.org%3E
> > 
> > [2] https://github.com/numpy/numpy/blob/master/numpy/core/einsumfunc.py
> > On 6/12/2020 7:20 PM, Leonard Lausen wrote:
> > 
> > Thank you Bob for the elaboration. PPMC would like to minimize complexity,
> > that's why we ask for your recommendation.
> > 
> > If it's easiest to just keep the original license header, we can do that. Do
> > we
> > need the contributor to re-license their contribution, or is the
> > contribution
> > already available under both licenses as both license headers were included
> > by
> > the contributor and the ASF header can simply be deleted?
> > 
> > Reading through the threads you referenced, there does not seem to be a
> > strong
> > consensus in the ASF about how to handle this situation. For example,
> > quoting
> > Roman Shaposhnik [2] in support of just putting 2 License Headers for
> > simplicity:
> > 
> > 
> > Hm. This is tricky, now that I re-read the language of the ASF license
> > header I'm not sure anymore. I *think* the language there should allow
> > you to slap said header on a compatible license code.
> > 
> > Besides, the alternative is much messier: every time somebody touches
> > that file he/she needs to decide whether it is time for an ASF header
> > or not.
> > 
> > I *think* (but I'd love for old-timers to chime in and correct me) that #3-5
> > were written from though-shall-not-fork-communities perspective.
> > 
> > Can we follow this approach (keep 2 License headers) for simplicity
> > (assuming
> > removal of ASF header will require extra steps)?
> > 
> > 
> > With respect to einsumfunc.py [5] vs np_einsum_op.cc [6] if this is in
> > fact a port where the behavior was copied/derived directly from numpy I
> > could see that as supporting Justin's case that the Apache header should
> > be removed.  However that is just my opinion.
> > 
> > Which email of Justin are you referring to?
> > 
> > Best regards
> > Leonard
> > 
> > 
> > [1]: http://www.apache.org/legal/src-headers.html#purpose
> > [2]: 
> > https://lists.apache.org/thread.html/ef46b1d0a3dd865d27a33c290430d892d3373d4bc5e27b5f06c7bcda%401451951295%40%3Cgeneral.incubator.apache.org%3E
> > 
> > 
> > On Wed, 2020-06-10 at 21:39 -0500, Bob Paulin wrote:
> > 
> > First general disclaimer: I am not a lawyer.
> > 
> > Second Disclaimer with an engineer hat on we want to avoid copying third
> > party code into the project since it increases the amount of maintenance
> > in a sense from a code standpoint and from a licensing standpoint.  If
> > at all possible it is preferable to either link or try to find a way to
> > integrate your tweaks back into the other projects before taking on the
> > burden of housing the code in MXNet.  I do hope these options were
> > considered or are being looked at for refactoring in the project since
> > it will help the long term viability of the project.
> > 
> > Now to your question.  Similar situations have been discussed both on
> > legal [1] and on incubator [2][3].  It may be useful to review some of
> > these threads to understand how other projects made this determination.
> > There are instances where other members have stated it is appropriate
> > and the dual headers have been used [4].  It seems in some of these
> > cases the PMC has reached out to the other projects to ask for
> > permission to apply the Apache license.
> > 
> > With respect to einsumfunc.py [5] vs np_einsum_op.cc [6] if this is in
> > fact a port where the behavior was copied/derived directly from numpy I
> > could see that as supporting Justin's case that the Apache header should
> > be removed.  However that is just my opinion.  If the PMC feels strongly
> > it would make sense to escalate to legal-discuss.   These are case by
> > case decisions and the more third party code that gets copied in the
> > more drag there will be on the community to deal with these issues.  I
> > would also encourage discussion of each case to remain on list so that
> > the incubator PMC can see how the PPMC is making these determinations.
> > 
> > - Bob
> > 
> > [1]
>

Re: [MENTORS] PPMC case-by-case decision for major modifications of third-party work guidance

2020-06-12 Thread Leonard Lausen
Thank you Bob for the elaboration. PPMC would like to minimize complexity,
that's why we ask for your recommendation.

If it's easiest to just keep the original license header, we can do that. Do we
need the contributor to re-license their contribution, or is the contribution
already available under both licenses as both license headers were included by
the contributor and the ASF header can simply be deleted?

Reading through the threads you referenced, there does not seem to be a strong
consensus in the ASF about how to handle this situation. For example, quoting
Roman Shaposhnik [2] in support of just putting 2 License Headers for
simplicity:

> Hm. This is tricky, now that I re-read the language of the ASF license
> header I'm not sure anymore. I *think* the language there should allow
> you to slap said header on a compatible license code.
> 
> Besides, the alternative is much messier: every time somebody touches
> that file he/she needs to decide whether it is time for an ASF header
> or not.
> 
> I *think* (but I'd love for old-timers to chime in and correct me) that #3-5
> were written from though-shall-not-fork-communities perspective.

Can we follow this approach (keep 2 License headers) for simplicity (assuming
removal of ASF header will require extra steps)?

> With respect to einsumfunc.py [5] vs np_einsum_op.cc [6] if this is in
> fact a port where the behavior was copied/derived directly from numpy I
> could see that as supporting Justin's case that the Apache header should
> be removed.  However that is just my opinion.

Which email of Justin are you referring to?

Best regards
Leonard


[1]: http://www.apache.org/legal/src-headers.html#purpose
[2]: 
https://lists.apache.org/thread.html/ef46b1d0a3dd865d27a33c290430d892d3373d4bc5e27b5f06c7bcda%401451951295%40%3Cgeneral.incubator.apache.org%3E


On Wed, 2020-06-10 at 21:39 -0500, Bob Paulin wrote:
> First general disclaimer: I am not a lawyer. 
> 
> Second Disclaimer with an engineer hat on we want to avoid copying third
> party code into the project since it increases the amount of maintenance
> in a sense from a code standpoint and from a licensing standpoint.  If
> at all possible it is preferable to either link or try to find a way to
> integrate your tweaks back into the other projects before taking on the
> burden of housing the code in MXNet.  I do hope these options were
> considered or are being looked at for refactoring in the project since
> it will help the long term viability of the project.  
> 
> Now to your question.  Similar situations have been discussed both on
> legal [1] and on incubator [2][3].  It may be useful to review some of
> these threads to understand how other projects made this determination. 
> There are instances where other members have stated it is appropriate
> and the dual headers have been used [4].  It seems in some of these
> cases the PMC has reached out to the other projects to ask for
> permission to apply the Apache license.
> 
> With respect to einsumfunc.py [5] vs np_einsum_op.cc [6] if this is in
> fact a port where the behavior was copied/derived directly from numpy I
> could see that as supporting Justin's case that the Apache header should
> be removed.  However that is just my opinion.  If the PMC feels strongly
> it would make sense to escalate to legal-discuss.   These are case by
> case decisions and the more third party code that gets copied in the
> more drag there will be on the community to deal with these issues.  I
> would also encourage discussion of each case to remain on list so that
> the incubator PMC can see how the PPMC is making these determinations.
> 
> - Bob
> 
> [1]
> https://lists.apache.org/thread.html/0fc4c0e95ee0c489553373e378125a0d163bc511da2555caa68bfa87%401455903168%40%3Clegal-discuss.apache.org%3E
> 
> [2]
> https://lists.apache.org/thread.html/d00f72c4aa0b56927dac87b116e2e92fa32b7dcf447016726683cc4f@1455210877@%3Cgeneral.incubator.apache.org%3E
> 
> [3]
> https://lists.apache.org/thread.html/e743b1b1cfda2c4775c3fe509f3adc8f69d64fd2b6eb253ade311fe7%401451947855%40%3Cgeneral.incubator.apache.org%3E
> 
> [4] https://github.com/apache/trafodion/blob/master/core/sql/parser/ulexer.h
> 
> [5] https://github.com/numpy/numpy/blob/master/numpy/core/einsumfunc.py
> 
> [6]
> https://github.com/apache/incubator-mxnet/blob/master/src/operator/numpy/np_einsum_op.cc
> 
> 
> On 6/10/2020 5:29 PM, Leonard Lausen wrote:
> > Hi Bob,
> > 
> > yes, your understanding is correct. To further give an example I'd like to
> > quote
> > Haozheng who added two of the files in question:
> > 
> > > The two files originate from > 
> > https://github.com/numpy/numpy/blob/master/numpy/core/einsumfunc.py .
> > > I translated them from python to cp

Re: Updates for 1.7.0 minor release

2020-06-12 Thread Leonard Lausen
Thank you Ciyong. After further investigation, the build issue is not as severe
as initially claimed on Github. I checked the high-water memory usage during
single-process build: It's 2.7GB on master. On 1.7 release, high-level usage is
2.2GB. This is much more acceptable than the previously claimed >16GB usage and
thus not a blocking issue from my perspective. I'll later also report the
numbers for 1.5 and 1.6.

Fixing the respective implementations to be more compiler-friendly would still
be good.

Looking at the parallel-build high-level memory usage on a 96 core machine, I
saw a 45% memory usage increase during build from 1.5 to 1.7.

Best regards
Leonard


On Fri, 2020-06-12 at 02:09 +, Chen, Ciyong wrote:
> Hi Chai,
> 
> Sorry for the late update.
> 
> Recently, several bug fixes [4] including numpy operator/batchnorm
> gradient/LSTM CPU gradient/CI/CD/license issues were back-ported into v1.7.x.
> So far, there's one build issue and two license issues being tracked.
> 1) build issue #18501 (It costs over 16GB memory to compile
> indexing_op.o), which @leezu stated it's a blocker for the release[1].
> 2) license issue: multiple license header issue[2] is under
> discussion; no valid apache license header issue[3] is identified, and I'm
> working on the PR as @szha suggested.
> 
> If the community can help to expedite the item of [1] and [2], it will be
> great helpful.
> Once we've completed the above items and no more other critical issues, it's
> ok to cut the rc0.
> 
> Thanks for your patients.
> 
> Thanks,
> -Ciyong
> 
> [1] 
> https://github.com/apache/incubator-mxnet/issues/18501#issuecomment-642785535
> [2] 
> https://github.com/apache/incubator-mxnet/issues/17329#issuecomment-641311199
> [3] 
> https://github.com/apache/incubator-mxnet/pull/18478#issuecomment-642462904
> [4] PR list:
> #18358/#18339/#18311/#18352/#18456/#18316/#18482/#18502/#18517/#18464
> 
> 
> 
> -Original Message-
> From: Chaitanya Bapat 
> Sent: Friday, June 12, 2020 1:34 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: RE: Updates for 1.7.0 minor release
> 
> Hey Ciyong,
> 
> Since the last discussion, the GPU memory regression PR has been reverted.
> Is there any update for when the rc0 for 1.7 will be cut?
> Can the community help expedite the process in any way?
> 
> Thanks
> Chai
> 
> On Wed, 13 May 2020 at 18:28, Chen, Ciyong  wrote:
> 
> > Hi Ziyi,
> > 
> > Thanks for reaching me for the known/found issue in the upcoming
> > release, let's fix all these potential issues before dropping the rc0
> > tag 
> > I'll ask help from Tao to merge the PR.
> > 
> > Thanks,
> > -Ciyong
> > 
> > -Original Message-
> > From: Patrick Mu 
> > Sent: Thursday, May 14, 2020 8:58 AM
> > To: d...@mxnet.apache.org
> > Subject: Re: RE: Updates for 1.7.0 minor release
> > 
> > Hi Ciyong,
> > 
> > We found a GPU memory usage regression issue triggered by PR
> > https://github.com/apache/incubator-mxnet/pull/17767, which was pushed
> > to both 2.0, 1.x and 1.7 branches
> > 
> > I have reverted this commit in 2.0, but we should revert this in 1.x
> > and
> > 1.7 branches. I have made a reverting PR on 1.x
> > https://github.com/apache/incubator-mxnet/pull/18309.
> > 
> > I am thinking if you can help to merge the reverting into 1.x and 1.7
> > before making the rc0 tag?
> > 
> > Thanks,
> > Ziyi
> > 
> > On 2020/05/12 00:58:22, "Chen, Ciyong"  wrote:
> > > Hi Chai,
> > > 
> > > Thanks a lot for your kindly help to fix this 
> > > I will continue the rest steps of release process.
> > > 
> > > Thanks,
> > > -Ciyong
> > > 
> > > -Original Message-
> > > From: Chaitanya Bapat 
> > > Sent: Tuesday, May 12, 2020 8:14 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: Re: Updates for 1.7.0 minor release
> > > 
> > > Hello Ciyong,
> > > 
> > > With the https://github.com/apache/incubator-mxnet/pull/18261
> > > merged,
> > nightly pipeline passes for 1.7.x So as far as the 2 nightly test
> > pipelines are concerned [NightlyTests and NightlyTestsForBinaries]
> > 1.7.x is good to go!
> > > Thanks,
> > > Chai
> > > 
> > > On Sun, 10 May 2020 at 04:53, Chen, Ciyong 
> > wrote:
> > > > Hi MXNet Community,
> > > > 
> > > > Here's some updates after the code freeze.
> > > > 1. Nightly tests[1] and nightly binaries tests[2] were enabled,
> > > > many thanks to Chaitanya who helped to create and activate these
> > > > jobs for v1.7.x branch.
> > > > 2. A nightly test failure (incorrect with_seed path) was fixed by
> > > > Chaitanya [3] 3. A bug fix for external graph pass by Sam [4] 4.
> > > > Recently, there's another failed cased (test_large_vector.test_nn)
> > > > in nightly test[5], and Chaitanya is helping to address this
> > > > issue[6]
> > > > 
> > > > I'll keep monitoring the nightly test before making a rc0 tag.
> > > > Please let me know if you have any other issues that should be
> > > > included/fixed in this release.
> > > > 
> > > > Thanks,
> > > > -Ciyong
> > > > 
> > > > 

Re: [MENTORS] PPMC case-by-case decision for major modifications of third-party work guidance

2020-06-10 Thread Leonard Lausen
Hi Bob,

yes, your understanding is correct. To further give an example I'd like to quote
Haozheng who added two of the files in question:

> The two files originate from > 
https://github.com/numpy/numpy/blob/master/numpy/core/einsumfunc.py .
> I translated them from python to cpp. The original files are subject to the 
> the following license: https://github.com/numpy/numpy/blob/master/LICENSE.txt

https://github.com/apache/incubator-mxnet/issues/17329#issuecomment-640043814
 
Thank you
Leonard

On Wed, 2020-06-10 at 07:42 -0500, Bob Paulin wrote:
> Hi,
> 
> Let me restate to make sure I understand what's being asked.
> 
> 1) There is third party code in the project that has Major Modifications to
> the original third party source.
> 
> 2) The original third party code does not currently have two license headers 
> 
> (ex Third Party Code has MIT license only.  Apache License header was added
> when it was checked into MXNet repo with modifications)
> 
> 3) You are asking if the files can remain in the MXNet repository with both
> license headers.
> 
> - Bob
> 
> On 6/9/2020 5:07 PM, Leonard Lausen wrote:
> > Hi Mentors,
> > 
> > https://www.apache.org/legal/src-headers.html#3party states the 5 rules for
> > handling third-party code included in the project [1]. In particular PPMC
> > shall
> > handle major modifications on a case-by-case basis.
> > 
> > But the other rules state
> > 
> > > 1. Do not modify or remove any copyright notices or licenses within third-
> > 
> > party works.
> > 
> > and
> > 
> > > 2. Do not add the standard Apache License header to the top of third-party
> > 
> > source files.
> > 
> > The major modifications in question [2] are currently licensed under Apache
> > License but the files originate from a third-party and there are thus two
> > license headers in the files. This is in conflict with rule 2.
> > 
> > Could you clarify if rule 2 is not a rule but only a guideline that can be
> > overruled in PPMC's case-by-case decision? What's your recommendation? Ie.
> > can
> > we keep the 2 headers in place?
> > 
> > Best regards
> > Leonard
> > 
> > 
> > [1]:
> > 
> > > 0. The term "third-party work" refers to a work not submitted directly to
> > > the
> > > ASF by the copyright owner or owner's agent. This includes parts of a work
> > > submitted directly to the ASF for which the submitter is not the copyright
> > > owner or owner's agent.
> > > 1. Do not modify or remove any copyright notices or licenses within third-
> > > party works.
> > > 2. Do ensure that every third-party work includes its associated license,
> > > even
> > > if that requires adding a copy of the license from the third-party
> > > download
> > > site into the distribution.
> > > 3. Do not add the standard Apache License header to the top of third-party
> > > source files.
> > > 4. Minor modifications/additions to third-party source files should
> > > typically
> > > be licensed under the same terms as the rest of the rest of the third-
> > > party
> > > source for convenience.
> > > 5. Major modifications/additions to third-party should be dealt with on a
> > > case-by-case basis by the PMC.
> > 
> > [2]: 
> > https://github.com/apache/incubator-mxnet/issues/17329#issuecomment-641311199
> > 



[MENTORS] PPMC case-by-case decision for major modifications of third-party work guidance

2020-06-09 Thread Leonard Lausen
Hi Mentors,

https://www.apache.org/legal/src-headers.html#3party states the 5 rules for
handling third-party code included in the project [1]. In particular PPMC shall
handle major modifications on a case-by-case basis.

But the other rules state

> 1. Do not modify or remove any copyright notices or licenses within third-
party works.

and

> 2. Do not add the standard Apache License header to the top of third-party
source files.

The major modifications in question [2] are currently licensed under Apache
License but the files originate from a third-party and there are thus two
license headers in the files. This is in conflict with rule 2.

Could you clarify if rule 2 is not a rule but only a guideline that can be
overruled in PPMC's case-by-case decision? What's your recommendation? Ie. can
we keep the 2 headers in place?

Best regards
Leonard


[1]:

> 0. The term "third-party work" refers to a work not submitted directly to the
> ASF by the copyright owner or owner's agent. This includes parts of a work
> submitted directly to the ASF for which the submitter is not the copyright
> owner or owner's agent.
> 1. Do not modify or remove any copyright notices or licenses within third-
> party works.
> 2. Do ensure that every third-party work includes its associated license, even
> if that requires adding a copy of the license from the third-party download
> site into the distribution.
> 3. Do not add the standard Apache License header to the top of third-party
> source files.
> 4. Minor modifications/additions to third-party source files should typically
> be licensed under the same terms as the rest of the rest of the third-party
> source for convenience.
> 5. Major modifications/additions to third-party should be dealt with on a
> case-by-case basis by the PMC.

[2]: 
https://github.com/apache/incubator-mxnet/issues/17329#issuecomment-641311199



Re: Issue with releases / feedback from ASF board

2020-06-07 Thread Leonard Lausen
Thank you Betrand for the suggestion.

I have created a pull request to update the website. Anyone interested,
please take a look and leave feedback in the pull request or via
response to this mail. There is no preview of the resulting page
available, but we can also iterate via multiple pull requests in case of
any remaining problems.

https://github.com/apache/incubator-mxnet/pull/18487

The PR is quite large, thus my reluctance to first open a PR deleting
stuff and then adding things back. The effort for correcting the site in
a single step is significantly lower. I hope Incubator has understanding
for that.

Thanks
Leonard

Bertrand Delacretaz  writes:
> Hi,
>
> On Thu, Jun 4, 2020 at 8:44 AM Leonard Lausen  wrote:
>> ...Does adding the following notice pior to any mentioning of a third-party
>> binary release work for clearly informing users?...
>
> I haven't followed all the details but IIUC what you are doing is
> linking to third-party packages that can help people get started with
> MXNet but are not provided by the ASF.
>
> If that's correct, I would phrase your disclaimer a bit differently.
>
>>
>> > WARNING: The following binary release is not provided by the Apache
>> > Software Foundation and third-party members of the MXNet community.
>> > They may contain closed-source components with restrictive licenses.
>> > You may want to download the official Apache MXNet (incubating) source
>> > release instead and build from source instead
>
> WARNING: the following links are provided for your convenience but
> they point to packages that are *not* provided nor endorsed by the
> Apache Software Foundation.
> As such, they might contain software components with more restrictive
> licenses than the Apache License and you'll need to decide whether
> they are appropriate for your usage. Like all Apache Releases, the
> official Apache MXNet (incubating) releases consist of source code
> only and are found at .
>
> -Bertrand


Re: Issue with releases / feedback from ASF board

2020-06-04 Thread Leonard Lausen
Hi Justin,

as there have been a couple of mails on the dev@ list prior to your mail
to general@ list and your mail contains a dramatic opening, I'd like to
provide some context here.

The problem in the current focus is how to ensure the
http://mxnet.apache.org/get_started page is compliant with ASF policies.
The page currently provides names of third-party binary distributions
not controlled by the PPMC which may confuse some users.

Let's take a look at the timeline first:

On May 5th 2020 I have opened LEGAL-515 and asked (among other
questions) how the MXNet PPMC can correctly reference third-party
distributions on the website. Unfortunately that question was not
answered. In fact the majority of questions in LEGAL-515 remained
unanswered throughout May (starting May 8th).

Note that prior to my question in LEGAL-515, the MXNet website has been
mentioning the names of third-party distributions already.

You just now stated:

> You were asked to do something about this a few weeks ago and as far
> as I can see have not done so. Please do so as soon as you can.

That's not entirely correct. I note that there a two different requests.
On May 24th you have contacted the PPMC, requesting the PPMC to (among
other things) improve the clarity of the Getting Started page:

> It also needs to be clear what a user is installed from this install
> page [http://mxnet.incubator.apache.org/get_started]

PPMC has been working on resolving this question in LEGAL-515 since May
5th and has also requested guidance from the trademark@ team. This was
still ongoing at the time of your email today.

Today you have contacted the PPMC with a different request about the
Getting Started page:

> It’s quite clear they should not be linked to from an Apache page
> like this as users will think these are Apache releases. Please remove
> them, after that bring it up on the incubator general list and we can
> discuss what needs to be done.

In response I have asked you, if it wouldn't be possible to first decide
how to properly disclaim links to third-parties on the website, before
removing the links and then potentially adding them back with a
disclaimer later.

This is a very simple question. It's quite late in my timezone and
updating the website will take some time. Why not udpate the website
once correctly instead of taking a route that requires multiple updates?

To resolve the situation, I suggest we start from your statement here:

> No Apache project should be distributing 3rd party releases from their
> web site without clearly informing the users of what they are getting.

Does adding the following notice pior to any mentioning of a third-party
binary release work for clearly informing users?

> WARNING: The following binary release is not provided by the Apache
> Software Foundation and third-party members of the MXNet community.
> They may contain closed-source components with restrictive licenses.
> You may want to download the official Apache MXNet (incubating) source
> release instead and build from source instead.

If so, PPMC can initiate the process of adding this statement to the
website tomorrow. If not, do you have a better suggestion?

And in either case, if the Incubator prefers the route of updating the
website multiple times and leaves a partially empty website in the
intermediate time, then let it be that way and PPMC may initiate that
process tomorrow.


>> I'm not sure what you mean. Note that Github automatically creates these
>> release pages based on the presence of git tags in the version control
>> history.
>
> Yes they do but they consists of Apache releases it looks like you
> have non Apache releases there. Other projects tag these add notes to
> make it very clear they are not Apache releases.

The context here is that I requested you to clarify on your mail from
May 24th in which you stated:

> The GitHub download page [2] is also confusing as it contains a mix of
> Apache and non-Apache releases

My understanding of your statement was that you refer to the source
archives created by Github, which are not the official ASF source
archives. MXNet project uploaded the ASF source archives in addition to
the Github source archives to ensure users can easily discover them. But
it appears this is not what you meant with "confusing" .

But given your response, I now believe you may be referring to git tags
that were made prior to MXNet joining the incubator on 2017-01-23 / on
which no vote by the PPMC took place? Adding notes to those releases can
be done easily if that is what you request.

Best regards
Leonard


Re: Issue with releases / feedback from ASF board

2020-06-03 Thread Leonard Lausen
Hi Justin,

Justin Mclean  writes:
> It’s quite clear they should not be linked to from an Apache page
> like this as users will think these are Apache releases. Please remove
> them, after that bring it up on the incubator general list and we can
> discuss what needs to be done.

The status quo has been in place since a while. Do you think we have
time to first discuss the correct solution on the Incubator list, before
we delete the existing pages?

>> Also I notice you referred to the Github Release page. Github will 
>> automatically
>> provide a ZIP folder ("Source code (zip)") for the commit tagged as release.
>> PPMC has further uploaded the ASF .tar.gz, .tar.gz.asc and .tar.gz.sha512. Is
>> that what you mean with confusing mix of "Apache and non-Apache releases”?
>
> You need to mark anything that is not an Apache release very clearly
> and if that cannot be done them it needs to be removed.

I'm not sure what you mean. Note that Github automatically creates these
release pages based on the presence of git tags in the version control
history.

I looked at a number of Apache projects and their Github Release pages.
By the very nature of how Github presents the release page, they all
contain links to download a source archive provided by Github. Different
to MXNet, these projects do not in addition provide the ASF source
archives on their Github release page, but only the Github source
archives.

- Apache Arrow: https://github.com/apache/arrow/releases
- Apache Hadoop: https://github.com/apache/hadoop/releases
- Apache Maven: https://github.com/apache/maven/releases

Most closely, the Apache Beam project includes changelog in a similar
manner as MXNet and also tags RC releases on Github:

- Apache Beam https://github.com/apache/beam/releases

So is your recommendation here to take down the ASF source archives, ie.
the .tar.gz, .tar.gz.asz and .tar.gz.sha512 files and only keep the
basic Github functionality? This will make it harder for users to
discover the official ASF releases, but it's certainly something we can
do.

Best regards
Leonard


Re: Issue with releases / feedback from ASF board

2020-06-03 Thread Leonard Lausen
Hi Justin,

this page currently contains some links to third-party binary distributions of
MXNet (for example at [1]). The question of what the PPMC should recommend those
third-parties to avoid trademarking issues is currently being discussed on
private@ and trademark@.

With respect to the MXNet Website linking to third-parties, I haven't been able
to find a policy yet. The current plan is to add a disclaimer and bring this up
with the Incubator for review. Do you think that's sensible? Do you have any
other recommendation?

Also I notice you referred to the Github Release page. Github will automatically
provide a ZIP folder ("Source code (zip)") for the commit tagged as release.
PPMC has further uploaded the ASF .tar.gz, .tar.gz.asc and .tar.gz.sha512. Is
that what you mean with confusing mix of "Apache and non-Apache releases"?

Best regards
Leonard

[1]: 
https://mxnet.apache.org/get_started?platform=linux=python=gpu=pip;
;

On Wed, 2020-06-03 at 23:50 +, Justin Mclean wrote:
> Hi,
> 
> I don't see what has been done about this [1] which I mentioned above. What is
> the planned action here?
> 
> Thanks,
> Justin
> 
> 1. https://mxnet.apache.org/get_started?
> 



Re: [apache/incubator-mxnet] [RFC] MXNet 2.0 JVM Language development (#17783)

2020-04-23 Thread Leonard Lausen
Another data point is that we currently only support OpenJDK 8 but the JVM 
languages are broken with OpenJDK 11 which is used on Ubuntu 18.04 for example. 
See https://github.com/apache/incubator-mxnet/issues/18153

-- 
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/17783#issuecomment-618745562

Re: [apache/incubator-mxnet] [RFC] Raising the toolchain requirements for MXNet 2 (#17968)

2020-04-14 Thread Leonard Lausen
https://github.com/apache/incubator-mxnet/commit/fb73a1717acad61caeaeef010faed9e9fcc05f0e
 implements the proposal, fixing a number of other issues that were blocking. 
Please see the commit message for a complete list of changes.

As a follow-up item, I suggest to remove the `cpplint` we currently use in 
favor of `clang-tidy` (which we also use). cpplint enforces the Google's C++ 
style guide, which is geared toward C++03. Instead we can target 
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines with enforcement 
by clang-tidy (+ potentially MSVC).

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/17968#issuecomment-613681904

Re: [apache/incubator-mxnet] [RFC] MXNet 2.0 JVM Language development (#17783)

2020-04-12 Thread Leonard Lausen
Another data point is that all of our Scala tests fail randomly with 
`src/c_api/c_api_profile.cc:141: Check failed: 
!thread_profiling_data.calls_.empty():`, so there seem to be some underlying 
issues.

https://github.com/apache/incubator-mxnet/issues/17067

-- 
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/17783#issuecomment-612726875

[apache/incubator-mxnet] [RFC] Raising the toolchain requirements for MXNet 2 (#17968)

2020-04-03 Thread Leonard Lausen
## Description
I propose to raise our toolchain requirements for the MXNet 2 development 
branch to require at minimum gcc7 or clang6 on Unix systems and MSVC 2019 on 
Windows system. All 3 have [reasonable complete C++17 
support](https://en.cppreference.com/w/cpp/compiler_support#cpp17) and MSVC 
2019 fully supports, so that we can adopt `C++17` as required language 
standard. gcc7 and clang6 are available on Ubuntu 18.04 LTS release.

The benefits of adopting a more recent C++ standard should be obvious, giving 
us access to new features and abstractions that the C++ committee has worked on 
over the course of 6 years. The benefits of adopting a more recent toolchain 
should also be obvious, as newer compilers will come with more optimizations as 
older ones.

There are no downsides for MXNet's users, as we can continue to build binary 
releases of MXNet on CentOS 6 that should work on any major Linux distribution 
released after 2004. This is possible based on the great work by RedHat to 
bring new C++ toolchains to old platforms [1]. 

With respect to Windows: MSVC 2019 is the first MSVC that uses an 64bit 
toolchain by default. You may have noticed that our Windows CI was recently 
blocked due to the use of a 32bit toolchain and updating it to MSVC 2019 was 
chosen as remedy (attempts to use the 64bit version of the 2017 toolchain 
failed). It also appears that MSVC 2019 16.5 is the first release to make 
proper use of advanced instruction sets, such as AVX2 [2].

## References
1: https://www.softwarecollections.org/en/scls/rhscl/devtoolset-8/
2: 
https://devblogs.microsoft.com/cppblog/avx2-floating-point-improvements-in-visual-studio-2019-version-16-5/

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/17968

Re: [apache/incubator-mxnet] [RFC] New Branches for MXNet 1.x, 1.7.x, and 2.x (#17701)

2020-03-04 Thread Leonard Lausen
It's not only about the API documentation. Installation instructions or 
tutorials will change over time. Building the website independently for 
different versions may be the simplest approach. I'm also fine with any other 
approach that enables users to look up documentation and instructions for their 
respective version.

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/17701#issuecomment-594934176

Re: [apache/incubator-mxnet] [RFC] New Branches for MXNet 1.x, 1.7.x, and 2.x (#17701)

2020-03-02 Thread Leonard Lausen
Even for 1.x, the current instructions are not compatible with stable 1.6 
release. We should build the website based on 1.6 release branch until a 
version selection is available.

-- 
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/17701#issuecomment-593654044

Re: [apache/incubator-mxnet] [RFC] MXNet 2.0 API Deprecation (#17676)

2020-02-28 Thread Leonard Lausen
We may also drop ONNX in MXNet 2. I'm not aware of anyone working on ONNX in 
MXNet and TVM can be used as a replacement.

-- 
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/17676#issuecomment-592782629

Re: [apache/incubator-mxnet] [RFC] Apache MXNet 2.0 Roadmap (#16167)

2020-02-25 Thread Leonard Lausen
@kalcohol please create a new issue about "static linking lib is (very) far 
away from easy to use", describing your setup in more detail and potentially 
suggestions how to improve the user experience.

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16167#issuecomment-590983917

Re: [apache/incubator-mxnet] [RFC] Deferred compute in imperative interface to unify imperative and symbolic interface (#16376)

2020-01-28 Thread Leonard Lausen
> This seems to be a big change to the existing operator mode (imperative and 
> symbolic).

Essentially the motivation for deferred compute is to extend imperative mode to 
enable users to "construct a symbol" without using symbolic API. This addresses 
confusion around having two APIs and prevents divergence between imperative and 
symbolic APIs. There's no need to drop the existing imperative / symbolic APIs 
due to deferred compute.

> Could you please provide more information.

Please ask a question and I'll answer ;)

> AFAIK, symbolic API already does deferred init, imperative API is provided to 
> improve user experience. Based on this RFC, what's the advantage of this new 
> deferred_compute mode? As a user, when should I use it or not.

Based on deferred compute we can simplify `gluon.HybridBlock` API so that it 
matches the `gluon.Block` API. For example, consider you'd like to reimplement 
`Dense(HybridBlock)` based on extended `HybridBlock` API based on deferred 
compute:

``` python
class Dense(HybridBlock):
def __init__(self, units, use_bias=True, flatten=True,
 dtype='float32', weight_initializer=None, 
bias_initializer='zeros',
 in_units=0): 
super().__init__()
self._flatten = flatten
self._units = units
self.weight = gluon.Parameter(shape=(units, in_units),
  init=weight_initializer, dtype=dtype,
  allow_deferred_init=True)
if use_bias:
self.bias = gluon.Parameter(shape=(units,),
init=bias_initializer, dtype=dtype,
allow_deferred_init=True)
else:
self.bias = None

def forward(self, x):  # We allow users to overwrite forward() directly.
ctx = x.context
return npx.FullyConnected(x, self.weight.data(ctx), self.bias.data(ctx),
  no_bias=bias is None, num_hidden=self._units,
  flatten=self._flatten, name='fwd')
```

`HybridBlock` can wrap the execution of `forward` into a deferred compute 
session and obtain a symbolic representation of the computation and pass it to 
`CachedOp`.

There would be no reason for users to explicitly use the API.

> Another question. We all know deferred init cause bad user experience when it 
> comes to debugging. Would this RFC address the debuggability issue?

This RFC is orthogonal to deferred init. When updating `gluon.HybridBlock` API 
based on deferred compute, one option is to require statically known shapes of 
weights at construction time **if** users implement `def forward`. For 
backwards compatibility we likely want to keep deferred init around for 
existing code relying on `mx.sym` and implementing `def hybrid_forward`.

However, the other option is to allow deferred initialization of weights and 
require users to implement `infer_shape`:

https://github.com/apache/incubator-mxnet/blob/910c608f682a47fc2c43375b5f5a426b563e5821/python/mxnet/gluon/block.py#L1073-L1075

This works around the failures of symbolic shape inference for deferred init in 
case of dynamic shape ops, while still allowing users to decide the shape of 
weight at first forward.

In the example above, it could look like:

``` python
class Dense(HybridBlock):
def __init__(self, units, use_bias=True, flatten=True,
 dtype='float32', weight_initializer=None, 
bias_initializer='zeros',
 in_units=0): 
[...]

def infer_shape(self, x):
self.weight.shape = (self.weight.shape[0], x.shape[1])

def forward(self, x):
[...]
```

> If it's about performance optimization, could we have some initial data of 
> using this new deferred mode vs. existing imperative mode?

There is the option to improve performance of imperative mode by deferring the 
computation and optimizing the computational graph before performing the 
computation. But this is not the main motivation and I haven't optimized for 
this use-case (yet). In the `gluon.HybridBlock` case, we only run with deferred 
compute once to construct the symbolic graph and then pass over to `CachedOp` 
for optimized execution.

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16376#issuecomment-579529593

Re: [apache/incubator-mxnet] [RFC] Apache MXNet 2.0 Roadmap (#16167)

2019-12-27 Thread Leonard Lausen
In the past we always kept development on the master branch, thus how about 
branching out 1.7.0 release branch and keeping development on master?

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16167#issuecomment-569262075

Re: [apache/incubator-mxnet] [RFC] Unified API for Distributed Data Parallel Training (#16795)

2019-11-12 Thread Leonard Lausen
Would it make sense to add optional support for sparse ndarrays and gradient 
compression in `AbstractKVStore`? You mentioned not all frameworks support it. 
Do you expect the API to change in the future?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16795#issuecomment-553250086

Re: [apache/incubator-mxnet] [RFC] Deferred compute in imperative interface to unify imperative and symbolic interface (#16376)

2019-10-07 Thread Leonard Lausen
Thank you @szha and @asmushetzel for looking through the RFC.

> Can you elaborate a bit more about specific use cases that this enables or 
> simplifies? Is there something that can't be done today that this would 
> enable? Are there major pain points that this would address compared to 
> hybrid-blocks? Etc..

The RFC is not so much about extending what is possible, but improving the user 
experience. A major issue of the existing API is that `mx.nd` and `mx.sym` are 
distinct and partially incompatible. The issue of both being distinct is 
partially addressed by existing `HybridBlock` at the cost of making the issue 
of their incompatibility even more severe. Some of this is tracked in [[Bug] 
Inconsistency between HybridBlock and 
Block](https://github.com/apache/incubator-mxnet/issues/16279).

Unifying symbolic and imperative mode with deferred compute also works towards 
[[RFC] Introducing NumPy-compatible coding experience into 
MXNet](https://github.com/apache/incubator-mxnet/issues/14253). While with 
deferred compute we only trace a computational graph (as with current symbolic 
API), a logical next step is to provide support for parsing the AST of user 
provided implementation and directly hybridize it without tracing. You can find 
some more discussion on it in #14253. AST transformation also benefits from a 
unified interface, as a separate imperative and symbolic frontend would be 
meaningless.

> First, should we restrict this mode to only apply to the new numpy arrays?

It may be feasible to provide support also for the normal ndarray interface. 
That said, I suggest to consider such support as a bonus. Providing backwards 
compatibility adds complexity for existing ndarray, which doesn't apply to new 
numpy arrays. The final decision could be taken later.

> Since the deferred compute mode won't support reverse shape inference, new 
> blocks that implement the forward interface will not work without 
> implementing the parameter shape inference logic in infer_shape. This also 
> applies when migrating the existing Gluon blocks in our API. Since we have 
> plan to adopt numpy array in Gluon, the two changes can potentially happen at 
> the same time.

Agree that both should happen at the same time


> could you elaborate on what the changes are to the `infer_shape`, especially 
> on how and when it's invoked during deferred initialization?
 
No conceptual change to the existing `infer_shape` API is required. 
The current implementation works as follows, during forward, if called 
imperatively

https://github.com/apache/incubator-mxnet/blob/4940ec0e7408fad2443f921131cf1ada72724c38/python/mxnet/gluon/block.py#L1084-L1097

where `_deferred_infer_shape` calls `infer_shape`.
Exactly the same logic applies with proposed deferred compute mode. In Line 
1091 a `DeferredInitializationError` will be caught, which is then handled by 
user-implemented implementation of `infer_shape`. If the user did not implement 
`infer_shape`, we raise a warning containing information on the requirement to 
implement `infer_shape` given the lack of general backward shape inference 
support.

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16376#issuecomment-539227866

Re: [apache/incubator-mxnet] [RFC] Deferred compute in imperative interface to unify imperative and symbolic interface (#16375)

2019-10-04 Thread Leonard Lausen
Closing as issue was not picked up by the mailing list bridge.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16375#issuecomment-538595967

Re: [apache/incubator-mxnet] [RFC] Deferred compute in imperative interface to unify imperative and symbolic interface (#16375)

2019-10-04 Thread Leonard Lausen
Closed #16375.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16375#event-2688951413

[apache/incubator-mxnet] [RFC] Deferred compute in imperative interface to unify imperative and symbolic interface (#16376)

2019-10-04 Thread Leonard Lausen
A new **deferred computation** (DC) argument to the imperative MXNet APIs is
proposed. If enabled, memory allocation and computation is deferred as long as
possible. Users can export the computational graph recorded during deferred
computation, which enables hybridization support.

Arrays for which DC is enabled are called **lazy**. Other arrays are called
**normal**. Inplace operations on lazy arrays are unsupported.

Storage allocation and computation for lazy arrays is deferred until their
results are required by conversion to numpy or use as input to an operator
creating a normal array. Accessing attributes such as `shape` can also trigger
computation if the attribute can't be inferred.

## C API

### Deferred Compute (DC) Mode

An “alias” to `MXImperativeInvokeEx`, `MXImperativeDeferredInvokeEx` is
introduced which creates lazy arrays based on (normal or lazy) input arrays and
the operator

``` c
/*!
 * \brief invoke a nnvm op and imperative function creating lazy ndarray
 * \param creator the op
 * \param num_inputs number of input NDArrays
 * \param inputs input NDArrays
 * \param num_outputs number of output NDArrays
 * \param outputs output NDArrays
 * \param num_params number of keyword parameters
 * \param param_keys keys for keyword parameters
 * \param param_vals values for keyword parameters
 * \param out_stypes output ndarrays' stypes
 * \return 0 when success, -1 when failure happens
 */
MXNET_DLL int MXImperativeDeferredInvokeEx(AtomicSymbolCreator creator,
   int num_inputs,
   NDArrayHandle *inputs,
   int *num_outputs,
   NDArrayHandle **outputs,
   int num_params,
   const char **param_keys,
   const char **param_vals,
   const int **out_stypes);
```

### Checks and explicit trigger

``` c
/*!
 * \brief Check if array's computation is deferred.
 * \param handles ndarray handles to be checked
 * \param num_handles nmuber of ndarray handles to be checked
 * \param status pointer to array of num_handles integers to hold the result.
 */
MXNET_DLL int MXNDArrayGetIsDeferredCompute(NDArrayHandle *handles,
 int num_handles,
 int *status);
/*!
 * \brief Trigger deferred computation.
 * \param handles ndarray handles to trigger comuptation of.
 * \param num_handles nmuber of ndarray handles to be checked
 *
 * Deferred computation of input arrays for specified handles is triggered if
 * required. Arrays that are already computed are ignored.
 */
 MXNET_DLL int MXNDArrayTriggerDeferredCompute(NDArrayHandle *handles,
  int num_handles);
```

### Exporting to symbol

The computational graph recorded in deferred computation mode can be exported to
symbol. Users must specify all inputs and outputs, to define the part of the
graph they are interested in exporting.

It is an error, if any of the output depends on an input is not or cannot be
computed from the specified inputs. Equally, providing an input that is not
connected to any output is an error.

``` C
/*!
 * \brief Extract the graph constructed during deferred computation mode as a
 * Symbol.
 * \param input_handles ndarray handles of inputs
 * \param output_handles ndarray handles of outputs
 * \param input_names names associated with the inputs of the returned Symbol
 * \param output_names names associated with the outputs of the returned Symbol
 * \param out grouped output symbol handle
 *
 * Construct a Symbol for the subgraph of the deferred computation graph
 * spanning from the input_handles to the output_handles. Requires that
 * input_handles and output_handles are connected in the tracked computational
 * graph. The input_handles are required to have been used as arguments to an
 * operator that is part of the tracked subgraph. All inputs of the
 * computational graph must be specified.
 */
MXNET_DLL int MXNDArrayGetDeferredComputeSymbol(NDArrayHandle *input_handles,
NDArrayHandle *output_handles,
const char** input_names,
const char** output_names,
int num_inputs,
int num_outputs,
SymbolHandle *out);
```

**Basic Python usage example**
Example without Gluon.

``` python
x = mx.np.arange(shape=(8, 10))
with deferred_compute():
y = (x + 5) * (x + 5)
z = x**2
s = export(inputs={'x': x}, outputs={'y': y, 'z': z})
assert s.list_inputs() == ['x']
assert s.list_outputs() == ['y', 'z']
```



## 

Re: [DISCUSS] Apache MXNet: Path to graduation

2019-08-30 Thread Leonard Lausen
Hen  writes:
> Are you saying that trademarks for MXNet has been registered by other
> companies?

Yes, though not for the purpose of machine learning frameworks.
So I suppose there is no concern?

These are the active / pending registrations: 
- Minimax GmbH & Co. KG trademarked MXNet it in Germany for a networked
  fire protection system.
- Matrikon Inc trademarked MXNet for "Computer software which
  distributes information over the World Wide Web about the state of
  processes in manufacturing plant"
- AIRBUS SLC has a pending registration in Mexico for MXNet for some
  telecommunication product

And then there are 3 expired trademarks that don't matter I suppose.

Does Apache typically start a trademarking process after incubation
finished?

> Hen
>
> On Fri, Aug 30, 2019 at 1:27 AM Leonard Lausen  wrote:
>> The MXNet brand is currently also unregistered by Apache (but registered
>> by various other companies), whereas for example Tensorflow is
>> registered by Google LLC in a variety of jurisdictions. Trademarks
>> registered under the Madrid System can be found at
>> https://www3.wipo.int/branddb/en/
>>


Re: [DISCUSS] Apache MXNet: Path to graduation

2019-08-30 Thread Leonard Lausen
Anton Chernov  writes:
> As a physicist I would like to point out that "Gluon" means: An elementary
> particle that acts as the exchange particle for the strong force between
> quarks [1].
> As a general scientific term it can barely be seen as a candidate for
> trademark registration.

This doesn't seem to pose a barrier for trademark registration though.
There are around 30 trademarks with the name Gluon currently in the
World Intellectual Property Organization's database:
https://www3.wipo.int/branddb/en/
13 are currently "Active".

> [1] https://en.wikipedia.org/wiki/Gluon


Re: [DISCUSS] Apache MXNet: Path to graduation

2019-08-30 Thread Leonard Lausen
Carin recently noted that gluonhq.com already uses the Gluon brand for
end-to-end enterprise mobile solution and Marco found that they do so
apparently since at least 2015. Do you see any impact on the Gluon brand
for deep learning models?

The MXNet brand is currently also unregistered by Apache (but registered
by various other companies), whereas for example Tensorflow is
registered by Google LLC in a variety of jurisdictions. Trademarks
registered under the Madrid System can be found at
https://www3.wipo.int/branddb/en/

Best regards
Leonard

Hen  writes:

> Amazon. Amazon created the brand. They own the first repository to use the
> term in this conext ( https://github.com/gluon-api ). There was some
> involvement from Microsoft, so Microsoft's opinion may also be relevant.
> Gluon is not an Apache Software Foundation nor Apache MXNet brand.
>
> Unless it was very recent, I don't believe there have been any trademark
> registrations. If Amazon would prefer Apache control the Gluon naming, I
> think the simplest 'act' to make that so would be to move the gluon-api
> repository over to ASF control.
>
> Hen
>
> On Thu, Aug 29, 2019 at 8:27 AM Chris Olivier  wrote:
>
>> Who is the gluon “Brand Owner”?
>>
>> On Tue, Aug 27, 2019 at 10:43 AM Chris Olivier 
>> wrote:
>>
>> > Who is the gluon "brand owner"?
>> >
>> > On Tue, Aug 27, 2019 at 10:13 AM Qing Lan  wrote:
>> >
>> >> Hi Lieven,
>> >>
>> >> Thanks for your comments. After the discussion with several committers
>> >> and contributors offline, we agreed that there are space for
>> improvement.
>> >>
>> >>
>> >>   1.  About the Gluon naming
>> >>
>> >> As we know, Gluon is born with the unique API design pattern. It
>> >> gradually became the dominant Python front end for MXNet. I would
>> suggest
>> >> to discuss more with the Brand owner and see if there could be a further
>> >> integration with MXNet. To MXNet itself, it becomes more popular with
>> this
>> >> frontend. We lean on the strong community and improve our product
>> better by
>> >> consuming the feedback from it.
>> >>
>> >>  2. Diversity of the PMC
>> >> Currently, we have 40 PMC numbers from different companies, like Amazon,
>> >> Uber, NVIDIA, ByteDance and a lot more. We are trying to grow the number
>> >> and invite indivials from different companies as well as research
>> institute.
>> >>
>> >> 3. Release rotation
>> >> In the history, most of the releases were done by the Amazon side.
>> >> Currently, we are moving on to rotate this responsibility with
>> >> contributors/committers not from Amazon to start working on them.
>> >>
>> >> 4. Committers from different firm/institution should have real work
>> >> on MXNet
>> >> I can tell from the issues/PRs/rfcs they submitted and indeed and indeed
>> >> we should encourage the committers who is less active to be involved
>> into
>> >> MXNet contribution.
>> >>
>> >> Thanks,
>> >> Qing
>> >>
>> >> 
>> >> From: Lieven Govaerts 
>> >> Sent: Saturday, August 10, 2019 5:59
>> >> To: dev@mxnet.incubator.apache.org 
>> >> Cc: d...@mxnet.apache.org 
>> >> Subject: Re: [DISCUSS] Apache MXNet: Path to graduation
>> >>
>> >> Hi Qing,
>> >>
>> >> as a user and ASF member observing this project:
>> >>
>> >> On Sat, 10 Aug 2019 at 01:44, Qing Lan  wrote:
>> >>
>> >> > Hi All,
>> >> >
>> >> > I would like to start a thread to discuss about the graduation for
>> >> Apache
>> >> > MXNet. From my time working in the community, I saw a great
>> improvement
>> >> in
>> >> > most of the area that we do to make MXNet a better place. We keep
>> >> tracking
>> >> > on all of the issues user raised and reviewing PRs. We follow the
>> Apache
>> >> > Way to release the package in official repository.
>> >> >
>> >> >
>> >> in terms of code, documentation, visibility this project is certainly
>> in a
>> >> healthy state, I see a lot of interest of companies and people, the
>> >> community is growing... As a user that gives me confidence my time
>> >> invested
>> >> in this product is well spent.
>> >>
>> >>
>> >> > In 2017, Apache MXNet joined the Apache incubation project. I think
>> now
>> >> is
>> >> > a good time to review the path to graduate MXNet and move forward to
>> it.
>> >> > Please feel free to share your thoughts on graduation and space for
>> >> > improvement.
>> >> >
>> >> >
>> >> If I may share one observation: I don't see the community working a lot
>> on
>> >> non-code topics. One example that I personally find important is the
>> >> discussion of the Gluon brand. People have expressed confusion about how
>> >> the name is used by multiple non-ASF projects, the MXNet team finds the
>> >> Gluon name very valuable yet the discussion on how to protect the name
>> and
>> >> decide on acceptable use by other projects has stalled [1]. I suggest
>> you
>> >> make a decision on this topic before you go for graduation.
>> >>
>> >> regards,
>> >>
>> >> Lieven
>> >>
>> >> [1]
>> >>
>> >>
>> 

Re: [VOTE] Python 2 Removal for MXNet 1.6

2019-08-27 Thread Leonard Lausen
Due to References: header the prior email was still sorted in the
discussion thread. Cancelling this and resending without that header.

Leonard Lausen  writes:

> Marco de Abreu  writes:
>> 1. Which Python version to support. 3.5 vs 3.6 is currently in the
>> discussion due to Ubuntu 16.04 being shipped with 3.5 while the biggest
>> market share being 3.6 as of now.
>
> We could drop Python 2 even before deciding when to drop 3.5.
>
>> 2. When to do the deprecation. EOY to match with official Python 2
>> deprecation, in 1.5 years to be in line with Ubuntu 16.04 LTS or with the
>> next major release (2.0) to adhere to semantic versioning.
>
> From a Semantic Versioning standepoint, "Given a version number
> MAJOR.MINOR.PATCH, increment the: MAJOR version when you make
> incompatible API changes, MINOR version when you add functionality in a
> backwards compatible manner, [...]" [1].
>
> Based on Semantic Versioning, the question is if we consider Python 2
> support to be part of our API, or rather independent. In the latter
> case, dropping for 1.6 is fine.
>
> From a user-experience perspective, users that want to continue using
> Python 2 for the next 127 days (until EOL date) currently have bigger
> worries than needing to upgrade to the next upcoming MXNet release. They
> must transition their codebase to Py3 within 127 days. For those days,
> they may just stay on MXNet 1.5?
>
> [1]: https://semver.org/
>
>> Once these points (and any future ones) have been properly discussed and
>> the community came to an agreement, we can formalize it with a voting
>> thread. Until then, I'd recommend to refrain from any actions or
>> user-facing communication regarding this topic.
>
> Thus, let's start a vote on dropping Python 2 for MXNet 1.6.
> It's fine if this vote fails, but we need to get a clear understanding
> how we want to move forward.
>
> For better visibility, I'm removing the In-Reply-To: header, which was
> pointing to cahtwjdorqsrbau0a89xjwasawgbvgz7bojsu6tkmxdl+ruh...@mail.gmail.com
>
>> On Tue, Aug 27, 2019 at 1:29 AM Pedro Larroy 
>> wrote:
>>
>>> I have sent a PR that removes Python2 from CI. But was closed. I thought
>>> everyone was +1 on this one. This would remove quite a bit of load on CI:
>>>
>>> https://github.com/apache/incubator-mxnet/pull/15990
>>>
>>> If it's not the right time to do this, what steps do we need to take?
>>>
>>> Pedro.


[VOTE] Python 2 Removal for MXNet 1.6

2019-08-27 Thread Leonard Lausen
Marco de Abreu  writes:
> 1. Which Python version to support. 3.5 vs 3.6 is currently in the
> discussion due to Ubuntu 16.04 being shipped with 3.5 while the biggest
> market share being 3.6 as of now.

We could drop Python 2 even before deciding when to drop 3.5.

> 2. When to do the deprecation. EOY to match with official Python 2
> deprecation, in 1.5 years to be in line with Ubuntu 16.04 LTS or with the
> next major release (2.0) to adhere to semantic versioning.

>From a Semantic Versioning standepoint, "Given a version number
MAJOR.MINOR.PATCH, increment the: MAJOR version when you make
incompatible API changes, MINOR version when you add functionality in a
backwards compatible manner, [...]" [1].

Based on Semantic Versioning, the question is if we consider Python 2
support to be part of our API, or rather independent. In the latter
case, dropping for 1.6 is fine.

>From a user-experience perspective, users that want to continue using
Python 2 for the next 127 days (until EOL date) currently have bigger
worries than needing to upgrade to the next upcoming MXNet release. They
must transition their codebase to Py3 within 127 days. For those days,
they may just stay on MXNet 1.5?

[1]: https://semver.org/

> Once these points (and any future ones) have been properly discussed and
> the community came to an agreement, we can formalize it with a voting
> thread. Until then, I'd recommend to refrain from any actions or
> user-facing communication regarding this topic.

Thus, let's start a vote on dropping Python 2 for MXNet 1.6.
It's fine if this vote fails, but we need to get a clear understanding
how we want to move forward.

> On Tue, Aug 27, 2019 at 1:29 AM Pedro Larroy 
> wrote:
>
>> I have sent a PR that removes Python2 from CI. But was closed. I thought
>> everyone was +1 on this one. This would remove quite a bit of load on CI:
>>
>> https://github.com/apache/incubator-mxnet/pull/15990
>>
>> If it's not the right time to do this, what steps do we need to take?
>>
>> Pedro.


[VOTE] Python 2 Removal for MXNet 1.6

2019-08-27 Thread Leonard Lausen
Marco de Abreu  writes:
> 1. Which Python version to support. 3.5 vs 3.6 is currently in the
> discussion due to Ubuntu 16.04 being shipped with 3.5 while the biggest
> market share being 3.6 as of now.

We could drop Python 2 even before deciding when to drop 3.5.

> 2. When to do the deprecation. EOY to match with official Python 2
> deprecation, in 1.5 years to be in line with Ubuntu 16.04 LTS or with the
> next major release (2.0) to adhere to semantic versioning.

>From a Semantic Versioning standepoint, "Given a version number
MAJOR.MINOR.PATCH, increment the: MAJOR version when you make
incompatible API changes, MINOR version when you add functionality in a
backwards compatible manner, [...]" [1].

Based on Semantic Versioning, the question is if we consider Python 2
support to be part of our API, or rather independent. In the latter
case, dropping for 1.6 is fine.

>From a user-experience perspective, users that want to continue using
Python 2 for the next 127 days (until EOL date) currently have bigger
worries than needing to upgrade to the next upcoming MXNet release. They
must transition their codebase to Py3 within 127 days. For those days,
they may just stay on MXNet 1.5?

[1]: https://semver.org/

> Once these points (and any future ones) have been properly discussed and
> the community came to an agreement, we can formalize it with a voting
> thread. Until then, I'd recommend to refrain from any actions or
> user-facing communication regarding this topic.

Thus, let's start a vote on dropping Python 2 for MXNet 1.6.
It's fine if this vote fails, but we need to get a clear understanding
how we want to move forward.

For better visibility, I'm removing the In-Reply-To: header, which was
pointing to cahtwjdorqsrbau0a89xjwasawgbvgz7bojsu6tkmxdl+ruh...@mail.gmail.com

> On Tue, Aug 27, 2019 at 1:29 AM Pedro Larroy 
> wrote:
>
>> I have sent a PR that removes Python2 from CI. But was closed. I thought
>> everyone was +1 on this one. This would remove quite a bit of load on CI:
>>
>> https://github.com/apache/incubator-mxnet/pull/15990
>>
>> If it's not the right time to do this, what steps do we need to take?
>>
>> Pedro.


Update GCC 4.8 dependency?

2019-08-27 Thread Leonard Lausen
Hi,

"Currently, we only support gcc-4.8 build." [1]

Do we ever want to change this? gcc-4.8 is now available since more than
6 years and a lot has happened during that time. Also platforms have
upgraded their default compiler versions, and gcc-7 is now commonly
available (eg. Ubuntu 18.04 LTS, Amazon Linux 2). With gcc-7 we could
for example rely on C++17.

Wikipedia says:
- GCC since version 7 has complete support for C++17.
- Clang 5 and later implement all the features of C++17.
- Visual Studio 2017 15.7 (MSVC 19.14) supports almost all of C++17.

As Mu mentioned "Conservatism is not an option" if we want to bring
MXNet forward. The benefits of 6 years of work on compilers as well as
C++ ISO committee work may help us with that.

Should we adapt a newer compiler toolchain and perhaps C++17 standard?

Best regards
Leonard

[1]: 
https://github.com/apache/incubator-mxnet/blob/681cfc4/tools/dependencies/README.md


Re: [Discuss] MXNet Python < 3.6 Support Deprecation

2019-08-26 Thread Leonard Lausen
Lieven Govaerts  writes:
> Hi,
>
> On Thu, 22 Aug 2019 at 17:01, Leonard Lausen  wrote:
>
>> Hi,
>>
>> Pedro stated "Seems 3.6 is a reasonable choice." and there have been a
>> few +1 after Chaitanya's reply to Pedro. I would like to check if these
>> only refer to Chaitanya's mail about a dedicated "improvement" effort or
>> about dropping 3.5.
>>
>> Thus two questions:
>>
>> 1) Are there any concerns about dropping Python 3.5? Now is your chance to
>> speak up if you think so.
>>
>>
> Ubuntu 16.04 LTS defaults to Python 3.5.x . The LTS releases are supported
> for 5 years, so for 16.04 LTS it ends in 1.5 years.
>
> I'm not saying you should wait for 1.5 more years, people can upgrade to
> 18.04 LTS after all, but may I suggest you make this switch in a major
> release only? More specifically, ensure that Python 3.6-only code doesn't
> accidentally gets merged into a 1.5.X patch release.
>
> thanks,
>
> Lieven

Hi Lieven,

thanks. I believe the Python version compatibility falls under the
semantic versioning umbrella of things not to break within any 1.x
release. Thus above suggestion would be with respect to a 2.x release or
experimental / preview / new features added to 1.x, without affecting
existing 1.x features. It would not affect 1.5.x patch releases.

Best regards,
Leonard


>> 2) Should new MXNet 1.x (experimental?) functionality (for example numpy
>> compatible interface) only target the Python versions to be supported in
>> MXNet 2? The current plan is to make many MXNet 2 features available as
>> "opt-in" in MXNet 1.x. Supporting older Python versions on MXNet 1 for
>> these features may impact design and functionality and create
>> unnecessary technical debt.


Re: CI and PRs

2019-08-15 Thread Leonard Lausen
To parallelize across machines: For GluonNLP we started submitting test
jobs to AWS Batch. Just adding a for-loop over the units in the
Jenkinsfile [1] and submitting a job for each [2] works quite well. Then
Jenkins just waits for all jobs to finish and retrieves their status.
This works since AWS Batch added GPU support this April [3].

For MXNet, naively parallelizing over the files defining the test cases
that are in the longest running Pipeline stage may already help?

[1]: 
https://github.com/dmlc/gluon-nlp/blob/master/ci/jenkins/Jenkinsfile_py3-master_gpu_doc#L53
[2]: https://github.com/dmlc/gluon-nlp/blob/master/ci/batch/submit-job.py
[3]: https://aws.amazon.com/blogs/compute/gpu-workloads-on-aws-batch/

Marco de Abreu  writes:

> The first start wrt parallelization could certainly be start adding
> parallel test execution in nosetests.
>
> -Marco
>
> Aaron Markham  schrieb am Do., 15. Aug. 2019,
> 05:39:
>
>> The PRs Thomas and I are working on for the new docs and website share the
>> mxnet binary in the new CI pipelines we made. Speeds things up a lot.
>>
>> On Wed, Aug 14, 2019, 18:16 Chris Olivier  wrote:
>>
>> > I see it done daily now, and while I can’t share all the details, it’s
>> not
>> > an incredibly complex thing, and involves not much more than nfs/efs
>> > sharing and remote ssh commands.  All it takes is a little ingenuity and
>> > some imagination.
>> >
>> > On Wed, Aug 14, 2019 at 4:31 PM Pedro Larroy <
>> pedro.larroy.li...@gmail.com
>> > >
>> > wrote:
>> >
>> > > Sounds good in theory. I think there are complex details with regards
>> of
>> > > resource sharing during parallel execution. Still I think both ways can
>> > be
>> > > explored. I think some tests run for unreasonably long times for what
>> > they
>> > > are doing. We already scale parts of the pipeline horizontally across
>> > > workers.
>> > >
>> > >
>> > > On Wed, Aug 14, 2019 at 5:12 PM Chris Olivier 
>> > > wrote:
>> > >
>> > > > +1
>> > > >
>> > > > Rather than remove tests (which doesn’t scale as a solution), why not
>> > > scale
>> > > > them horizontally so that they finish more quickly? Across processes
>> or
>> > > > even on a pool of machines that aren’t necessarily the build machine?
>> > > >
>> > > > On Wed, Aug 14, 2019 at 12:03 PM Marco de Abreu <
>> > marco.g.ab...@gmail.com
>> > > >
>> > > > wrote:
>> > > >
>> > > > > With regards to time I rather prefer us spending a bit more time on
>> > > > > maintenance than somebody running into an error that could've been
>> > > caught
>> > > > > with a test.
>> > > > >
>> > > > > I mean, our Publishing pipeline for Scala GPU has been broken for
>> > quite
>> > > > > some time now, but nobody noticed that. Basically my stance on that
>> > > > matter
>> > > > > is that as soon as something is not blocking, you can also just
>> > > > deactivate
>> > > > > it since you don't have a forcing function in an open source
>> project.
>> > > > > People will rarely come back and fix the errors of some nightly
>> test
>> > > that
>> > > > > they introduced.
>> > > > >
>> > > > > -Marco
>> > > > >
>> > > > > Carin Meier  schrieb am Mi., 14. Aug. 2019,
>> > > 21:59:
>> > > > >
>> > > > > > If a language binding test is failing for a not important reason,
>> > > then
>> > > > it
>> > > > > > is too brittle and needs to be fixed (we have fixed some of these
>> > > with
>> > > > > the
>> > > > > > Clojure package [1]).
>> > > > > > But in general, if we thinking of the MXNet project as one
>> project
>> > > that
>> > > > > is
>> > > > > > across all the language bindings, then we want to know if some
>> > > > > fundamental
>> > > > > > code change is going to break a downstream package.
>> > > > > > I can't speak for all the high level package binding maintainers,
>> > but
>> > > > I'm
>> > > > > > always happy to pitch in to provide code fixes to help the base
>> PR
>> > > get
>> > > > > > green.
>> > > > > >
>> > > > > > The time costs to maintain such a large CI project obviously
>> needs
>> > to
>> > > > be
>> > > > > > considered as well.
>> > > > > >
>> > > > > > [1] https://github.com/apache/incubator-mxnet/pull/15579
>> > > > > >
>> > > > > > On Wed, Aug 14, 2019 at 3:48 PM Pedro Larroy <
>> > > > > pedro.larroy.li...@gmail.com
>> > > > > > >
>> > > > > > wrote:
>> > > > > >
>> > > > > > > From what I have seen Clojure is 15 minutes, which I think is
>> > > > > reasonable.
>> > > > > > > The only question is that when a binding such as R, Perl or
>> > Clojure
>> > > > > > fails,
>> > > > > > > some devs are a bit confused about how to fix them since they
>> are
>> > > not
>> > > > > > > familiar with the testing tools and the language.
>> > > > > > >
>> > > > > > > On Wed, Aug 14, 2019 at 11:57 AM Carin Meier <
>> > carinme...@gmail.com
>> > > >
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > Great idea Marco! Anything that you think would be valuable
>> to
>> > > > share
>> > > > > > > would
>> > > > > > > > be good. The duration of each node in the test stage 

Re: How should MXNet treat nan values?

2018-07-26 Thread Leonard Lausen
Thanks to everyone who made their opinion known. So far the consensus
is that any nan handling in MXNet should not affect performance, at
least not by default.

This still leaves the question open if we should aim for documenting the
behavior of MXNet operators under presence of nan values. For example,
should we include a sentence in the argmax and topk documentation?
Should the 1.3 release notes note the changed behavior of topk?

So far this has not been done. Instead any change of operator behavior
with respect to nan values is treated as implementation change that is
not worth noting to the user.

As this can decrease user experience, I advocate for documenting the
current behavior and possible future changes.

In case there are no objections, is there any way to edit the changelog
for the upcoming release?


Should MXNet 1.3 contain a buggy version of nn.Embedding backward by default?

2018-07-23 Thread Leonard Lausen
Currently the default kernel of nn.Embedding backward is known to be
buggy on P3 instances or using Cuda 9.2 (though the issue also occurs on
other instances with earlier version of Cuda, but less often).

https://github.com/apache/incubator-mxnet/issues/11314

There is currently an opt-in for using a bug-free kernel, but it is not
the default. However, the bug-free kernel is used by default for shape
smaller 16384.

Should MXNet ship a more efficient but buggy kernel in v1.3 or use a
correct but less efficient kernel by default? As MXNet v1.3 is likely to
be used a lot with Cuda 9.2 I believe the default behavior should be
changed to use the bug-free but less efficient Kernel. Correctness and
providing a good user experience should be No. 1 here (?). Then users
that want a faster but buggy backward kernel can still select to do so.
Note this only affects the backward pass.

Hao did related work on improving the take operator
https://github.com/apache/incubator-mxnet/pull/11326
https://github.com/apache/incubator-mxnet/pull/11795 which also fixes
the issue, but he found it to be only "slightly faster" compared to the
bug-free kernel that is currently under opt-in while leading to CI
failures on Windows.

In my experience, there is no speed difference between the current buggy and
opt-in bug-free kernel, but the GPU utilization of the latter is 100% compared
to 60% of the former (benchmark script:
https://github.com/apache/incubator-mxnet/pull/11795#issuecomment-405808567 )


How should MXNet treat nan values?

2018-07-20 Thread Leonard Lausen
Hello MXNet community,

It seems that there is currently no agreed upon principle to handle
`nan` values in operators. This has led to inconsistencies between
operators and also to inconsistency over releases. Some operators ignore
nan values (eg. argmax), others treated it as maximum (e.g. topk up to
mxnet v1.2) or just return “undefined” output (e.g. topk starting with
mxnet v1.3).
 
Initially the change in topk was reported as a bug
(https://github.com/apache/incubator-mxnet/issues/8510) as some users
relied on the behavior. However (and rightfully) @asmushetzel, who
contributed the improved topk operator for mxnet v1.3 pointed out that
the change did not break any documented behavior.
 
To go forward, please share your opinion how MXNet should handle `nan`
values. Should we continue to treat the behavior as undefined and
possibly silently changing between releases? Should we define a
reasonable standard (e.g. follow numpy) and treat operators that deviate
as buggy? Should we just document how operators behave currently and
warn if the behavior changes? Something else?
 
Please make your opinion known so above issue can be resolved/closed and
general guidelines can be defined for future contributions, following
whatever consensus emerges.
 
Thanks!
Leonard